Salvador Dali: The Making of New Man   Psychology as Science
 

 

Home AS A2 Links
Psychology a Science?
Applying Scientific Method
Ethics in Research
Ethics in Research 2
Dealing with Ethical Issues
Experimental Method
Research Design
Observations
Correlations
Case studies and Content Analysis
Interviews and Questionnaires
Aims and Hypotheses
Sampling
Reliability and Validity
Researchers and Participants
Data Analysis
Central Tendency
Graphs
Qualitative Analysis
Planning Research
Introduction and Method
Data and Results
Discussion, Abstract, Refs
Choosing your Stats Test

 

 

 

 

 

Choosing the correct Statistical Test

 

An important point to consider at the outset, particularly those amongst you that don’t like sums.  You will not be expected to calculate a level of statistical significance.  However, you will need to know when to use a particular test and also having been given an observed value, be able to decide its level of significance.  This isn’t as complex as it sounds.  It’s simply a matter of looking up the information in a table, though you will need to understand what the table tells us!

When choosing a test there are three things to consider.  Two of these have already been covered in this booklet, the third was covered at AS so a quick reminder.

 

1.  NOIR: What is the level of your data?

Nominal Data: is the simplest thing a number can do.  It can tell us how many things there are!  Basically nominal data is a headcount or a tally.  It doesn’t tell us if something is bigger, brighter or bolder, just how many.  For example, get a show of hands; how many people in the class study English.  Your head count provides nominal data.  If you were replicating Piaget’s research at a primary school you might count the number of five year olds who can successfully complete the three mountains task and compare this to the number of seven year olds.  Nominal data.

Ordinal data: allows us to put things in order.  For example A might be more attractive than B but uglier than C.  We have the order C A  B in terms of attractiveness.  Crucially however, we can’t be sure that the difference between C and A is the same as the difference between A and B.  C and A might both be very attractive whereas B might be a complete minger.  We can’t tell that the intervals are the same.  

Usain Bolt won the men’s 200m at Beijing, Shawn Crawford was second and Walter Dix third.  From this we can’t tell is the difference between first and second was the same as the difference between second and third.  First, second, third provides ordinal data.

Interval and Ratio: allows us to put things in order (ascending or descending) just as ordinal, however this time we can be sure that the intervals are the same.  We know that the difference between 10cm and 11cm is the same as the difference between 15cm and 16cm.  The same applies to weight or mass, temperature and time. 

 

An odd one to consider is IQ.  The jury is out on this one.  Some psychologists believe it yields interval/ratio data, others that it is merely ordinal.

Generally speaking if you need a piece of equipment to measure it, then its interval or ratio. 

For the purposes of statistics interval and ratio are taken as the same.  There is however, a subtle difference.  Ratio has a true zero. So no minus values, e.g. time, weight, height.  Interval data can be minus e.g. temperature in degrees Celcius.  As a result you can say that 20cm is twice as long as 10cm.  You cannot say 20C is twice as hot as 10C.

 

2. Correlation or difference?

Provided you’ve given careful consideration to your procedure and are confident tin what you’re looking for this should be easy.  Some groups have appeared confused in the past, particularly with issues such as the relationship between attractiveness and punishment.  This could be done either way:

You could produce an ascending scale of attractiveness and compare this to the level of punishment given to each person.  You would predict a negative correlation; as attractiveness increases, level of punishment given decreases. 

Alternatively you could split your photographs into two groups, with the beautiful people in one group and the mingers in the other.  Then count the level of punishment offered for each.  You are now looking for a difference between the two groups.  The danger is, having formulated a hypothesis that you don’t stick to it.

Generally however, it should be obvious from your hypothesis what you’re looking for!

 

3.  Repeated or independent measures design

Again obvious since we’ve covered it many times.  If you’re using the same group of participants to assess both variables its repeated measures.  If the participants in one condition differ from the other its independent.  There are times when the decision is made for you.  Sex differences, age differences, cultural differences… they have to be different participants in each condition. 

 

Decision time

Having decided on the above three dimensions, use the chart below to decide which test to use.  You will be expected to know about the four in bold: Chi squared, Wilcoxon’s sign test, Mann-Whitney ‘U’ and Spearman’s ‘rho.’

  

 

Test of

 

difference

Test of correlation (relationship)

Type of Data

Repeated Measures / matched Pairs

Independent measures / single participant

 

Nominal

 

Sign Test

Chi Squared

Chi Squared

Ordinal

Wilcoxon sign test

Mann Whitney ‘U’

 

Spearman ‘rho’

Interval/

ratio

Related ‘t’ test

Independent (unrelated) ‘t’ test

Pearson product moment (‘r’)

 

e.g. if you have ordinal data with independent measures design and you’re looking for a difference, you will use Mann-Whitney ‘U.’ 

Now a little bit of play acting or imagination.  Let’s pretend you’ve done your experiment, collected your raw data, chosen the correct test to use and made your calculation.  All your numbers will have been put into tables or grids, you’ll have calculated means and added things up, squared and square-rooted, subtracted one group from another and perhaps done some dividing too.  At the end of this you’ll have calculated ONE number.  This number will magically tell you whether your results are meaningful and statistically significant, or whether they’ve more than likely occurred by chance and are little more than a fluke. 

 

Critical and observed values

The number you calculate is your observed value.  This needs to be compared with the critical value in the appropriate table.  Each test has its own table with various critical values depending on the level of significance 5% (0.05), 1% (0.01), 0.5% (0.005) and so on.  The critical value also varies depending on the number of participants or degrees of freedom. 

With Spearman’s rho and chi squared tests the number you calculate needs to be equal to or greater than the critical value for your findings to be significant.

Aide memoire

 ‘Spearman’s rho’ and ‘chi squared’ both contain ‘Rs’ as does the word gReater

‘Mann Whitney U’ and ‘Wilcoxon’s sign’ do not contain R.  With these two tests the critical value needs to be equal to or smaller than the critical value. 

 

Type one and type two errors

Type 1

This is believing you have found a significant result when you haven’t.  You reject the null hypothesis when it should be retained.  For example you might set too lenient a level of significance. 

Type 2

You’ve guessed it… this is believing you have found nothing of significance when you have.  This one is particularly annoying for an undergraduate piece of research.  You have accepted the null hypothesis when it should have been rejected.  This could happen if you set yourself too high a level of significance. 

 

 

Chi squared test

Use when you have nominal data with independent measures design.  Unlike the other tests, chi-squared can be used to test for a correlation or a difference. 

For example: Piaget’s three mountains test:

 

5 year olds

7 year olds

Totals

Successful

a.

4

b.

18

 

 

22

Not successful

c.

16

 

d.

2

 

18

Total

20

20

40

 

You would put your raw data into a grid and then calculate the expected frequencies for each cell (a,b,c,d)

You then compare the scores you obtained with what would be expected by chance.  With some appropriate and very repetitive number crunching (especially if you have 20 cells) you calculate your critical value. 

The chi squared test uses degrees of freedom calculated:

Number of columns -1  x  Number of rows -1

In this case 2-1  x  2-1 = 1 x 1 = 1

You look up your observed value in the appropriate table for 1 degree of freedom at the 5% level.

Your number needs to be equal to or greater than the critical value.

 

 

If asked to justify a choice of test do so in terms of whether you’re looking for a correlation or a difference, using an independent or repeated measures design and level of data obtained.

For example:  I chose to use Mann Whitney ‘U’ because I was looking for a difference with an independent measures design and would be obtaining data at the ordinal level.

Note: if using matched pairs design treat as repeated measures. 


 

 

Spearman’s Rho

Use when you are looking for an association (for example a correlation) with ordinal level of data.

For example, testing the matching hypothesis which predicts that men and women with similar levels of attractiveness are more likely to get married. 

This time you put your raw data in a table that looks like this:

 

Couple

Groom

Bride

Rank

(groom)

Rank

(bride)

Difference between ranks

Difference squared

A

4

5

 

 

 

 

B

4

4

 

 

 

 

C

9

8

 

 

 

 

D

2

10

 

 

 

 

E

7

7

 

 

 

 

F

8

8

 

 

 

 

G

3

4

 

 

 

 

H

8

9

 

 

 

 

I

6

6

 

 

 

 

J

4

5

 

 

 

 

 

 

 

 

 

 

 

 

You can complete the rest when we look at ranking a set of data.

Essentially you give each groom a rank dependent on their attractiveness compared to the other grooms and then repeat the process for the brides.  The higher the correlation the more similar the two sets of ranks (i.e. the more similar their levels of attractiveness.  When you calculate the difference in ranks the more similar the attractiveness the smaller the differences.  You square the values to get rid of any negative values (remember -2 squared is 4 not -4!).

After a little more jiggery pokery you end up with a critical value… this time always between -1 and +1.

You look it up in the appropriate table.  This time the number of pairs is important.  There is a critical value at 5% that varies depending upon the number of pairs of participants.  Your observed value needs to be gReater than or equal to the critical value.

 

 

 

Mann Whitney ‘U’ Test

Use when you are looking for a difference with ordinal data and an independent measures design. 

For example you might want to test the hypothesis that boys and girls take different subjects at A-level, boys preferring spatial and mathematical, girls preferring subjects that are more verbal.

To do this you allocate a score for each A-level subject…for example allocating spatial and mathematical subjects a low score: physics and maths (1), chemistry (2) etc and verbal subjects a high score English, French, German (10), politics and history (9) and so on…

You put your raw data in a table that looks like this:

 

Boys scores

Girls scores

Rank (boys)

Rank (girls)

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Σ

 

 

Unlike correlational (Spearman’s) the boys and girls can go in any order… this is independent measures so there are no pairs as such.  Also, unlike Spearman’s the number of boys and girls scores can be different.  You could have 10 boys and 12 gorls for example.

This time you rank all the scores together… place all the boys AND girls scores in ascending order and calculate a rank.  For the calculation you only need to add up one set of ranks, in this case the boys.  Then following some other number crunching you end up with TWO values.  The smaller value is called ‘U’ and the larger value ‘U’’ (pronounced U prime). 

You check U (smaller number) against the critical value for the number of participants in each column.  This time the observed value needs to be equal to or smaller than the critical value. 

 

 

 

 

Wilcoxon’s sign test

Use when you are looking for a difference, with a repeated measures design and ordinal data.

For example investigating the Mozart effect.  This is the idea that listening to the music of Wolfgang Amadeus Mozart (his real name was Johannes Chrysostomus Wolfgangus Theophilus Mozart) but I digress, will improve all manner of cognitive functions.  This could be tested using a repeated measures design.  Day 1 you get your participants to complete a memory task whilst listening to a popular contemporary instrumental track.  Day 2 they return and complete a similar task listening to Mozart. 

Obviously a better design option here is then to deploy counter-balancing measures or ABBA if you prefer.

Raw data would go on a table like this:

 

Participant

Mozart

Non-Mozart

Difference

Rank

A

 

 

 

 

B

 

 

 

 

C

 

 

 

 

D

 

 

 

 

E

 

 

 

 

F

 

 

 

 

G

 

 

 

 

H

 

 

 

 

I

 

 

 

 

J

 

 

 

 

 

 

 

 

 

 

Any ‘0’ ranks are ignored.  The sum of positive ranks is added and then the sum of negative ranks.  The smaller of the two values is taken and then it’s a very quick job to look up the value in an appropriate table for the appropriate number of participants (in the above case 10).  The simplest of all inferential tests to calculate. 

Wilcoxon’s sign test contains no letter ‘R’ so this time the observed value needs to be equal to or smaller than the critical value found in the table.