S3: Chapter 5 Regression and Correlation Dr J Frost ([email protected]) www.drfrostmaths.com Last modified: 30th August 2015 What this chapter is mostly about 11+ NVR Score 119 103 Avg AS point score 287 265 110 137 37 300

* Disclaimer: These values are made up! One question that naturally arises amongst teachers at Tiffin is whether the 11+ tests are an effective predictor of academic success later on. Tiffin has recently dropped its NonVerbal/Verbal tests 11+ in favour of English/Maths tests. It will be interesting to see to what extent the correlation between 11+ scores and later metrics (e.g. average A2 point scores) increases. We could just calculate the PMCC of the two variables, but we might just be interested in comparing the rankings. Tiffin Data Fun Facts Fun True Fact: The PMCC between the 11+ ranks of students in the current L6 and their last JMC score ranks is 0.280. ? Fun True Fact: The PMCC between the NVR ranks and C1 test ranks (when taken in Year 11) is 0.101. ?

For Year 11s in 2010. Fun True Fact: The PMCC between Year 7 endof-year test rank and Year 9 test rank is 0.502. ? For Year 11s in 2010. Fun True Fact: The PMCC between Year 8 endof-year test rank and Year 9 test rank is 0.794. ? For Year 11s in 2010. Fun True Fact: The PMCC between: NVR + Year 9 test rank: 0.12. ? VR + Year 9 test rank: 0.16 For Year 9s in 2009. ? NVR% VR% All Tiffinians

84% 84% Oxbridge Tiffs 89% 85% ? Year 7s in 2007 ? RECAP: Product Moment Correlation Coefficient 11+ NVR Score Avg AS point score 119 287

103 265 110 137 37 300 ? ? ? ? ? ? ? ? ? Spearmans rank correlation coefficient

11+ NVR Score 119 1 Avg AS point score 287 However, if were simply interested in how the rankings are correlated, we might discard the original data and use the rankings instead. 2 Using your STATS mode: ? 103 3 265

3 110 2 137 4 37 4 300 1 ! Spearmans rank correlation coefficient is when the data is converted to rankings before calculating the PMCC. Interpreting

=1 Rankings in perfect agreement. ? = 1 Ranks in reverse order. ? = 0 No correlation in rankings. ? Calculating more easily ! If no tied ranks: 11+ rank () AS rank () 1 2

3 3 -1 where is difference between each rank. 0 (If tied ranks, calculate normal PMCC on ranked data) ? ? 2 4 -2 ?

? 4 1 3 ? ? Proof of and PMCC equivalence ? Since we know each of the are 1 to : ? Therefore: ? ? ? (Not in textbook/exam) ?

? ? ? ? ? ? ? ? ? Test Your Understanding Edexcel S3 June 2011 Q2 ? Bro Exam Tip: Use your calculator STATS mode to calculate on your ranked data and check your answer guaranteed full marks every time!

Differences between and (Bro Exam Tip: This can be tested!) Original data Spearmans Rank: Makes no assumption about original data: original data need not be linear. PMCC: We can only do a hypothesis test if the variables are (jointly) normally distributed. (Well do hypothesis testing in a sec) Ranked data Exercise 5A Hypothesis Testing What would you think would be a suitable null hypothesis what analysing the correlation of two variables? The null hypothesis in general is when the data is random, i.e. in this case, that there is

? no linear correlation between them. Now suppose the two variables were each normally distributed. () ( ) English mark See Demo > (File Ref: PMCC_Correlation_Model) Maths mark Questions from Demo Given the points were randomly generated, what do we expect the correlation to be? 0: if the data was randomly generated and the variables were independent theres no ? inherent connection between them. Is it possible that for some randomly generated independent data, the correlation may be high? Yes, just by chance they could show either positive or negative correlation. ?

! (Greek letter rho) is a population parameter which is the actual correlation between variables and . ! is the observed correlation from a sample. This varies across samples. ( ) We saw in the demo that when , jumped around symmetrically about . This forms an (incredibly complicated) sampling distribution for . 5% -1 0 1 We might be interested in knowing the critical value at which the probability is above it is 5%, i.e. the point at which any correlation seen is considered to be significant (were we assuming any correlation there is, is by chance)

Correlation Coefficient Table Formula booklet (note , i.e. were always assuming no correlation in S3) In our demo our sample size was and Determine: The critical region at which we have a significant positive correlation (significance level 5%) ? Critical region at which we have a significant correlation (significance level 5%) ? Critical region at which

we have a significant correlation (significance level 1%) ? Example Hypothesis Test The product-moment correlation coefficient between 30 pairs of reactions is . Using a 0.05 significance level, test whether or not differs from 0. Null/Alternative Hypotheses? Critical region: and . Critical therefore value of is significant. Region? Reject and accept . Conclusion? There is evidence of some correlation. Test Your Understanding The table shows the BMI (Body Mass Index) of a number of people along with their age. a) What assumption are we making about the data in order to carry out a hypothesis test on the Product Moment Correlation Coefficient?

b) Carry out a suitable hypothesis test at the 5% level that age and BMI are correlated. Age 26 30 31 50 42 BMI 18 21 20.5 24 17

a) That age and BMI are normally distributed. ? b) ? therefore do not reject . There is insufficient evidence for correlation between age and BMI. Hypothesis Testing with Spearmans Rank Why do you think we might use a different table for hypothesis testing with Spearmans rank? For the PMCC, the distribution of was produced by repeatedly sampling from two (jointly) normally distributed variables and taking the PMCC each time. i.e. The variables are assumed to be normally distributed. But with data for which were calculating , the data in each variable is always 1 to ! ? Sampling distribution of when the sample size is say 2 or 3: = 2 = 3 Suppose we ordered first set of data so that ranks are 1, 2

Suppose we ordered first set of data so that ranks are 1, 2, 3 Possible samples for Possible ? ? ? ? ? Example ? ? Test Your Understanding Edexcel S3 June 2011 Q2 (we found this earlier)

? YOU HAVE REACHED THE END OF MATHS WELL DONE