Chapter 4 Review: More About Relationship Between Two Variables Group Members: Qianya Meng Nikta Kheiri Min Kim 1st period 12/14/11 The Big Idea Transform the graph to achieve linearity Transform exponential graphs: to achieve linearity and come up with a transformed equation for the use of extrapolation. Transform power functions to achieve linearity and come up with a transformed equation for the use of extrapolation. Learn to use marginal distribution and conditional Recognize relationships between two variables.

Vocabulary You Need to Know Transforming or re-expressing the data is applying a function such as the logarithm or square root to a quantitative variable Log Rules: 1) logb(mn) = logb(m) + logb(n) 2) logb(m/n) = logb(m) logb(n) 3) logb(mn) = n logb(m) Vocabulary Linear growth increases by a fixed amount in each equal time period. Exponential growth model Log y = log a + (log b)x

Predicted y = ab^x Power law model Log y = log a + p log x Predicted y = ax^p Vocabulary Two-way table describes two categorical variables Marginal distributions are the total in each column and row variable Conditional distributions of column variable, given row variable Conditional distributions of row variable, given column variable Simpsons paradox is a reversal that an association or comparison that holds for all of several groups can reverse direction when the data are combined to form a single group Vocabulary Causation: Changes in x cause changes in y Common response: Changes in both x and y are caused by changes in a lurking variable z

Confounding: The effect (if any) of x on y is confounded with the effect of a lurking variable z Key Topics Covered in this Chapter Modeling nonlinear data Relations in categorical data Establishing causation Formulas You Should Know Exponential growth model Log y = log a + (log b)x Predicted y = ab^x Power law model

Log y = log a + p log x Predicted y = ax^p Calculator Key Strokes Exponential growth modeling Enter the explanatory data into L1 and response data into L2 Draw the scatterplot y versus x Define L3 as the (natural) logarithm of L2 then make a scatterplot of (ln) log versus L1 Perform the least-squares regression on the transformed data Draw the scatterplot Plot the residuals versus L1

With the regression equation in Y1, define Y2 = e^(Y1) or Y2 = log^(Y1). Calculator Key Strokes Power law modeling Enter the explanatory data into L1 and response data into L2 Draw the scatterplot y versus x Define L3 as the (natural) logarithm of L1 and define L4 as the (natural) logarithm of L2 Plot L4 versus L3

Calculate the regression equation for the transformed data and store it in Y1 Construct a residual plot Define Y2 as (10^a)(x^b) or (e^a)(x^b) Plot Y2 and the scatterplot for the original data together To make a prediction for the value x = k, evaluate Y2(k) on the home screen Helpful Hints When the explanatory variable is years, transform the data to years since so that the values are smaller and dont create overflow problems when you perform the inverse transformation If there is a clear explanatory/response relationship, compare the conditional distributions of the response variable for the separate values of the explanatory variable Even when direct causation is present, it is rarely a complete explanation of an association between two variables Q1

Depths (m) Light intensity 5 168.00 6 120.42 7 86.31 8

61.87 Some college students collected data on the intensity of light at various depths in a lake. 9 10 Here are their data: 11 a) Make a scatterplot suitable for predicting light intensity from depth. Describe the form of the relationship. b) To verify that the decrease in light intensity follows an exponential model, calculate the ratio of light intensity at consecutive depths. Start with 120.42/168.00=.0717. what do you conclude? c) Take the natural logarithm(ln) of the light intensity measurements and plot these values against the corresponding depth. Does this transformation achieve linearity? d) Calculate the least-square regression equation for the transformed data. Interpret the slope and y intercept of this equation in this setting. e) Construct and interpret a residual plot. f) Perform the inverse transformation to express light intensity as an exponential function of depth in the lake. Display scatter plot of the original data with the exponential model superimposed. Is your exponential function a satisfactory model for the data? g) Use your model to predict the light intensity at a depth of 22 meters. The actual light intensity reading at the depth was .58 lumens. Does this surprise you?

44.34 31.78 22.78 Answer Q1 A) the relationship is strong, negative, and curved. B) the ratios are all 0.717, so an exponential model is appropriate. C) it achieves linearity. D) if x= depth and y=ln(light intensity), then =6.7891-0.3330x. T5hye i8ntercept, 6.7891, provides an estimate for the average value of the natural log of the light intensity decreases on average by 0.3330 for each one meter increase in depth. E) the residual plot shows a fairly random scatter and relatively small residuals, so the linear model is appropriate.

F) if x=depth and y=light intensity, y=(e^6.789)(e^-.333x). It is a satisfactory model. G) at 22m, the predicted light intensity would be .584 lumens. No, not surprised. Q2 Some high school physics students dropped a ball and measured its height at various points along its descent. Table 4.3 shows the time since release and the distance the ball had fallen a) Make a scatterplot suitable for predicting distance fallen from time since release. describe the direction, form, and strength of the relationship. b) Perform an appropriate transformation to achieve linearity . Then find a least-square regression model for the transformed data. c) Comment on the quality of your model in (b) by referring to a residual plot and . d) Make a scatter plot of the point (time, ) to see if this

transformation works. Then find a least-square regression model for the transformed data. e) Comment on the quality of your model in (d) by referring to a residual plot and f) Use the two models you obtained in (b) and (d) to predict the distance that the object had fallen after 0.47 seconds. Which prediction do you think is closer to the actual value? Why? time distance .16 12.1 .24 29.8

.25 32.7 .3 42.8 .3 44.2 .32 55.8 .36 63.5 .36

65.1 .5 124.6 .5 129.7 .57 150.2 .61 182.2 .61

1189.4 .68 220.4 .72 254.0 .72 261.0 .83 334.6 .88 375.5

.89 399.1 Answer Q2 (a) relationship is curved, strong, and positive. (b) if x = time and y = distance, predicted y = 0.99 + 490.416x^2 (c) r^2 = 0.9984 and the residual plot shows random scatter and fairly small-sized residuals, so this looks like an appropriate model (d) yes. Square-root of the predicted y = 0.1046 + 22.0428x (e) r^2 = 0.9986 and the residual plot show no pattern, which suggest a good model (f) using model from (b): 109.32 cm. using model from (d): 109.51cm Q3 Here are data from eight schools on smoking among students and among their parents.

a) How many students are described in the two-way table ? b)What percent of these students smoke? c) Give the marginal distribution of parents smoking behavior, both in counts and in percents. d)Calculate three conditional distributions of students smoking behavior: one for each of the three parental smoking categories. Describe the relationship between the smoking behaviors of students and their parents in a few sentences. Neither parent smoke One parent smoke Both parents smoke Students does not smoke 1168 1823

1380 Student smoke 188 416 400 Answer Q3 A) 5375 students B) 18.7% C) both parents smoke: 1780, 33.1%. One parent smokes: 2239, 41.7%. Neither parents smoke: 1356, 25.2%. D) student smokes, given both parents smoke: 400/(400+1380)=.2247.

student doesnt smoke, given both parents smoke: 1380/(400+1380)=.7753. student smoke, given one parent smokes: 416/(416+1823)=.1858. student doesnt smoke, given one parent smokes: 1823/(416+1823)=.8142. student smokes, given neither parent smokes : 188?(188+1168)=.1386. student doesnt smoke, given that neither parent smokes: 1168/(188+1168)=.8614. students who smoke are most likely to come from families where one or more of their parents smoke. Q4 Whether a convicted murder gets the death penalty seems to be influenced by the race of the victim. Here are data on 326 cases in which the defendants was convicted of murder a) Use these data to make a two-way table of defendants race vs. death penalty b)Show that Simpsons paradox holds: a higher percent of white defendants are sentenced to death overall, but for the black and white victims a higher percent of black defendants are sentenced to death. c) Use the data to explain why the paradox hold in language that a judge could understand White defendant

Black defendant White victim Black victim White victim Black victim Death 19 0 11 6 Not

132 9 52 97 Answer Q4 A) white defendant: 19 yes, 141 no. Black defendant: 17 yes, 149 no. B) overall death penalty: 11.9% of white defendants, 10.2% of Black defendants. For white victims, 12.6% and 17.5%; for black victims, 0% and 5.8%. C) the death penalty is more likely when the victim was white(14%) rather than lack (5.4%). Because most convicted killers are of the same race as their victims, whites are more often sentenced to death. Q5 A study showed that woman who work in the production of

computer chips have abnormally high numbers of miscarriages. The union claimed that exposure to chemical used in production causes the miscarriage. Another possible explanation is that these workers spend most of their time standing up. Can we conclude that exposure to chemicals causes more miscarriages? Why or why not? Answer Q5 No. The number of hours standing up at work is a confounding variable. Q6 A study finds that high school students who take the SAT, enroll in an SAT coaching courses, and then take the SAT a second time raise their SAT mathematics scores from a mean of 521 to a mean of 561. what factors other taking the course might explain this improvement? Answer Q6 The variable knowledge gained as a result of taking the SAT previously is a confounding

variable.