Regression

AP Statistics guide to linear regression: least-squares regression, correlation, residuals, r-squared, inference for slope, and transformation of data.

# Regression — AP Statistics

Linear regression models the relationship between two quantitative variables. AP Statistics covers the least-squares regression line, correlation, residual analysis, and inference for the slope.

Key Concepts

Scatterplots

Describe: direction (positive/negative), form (linear/nonlinear), strength (weak/moderate/strong), unusual features (outliers).

Correlation (rr)

r=1n1(xixˉsx)(yiyˉsy)r = \frac{1}{n-1}\sum\left(\frac{x_i - \bar{x}}{s_x}\right)\left(\frac{y_i - \bar{y}}{s_y}\right)

  • 1r1-1 \leq r \leq 1.
  • Measures linear association only.
  • rr is unitless and unaffected by changes in units.

Least-Squares Regression Line (LSRL)

y^=a+bx\hat{y} = a + bx where b=rsysxb = r \cdot \frac{s_y}{s_x} and the line passes through (xˉ,yˉ)(\bar{x}, \bar{y}).

Coefficient of Determination (r2r^2)

r2r^2 = proportion of variability in yy explained by the linear relationship with xx.

Residuals

residual=yy^\text{residual} = y - \hat{y}

  • Residual plot should show random scatter (no pattern).
  • Patterns indicate the model is not appropriate.

Influential Points

  • Outlier: large residual.
  • High leverage: extreme xx-value.
  • Influential: removing it substantially changes the regression line.

Inference for Slope

t=b0SEb,df=n2t = \frac{b - 0}{SE_b}, \quad df = n - 2

CI for β\beta: b±tSEbb \pm t^* \cdot SE_b.

H0:β=0H_0: \beta = 0 (no linear relationship).

Conditions (LINE):

  1. Linear relationship (check residual plot).
  2. Independence of observations.
  3. Normal distribution of residuals.
  4. Equal variance (constant spread in residual plot).

Transformations

If the relationship is nonlinear, transform one or both variables (e.g., lny\ln y vs. xx for exponential, lny\ln y vs. lnx\ln x for power).

Worked Example

Problem: From computer output: b=2.5b = 2.5, SEb=0.8SE_b = 0.8, n=20n = 20. Test if the slope is significantly different from zero.

Solution:

H0:β=0H_0: \beta = 0, Ha:β0H_a: \beta \neq 0.

t=2.5/0.8=3.125t = 2.5/0.8 = 3.125, df=18df = 18.

p-value 0.006\approx 0.006. At α=0.05\alpha = 0.05, reject H0H_0. There is convincing evidence of a linear relationship.

Practice Questions

  1. 1. If r=0.8r = 0.8, what is r2r^2 and what does it mean?

    r2=0.64r^2 = 0.64. 64% of the variability in yy is explained by the linear relationship with xx.

    2. A residual plot shows a U-shaped pattern. What does this indicate?

    The linear model is not appropriate; the relationship may be curved (try a quadratic or transformed model).

    3. The LSRL is y^=3+2x\hat{y} = 3 + 2x. Predict yy when x=5x = 5, and find the residual if y=14y = 14.

    y^=3+2(5)=13\hat{y} = 3 + 2(5) = 13. Residual = 1413=114 - 13 = 1.

Want to check your answers and get step-by-step solutions?

Get it on Google PlayDownload on the App Store

Summary

  • LSRL: y^=a+bx\hat{y} = a + bx; b=r(sy/sx)b = r(s_y/s_x); passes through (xˉ,yˉ)(\bar{x}, \bar{y}).
  • rr measures linear association; r2r^2 tells the explained proportion.
  • Residual plots check model fit; patterns = bad fit.
  • Inference for slope: t=b/SEbt = b/SE_b; check LINE conditions.
  • Transform data for nonlinear relationships.

Ready to Ace Your AP STATISTICS statistics?

Get instant step-by-step solutions to any problem. Snap a photo and learn with Tutor AI — your personal exam prep companion.

Get it on Google PlayDownload on the App Store