10 Multiple Linear Regression

(PSY206) Data Management and Analysis

Author

Md Rasel Biswas

Learning Objectives

After studying this lecture, students will be able to:

  • Explain the purpose of multiple regression
  • Distinguish between predictor and criterion variables
  • Interpret R, R², Adjusted R², coefficients, and significance tests
  • Perform multiple regression in SPSS
  • Check assumptions (normality, multicollinearity, homoscedasticity)
  • Report results in an academic format

10.1 What is Multiple Regression?

Multiple regression is a statistical technique used to predict one continuous dependent variable from two or more independent variables.

  • Dependent variable → Criterion (\(Y\))
  • Independent variables → Predictors (\(X_1, X_2, \ldots, X_k\))

It extends simple regression by allowing several predictors simultaneously, which improves prediction accuracy because real-world outcomes usually depend on multiple factors.

Why Not Run Many Simple Regressions?
Because predictors may be correlated with each other. Multiple regression controls for interrelationships among predictors, giving the unique effect of each variable.

The Regression Equation: \[ Y' = a + b_1X_1 + b_2X_2 + \cdots + b_kX_k \] Where:

  • \((Y')\) = predicted value
  • \((a)\) = intercept
  • \((b)\) = regression coefficient (slope)

Each coefficient represents the expected change in Y when that predictor increases by 1 unit, holding other variables constant.

10.2 Example Scenario

Research Question: What factors predict students’ exam performance?

Variables

  • Criterion (DV): Exam score
  • Predictors (IVs):
    • Study hours
    • Class attendance
    • Anxiety level
    • Previous GPA

We expect:

  • Positive effects → study hours, GPA
  • Negative effect → anxiety

10.3 Assumptions of Multiple Regression

Students must ALWAYS check these before interpreting results.

  1. Linearity Relationship between predictors and DV should be linear

  2. Normality of residuals

  3. Homoscedasticity Equal spread of residuals

  4. No multicollinearity Predictors should not be highly correlated

  5. Adequate sample size Rough rule: at least 10–15 cases per predictor

10.4 Key Statistics to Understand

  • R (Multiple Correlation): Correlation between observed and predicted values
  • R² (Coefficient of Determination): Proportion of variance explained by predictors - Example: R² = 0.60 → predictors explain 60% of variability
  • Adjusted R²: Corrects R² for sample size and number of predictors Most reliable measure of model fit
  • Regression Coefficients:
    • Unstandardized B: Change in DV for 1-unit change in predictor
    • Standardized Beta (β): Effect size in standard deviation units. Allows comparison of predictors’ importance
  • Significance Tests:
    • F-test → overall model significance
    • t-test → significance of individual predictors

10.5 SPSS Practical Guide

Data Setup:
Each row → participant
Each column → variable

SPSS Steps

  1. Click Analyze → Regression → Linear
  2. Move dependent variable to Dependent box
  3. Move predictors to Independent(s) box
  4. Method → Enter
  5. Click Statistics
    • Estimates
    • Model fit
    • Collinearity diagnostics
  6. Click Plots
    • ZPRED → X
    • ZRESID → Y
    • Normal probability plot
  7. Click OK

These steps correspond to the procedure illustrated in the chapter’s SPSS walkthrough (see the regression dialog illustrations around pages 306–308).

Download the student’s performance data

10.6 Interpreting SPSS Output

Descriptive Statistics
Begin by examining the means and standard deviations to understand the distribution and scale of each variable. This helps identify unusual values and provides context for interpreting regression coefficients.

Correlation Matrix
Inspect correlations among predictors to detect potential multicollinearity. Moderate correlations are acceptable, but very high correlations (for example, above .80) may indicate redundancy among predictors.

Model Summary Table
Focus on three key statistics:

  • R: the correlation between observed and predicted values
  • : the proportion of variance in the outcome explained by the predictors
  • Adjusted R²: the preferred estimate of model fit because it adjusts for the number of predictors

Example interpretation:
“The predictors explained 65% of the variance in exam scores (Adjusted R² = .65), indicating a strong model fit.”

ANOVA Table
The ANOVA table tests whether the overall regression model significantly predicts the outcome variable.

  • If p < .05, the model provides a better fit than a model with no predictors.
  • This indicates that the predictors, taken together, significantly explain variation in the dependent variable.

Coefficients Table
This table shows the contribution of each predictor while controlling for the others.

Interpret:

  • Unstandardized B → expected change in the outcome for a one-unit increase in the predictor
  • Standardized Beta (β) → relative importance of predictors
  • p-value → whether the predictor significantly contributes to the model

Example interpretation:
Study hours \((β = .45, p < .001)\) is the strongest predictor of exam scores, indicating that higher study time is associated with higher scores.
Anxiety \((β = −.30, p = .01)\) is a significant negative predictor, suggesting that higher anxiety is associated with lower performance.

Multicollinearity Diagnostics
Assess whether predictors are excessively correlated.

  • Tolerance values below .10 indicate concern
  • VIF values above 10 suggest problematic multicollinearity

If these thresholds are exceeded, consider removing or combining predictors.

Residual Diagnostics
Evaluate whether regression assumptions are satisfied.

  • Normal P–P plot: points should lie close to the diagonal line, indicating normally distributed residuals
  • Residual vs predicted scatterplot: a random rectangular spread suggests linearity and homoscedasticity

A roughly even spread of residuals across predicted values indicates that model assumptions are reasonably met.

10.7 Writing Results (APA-Style)

A multiple regression analysis was conducted to examine whether study hours, attendance, anxiety, and GPA predicted exam scores.

The overall model was statistically significant,
\(F(4, 95) = 28.40, p < .001,\)
and explained 58% of the variance in exam scores (Adjusted \(R^2 = .58\)), indicating a substantial level of predictive accuracy.

Study hours \((β = .42, p < .001)\) and GPA \((β = .35, p < .001)\) were significant positive predictors, suggesting that students who studied more and had higher prior academic performance tended to achieve higher exam scores. Anxiety was a significant negative predictor \((β = −.21, p = .02)\), indicating that higher anxiety levels were associated with lower exam performance. Attendance did not significantly predict exam scores \((p = .18)\) when other variables were controlled.