Econometrics Lecture 7 - Nonlinear Regression II: Interactions

Why Do We Need Interaction Terms?

기초 개념

Q: 왜 한 변수의 효과가 항상 일정하다고 가정하는 것이 비현실적일까요?

A: 현실 세계는 훨씬 복잡하기 때문입니다!

예를 들어 생각해보세요:
• 약의 효과: 같은 약도 나이에 따라 효과가 다릅니다
• 교육의 가치: 교육의 임금 효과는 성별에 따라 다를 수 있습니다
• 광고의 효과: TV 광고의 효과는 소득 수준에 따라 다릅니다
• 학급 규모: 작은 학급의 효과는 영어 학습자 비율에 따라 다를 수 있습니다

이것은 마치 "운동의 효과"를 측정하면서 나이나 건강 상태를 고려하지 않는 것과 같습니다. 20대에게 좋은 운동이 70대에게는 위험할 수 있죠!

The Core Idea of Interactions

기본 선형 회귀에서는:

$$Y_i = \beta_0 + \beta_1 X_{1i} + \beta_2 X_{2i} + u_i$$

이 모델에서 $X_1$의 효과($\beta_1$)는 $X_2$의 값과 무관하게 항상 일정합니다.

하지만 interaction term을 추가하면:

$$Y_i = \beta_0 + \beta_1 X_{1i} + \beta_2 X_{2i} + \beta_3 (X_{1i} \times X_{2i}) + u_i$$

이제 $X_1$의 효과는 $X_2$의 값에 따라 달라집니다:

$$\frac{\partial Y}{\partial X_1} = \beta_1 + \beta_3 X_2$$

직관적인 예시: 비와 우산의 상호작용

행복도를 예측하는 모델을 생각해봅시다:

비가 오면: 행복도 감소 (젖으니까)
우산이 있으면: 행복도 증가 (준비되어 있으니까)
하지만! 우산의 효과는 날씨에 따라 다릅니다:
- 맑은 날: 우산은 짐만 됩니다 (부정적 효과)
- 비 오는 날: 우산은 생명줄입니다 (매우 긍정적 효과)

이것이 바로 interaction effect입니다!

세 가지 종류의 Interaction Terms

1

Binary × Binary Interactions

두 개의 dummy variable의 상호작용

예: Gender × Marriage의 임금 효과

2

Binary × Continuous Interactions

Dummy variable과 continuous variable의 상호작용

예: High English Learners × Student-Teacher Ratio

3

Continuous × Continuous Interactions

두 개의 continuous variable의 상호작용

예: Education × Experience의 임금 효과

Interaction Effects: When One Size Doesn't Fit All

Binary × Binary Interactions

중급

Two Dummy Variables Interacting

Model specification:

$$Y_i = \beta_0 + \beta_1 D_{1i} + \beta_2 D_{2i} + \beta_3 (D_{1i} \times D_{2i}) + u_i$$

여기서 $D_1$과 $D_2$는 binary (0 또는 1) 변수입니다.

핵심 원칙: 4개의 가능한 그룹을 모두 생각해야 합니다!
$(D_1=0, D_2=0)$, $(D_1=0, D_2=1)$, $(D_1=1, D_2=0)$, $(D_1=1, D_2=1)$

EXAMPLE 1: Test Scores, Class Size, and English Learners

Let's define:

$HiSTR = 1$ if STR ≥ 20 (large class), 0 otherwise
$HiEL = 1$ if PctEL ≥ 10% (many English learners), 0 otherwise

Regression result:

$$\widehat{TestScore} = 664.1 - 18.2 \times HiEL - 1.9 \times HiSTR - 3.5 \times (HiSTR \times HiEL)$$ $$(1.4) \quad\quad (2.3) \quad\quad\quad\quad (1.9) \quad\quad\quad\quad\quad (3.1)$$

네 그룹의 평균 점수:

	Low STR (작은 학급)	High STR (큰 학급)	차이
Low EL	664.1 (Base)	664.1 - 1.9 = 662.2	-1.9
High EL	664.1 - 18.2 = 645.9	664.1 - 18.2 - 1.9 - 3.5 = 640.5	-5.4
차이	-18.2	-21.7

이 결과를 어떻게 해석해야 할까요?

학급 규모 감소의 효과가 영어 학습자 비율에 따라 다릅니다!

• Low EL schools: 학급 규모 감소 효과 = 1.9점
• High EL schools: 학급 규모 감소 효과 = 5.4점

즉, 영어 학습자가 많은 학교에서 작은 학급의 혜택이 더 큽니다!
이는 개별 관심이 필요한 학생이 많을 때 소규모 학급이 더 효과적이라는 직관과 일치합니다.

EXAMPLE 2: Gender and Marriage Wage Premium

Model:

$$ln(Wage) = \beta_0 + \beta_1 Female + \beta_2 Married + \beta_3 (Female \times Married) + ...$$

Estimation results:

lwage = 1.342 - 0.131*female + 0.272*married - 0.308*femxmar + ...
       (0.053) (0.065)        (0.065)         (0.082)

Wage differences by group:

Group	Expected log(wage)	Interpretation
Single Male (base)	1.342	Reference group
Single Female	1.342 - 0.131 = 1.211	13.1% lower than single males
Married Male	1.342 + 0.272 = 1.614	27.2% higher than single males
Married Female	1.342 - 0.131 + 0.272 - 0.308 = 1.175	16.7% lower than single males

중요한 발견:
• Marriage premium for males: +27.2%
• Marriage premium for females: +27.2% - 30.8% = -3.6%
• 결혼은 남성에게는 임금 프리미엄을, 여성에게는 패널티를 가져옵니다!

! Interpreting Coefficients: The General Rule

For model: $Y = \beta_0 + \beta_1 D_1 + \beta_2 D_2 + \beta_3 (D_1 \times D_2) + u$

Step 1: Write out all four cases:

$(D_1=0, D_2=0)$: $E(Y) = \beta_0$
$(D_1=1, D_2=0)$: $E(Y) = \beta_0 + \beta_1$
$(D_1=0, D_2=1)$: $E(Y) = \beta_0 + \beta_2$
$(D_1=1, D_2=1)$: $E(Y) = \beta_0 + \beta_1 + \beta_2 + \beta_3$

Step 2: Calculate differences:

Effect of $D_1$ when $D_2=0$: $\beta_1$
Effect of $D_1$ when $D_2=1$: $\beta_1 + \beta_3$
Difference in effects: $\beta_3$ (the interaction coefficient)

Binary × Continuous Interactions

중급

Different Slopes for Different Groups

Model specification:

$$Y_i = \beta_0 + \beta_1 D_i + \beta_2 X_i + \beta_3 (D_i \times X_i) + u_i$$

이 모델은 두 개의 회귀선을 만듭니다:

When $D=0$: $Y = \beta_0 + \beta_2 X$
When $D=1$: $Y = (\beta_0 + \beta_1) + (\beta_2 + \beta_3) X$

해석:
• $\beta_1$: intercept의 차이
• $\beta_3$: slope의 차이
• $\beta_2 + \beta_3$: $D=1$일 때의 slope

EXAMPLE: Test Scores and STR with HiEL Interaction

Model: $TestScore = \beta_0 + \beta_1 HiEL + \beta_2 STR + \beta_3 (STR \times HiEL) + u$

Results:

$$\widehat{TestScore} = 682.2 - 0.97 \times STR + 5.6 \times HiEL - 1.28 \times (STR \times HiEL)$$ $$(11.9) \quad (0.59) \quad\quad\quad (19.5) \quad\quad\quad\quad (0.97)$$

Two separate regression lines:

For Low EL schools (HiEL = 0):

$\widehat{TestScore} = 682.2 - 0.97 \times STR$

For High EL schools (HiEL = 1):

$\widehat{TestScore} = 682.2 + 5.6 + (-0.97 - 1.28) \times STR$

$= 687.8 - 2.25 \times STR$

이 결과는 무엇을 의미할까요?

학급 규모 감소의 효과가 영어 학습자 비율에 따라 다릅니다!

• Low EL schools: STR이 1 감소 → TestScore 0.97점 증가
• High EL schools: STR이 1 감소 → TestScore 2.25점 증가

영어 학습자가 많은 학교에서 작은 학급의 효과가 2배 이상 큽니다!
이는 개별 지도가 필요한 학생이 많을수록 소규모 학급이 더 효과적이라는 것을 시사합니다.

Hypothesis Testing:

Same slope? $H_0: \beta_3 = 0$
$t = -1.28/0.97 = -1.32$ → Fail to reject
Same intercept? $H_0: \beta_1 = 0$
$t = 5.6/19.5 = 0.29$ → Fail to reject
Same lines? $H_0: \beta_1 = \beta_3 = 0$
$F = 89.94$ (p-value < 0.001) → Reject!

Multicollinearity 문제!
개별 t-test는 유의하지 않지만 joint F-test는 유의합니다.
이는 STR과 STR×HiEL 간의 높은 상관관계 때문입니다.

Three Possible Patterns

Binary × Continuous interaction이 만들 수 있는 세 가지 패턴:

(a) Different intercepts, same slope

Model: $Y = \beta_0 + \beta_1 D + \beta_2 X$

평행한 두 직선

(b) Different intercepts, different slopes

Model: $Y = \beta_0 + \beta_1 D + \beta_2 X + \beta_3 (D \times X)$

완전히 다른 두 직선

(c) Same intercept, different slopes

Model: $Y = \beta_0 + \beta_2 X + \beta_3 (D \times X)$

같은 점에서 시작하는 두 직선

Binary × Continuous Interaction: Two Different Regression Lines

Continuous × Continuous Interactions

고급

When Effects Depend on Continuous Variables

Model specification:

$$Y_i = \beta_0 + \beta_1 X_{1i} + \beta_2 X_{2i} + \beta_3 (X_{1i} \times X_{2i}) + u_i$$

이제 marginal effects가 다른 변수의 값에 따라 변합니다:

Effect of $X_1$:

$\frac{\partial Y}{\partial X_1} = \beta_1 + \beta_3 X_2$

Effect of $X_2$:

$\frac{\partial Y}{\partial X_2} = \beta_2 + \beta_3 X_1$

EXAMPLE 1: Test Scores, STR, and PctEL

Model: $TestScore = \beta_0 + \beta_1 STR + \beta_2 PctEL + \beta_3 (STR \times PctEL) + u$

Results:

$$\widehat{TestScore} = 686.3 - 1.12 \times STR - 0.67 \times PctEL + 0.0012 \times (STR \times PctEL)$$ $$(11.8) \quad (0.59) \quad\quad\quad (0.37) \quad\quad\quad\quad (0.019)$$

Effect of class size at different PctEL levels:

$$\frac{\Delta \widehat{TestScore}}{\Delta STR} = -1.12 + 0.0012 \times PctEL$$

PctEL	Effect of 1 unit increase in STR	Interpretation
0%	-1.12	No English learners: moderate negative effect
10%	-1.12 + 0.0012(10) = -1.108	Slightly smaller negative effect
20%	-1.12 + 0.0012(20) = -1.096	Even smaller negative effect
93.3%	-1.12 + 0.0012(93.3) ≈ 0	No effect of class size!

놀라운 결과!
영어 학습자 비율이 매우 높으면 (93.3%), 학급 규모가 시험 점수에 영향을 미치지 않습니다.
하지만 이는 데이터 범위를 벗어난 extrapolation이므로 주의해야 합니다!

EXAMPLE 2: Wages, Education, and Experience (Tenure)

Model: $Wage = \beta_0 + \beta_1 Educ + \beta_2 Tenure + \beta_3 (Educ \times Tenure) + ...$

Results:

wage = 0.318 + 0.404*educ - 0.147*tenure + 0.0237*educxten + ...
      (0.881)  (0.069)      (0.083)       (0.0074)

Return to education at different tenure levels:

$$\frac{\partial Wage}{\partial Educ} = 0.404 + 0.0237 \times Tenure$$

이 결과를 어떻게 해석해야 할까요?

교육의 수익률이 근속연수에 따라 증가합니다!

예를 들어, 4년의 추가 교육이 임금에 미치는 효과:
• 신입사원 (Tenure = 1): $[0.404 + 0.0237(1)] \times 4 = \$1.72$
• 5년차 (Tenure = 5): $[0.404 + 0.0237(5)] \times 4 = \$2.10$

이는 교육과 경험이 complementary라는 것을 의미합니다:
• 고학력자일수록 경험을 통해 더 많이 배움
• 경험이 많을수록 교육을 더 잘 활용

Joint significance tests:

$H_0$: educ = educxten = 0
$F = 42.28$, p-value = 0.0000 → Reject
$H_0$: tenure = educxten = 0
$F = 14.22$, p-value = 0.0000 → Reject
$H_0$: educ = tenure = educxten = 0
$F = 35.57$, p-value = 0.0000 → Reject

! Interpreting Continuous × Continuous: After-Before Method

For model: $Y = \beta_0 + \beta_1 X_1 + \beta_2 X_2 + \beta_3 (X_1 \times X_2) + u$

Before: $Y = \beta_0 + \beta_1 X_1 + \beta_2 X_2 + \beta_3 (X_1 \times X_2)$

After: $Y + \Delta Y = \beta_0 + \beta_1 (X_1 + \Delta X_1) + \beta_2 X_2 + \beta_3 [(X_1 + \Delta X_1) \times X_2]$

Subtract to get:

$\Delta Y = \beta_1 \Delta X_1 + \beta_3 X_2 \Delta X_1$

$\frac{\Delta Y}{\Delta X_1} = \beta_1 + \beta_3 X_2$

Multicollinearity in Interaction Models

고급

왜 interaction term을 추가하면 multicollinearity가 생길까요?

생각해보세요: $X$와 $X \times Z$는 당연히 높은 상관관계를 가집니다!

예를 들어:
• STR과 STR×HiEL은 correlated
• Female과 Female×Married도 correlated
• Educ과 Educ×Tenure도 correlated

이것은 마치 "키"와 "키×체중"을 함께 넣는 것과 같습니다.
키가 큰 사람의 "키×체중"은 당연히 클 가능성이 높죠!

Multicollinearity의 증상과 해결책

증상:

개별 계수의 standard error가 크게 증가
개별 t-test는 insignificant하지만 joint F-test는 significant
계수 추정치가 불안정 (작은 데이터 변화에 민감)

예시: STR과 PctEL interaction

• STR coefficient의 t-stat = -1.90 (insignificant)

• STR×PctEL coefficient의 t-stat = 0.06 (insignificant)

• But joint F-test = 3.89 (p = 0.021) → significant!

해결책:

Centering: 변수를 평균에서 뺀 값 사용
$(X - \bar{X}) \times (Z - \bar{Z})$ instead of $X \times Z$
Joint tests 사용: 개별 t-test보다 F-test 선호
경제적 의미에 집중: 통계적 유의성보다 효과의 크기와 방향

Multicollinearity의 실제 예: Wage Regression

Female과 Female×Experience를 포함한 회귀분석:

wage = ... + 0.590*educ + 0.057*exper - 0.878*female - 0.056*femxexp
           (0.064)      (0.016)       (0.352)       (0.018)

이제 변수를 재정의해봅시다:

new = femxexp - female

wage = ... + 0.590*educ + 0.057*exper - 0.439*female - 0.056*new
           (0.064)      (0.016)       (0.050)       (0.018)

무엇이 바뀌었나요?
• Female의 standard error: 0.352 → 0.050 (크게 감소!)
• t-statistic 개선으로 유의성 증가
• 하지만 모델의 예측력은 동일 (같은 모델의 다른 표현)

Real-World Applications

응용

APPLICATION 1: Nonlinear Effects of Class Size

Research Questions:

Are there nonlinear effects of class size reduction?
Are there nonlinear interactions between PctEL and STR?

Full Model with Cubic STR and Interactions:

$$TestScore = \beta_0 + \beta_1 STR + \beta_2 STR^2 + \beta_3 STR^3$$ $$+ \beta_4 HiEL + \beta_5 (HiEL \times STR) + \beta_6 (HiEL \times STR^2) + \beta_7 (HiEL \times STR^3)$$ $$+ \text{control variables} + u$$

Key Findings:

Test	F-statistic	p-value	Conclusion
All STR variables = 0	5.91	0.001	STR matters!
$STR^2$, $STR^3$ = 0	5.81	0.003	Nonlinearity exists
All interaction terms = 0	5.81	0.003	Interactions matter

실용적 의미:
• Class size reduction의 효과는 선형이 아님
• 효과는 영어 학습자 비율에 따라 다름
• 정책 입안 시 one-size-fits-all 접근은 부적절

APPLICATION 2: Simultaneous Causality

Class size가 test scores에 영향을 미칠까요, 아니면 그 반대일까요?

둘 다일 수 있습니다! 이것이 simultaneous causality입니다.

정방향 인과관계:
• 작은 학급 → 더 나은 교육 → 높은 시험 점수

역방향 인과관계:
• 낮은 시험 점수 → 추가 자원 배정 → 작은 학급

이 문제를 해결하지 않으면 OLS 추정치는 biased됩니다!

Simultaneous Causality in Equations:

(a) Causal effect of X on Y: $Y_i = \beta_0 + \beta_1 X_i + u_i$

(b) Causal effect of Y on X: $X_i = \gamma_0 + \gamma_1 Y_i + v_i$

Problem: $corr(X_i, u_i) \neq 0$ because:

Large $u_i$ → Large $Y_i$
Large $Y_i$ → Large $X_i$ (if $\gamma_1 > 0$)
Therefore: $X_i$ and $u_i$ are correlated!

Solutions:

Randomized experiments: X를 무작위로 배정
Instrumental variables: Chapter 12에서 배울 예정
Panel data methods: Fixed effects로 일부 해결 가능

Model Selection Guidelines

어떤 specification을 선택해야 할까요?

경제 이론 (Economic theory):
• 이론이 특정 functional form을 시사하는가?
• Diminishing returns? Complementarity?
데이터 탐색 (Data exploration):
• Scatter plots으로 패턴 확인
• 다양한 specification 시도
통계적 검정 (Statistical tests):
• t-tests for individual terms
• F-tests for groups of terms
• Information criteria (AIC, BIC)
경제적 의미 (Economic significance):
• 효과의 크기가 실질적으로 중요한가?
• 결과가 합리적인가?
Robustness checks:
• 다양한 specification에서 일관된 결과?
• Out-of-sample prediction

Practice Problems for Exam 2

실전

Problem 1 (Exam 2 Style)

The following regression is estimated for baseball players:

$\ln(\widehat{salary}) = 10.34 - 0.198 \times black - 0.190 \times hispan + 0.0125 \times (black \times percblck)$
$\quad\quad\quad\quad\quad\quad (2.18) \quad (0.125) \quad\quad\quad (0.153) \quad\quad\quad\quad (0.0050)$
$\quad\quad\quad\quad\quad\quad + 0.0201 \times (hispan \times perchisp) + \text{other factors}$
$\quad\quad\quad\quad\quad\quad (0.0098)$

where percblck is the percentage of the city's population that is black, and perchisp is the percentage Hispanic.

(a) How do you interpret the coefficient on black?
(b) What is the salary difference between black and white players in a city with 10% black population?
(c) At what black population percentage do black and white players earn the same?
(d) Test whether Hispanic players earn differently from white players.

Solution

(a) Interpretation of black coefficient:

The coefficient -0.198 represents the log salary difference between black and white players in a city with 0% black population. Since the interaction term is included, this coefficient alone doesn't tell the full story.

(b) Salary difference at 10% black population:

$\ln(salary)_{black} - \ln(salary)_{white} = -0.198 + 0.0125(10) = -0.198 + 0.125 = -0.073$

Black players earn approximately 7.3% less than white players in such cities.

(c) Equal salary point:

Set the difference to zero: $-0.198 + 0.0125 \times percblck = 0$

$percblck = 0.198 / 0.0125 = 15.84\%$

At 15.84% black population, there's no racial wage gap.

(d) Testing Hispanic wage difference:

This requires a joint test of $H_0: \beta_{hispan} = \beta_{hispan \times perchisp} = 0$

Individual t-tests may not be reliable due to multicollinearity between hispan and hispan×perchisp.

Need F-test for joint significance.

Problem 2

Consider the wage equation:
$\ln(wage) = \beta_0 + \beta_1 educ + \beta_2 exper + \beta_3 female + \beta_4 (female \times educ) + u$

Estimation results:
$\ln(\widehat{wage}) = -2.27 + 0.626 \times educ + 0.026 \times exper - 0.060 \times female - 0.140 \times (female \times educ)$
$\quad\quad\quad\quad\quad\quad (0.93) \quad (0.071) \quad\quad\quad (0.010) \quad\quad\quad\quad (1.436) \quad\quad\quad\quad\quad (0.120)$

(a) Write separate wage equations for males and females.
(b) Calculate the return to education for males and females.
(c) Test whether education has the same effect for both genders at 5% level.
(d) For a female with 16 years of education, what's the wage penalty compared to a similar male?

Solution

(a) Separate equations:

Males (female = 0):

$\ln(wage) = -2.27 + 0.626 \times educ + 0.026 \times exper$

Females (female = 1):

$\ln(wage) = (-2.27 - 0.060) + (0.626 - 0.140) \times educ + 0.026 \times exper$

$= -2.33 + 0.486 \times educ + 0.026 \times exper$

(b) Returns to education:

Males: 62.6% per year of education
Females: 48.6% per year of education
Difference: 14.0 percentage points

(c) Test for equal education effects:

$H_0: \beta_4 = 0$ (female×educ coefficient = 0)

$t = -0.140 / 0.120 = -1.17$

$|t| = 1.17 < 1.96$ → Fail to reject at 5% level

We cannot conclude that education effects differ by gender.

(d) Wage penalty for female with 16 years education:

$\ln(wage)_{female} - \ln(wage)_{male} = -0.060 - 0.140(16) = -0.060 - 2.24 = -2.30$

This seems unreasonably large (230% lower), suggesting possible specification issues or the need to check the standard error for female (1.436 seems high).

Problem 3 (Advanced)

A researcher estimates the effect of class size on test scores, allowing for interactions with both English learners and income:

$TestScore = \beta_0 + \beta_1 STR + \beta_2 HiEL + \beta_3 LowInc$
$\quad\quad\quad\quad + \beta_4 (STR \times HiEL) + \beta_5 (STR \times LowInc)$
$\quad\quad\quad\quad + \beta_6 (HiEL \times LowInc) + \beta_7 (STR \times HiEL \times LowInc) + u$

where HiEL = 1 if PctEL ≥ 10%, and LowInc = 1 if average income < \$15,000.

(a) How many different regression lines does this model allow?
(b) Write the marginal effect of STR for each group.
(c) If $\beta_7 > 0$, what does this mean economically?
(d) How would you test whether the STR effect is the same for all groups?

Solution

(a) Number of regression lines:

With two binary variables, we have $2 \times 2 = 4$ groups:

(HiEL = 0, LowInc = 0): High income, few English learners
(HiEL = 1, LowInc = 0): High income, many English learners
(HiEL = 0, LowInc = 1): Low income, few English learners
(HiEL = 1, LowInc = 1): Low income, many English learners

(b) Marginal effects of STR:

Group	Marginal Effect of STR
HiEL = 0, LowInc = 0	$\beta_1$
HiEL = 1, LowInc = 0	$\beta_1 + \beta_4$
HiEL = 0, LowInc = 1	$\beta_1 + \beta_5$
HiEL = 1, LowInc = 1	$\beta_1 + \beta_4 + \beta_5 + \beta_7$

(c) Economic meaning of $\beta_7 > 0$:

The triple interaction $\beta_7 > 0$ means that the combined effect of having both many English learners AND low income creates an additional impact beyond what we'd expect from just adding the two separate interaction effects.

This suggests synergy between the two disadvantages: schools facing both challenges benefit even more from smaller classes than the sum of the individual effects would suggest.

(d) Testing equal STR effects:

$H_0: \beta_4 = \beta_5 = \beta_7 = 0$

This is a joint F-test with 3 restrictions. If we fail to reject, then STR has the same effect ($\beta_1$) for all groups.

Problem 4

A researcher estimates:
$ColGPA = \beta_0 + \beta_1 hsGPA + \beta_2 skipped + \beta_3 bfriend + \beta_4 (bfriend \times skipped) + u$

where bfriend = 1 if student has boyfriend/girlfriend, skipped = average classes skipped per week.

Individual t-tests show:
- bfriend: t = 1.2 (not significant)
- bfriend × skipped: t = -1.5 (not significant)

But the F-test for $H_0: \beta_3 = \beta_4 = 0$ gives F = 8.5 (p < 0.01).

(a) Explain this apparent contradiction.
(b) Should you conclude that having a boyfriend/girlfriend affects GPA?
(c) How would you interpret the interaction term if $\beta_4 < 0$?

Solution

(a) Explaining the contradiction:

This is a classic case of multicollinearity. The variables bfriend and bfriend×skipped are highly correlated because:

bfriend×skipped = 0 whenever bfriend = 0
bfriend×skipped = skipped whenever bfriend = 1

This correlation inflates standard errors, making individual t-tests less powerful. However, the joint F-test can still detect that together these variables explain significant variation in GPA.

(b) Conclusion about boyfriend/girlfriend effect:

Yes, based on the significant F-test, we should conclude that having a boyfriend/girlfriend affects GPA, but the effect depends on class attendance behavior. We cannot rely on individual t-tests due to multicollinearity.

(c) Interpretation of negative interaction ($\beta_4 < 0$):

If $\beta_4 < 0$, it means:

The negative effect of skipping class is worse for students with boyfriends/girlfriends
Or equivalently: Having a boyfriend/girlfriend is more harmful for students who skip classes frequently
This suggests that relationship distractions compound with poor attendance habits

Exam 2 핵심 체크리스트: Interactions

1

Interpreting Interactions

• Binary × Binary: 4개 그룹 비교

• Binary × Continuous: 2개의 다른 회귀선

• Continuous × Continuous: Conditional marginal effects

• Always use After-Before method!

2

Hypothesis Testing

• Individual t-tests may fail due to multicollinearity

• Joint F-tests are more reliable

• Test both individual and joint hypotheses

3

Common Mistakes to Avoid

• Forgetting to include main effects

• Misinterpreting coefficients in presence of interactions

• Relying only on t-tests when multicollinearity exists

• Not considering economic significance

4

Practical Tips

• Draw tables for binary interactions

• Calculate marginal effects at different values

• Check for multicollinearity

• Consider centering variables

Exam 2 최종 준비사항:
✓ Interaction term이 있을 때 marginal effect 계산 연습
✓ Binary × Binary는 표로 정리하는 습관
✓ Multicollinearity 상황에서 joint test의 중요성 이해
✓ 실제 데이터의 경제적 해석 능력
✓ After-Before method 완벽 숙지
✓ 계산 실수 방지를 위한 체계적 접근