We want to estimate this model \[y = \mathbf{x\beta}+u,\ E(u|\mathbf{x})=0\]
However, \(y\) is not observed for some observations.
Suppose the selection process follows this equation. \[s = \mathcal{1}[\mathbf{z\gamma} + v \ge 0],\] where \(s=1\) if \(y\) is observed, 0 otherwise.
Then we can show that
\[E(y|\mathbf{z},s=1) = \mathbf{x}\beta + \rho\lambda(\mathbf{z\gamma}),\] where \(\lambda(\mathbf{z\gamma})\) is the inverse Mills ratio.
This equation means we can estimate \(\beta\) using only the selected sample if we include \(\lambda(\mathbf{z\gamma})\) as an additional regressor. This means we can correct for sample selection in two steps:
Using all observations, estimate a probit model of \(s_i\) on \(\mathbf{z}_i\) and obtain the estimates \(\mathbf{\hat{\gamma}}\). Compute the inverse Mills ratio \(\hat{\lambda}_i = \lambda(\mathbf{z_i\hat{\gamma}}) = \phi(\mathbf{z_i\hat{\gamma}})/\Phi(\mathbf{z_i\hat{\gamma}})\) for each \(i\).
Using the selected sample, i.e. the observations for which \(s_i = 1\), run the regression of \(y_i\) on \(\mathbf{x}_i\) and \(\hat{\lambda}_i\). Obtain \(\mathbf{\hat{\beta}}\).
As usual, load wooldridge
and tidyverse
to the session. If we want to do Heckit from scratch, we do not need any extra packages.
library(wooldridge)
library(tidyverse)
Use the mroz
data for this exercise. Using the 428 women who were in the workforce, regress \(\log(wage)\) on \(educ\) using OLS. Include \(exper\), \(exper^2\), \(nwifeinc\), \(age\), \(kidslt6\), and \(kidsge6\) as explanatory variables. Report your estimate on educ and its standard error.
Now, estimate the return to education by Heckit, where all exogenous variables show up in the second-stage regression. In other words, the regression is \(\log (wage)\) on \(educ\), \(exper\), \(exper^2\), \(nwifeinc\), \(age\), \(kidslt6\), \(kidsge6\) and \(\hat{\lambda}\). Compare the estimated return to education and its standard error to that from Question 1. (Hint: We can access the linear fit using [glmobject]$linear.predictors
.
Using only the 428 observations for working women, regress \(\hat{\lambda}\) on \(educ\), \(exper\), \(exper^2\), \(nwifeinc\), \(age\), \(kidslt6\), and \(kidsge6\). How big is the \(R^2\)? How does this help explain your findings from Question 2?
q1 <- lm(data = mroz,
lwage ~ 1 + educ +
exper + expersq + nwifeinc + age + kidslt6 + kidsge6)
tab_model(q1, show.ci = FALSE, show.stat = TRUE, show.se = TRUE,
show.p = FALSE, p.style = "stars")
lwage | |||
---|---|---|---|
Predictors | Estimates | std. Error | Statistic |
(Intercept) | -0.36 | 0.32 | -1.12 |
educ | 0.10 *** | 0.02 | 6.62 |
exper | 0.04 ** | 0.01 | 3.04 |
expersq | -0.00 | 0.00 | -1.86 |
nwifeinc | 0.01 | 0.00 | 1.72 |
age | -0.00 | 0.01 | -0.65 |
kidslt6 | -0.06 | 0.09 | -0.63 |
kidsge6 | -0.02 | 0.03 | -0.63 |
Observations | 428 | ||
R2 / R2 adjusted | 0.164 / 0.150 | ||
|
# Step 1
mroz$s <- !is.na(mroz$lwage) %>% as.numeric()
step1 <- glm(data = mroz,
s ~ educ + exper + expersq + nwifeinc + age + kidslt6 + kidsge6,
family = binomial(link = "probit"))
mroz$imr <- dnorm(step1$linear.predictors) / pnorm(step1$linear.predictors)
# Step 2
step2 <- lm(data = mroz[mroz$s==1,],
lwage ~ 1 + educ +
exper + expersq + nwifeinc + age + kidslt6 + kidsge6 + imr)
tab_model(step2, show.ci = FALSE, show.stat = TRUE, show.se = TRUE,
show.p = FALSE, p.style = "stars")
lwage | |||
---|---|---|---|
Predictors | Estimates | std. Error | Statistic |
(Intercept) | -0.56 | 0.46 | -1.22 |
educ | 0.12 *** | 0.03 | 3.48 |
exper | 0.06 | 0.03 | 1.77 |
expersq | -0.00 | 0.00 | -1.65 |
nwifeinc | 0.00 | 0.00 | 0.86 |
age | -0.01 | 0.01 | -0.82 |
kidslt6 | -0.19 | 0.23 | -0.81 |
kidsge6 | -0.01 | 0.03 | -0.42 |
imr | 0.29 | 0.47 | 0.62 |
Observations | 428 | ||
R2 / R2 adjusted | 0.165 / 0.149 | ||
|
After sample selection correction, the return to education becomes larger but standard error is twice as large.
q3 <- lm(data = mroz[mroz$s==1,],
imr ~ 1 + educ + exper + expersq + nwifeinc + age + kidslt6+kidsge6)
summary(q3)$r.squared
## [1] 0.9620133
There is substantial multicollinearity in the second stage regression. Thus, large standard errors.
Instead of doing Heckit by hand, we can also use the heckit()
function from sampleSelection
package.
library(sampleSelection)
mroz.heckit <- mroz %>%
heckit(selection = s ~ educ + exper + expersq + nwifeinc +
age + kidslt6 + kidsge6,
outcome = lwage ~ educ + exper + expersq + nwifeinc +
age + kidslt6 + kidsge6)
summary(mroz.heckit)
## --------------------------------------------
## Tobit 2 model (sample selection model)
## 2-step Heckman / heckit estimation
## 753 observations (325 censored and 428 observed)
## 19 free parameters (df = 735)
## Probit selection equation:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.270077 0.508593 0.531 0.59556
## educ 0.130905 0.025254 5.183 2.82e-07 ***
## exper 0.123348 0.018716 6.590 8.37e-11 ***
## expersq -0.001887 0.000600 -3.145 0.00173 **
## nwifeinc -0.012024 0.004840 -2.484 0.01320 *
## age -0.052853 0.008477 -6.235 7.63e-10 ***
## kidslt6 -0.868328 0.118522 -7.326 6.24e-13 ***
## kidsge6 0.036005 0.043477 0.828 0.40786
## Outcome equation:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.5602852 0.4587672 -1.221 0.222370
## educ 0.1187171 0.0340507 3.486 0.000518 ***
## exper 0.0598358 0.0336730 1.777 0.075987 .
## expersq -0.0010523 0.0006381 -1.649 0.099566 .
## nwifeinc 0.0038434 0.0044919 0.856 0.392492
## age -0.0111580 0.0134792 -0.828 0.408054
## kidslt6 -0.1880451 0.2308275 -0.815 0.415533
## kidsge6 -0.0122255 0.0296063 -0.413 0.679775
## Multiple R-Squared:0.1649, Adjusted R-Squared:0.1489
## Error terms:
## Estimate Std. Error t value Pr(>|t|)
## invMillsRatio 0.2885 0.4636 0.622 0.534
## sigma 0.6896 NA NA NA
## rho 0.4183 NA NA NA
## --------------------------------------------
The estimates are the same as the ones we obtained from doing heckit by hand.
There are several definitions of Inverse Mill’s ratio. For example, the ones listed on this forum. However, in the context of Heckit, we need to use the following definition:
\[\hat{\lambda}_i = \lambda(\mathbf{z_i\hat{\gamma}}) = \phi(\mathbf{z_i\hat{\gamma}})/\Phi(\mathbf{z_i\hat{\gamma}})\]
Proof: Suppose our original model is \(y = X\beta + u\) and selection process is \(s = \mathcal{1}[\mathbf{z\gamma} + v \ge 0]\). Then,
\[ \begin{aligned} E(y_i | s_i = 1, z_i) &= E[X_i \beta + u_i|s_i = 1, z_i]\\ &= E[X_i \beta + u_i| v_i \ge -\mathbf{z_i\gamma}] \\ &= X_i \beta + E[u_i| v_i \ge -\mathbf{z_i\gamma}] \end{aligned} \]
Let \(\epsilon_i\) be the residual from regressing \(u\) on \(v\). Then \(\epsilon_i = u_i - \rho v_i\) and \(E(\epsilon)= 0\) and \(Cov(v_i,\epsilon_i)=0\).
Therefore,
\[ \begin{aligned} E[u_i| v_i \ge -\mathbf{z_i\gamma}] &= E[\epsilon_i +\rho v_i| v_i \ge -\mathbf{z_i\gamma}] \\ &= E[\epsilon_i | v_i \ge -\mathbf{z_i\gamma}] + E[\rho v_i| v_i \ge -\mathbf{z_i\gamma}]\\ &= 0 + \rho E[v_i| v_i \ge -\mathbf{z_i\gamma}] \\ &= \rho E[v_i| v_i \ge -\mathbf{z_i\gamma}] \\ &=\rho\frac{\phi(\mathbf{-z_i\gamma})}{1-\Phi(\mathbf{-z_i\gamma})} \\ &=\rho\frac{\phi(\mathbf{z_i\gamma})}{\Phi(\mathbf{z_i\gamma})} \end{aligned} \]