Review of Heckit

We want to estimate this model \[y = \mathbf{x\beta}+u,\ E(u|\mathbf{x})=0\]

However, \(y\) is not observed for some observations.

Suppose the selection process follows this equation. \[s = \mathcal{1}[\mathbf{z\gamma} + v \ge 0],\] where \(s=1\) if \(y\) is observed, 0 otherwise.

Then we can show that

\[E(y|\mathbf{z},s=1) = \mathbf{x}\beta + \rho\lambda(\mathbf{z\gamma}),\] where \(\lambda(\mathbf{z\gamma})\) is the inverse Mills ratio.

This equation means we can estimate \(\beta\) using only the selected sample if we include \(\lambda(\mathbf{z\gamma})\) as an additional regressor. This means we can correct for sample selection in two steps:

  1. Using all observations, estimate a probit model of \(s_i\) on \(\mathbf{z}_i\) and obtain the estimates \(\mathbf{\hat{\gamma}}\). Compute the inverse Mills ratio \(\hat{\lambda}_i = \lambda(\mathbf{z_i\hat{\gamma}}) = \phi(\mathbf{z_i\hat{\gamma}})/\Phi(\mathbf{z_i\hat{\gamma}})\) for each \(i\).

  2. Using the selected sample, i.e. the observations for which \(s_i = 1\), run the regression of \(y_i\) on \(\mathbf{x}_i\) and \(\hat{\lambda}_i\). Obtain \(\mathbf{\hat{\beta}}\).


Heckit by Hand

As usual, load wooldridge and tidyverse to the session. If we want to do Heckit from scratch, we do not need any extra packages.

library(wooldridge)
library(tidyverse)

Computer Exercise 17.7

  1. Use the mroz data for this exercise. Using the 428 women who were in the workforce, regress \(\log(wage)\) on \(educ\) using OLS. Include \(exper\), \(exper^2\), \(nwifeinc\), \(age\), \(kidslt6\), and \(kidsge6\) as explanatory variables. Report your estimate on educ and its standard error.

  2. Now, estimate the return to education by Heckit, where all exogenous variables show up in the second-stage regression. In other words, the regression is \(\log (wage)\) on \(educ\), \(exper\), \(exper^2\), \(nwifeinc\), \(age\), \(kidslt6\), \(kidsge6\) and \(\hat{\lambda}\). Compare the estimated return to education and its standard error to that from Question 1. (Hint: We can access the linear fit using [glmobject]$linear.predictors.

  3. Using only the 428 observations for working women, regress \(\hat{\lambda}\) on \(educ\), \(exper\), \(exper^2\), \(nwifeinc\), \(age\), \(kidslt6\), and \(kidsge6\). How big is the \(R^2\)? How does this help explain your findings from Question 2?

Question 1

q1 <- lm(data = mroz, 
         lwage ~ 1 + educ + 
           exper + expersq + nwifeinc + age + kidslt6 + kidsge6)
tab_model(q1, show.ci = FALSE, show.stat = TRUE, show.se = TRUE,
          show.p = FALSE, p.style = "stars")
  lwage
Predictors Estimates std. Error Statistic
(Intercept) -0.36 0.32 -1.12
educ 0.10 *** 0.02 6.62
exper 0.04 ** 0.01 3.04
expersq -0.00 0.00 -1.86
nwifeinc 0.01 0.00 1.72
age -0.00 0.01 -0.65
kidslt6 -0.06 0.09 -0.63
kidsge6 -0.02 0.03 -0.63
Observations 428
R2 / R2 adjusted 0.164 / 0.150
  • p<0.05   ** p<0.01   *** p<0.001

Question 2

# Step 1
mroz$s <- !is.na(mroz$lwage) %>% as.numeric()
step1 <- glm(data = mroz,
             s ~ educ + exper + expersq + nwifeinc + age + kidslt6 + kidsge6,
             family = binomial(link = "probit"))
mroz$imr <- dnorm(step1$linear.predictors) / pnorm(step1$linear.predictors)
# Step 2
step2 <- lm(data = mroz[mroz$s==1,],
            lwage ~ 1 + educ + 
           exper + expersq + nwifeinc + age + kidslt6 + kidsge6 + imr)
tab_model(step2, show.ci = FALSE, show.stat = TRUE, show.se = TRUE,
          show.p = FALSE, p.style = "stars")
  lwage
Predictors Estimates std. Error Statistic
(Intercept) -0.56 0.46 -1.22
educ 0.12 *** 0.03 3.48
exper 0.06 0.03 1.77
expersq -0.00 0.00 -1.65
nwifeinc 0.00 0.00 0.86
age -0.01 0.01 -0.82
kidslt6 -0.19 0.23 -0.81
kidsge6 -0.01 0.03 -0.42
imr 0.29 0.47 0.62
Observations 428
R2 / R2 adjusted 0.165 / 0.149
  • p<0.05   ** p<0.01   *** p<0.001

After sample selection correction, the return to education becomes larger but standard error is twice as large.


Question 3

q3 <- lm(data = mroz[mroz$s==1,],
         imr ~ 1 + educ + exper + expersq + nwifeinc + age + kidslt6+kidsge6)
summary(q3)$r.squared
## [1] 0.9620133

There is substantial multicollinearity in the second stage regression. Thus, large standard errors.


Automatic Heckit

Instead of doing Heckit by hand, we can also use the heckit() function from sampleSelection package.

library(sampleSelection)
mroz.heckit <- mroz %>% 
  heckit(selection = s ~ educ + exper + expersq + nwifeinc +
           age + kidslt6 + kidsge6,
         outcome = lwage ~ educ + exper + expersq + nwifeinc +
           age + kidslt6 + kidsge6)
summary(mroz.heckit)
## --------------------------------------------
## Tobit 2 model (sample selection model)
## 2-step Heckman / heckit estimation
## 753 observations (325 censored and 428 observed)
## 19 free parameters (df = 735)
## Probit selection equation:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  0.270077   0.508593   0.531  0.59556    
## educ         0.130905   0.025254   5.183 2.82e-07 ***
## exper        0.123348   0.018716   6.590 8.37e-11 ***
## expersq     -0.001887   0.000600  -3.145  0.00173 ** 
## nwifeinc    -0.012024   0.004840  -2.484  0.01320 *  
## age         -0.052853   0.008477  -6.235 7.63e-10 ***
## kidslt6     -0.868328   0.118522  -7.326 6.24e-13 ***
## kidsge6      0.036005   0.043477   0.828  0.40786    
## Outcome equation:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -0.5602852  0.4587672  -1.221 0.222370    
## educ         0.1187171  0.0340507   3.486 0.000518 ***
## exper        0.0598358  0.0336730   1.777 0.075987 .  
## expersq     -0.0010523  0.0006381  -1.649 0.099566 .  
## nwifeinc     0.0038434  0.0044919   0.856 0.392492    
## age         -0.0111580  0.0134792  -0.828 0.408054    
## kidslt6     -0.1880451  0.2308275  -0.815 0.415533    
## kidsge6     -0.0122255  0.0296063  -0.413 0.679775    
## Multiple R-Squared:0.1649,   Adjusted R-Squared:0.1489
##    Error terms:
##               Estimate Std. Error t value Pr(>|t|)
## invMillsRatio   0.2885     0.4636   0.622    0.534
## sigma           0.6896         NA      NA       NA
## rho             0.4183         NA      NA       NA
## --------------------------------------------

The estimates are the same as the ones we obtained from doing heckit by hand.


Issues about Inverse Mill’s Ratio

There are several definitions of Inverse Mill’s ratio. For example, the ones listed on this forum. However, in the context of Heckit, we need to use the following definition:

\[\hat{\lambda}_i = \lambda(\mathbf{z_i\hat{\gamma}}) = \phi(\mathbf{z_i\hat{\gamma}})/\Phi(\mathbf{z_i\hat{\gamma}})\]

Proof: Suppose our original model is \(y = X\beta + u\) and selection process is \(s = \mathcal{1}[\mathbf{z\gamma} + v \ge 0]\). Then,

\[ \begin{aligned} E(y_i | s_i = 1, z_i) &= E[X_i \beta + u_i|s_i = 1, z_i]\\ &= E[X_i \beta + u_i| v_i \ge -\mathbf{z_i\gamma}] \\ &= X_i \beta + E[u_i| v_i \ge -\mathbf{z_i\gamma}] \end{aligned} \]

Let \(\epsilon_i\) be the residual from regressing \(u\) on \(v\). Then \(\epsilon_i = u_i - \rho v_i\) and \(E(\epsilon)= 0\) and \(Cov(v_i,\epsilon_i)=0\).

Therefore,

\[ \begin{aligned} E[u_i| v_i \ge -\mathbf{z_i\gamma}] &= E[\epsilon_i +\rho v_i| v_i \ge -\mathbf{z_i\gamma}] \\ &= E[\epsilon_i | v_i \ge -\mathbf{z_i\gamma}] + E[\rho v_i| v_i \ge -\mathbf{z_i\gamma}]\\ &= 0 + \rho E[v_i| v_i \ge -\mathbf{z_i\gamma}] \\ &= \rho E[v_i| v_i \ge -\mathbf{z_i\gamma}] \\ &=\rho\frac{\phi(\mathbf{-z_i\gamma})}{1-\Phi(\mathbf{-z_i\gamma})} \\ &=\rho\frac{\phi(\mathbf{z_i\gamma})}{\Phi(\mathbf{z_i\gamma})} \end{aligned} \]