Review of RESET

Regression Specification Error Test (RESET) has been used to detect general functional form misspecifications.

Consider \[ y = \beta_0 + \beta_1x_1 + ... + \beta_k x_k + u \]

RESET involves three steps: First, estimate the original model. Obtain the fitted values. Second, estimate a new model with polynomials of the fitted values. \[ y = \beta_0 + \beta_1x_1 + ... + \beta_k x_k + \delta_1\hat{y}^2 + \delta_2\hat{y}^3+u \] Third, test the joint significance of \(\delta_1\) and \(\delta_2\).

For this exercise, load the following packages.

library(wooldridge)
library(tidyverse)
library(car)

Example 9.2 Housing Price Equation

Use hprice1 data from the Wooldridge package. Consider two models for housing prices. \[ price = \beta_0 + \beta_1 lotsize + \beta_2 sqrft + \beta_2 bdrms + u \]

\[ lprice = \beta_0 + \beta_1 llotsize + \beta_2 lsqrft + \beta_2 bdrms + u \]

We want to compare whether the level-level model or the log-log model is preferred.

hprice <- hprice1

# Step 1
levelmod <- lm(data = hprice,price ~ 1 + lotsize + sqrft + bdrms)
hprice$predsq <- levelmod$fitted.values^2
hprice$predcb <- levelmod$fitted.values^3

# Step 2
levelmod_reset <- lm(data = hprice,
                     price ~ 1 + lotsize + sqrft + bdrms + predsq +predcb)

# Step 3
linearHypothesis(levelmod_reset,c("predsq=0", "predcb=0"))
## Linear hypothesis test
## 
## Hypothesis:
## predsq = 0
## predcb = 0
## 
## Model 1: restricted model
## Model 2: price ~ 1 + lotsize + sqrft + bdrms + predsq + predcb
## 
##   Res.Df    RSS Df Sum of Sq      F  Pr(>F)  
## 1     84 300724                              
## 2     82 269984  2     30740 4.6682 0.01202 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Repeat the same exercise for the log model. For the log model, I write a simple function for RESET as follows. This function takes an lmobject as input. An advantage of writing a function is that we donโ€™t have to copy-paste all the codes. This point will be demonstrated below.

Now apply this function to the log model.

logmod <- lm(data = hprice,
             lprice ~ 1 + llotsize + lsqrft + bdrms)
myresettest(logmod)
## Linear hypothesis test
## 
## Hypothesis:
## xpredsq = 0
## xpredcb = 0
## 
## Model 1: restricted model
## Model 2: y ~ x
## 
##   Res.Df    RSS Df Sum of Sq     F  Pr(>F)  
## 1     84 2.8626                             
## 2     82 2.6940  2   0.16854 2.565 0.08308 .
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

We can also apply the function to the level model.

myresettest(levelmod)
## Linear hypothesis test
## 
## Hypothesis:
## xpredsq = 0
## xpredcb = 0
## 
## Model 1: restricted model
## Model 2: y ~ x
## 
##   Res.Df    RSS Df Sum of Sq      F  Pr(>F)  
## 1     84 300724                              
## 2     82 269984  2     30740 4.6682 0.01202 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

For the level-level model, since p-value is statistically significant at 5% significance level, we reject the null hypothesis that \(\delta_1 =0, \delta_2 = 0\). Thus, there is evidence for functional form misspecification.

On the other hand, we do not reject the null hypothesis of log-log model at 5% level (although we would at 10% level). This means the log-log model is preferred on the basis of RESET.

However, a drawback of RESET is that it provides no real direction on how to proceed if the model is rejected. Rejecting the level model does not immediately suggest that the log-log model is the next step.