Regression Specification Error Test (RESET) has been used to detect general functional form misspecifications.
Consider \[ y = \beta_0 + \beta_1x_1 + ... + \beta_k x_k + u \]
RESET involves three steps: First, estimate the original model. Obtain the fitted values. Second, estimate a new model with polynomials of the fitted values. \[ y = \beta_0 + \beta_1x_1 + ... + \beta_k x_k + \delta_1\hat{y}^2 + \delta_2\hat{y}^3+u \] Third, test the joint significance of \(\delta_1\) and \(\delta_2\).
For this exercise, load the following packages.
library(wooldridge)
library(tidyverse)
library(car)
Use hprice1
data from the Wooldridge
package. Consider two models for housing prices. \[
price = \beta_0 + \beta_1 lotsize + \beta_2 sqrft + \beta_2 bdrms + u
\]
\[ lprice = \beta_0 + \beta_1 llotsize + \beta_2 lsqrft + \beta_2 bdrms + u \]
We want to compare whether the level-level model or the log-log model is preferred.
hprice <- hprice1
# Step 1
levelmod <- lm(data = hprice,price ~ 1 + lotsize + sqrft + bdrms)
hprice$predsq <- levelmod$fitted.values^2
hprice$predcb <- levelmod$fitted.values^3
# Step 2
levelmod_reset <- lm(data = hprice,
price ~ 1 + lotsize + sqrft + bdrms + predsq +predcb)
# Step 3
linearHypothesis(levelmod_reset,c("predsq=0", "predcb=0"))
## Linear hypothesis test
##
## Hypothesis:
## predsq = 0
## predcb = 0
##
## Model 1: restricted model
## Model 2: price ~ 1 + lotsize + sqrft + bdrms + predsq + predcb
##
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 84 300724
## 2 82 269984 2 30740 4.6682 0.01202 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Repeat the same exercise for the log model. For the log model, I write a simple function for RESET as follows. This function takes an lmobject
as input. An advantage of writing a function is that we donโt have to copy-paste all the codes. This point will be demonstrated below.
Now apply this function to the log model.
logmod <- lm(data = hprice,
lprice ~ 1 + llotsize + lsqrft + bdrms)
myresettest(logmod)
## Linear hypothesis test
##
## Hypothesis:
## xpredsq = 0
## xpredcb = 0
##
## Model 1: restricted model
## Model 2: y ~ x
##
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 84 2.8626
## 2 82 2.6940 2 0.16854 2.565 0.08308 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
We can also apply the function to the level model.
myresettest(levelmod)
## Linear hypothesis test
##
## Hypothesis:
## xpredsq = 0
## xpredcb = 0
##
## Model 1: restricted model
## Model 2: y ~ x
##
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 84 300724
## 2 82 269984 2 30740 4.6682 0.01202 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
For the level-level model, since p-value is statistically significant at 5% significance level, we reject the null hypothesis that \(\delta_1 =0, \delta_2 = 0\). Thus, there is evidence for functional form misspecification.
On the other hand, we do not reject the null hypothesis of log-log model at 5% level (although we would at 10% level). This means the log-log model is preferred on the basis of RESET.
However, a drawback of RESET is that it provides no real direction on how to proceed if the model is rejected. Rejecting the level model does not immediately suggest that the log-log model is the next step.