library(plm)
library(lfe)
library(dplyr)
Load data from the plm
package.
data(Produc, package="plm")
Produc <- pdata.frame(Produc, index = c('state', 'year'), stringsAsFactors = FALSE)
Run a unit fixed effects model, obtaining a coefficient of
0.145
.
plm(pcap~ gsp, data= Produc , model = 'within',
index = c('state', 'year')) %>% summary %>% coefficients()
## Estimate Std. Error t-value Pr(>|t|)
## gsp 0.1450668 0.006240198 23.24715 6.652442e-91
lfe
and lm
yield identical results.
# lfe
felm(pcap~ gsp|state|0|0, data = Produc) %>% summary %>% coefficients()
## Estimate Std. Error t value Pr(>|t|)
## gsp 0.1450668 0.006240198 23.24715 6.652442e-91
# lsdv
lm(pcap~ gsp + as.factor(state), data= Produc) %>% summary %>% coefficients() %>%
as.data.frame %>% slice(2)
## Estimate Std. Error t value Pr(>|t|)
## gsp 0.1450668 0.006240198 23.24715 6.652442e-91
However, if we sort the data differently, the coefficient from
plm
is completely different, at 0.386
, while
other methods behavior normally.
Produc <- Produc %>% arrange(year, state)
plm(pcap~ gsp, data= Produc , model = 'within',
index = c('state', 'year')) %>% summary %>% coefficients
## Estimate Std. Error t-value Pr(>|t|)
## gsp 0.3868028 0.003447077 112.2118 0
# lfe
felm(pcap~ gsp|state|0|0, data = Produc) %>% summary %>% coefficients()
## Estimate Std. Error t value Pr(>|t|)
## gsp 0.1450668 0.006240198 23.24715 6.652442e-91
If we rerun pdata.frame()
, things become normal
again.
class(Produc) <- "data.frame" # drop "pdata.frame" class label
Produc <- pdata.frame(Produc, index = c('state', 'year'), stringsAsFactors = FALSE)
plm(pcap~ gsp, data= Produc , model = 'within',
index = c('state', 'year')) %>% summary %>% coefficients
## Estimate Std. Error t-value Pr(>|t|)
## gsp 0.1450668 0.006240198 23.24715 6.652442e-91
The same issue appears with time fixed effects and twoway fixed
effects models in plm
.
plm(pcap~ gsp, data= Produc , model = 'within', effect = 'twoway',
index = c('state', 'year')) %>% summary %>% coefficients()
## Estimate Std. Error t-value Pr(>|t|)
## gsp 0.09222371 0.006612866 13.9461 1.770587e-39
# lfe
felm(pcap~ gsp|state + year|0|0, data = Produc) %>% summary %>% coefficients()
## Estimate Std. Error t value Pr(>|t|)
## gsp 0.09222371 0.006612866 13.9461 1.770587e-39
Produc <- Produc %>% arrange(year, state)
plm(pcap~ gsp, data= Produc, model = 'within', effect = 'twoway',
index = c('state', 'year') ) %>% summary %>% coefficients()
## Estimate Std. Error t-value Pr(>|t|)
## gsp 0.3870266 0.003463795 111.7348 0
# lfe
felm(pcap~ gsp|state + year|0|0, data = Produc) %>% summary %>% coefficients()
## Estimate Std. Error t value Pr(>|t|)
## gsp 0.09222371 0.006612866 13.9461 1.770587e-39
The above illustration does not necessarily indicate a bug in the
plm package. However, it could potentially result in
misleading outcomes if users unintentionally sort the data differently.
In my opinion, it would be ideal that plm()
can sort the
data internally to avoid such problems, or at the very least, notify
users that the data should be sorted appropriately (first unit, then
time).