library(plm)
library(lfe)
library(dplyr)

Benchmark

Load data from the plm package.

data(Produc, package="plm")
Produc <- pdata.frame(Produc, index = c('state', 'year'), stringsAsFactors = FALSE)

Run a unit fixed effects model, obtaining a coefficient of 0.145.

plm(pcap~ gsp,  data= Produc , model = 'within', 
    index = c('state', 'year')) %>% summary %>% coefficients()
##      Estimate  Std. Error  t-value     Pr(>|t|)
## gsp 0.1450668 0.006240198 23.24715 6.652442e-91

lfe and lm yield identical results.

# lfe
felm(pcap~ gsp|state|0|0, data = Produc) %>% summary %>% coefficients()
##      Estimate  Std. Error  t value     Pr(>|t|)
## gsp 0.1450668 0.006240198 23.24715 6.652442e-91
# lsdv
lm(pcap~ gsp + as.factor(state),  data= Produc) %>% summary  %>% coefficients() %>% 
  as.data.frame %>% slice(2)
##      Estimate  Std. Error  t value     Pr(>|t|)
## gsp 0.1450668 0.006240198 23.24715 6.652442e-91

When the “problem” appears

However, if we sort the data differently, the coefficient from plm is completely different, at 0.386, while other methods behavior normally.

Produc <- Produc %>% arrange(year, state)
plm(pcap~ gsp,  data= Produc , model = 'within', 
    index = c('state', 'year')) %>% summary %>% coefficients
##      Estimate  Std. Error  t-value Pr(>|t|)
## gsp 0.3868028 0.003447077 112.2118        0
# lfe
felm(pcap~ gsp|state|0|0, data = Produc) %>% summary %>% coefficients()
##      Estimate  Std. Error  t value     Pr(>|t|)
## gsp 0.1450668 0.006240198 23.24715 6.652442e-91

Fix the “problem”

If we rerun pdata.frame(), things become normal again.

class(Produc) <- "data.frame" # drop "pdata.frame" class label
Produc <- pdata.frame(Produc, index = c('state', 'year'), stringsAsFactors = FALSE)
plm(pcap~ gsp,  data= Produc , model = 'within', 
    index = c('state', 'year')) %>% summary %>% coefficients
##      Estimate  Std. Error  t-value     Pr(>|t|)
## gsp 0.1450668 0.006240198 23.24715 6.652442e-91

TWFE

The same issue appears with time fixed effects and twoway fixed effects models in plm.

plm(pcap~ gsp,  data= Produc , model = 'within',   effect = 'twoway',
     index = c('state', 'year')) %>% summary %>% coefficients()
##       Estimate  Std. Error t-value     Pr(>|t|)
## gsp 0.09222371 0.006612866 13.9461 1.770587e-39
# lfe
felm(pcap~ gsp|state + year|0|0, data = Produc) %>% summary %>% coefficients()
##       Estimate  Std. Error t value     Pr(>|t|)
## gsp 0.09222371 0.006612866 13.9461 1.770587e-39

When the data is sorted differently

Produc <- Produc %>% arrange(year, state)
plm(pcap~ gsp,  data= Produc, model = 'within',   effect = 'twoway',
     index = c('state', 'year')  ) %>% summary %>% coefficients()
##      Estimate  Std. Error  t-value Pr(>|t|)
## gsp 0.3870266 0.003463795 111.7348        0
# lfe
felm(pcap~ gsp|state + year|0|0, data = Produc) %>% summary %>% coefficients()
##       Estimate  Std. Error t value     Pr(>|t|)
## gsp 0.09222371 0.006612866 13.9461 1.770587e-39

Summary

The above illustration does not necessarily indicate a bug in the plm package. However, it could potentially result in misleading outcomes if users unintentionally sort the data differently. In my opinion, it would be ideal that plm() can sort the data internally to avoid such problems, or at the very least, notify users that the data should be sorted appropriately (first unit, then time).