By Justin Esarey
There’s a well-known bit of code for estimating Liang and Zeger (1986) type cluster robust standard errors for GLM models in R (see also Rogers 1993), but it doesn’t work exactly right off-the-shelf for multinomial models estimated in the mlogit package. In support of another project, I’ve modified it slightly to work with mlogit models. I thought it might be nice to provide this code to anyone interested in using it:
# load in a function to create clustered standard errors for mlogit models
# initial code by Mahmood Ara: http://thetarzan.wordpress.com/2011/06/11/clustered-standard-errors-in-r/
# slightly modified for mlogit models by Justin Esarey on 3/3/2015
cl.mlogit
function
(fm, cluster){
# fm: a fitted mlogit model
# cluster: a data vector with the cluster
# identity of each observation in fm
require
(sandwich, quietly =
TRUE
)
require
(lmtest, quietly =
TRUE
)
M
length
(
unique
(cluster))
N
length
(cluster)
K
length
(
coefficients
(fm))
# edit 6/22/2015: change dfc
# dfc
dfc
uj
apply
(
estfun
(fm),2,
function
(x)
tapply
(x, cluster, sum));
vcovCL
sandwich
(fm, meat.=
crossprod
(uj)/N)
coeftest
(fm, vcovCL) }
And here’s a little example you can test it on:
require
(VGAMdata)
data
(vtinpat)
vtinpat$hos.num
as.numeric
(vtinpat$hospital)
vtinpat$age
as.numeric
(vtinpat$age.group)
vtinpat.mlogit
mlogit.data
(vtinpat, choice =
"admit"
, shape =
"wide"
)
vt.mod
mlogit
(admit ~ 0 | age + sex, data = vtinpat.mlogit)
summary
(vt.mod)
cl.mlogit
(vt.mod, vtinpat$hos.num)
You can verify that this yields the same results as mlogit with cluster in Stata by writing the data file out with library(foreign) and read.dta, then running an identical model.
The reason for writing this code is to eventually use it as a part of another project:
http://cran.r-project.org/web/packages/clusterSEs/index.html
Pairs cluster bootstrapping seems to work much better when you use cluster-robust standard errors for the replicates, and until now I couldn’t estimate them in R (had to do it in Stata). With this code, I should be able to adapt the pairs cluster bootstrap procedure for mlogit models in R.
[Update 6/22/2015]: Further checking indicated that Stata actually uses a slightly different degrees of freedom correction for mlogit (and MLE models in general) than was initially included in this code. The commented (original) code includes the “regression-like” correction while the uncommented (updated) code includes the “asymptotic-like” correction referenced on pp. 1865-1866 of the Stata 13 [R] manual; see also p. 54 here. The answers with the updated dfc code are closer to Stata’s answers for mlogit models.
References
Liang, Kung-Yee and Scott L. Zeger. 1986. “Longitudinal data analysis using generalized linear models.” Biometrika 73(1):13-22.
Rogers, William. 1993. “Regression standard errors in clustered samples.” Stata Technical Bulletin 13: 19-23.