# Neal D. Goldstein, PhD, MBI

### Multilevel and longitudinal modeling approaches

#### Mixed effects versus GEE for the non-statistician

While much has been written comparing these two approaches from a statistical sense, there's limited practical discussion of the differences between the two and what it means for causal inference in epidemiology (probably the best discussion I've seen is this paper). As I recently needed to explain the difference to non-statisticians, I used this as an opportunity for a blog post to compare and contrast the two modeling approaches in instances where there exists correlation among observations thereby violating the independence assumption in regression: i.e., multilevel or longitudinal data.

From a practical standpoint, both modeling procedures (mixed effects using maximum likelihood estimation of coefficients and generalized estimating equations estimation of coefficients, hereafter referred to as mixed effects and GEE) allow predictive models to be constructed where multiple observations in a dataset are correlated with each other, common in longitudinal and multilevel work. In longitudinal data, the clustering unit is usually an individual with repeated outcome measures over time, and in multilevel data the clustering unit is usually a contextual area comprised of individual units (like a neighborhood, physician office, hospital, country, etc.). Essentially, they differ in terms of the inference one wishes to make.

As a motivating example, consider number of schools in a neighborhood as the measured variable and flu (yes/no at the individual level) as the outcome. Mixed effect modeling allows both fixed (aka marginal) and random effects, while GEE modeling allows for fixed effects alone. A fixed effect is akin to a population effect: some measured variable is believed to have a single effect across the population. A random effect acknowledges that contextual factors may alter the relationship between this measured variable and its population effect: that is, the random component allows context-specific effects. From a technical point, a mixed effect model partitions the variance within and between neighborhoods, which allows the researcher to explain the area-level variance (between neighborhoods) by the predictors in the model - a useful feature as one of the goals of multilevel modeling is to identify sources of variance. The more variability you can explain, the better your model has fit the data (and hopefully the better your predictors are at teasing out a causal connection). In a GEE model, the variability is in effect treated as a nuisance factor that is adjusted for as a covariate, meaning the researcher cannot describe changes in variability. Using the random effects approaches allows the schools/flu relationship to have neighborhood specific effects, while a GEE approach infers this relationship to a population effect across all neighborhoods. Personally, I struggle to see why there would be an a priori belief that the schools/flu relationship is global, and cannot vary by neighborhood effects. So why does GEE exist, and what are the advantages? It has certain statistical properties that make it robust to model misspecification. For a good discussion, see Hubbard, et al. And this is a pretty important advantage. Yet GEE has some additional requirements, one of which sufficient clustering units ~50 individuals in repeated measures studies, or ~50 contextual units in multilevel studies) to produce reliable estimates. Further it's been shown that a mixed effects approach can be reliable in the face of assumption violations for sufficiently large samples (see Hox J, Multilevel Analysis: Techniques and Applications).

At the end of the day, I'll probably favor using the mixed effects approach, as I'm more familiar with it and its underlying assumptions. And I want to describe change in variance from introducing or removing parameters from the model, as advocated by Merlo et al. But the reader should make up his or her own mind, and for that I would recommend reviewing Hubbard et al., specifically the Table: Summary of Approaches for Mixed Models and GEE, this series of tutorials by Merlo et al., and an interesting discussion on ResearchGate.