Multilevel regression #

This post is about what people usually mean when they say “heirarchical regression”. The term “heirarchical regression” is thrown around a lot, and it’s one of those confusing terms in statistics that can have different meanings to different people.

They don’t usually mean what statisticians refer to as heirarchical regression

What people usually mean is heirarchical models, or what statisticians call multilevel models

This is the more interesting thing I think

  • Classic example: children within schools

  • Rephrasing of (very well-phrased below but need a slightly more mathematical explanation and fewer words). The classic example is data from children nested within schools. The dependent variable could be something like math scores, and the predictors a whole host of things measured about the child and the school. Child-level predictors could be things like GPA, grade, and gender. School-level predictors could be things like: total enrollment, private vs. public, mean SES. Because multiple children are measured from the same school, their measurements are not independent. Hierarchical modeling takes that into account.

  • Another example: states within regions (538-like)

    • \(y\) - trump vote
    • regions with regional features
    • states with state features
  • Fully pooled: using features from all levels on the individual observation “requires dramatically different ranges for the explanatory variables to produce reliable coefficients” (see here).

    • Countries do not show such differences in individual level features
    • Country level features do not vary strongly with outcome variable (cor -.25)
    • individual level
      • age
      • gender
      • income quintile
      • rural/small town/suburban/city
    • district level
    • number of seats elected in the district (PR)
    • election level
    • number of parties
    • number of seats

Copy about elections I don’t think I’ll use #

An example a little closer to my interests is to consider a vastly simplified version of the 538 model for American presidential elections. As opposed to more traditional “fundamentals” models, which take several country-level features like GDP growth, unemployment, social polarisation and the presence or absence of civil unrest to predict the national popular vote, a multilevel regression can take advantage of the similarities between states to “smooth over” low volume of polling in smaller states, making a poll-based model potentially more accurate, particular in the United States where lots of high-quality polls are conducted regularly.

Formal setup #

If we drastically simplify the task of modelling an election, let’s suppose we want to model the percentage \(y\) of registered American voters who will vote for Donald Trump in the upcoming 2020 election in each of the fifty states. We have a small list of important features for each state:

  • average household income (\(\iota\))
  • % of white people (\(\omega\))
  • % of Black people (\(\lambda\))
  • % of Hispanic people (\(\eta\))
  • % of religious people (\(\rho\))
  • % of registered voters who voted for Donald Trump in 2016 (\(\tau\))

Then we can run a simple linear regression model to obtain coefficients \[\beta_0,\beta_1,\beta_2,\beta_3,\beta_4,\beta_5,\beta_6\] such that for each state,

\[ y = \beta_0 + \beta_1\iota + \beta_2\omega + \beta_3\lambda + \beta_4\eta + \beta_5\rho + \beta_6\tau + \epsilon \]

where \(\epsilon\) represents some normally distributed error (of course, there are a number of assumptions we are making about the data here).

Problem with this example is

  • there isn’t any intuitive reason to expect the regions to have an impact like the physicians do in the example PDF
  • we aren’t using polls so are essentially back to a “fundamentals” model that I am deriding
  • need to think a bit more here, read a few more examples

There is a common grouping of American states into four regions: the Northeast, the Midwest, the South and the West.

Move more towards MRP (maybe two posts) #

References #

  • Blog post on GLMs

  • Page on hierarchical linear regression from Virginia U

  • Disambiguation of the two common uses

  • Wikipedia

  • 538’s model

  • Chapter from a book on multilevel models about their use in election forecasting

  • Example using elections from Andrew Gelman’s book

  • Nice interactive introduction to multilevel models

  • From Statistical Rethinking:

    Statistical models don’t contain turtles, but they do contain parameters. And parameters support inference. Upon what do parameters themselves stand? Sometimes, in some of the most powerful models, it’s parameters all the way down. What this means is that any particular parameter can be usefully regarded as a placeholder for a missing model. Given some model of how the parameter gets its value, it is simple enough to embed the new model inside the old one. This results in a model with multiple levels of uncertainty, each feeding into the next—a multilevel model.

    McElreath suggests here that “multilevel regres- sion deserves to be the default form of regression”.