Multilevel regression #
This post is about what people usually mean when they say “heirarchical regression”. The term “heirarchical regression” is thrown around a lot, and it’s one of those confusing terms in statistics that can have different meanings to different people.
They don’t usually mean what statisticians refer to as heirarchical regression
What people usually mean is heirarchical models, or what statisticians call multilevel models
This is the more interesting thing I think

Classic example: children within schools

Rephrasing of (very wellphrased below but need a slightly more mathematical explanation and fewer words). The classic example is data from children nested within schools. The dependent variable could be something like math scores, and the predictors a whole host of things measured about the child and the school. Childlevel predictors could be things like GPA, grade, and gender. Schoollevel predictors could be things like: total enrollment, private vs. public, mean SES. Because multiple children are measured from the same school, their measurements are not independent. Hierarchical modeling takes that into account.

Another example: states within regions (538like)
 \(y\)  trump vote
 regions with regional features
 states with state features

Fully pooled: using features from all levels on the individual observation “requires dramatically different ranges for the explanatory variables to produce reliable coefficients” (see here).
 Countries do not show such differences in individual level features
 Country level features do not vary strongly with outcome variable (cor .25)
 individual level
 age
 gender
 income quintile
 rural/small town/suburban/city
 district level
 number of seats elected in the district (PR)
 election level
 number of parties
 number of seats
Copy about elections I don’t think I’ll use #
An example a little closer to my interests is to consider a vastly simplified version of the 538 model for American presidential elections. As opposed to more traditional “fundamentals” models, which take several countrylevel features like GDP growth, unemployment, social polarisation and the presence or absence of civil unrest to predict the national popular vote, a multilevel regression can take advantage of the similarities between states to “smooth over” low volume of polling in smaller states, making a pollbased model potentially more accurate, particular in the United States where lots of highquality polls are conducted regularly.
Formal setup #
If we drastically simplify the task of modelling an election, let’s suppose we want to model the percentage \(y\) of registered American voters who will vote for Donald Trump in the upcoming 2020 election in each of the fifty states. We have a small list of important features for each state:
 average household income (\(\iota\))
 % of white people (\(\omega\))
 % of Black people (\(\lambda\))
 % of Hispanic people (\(\eta\))
 % of religious people (\(\rho\))
 % of registered voters who voted for Donald Trump in 2016 (\(\tau\))
Then we can run a simple linear regression model to obtain coefficients \[\beta_0,\beta_1,\beta_2,\beta_3,\beta_4,\beta_5,\beta_6\] such that for each state,
\[ y = \beta_0 + \beta_1\iota + \beta_2\omega + \beta_3\lambda + \beta_4\eta + \beta_5\rho + \beta_6\tau + \epsilon \]
where \(\epsilon\) represents some normally distributed error (of course, there are a number of assumptions we are making about the data here).
Problem with this example is
 there isn’t any intuitive reason to expect the regions to have an impact like the physicians do in the example PDF
 we aren’t using polls so are essentially back to a “fundamentals” model that I am deriding
 need to think a bit more here, read a few more examples
There is a common grouping of American states into four regions: the Northeast, the Midwest, the South and the West.
Move more towards MRP (maybe two posts) #
References #

Blog post on GLMs

Page on hierarchical linear regression from Virginia U

Disambiguation of the two common uses

538’s model

Chapter from a book on multilevel models about their use in election forecasting

Example using elections from Andrew Gelman’s book

Nice interactive introduction to multilevel models

From Statistical Rethinking:
Statistical models don’t contain turtles, but they do contain parameters. And parameters support inference. Upon what do parameters themselves stand? Sometimes, in some of the most powerful models, it’s parameters all the way down. What this means is that any particular parameter can be usefully regarded as a placeholder for a missing model. Given some model of how the parameter gets its value, it is simple enough to embed the new model inside the old one. This results in a model with multiple levels of uncertainty, each feeding into the next—a multilevel model.
McElreath suggests here that “multilevel regres sion deserves to be the default form of regression”.