Bayesian statistics

Bayesian statistics #

Something to formalise my understanding of a little more. Maybe write a few things.

Notes on Bayesian Data Analysis by Gelman et al #

I. Fundamentals of Bayesian inference #

1. Probability and inference
A really nice (re)introduction to probability theory - one of those introductions from a high level that seems to get across the essence of an idea, but that probably only works if you have a basic understanding of what they’re talking about, otherwise it’s impossible to follow. Probably got tripped up by these kinds of resources a bunch in the past.
- Exchangeability, related to i.i.d but slightly different, hadn’t heard of it before.
- Lots of interesting stuff here about the philosophy of statistics, Bayesian vs. frequentist definitions of uncertainty and probability. Nice questions that again I hadn’t really given too much thought to.
- Great quote explaining why combinations of simple to create complexity is better:
  
  Useful probability models often express the distribution of observables conditionally or hierarchically rather than through more complicated unconditional distributions. For example, suppose \(y\) is the height of a university student selected at random. The marginal distribution \(p(y)\) is (essentially) a mixture of two approximately normal distributions centered around 160 and 175 centimeters. A more useful description of the distribution of \(y\) would be based on the joint distribution of height and sex: \(p(\textrm{male}) \approx p(\textrm{female}) \approx \frac12\) , along with the conditional specifications that \(p(y\mid\textrm{female})\) and \(p(y\mid\textrm{male})\) are each approximately normal with means 160 and 175 cm, respectively. If the conditional variances are not too large, the marginal distribution of \(y\) is bimodal. In general, we prefer to model complexity with a hierarchical structure using additional variables rather than with complicated marginal distributions, even when the additional variables are unobserved or even unobservable; this theme underlies mixture models.
- Idea: As a nice blog post on introductory Bayesian statistics, reproduce the Bayes / Laplace billiard ball example using Monte Carlo methods to estimate the integrals. Maybe in Stan.

1. Single-parameter models
This is a general feature of Bayesian inference: the posterior distribution is centered at a point that represents a compromise between the prior information and the data, and the compromise is controlled to a greater extent by the data as the sample size increases.

Links and resources #

PyMC3 looks like the state-of-the-art in Python for computational Bayesian statistics.
A great and free book by Andrew Gelman and others that gives a really solid treatment to the whole subject.
An interesting paper on automatic variational inference, implemented in Stan.
This nice worked example of the beta-binomial conjugacy that is discussed in Chapter 2 of the above book in a very terse way that I struggled to follow.
Pyro seems like an interesting Python package for Bayesian stuff.