As Richard McElreath says in his fantastic Statistics course, Frequentist statistics is more a framework to evaluate estimators than a framework for deriving them. Therefore, we can use frequentist tools to evaluate the posterior. In particular, what happens to the posterior as more and more data arrive from the same sampling distribution?
In this blogpost, I’ll follow chapter 4 of Bayesian Data Analysis and the material in week 9 of Aki Vehtari’s course to study the Posterior Distribution under the framework of Large Sample Theory.
Asymptotic Normality and Consistency
Suppose then that the true data distribution is
Indeed, asymptotic normality, in its turn, guarantees that the limiting distribution of the posterior can be approximated with a Gaussian centered at the mode
Naturally, we can then approximate the posterior by finding the mode by finding the maximum of the posterior and then estimating the curvature at that point. As more and more data arrives, the approximation will be more and more precise.
Big Data and the Normal Approximation
If more and more data makes the normal approximation better and better, does that mean that the era of Big Data will usher a new era where the posterior will be easily and reliably approximated with a Gaussian?
Not quite: as we have more and more data, so we have more and more questions that we can possibly ask. We can then create more complex models with more parameters to try to answer these more complicated questions. In this scenario, as more and more data arrives, the posterior distribution will not converge to the Gaussian approximation in the expanding parameter space that reflects the increasing complexity of our model. The reason? The curse of dimensionality
The more dimensions we have, ceteris paribus, due to the concentration of measure, the mode and the area around the mode will be farther and farther way from the typical set. Thus, the Gaussian approximation that concentrates most of the mass around the mode will be a poor substitute of the typical set and thus a poor approximation for the posterior distribution.
Normal Approximation for the Marginals
Nevertheless, even if we cannot approximate the joint posterior distribution with the Gaussian approximation, the normal approximation is not that faulty if we instead focus on the marginal posterior. The reason? Determining the marginal distribution of a component of
This fact explains why we see so many approximately Gaussian marginals in practice and why can sometimes summarize them with a point estimate and a standard error.
Unbiasedness and Hierarchical Models
Frequentist methods place great emphasis on unbiasedness:
Indeed, if we have a familiy of parameters
Thus, this highlights the problem that it is often not possible to estimate several parameters at once in an even approximately unbiased manner: unbiased