David Salazar
  • Posts
Categories
All (51)

Fat vs Thin: does LLN work?

Statistical estimation is based…
Apr 17, 2020
5 min

Statistical Rethinking: Week 1

Week 1 tries to go as deep as possible in the intuition and the mechanics of a very simple model. As always with McElreath, he goes on with both clarity and erudition.
Apr 19, 2020
10 min

Spurious PCA under Thick Tails

PCA is a dimensionality reduction technique. It seeks to project the data onto a lower dimensional hyperplane such that as much of the original data variance is preserved.…
Apr 27, 2020
4 min

Statistical Rethinking: Week 2

Week 2 has gotten us to start exploring linear regression from a bayesian perspective. I found it the most interesting to propagate uncertainty through the model.
Apr 28, 2020
5 min

Statistical Rethinking: Week 3

Week 3 gave the most interesting discussion of multiple regression. Why isn’t it enough with univariate regression? It allows us to disentagle two types of mistakes:
May 3, 2020
12 min

What does it mean to fatten the tails?

First, let’s define what we mean by fatter tails.
May 9, 2020
5 min

Statistical Rethinking: Week 4

This week was a…
May 11, 2020
7 min

Statistical Rethinking: Week 5 -> Interactions

As Richard…
May 12, 2020
11 min

Standard Deviation and Fat Tails

In this post, I’ll continue to explore with Monte-Carlo simulations the ideas in Nassim Taleb’s latest book: Statistical Consequences of Fat Tails.…
May 13, 2020
10 min

Statistical Rethinking Week 5 -> HMC samples

After a quick tour around interactions, this week was a quick introduction to MCMC samplers and how they are the engine that powers current Bayesian…
May 15, 2020
9 min

Understanding the tail exponent

Power Laws are ubiquitous to describe fat tails, a topic that I’ve been trying to wrap my…
May 19, 2020
10 min

Statistical Rethinking: Week 6

The week was a whirlwind tour of:
May 20, 2020
16 min

Correlation is not Correlation

To the usual phrase of correlation is not causation, Nassim Taleb often answers: correlation is not correlation. First, just like the mean and PCA, the sample correlation coefficient has persistent small sample effects when variables from Extremistan are involved. These topics are…
May 22, 2020
13 min

Statistical Rethinking: Week 7

This week paid off. All the hard work of understanding link functions, HMC flavored Monte-Carlo, and GLM allowed to study more complex models. To keep…
May 24, 2020
16 min

R-squared and fat tails

…
May 26, 2020
6 min

Simulating into understanding Multilevel Models

Pooling is the process and shrinkning is the result
May 28, 2020
10 min

Statistical Rethinking Week 8

This week was our first introduction to Multilevel models. Models where we…
May 29, 2020
11 min

Central Limit Theorem in Action

I have recently been exploring Nassim Taleb’s latest technical book: Statistical Consequences of Fat Tails. In it, we have seen how the Law of Large Numbers for different estimators simply does not work fast enough (in Extremistan) to be…
May 30, 2020
15 min

Understanding Pooling across Intercepts and Slopes

Statistical Rethinking is a fabulous course on Bayesian Statistics (and much more). By following simulations in the book, I recently tried to understand why pooling is the process and…
Jun 1, 2020
11 min

LLN for higher p Moments

I have recently been exploring Nassim Taleb’s latest technical book: Statistical Consequences of Fat Tails. In it, we have seen how the Law of Large Numbers for different estimators simply does not work fast enough (in Extremistan) to be…
Jun 2, 2020
5 min

Bayesian Instrumental Variable Regression

Statistical Rethinking is a fabulous course on Bayesian…
Jun 3, 2020
5 min

Statistical Rethinking: Week 9

Week 9 was all about fitting models with multivariate distributions in them. For example, a multivariate likelihood helps us use an instrumental variable to estimate the…
Jun 3, 2020
12 min

Statistical Rethinking Week 10

This is the final week of the best Statistics course out there. It showed the benefits of being ruthless with conditional…
Jun 9, 2020
6 min

Fisher Tippet Th: a “CLT” for the sample maxima

For fat-tailed random variables, the statistical properties are determined by a few observations in the tail. In Nassim Taleb’s words, “the tail wags the dog”. Therefore, it is vital to study the distribution of these few observations. A logical question to ask, then, is: is there a limiting…
Jun 10, 2020
8 min

How to not get fooled by the “Empirical Distribution”

With fat-tailed random variables, as Nassim Taleb says, the tail wags the dogs. That is, “the tails (the rare events) play a disproportionately…
Jun 11, 2020
9 min

When are GARCH (and friends) models warranted?

In this blogpost, I’ll answer the question, following Nassim Taleb’s latest technical book: Statistical Consequences of Fat Tails, when can we use GARCH (and firends) models? As an example, also following Taleb, I’ll check the resulting conditions with the…
Jun 14, 2020
5 min

Extreme Value Theory for Time Series

The Fisher-Tippet theorem (a type of CLT for the tail events) rests on the assumption that the observed…
Jun 17, 2020
12 min

Bayesian Data Analysis: Week 2

Bayesian Data Analysis (Gelman, Vehtari et. alter) is equals part a great introduction and THE reference for advanced Bayesian Statistics. Luckily, it’s …
Jun 22, 2020
4 min

Bayesian Data Analysis: Week 3 -> Fitting a Gaussian probability model

Bayesian Data Analysis (Gelman, Vehtari et. alter) is equals part a great introduction and THE reference for advanced Bayesian Statistics. Luckily, it’s …
Jun 24, 2020
2 min

Probability Calibration under fat-tails: useless

Probability calibration refers to…
Jun 24, 2020
2 min

Bayesian Data Analysis: Week 3-> Exercises

Bayesian Data Analysis (Gelman, Vehtari et. alter) is equals part a great introduction and THE reference for advanced Bayesian Statistics. Luckily, it’s …
Jun 25, 2020
7 min

Gini Index under Fat-Tails

I have recently been exploring Nassim Taleb’s latest technical book: Statistical Consequences of Fat Tails. In this blogpost, I’ll follow Taleb’s…
Jun 26, 2020
6 min

Bayesian Data Analysis: Week 4 -> Importance Sampling

Bayesian Data Analysis (Gelman, Vehtari et. alter) is equals part a great introduction and THE reference for advanced Bayesian Statistics. Luckily, it’s …
Jun 27, 2020
5 min

Bayesian Data Analysis: Week 5 -> Metropolis

Bayesian Data Analysis (Gelman, Vehtari et. alter) is equals part a great introduction and THE reference for advanced Bayesian Statistics. Luckily, it’s …
Jun 29, 2020
9 min

BDA Week 6: MCMC in High Dimensions, Hamiltonian Monte Carlo

In the…
Jul 2, 2020
12 min

Tail Risk of diseases in R

Pasquale Cirillo and Nassim Taleb published a short, interesting and important paper on the Tail Risk of contagious diseases. In short, the distribution of fatalities is strongly fat-tailed: thus rendering any forecast, whether is pointwise or a…
Jul 5, 2020
14 min

BDA week 7: LOO and its diagnostics

Once Stan’s implementation of HMC has run its magic, we finally have samples from the posterior distribution \(\pi (\theta | y))\). We can then run posterior predictive checks and hopefully our samples looks plausible under our posterior.…
Jul 8, 2020
4 min

BDA Week 8: Bayesian Decision Analysis

Many if not most statistical analyses are performed for the ultimate goal of decision making. Bayesian Statistics has the advantage of…
Jul 10, 2020
0 min

BDA Week 9: Large Sample Theory for the Posterior

As Richard McElreath says in his fantastic Statistics course, Frequentist statistics is more a framework to evaluate estimators than a framework for deriving them. Therefore, we can use frequentist tools to evaluate the…
Jul 13, 2020
3 min

Causality: Bayesian Networks and Probability Distributions

Stats people know that correlation coefficients do not imply causal effects. Yet, very often, partial correlation coefficients from regressions…
Jul 18, 2020
18 min

Causality: Invariance under Interventions

In the last post we saw how two causal models can yield the same testable implications and thus cannot be distinguished from data alone. That is, we cannot gain…
Jul 22, 2020
10 min

Causality: To adjust or not to adjust

In this blogpost, I’ll simulate data to show how conditioning on as many variables as possible is not a good idea. Sometimes, conditioning can lead to de-confound an effect; other times, however, conditioning on a variable can create unnecessary…
Jul 25, 2020
12 min

Causality: The front-door criterion

In a past blogpost, I’ve explore the backdoor criterion: a simple graphical algorithm, we can define which variables we must include in our analysis in order to cancel out all the information coming from different causal relationships than the one…
Jul 30, 2020
8 min

Causality: Testing Identifiability

We’ve defined causal effects as an interventional distribution and posit two identification strategies to estimate them: the back-door and the front-door criteria. However, we cannot always use these…
Jul 31, 2020
13 min

Causality: Counterfactuals - Clash of Worlds

We’ve seen how the language of causality require an exogenous intervention on the values of \(X\); so far…
Aug 10, 2020
7 min

Causality: Regret? Look at Effect of Treatment on the Treated

Regret about our actions stems from a counterfactual question: What if I had acted differently?. Therefore, to answer such question, we need a more elaborate language than the one we…
Aug 16, 2020
9 min

Causality: Probabilities of Causation

Questions of attribution are everywhere: i.e., did \(X=x\) cause \(Y=y\)? From…
Aug 20, 2020
6 min

Causality: Mediation Analysis

Kids are the prototypical question makers; they never stop asking questions. Just after you have answered a Why? question, they ask yet another Why? This is the problem of mediation…
Aug 26, 2020
12 min

Forecasting elections? Taleb says no

With US elections around the corner, news outlets are…
Sep 3, 2020
3 min

Introduction to Survival Analysis

Survival Analysis is fun, but not often taught in statistics courses. It’s just like regular statistics but with a few twists.
Nov 17, 2022
14 min

Notes from Simon Wood’s Generalized Additive Models

Simon Wood is a world expert on GAMs and creator of the fantastic mgcv package in R. His book on GAMs is a great introduction to the subject. I’ve been re-reading it and taking notes to be able to better teach this to my co-workers. This post is a summary of the…
Aug 12, 2023
8 min
No matching items