Dilettanting Data Science
/
Recent content on Dilettanting Data ScienceHugo -- gohugo.ioen-usThu, 03 Sep 2020 00:00:00 +0000Forecasting elections? Taleb says no
/2020/09/03/forecasting-elections-taleb-says-no/
Thu, 03 Sep 2020 00:00:00 +0000/2020/09/03/forecasting-elections-taleb-says-no/With US elections around the corner, news outlets are constantly deploying new models to try to predict who will win the next presidential elections. Ever since Trump’s ‘surprising’ win in ’16, these (pre-election) polls based models have come under scrutiny. Indeed, there is a huge amount of uncertainty that these models do not seem to capture well enough.
Taleb’s and Dhruv Madeka’s point is the following: whilst forecasting an uncertain election, one cannot invoke probabilistic thinking unless one imposes severe constraints on how the forecast will move up to election day.Causality: Mediation Analysis
/2020/08/26/causality-mediation-analysis/
Wed, 26 Aug 2020 00:00:00 +0000/2020/08/26/causality-mediation-analysis/library(tidyverse)
library(ggdag)
extrafont::loadfonts(device="win")
theme_set(theme_dag(base_family = "Roboto Condensed"))
Motivation
Kids are the prototypical question makers; they never stop asking questions. Just after you have answered a Why? question, they ask yet another Why? This is the problem of mediation analysis: if you answer that X causes Y, how does exactly the causal mechanism work? Is the causal effect direct or mediated through yet another variable M? Mediation analysis aims to disentangle the direct effect (which does not pass through the mediator) from the indirect effect (the part that passes through the mediator).Causality: Probabilities of Causation
/2020/08/20/causality-probabilities-of-causation/
Thu, 20 Aug 2020 00:00:00 +0000/2020/08/20/causality-probabilities-of-causation/Why read this?
Questions of attribution are everywhere: i.e., did \(X=x\) cause \(Y=y\)? From legal battles to personal decision making, we are obsessed by them. Can we give a rigorous answer to the problem of attribution?
One alternative to solve the problem of attribution is to reason in the following manner: if there is no possible alternative causal process, which does not involve \(X\),that can cause \(Y=y\), then \(X=x\) is necessary to produce the effect in question.Causality: Regret? Look at Effect of Treatment on the Treated
/2020/08/16/causality-effect-of-treatment-on-the-treated/
Sun, 16 Aug 2020 00:00:00 +0000/2020/08/16/causality-effect-of-treatment-on-the-treated/Why read this?
Regret about our actions stems from a counterfactual question: What if I had acted differently?. Therefore, to answer such question, we need a more elaborate language than the one we need to answer prediction or intervention questions. Why? Because we need to compare what happened with what would had happened if we had acted differently. We need to compute the Effect of Treatment on the Treated (ETT).Causality: Counterfactuals - Clash of Worlds
/2020/08/10/causality-counterfactuals-clash-of-worlds/
Mon, 10 Aug 2020 00:00:00 +0000/2020/08/10/causality-counterfactuals-clash-of-worlds/Motivation
We’ve seen how the language of causality require an exogenous intervention on the values of \(X\); so far we’ve studied interventions on all the population, represented by the expression \(do(X)\). Nevertheless, with this language, there are plenty of interventions that remain outside our realm: most notably, counterfactual expressions where the antecedent is in contradiction with the observed behavior: there’s a clash between the observed world and the hypothetical world of interest.Causality: Testing Identifiability
/2020/07/31/causality-testing-identifiability/
Fri, 31 Jul 2020 00:00:00 +0000/2020/07/31/causality-testing-identifiability/Motivation
We’ve defined causal effects as an interventional distribution and posit two identification strategies to estimate them: the back-door and the front-door criteria. However, we cannot always use these criteria; sometimes, we cannot measure the necessary variables to use either of them.
More generally, given a causal model and some incomplete set of measurements, when is the causal effect of interest identifiable?
In this blog post, we will develop a graphical criterion to answer this question by exploiting the concept of c-components.Causality: The front-door criterion
/2020/07/30/causality-the-front-door-criterion/
Thu, 30 Jul 2020 00:00:00 +0000/2020/07/30/causality-the-front-door-criterion/Motivation
In a past blogpost, I’ve explore the backdoor criterion: a simple graphical algorithm, we can define which variables we must include in our analysis in order to cancel out all the information coming from different causal relationships than the one we are interested. However, these variables are not always measured. What else can we do?
In this blogpost, I’ll explore the front-door criterion: i) an intuitive proof of why it works; (ii) how to estimate it; (iii) what are its fundamental assumptions; finally, (iv) an experiment with monte-carlo samples.Causality: To adjust or not to adjust
/2020/07/25/causality-to-adjust-or-not-to-adjust/
Sat, 25 Jul 2020 00:00:00 +0000/2020/07/25/causality-to-adjust-or-not-to-adjust/What is this blogpost about?
In this blogpost, I’ll simulate data to show how conditioning on as many variables as possible is not a good idea. Sometimes, conditioning can lead to de-confound an effect; other times, however, conditioning on a variable can create unnecessary confounding and bias the effect that we are trying to understand.
It all depends on our causal story: by applying the backdoor-criterion to our Causal Graph, we can derive an unambiguous answer to decide which variables should we use as controls in our statistical analysis.Causality: Invariance under Interventions
/2020/07/22/causality-invariance-under-interventions/
Wed, 22 Jul 2020 00:00:00 +0000/2020/07/22/causality-invariance-under-interventions/In the last post we saw how two causal models can yield the same testable implications and thus cannot be distinguished from data alone. That is, we cannot gain causal understanding from data alone. Does that mean that we cannot ever gain causal understanding? Far from it; it just means that we must have a causal model.
Thus, causal effects cannot be estimated from the data itself without a causal story.Causality: Bayesian Networks and Probability Distributions
/2020/07/18/causality-bayesian-networks/
Sat, 18 Jul 2020 00:00:00 +0000/2020/07/18/causality-bayesian-networks/Motivation
Stats people know that correlation coefficients do not imply causal effects. Yet, very often, partial correlation coefficients from regressions with an ever growing set of ‘control variables’ are unequivocally interpreted as a step in the right direction toward estimating a causal effect. This mistaken intuition was aptly named by Richard McElreath, in his fantastic Stats course, as Causal Salad: people toss a bunch of control variables and hope to get a casual effect out of it.BDA Week 9: Large Sample Theory for the Posterior
/2020/07/13/bda-week-9-large-sample-theory-for-the-posterior/
Mon, 13 Jul 2020 00:00:00 +0000/2020/07/13/bda-week-9-large-sample-theory-for-the-posterior/As Richard McElreath says in his fantastic Statistics course, Frequentist statistics is more a framework to evaluate estimators than a framework for deriving them. Therefore, we can use frequentist tools to evaluate the posterior. In particular, what happens to the posterior as more and more data arrive from the same sampling distribution?
In this blogpost, I’ll follow chapter 4 of Bayesian Data Analysis and the material in week 9 of Aki Vehtari’s course to study the Posterior Distribution under the framework of Large Sample Theory.BDA Week 8: Bayesian Decision Analysis
/2020/07/10/bda-week-8-bayesian-decision-analysis/
Fri, 10 Jul 2020 00:00:00 +0000/2020/07/10/bda-week-8-bayesian-decision-analysis/Many if not most statistical analyses are performed for the ultimate goal of decision making. Bayesian Statistics has the advantage of direct use of probability to quantify the uncertainty around unobserved quantities of interest: whether those are parameters or predictions, we end up with a posterior distribution.
Indeed, our posterior predictions interact uniquely for each possible action \(d\) we can take to create a unique distribution of our utility \(U(x)|d\).BDA week 7: LOO and its diagnostics
/2020/07/08/bda-week-7-loo-and-its-diagnostics/
Wed, 08 Jul 2020 00:00:00 +0000/2020/07/08/bda-week-7-loo-and-its-diagnostics/Once Stan’s implementation of HMC has run its magic, we finally have samples from the posterior distribution \(\pi (\theta | y))\). We can then run posterior predictive checks and hopefully our samples looks plausible under our posterior. Nevertheless, this is just an internal validation check: we expect more from our model. We expect it to hold under an external validation check: never seen observations, once predicted, should also look plausible under our posterior.Tail Risk of diseases in R
/2020/07/05/tail-risk-of-diseases-in-r/
Sun, 05 Jul 2020 00:00:00 +0000/2020/07/05/tail-risk-of-diseases-in-r/The Data
The plots
Max-to-Sum ratio
Histogram
The Zipf plot
Mean Excess Plot
Fitting the tail
Wait a moment: infinite casualties?
Extreme Value theory on the dual observations
Maximum Likelihood estimate
Bayesian model
Simulating fake data
Fitting the model
Posterior predictive checks
Convergence diagnostics
Stressing the data
Measurement error
Influential observations
Conclusion
Pasquale Cirillo and Nassim Taleb published a short, interesting and important paper on the Tail Risk of contagious diseases.BDA Week 6: MCMC in High Dimensions, Hamiltonian Monte Carlo
/2020/07/02/bda-week-6-mcmc-in-high-dimensions-hamiltonian-monte-carlo/
Thu, 02 Jul 2020 00:00:00 +0000/2020/07/02/bda-week-6-mcmc-in-high-dimensions-hamiltonian-monte-carlo/In the last couple of weeks, We’ve seen how the most difficult part of Bayesian Statistics is computing the posterior distribution. In particular, in the last week, we’ve studied the Metropolis Algorithm. In this blogpost, I’ll study why Metropolis does not scale well enough to high dimensions and give an intuitive explanation of our best alternative: Hamiltonian Monte Carlo (HMC).
This blogpost is my personal digestion of the excellent content that Michael Betancourt has put out there to explain HMC.Bayesian Data Analysis: Week 5 -> Metropolis
/2020/06/29/bayesian-data-analysis-week-5-metropolis/
Mon, 29 Jun 2020 00:00:00 +0000/2020/06/29/bayesian-data-analysis-week-5-metropolis/Bayesian Data Analysis (Gelman, Vehtari et. alter) is equals part a great introduction and THE reference for advanced Bayesian Statistics. Luckily, it’s freely available online. To make things even better for the online learner, Aki Vehtari (one of the authors) has a set of online lectures and homeworks that go through the basics of Bayesian Data Analysis.
So far in the course, we have seen how the main obstacle in the way of performing Bayesian Statistics is the computation of the posterior.Bayesian Data Analysis: Week 4 -> Importance Sampling
/2020/06/27/bayesian-data-analysis-week-4-importance-sampling/
Sat, 27 Jun 2020 00:00:00 +0000/2020/06/27/bayesian-data-analysis-week-4-importance-sampling/Bayesian Data Analysis (Gelman, Vehtari et. alter) is equals part a great introduction and THE reference for advanced Bayesian Statistics. Luckily, it’s freely available online. To make things even better for the online learner, Aki Vehtari (one of the authors) has a set of online lectures and homeworks that go through the basics of Bayesian Data Analysis.
In this blogpost, I’ll go over one of the main topics of Week 4: Importance Sampling; I’ll also solve a couple of the exercises for Chapter 10 of the book.Gini Index under Fat-Tails
/2020/06/26/gini-index-under-fat-tails/
Fri, 26 Jun 2020 00:00:00 +0000/2020/06/26/gini-index-under-fat-tails/I have recently been exploring Nassim Taleb’s latest technical book: Statistical Consequences of Fat Tails. In this blogpost, I’ll follow Taleb’s exposition of the Gini Index under fat-tails in Chapter 13 of his book.
Intuitively, if we use the “empirical distribution” to estimate the Gini Index, under fat-tails, we underestimate the tail of the distribution and thus underestimate the Gini index. This is yet another example of how we fool ourselves when we are using the “empirical” distribution.Bayesian Data Analysis: Week 3-> Exercises
/2020/06/25/bayesian-data-analysis-week-3-exercises/
Thu, 25 Jun 2020 00:00:00 +0000/2020/06/25/bayesian-data-analysis-week-3-exercises/Bayesian Data Analysis (Gelman, Vehtari et. alter) is equals part a great introduction and THE reference for advanced Bayesian Statistics. Luckily, it’s freely available online. To make things even better for the online learner, Aki Vehtari (one of the authors) has a set of online lectures and homeworks that go through the basics of Bayesian Data Analysis.
In this blogpost, I’ll go over a couple of the selected exercises for week 3: exercise number 2 and exercise number 3.Bayesian Data Analysis: Week 3 -> Fitting a Gaussian probability model
/2020/06/24/bayesian-data-analysis-week-3-fitting-a-gaussian-probability-model/
Wed, 24 Jun 2020 00:00:00 +0000/2020/06/24/bayesian-data-analysis-week-3-fitting-a-gaussian-probability-model/Bayesian Data Analysis (Gelman, Vehtari et. alter) is equals part a great introduction and THE reference for advanced Bayesian Statistics. Luckily, it’s freely available online. To make things even better for the online learner, Aki Vehtari (one of the authors) has a set of online lectures and homeworks that go through the basics of Bayesian Data Analysis.
Instead of going through the homeworks (due to the fear of ruining the fun for future students of Aki’s), I’ll go through some of the examples of the book as case studies.Probability Calibration under fat-tails: useless
/2020/06/24/probability-calibration-under-fat-tails-useless/
Wed, 24 Jun 2020 00:00:00 +0000/2020/06/24/probability-calibration-under-fat-tails-useless/Probability calibration refers to a manner of evaluating forecasts: the forecast frequency of an event should correspond to the correct frequency of the event happening in real life. Is this truly the mark of a good analysis? Under fat-tails, Nassim Taleb in his book answer with a categorical response NO!
Probability calibration in the real world
Probability calibration amounts, in the real world, to a binary payoff: a fixed sum is paid off if the event happens.Bayesian Data Analysis: Week 2
/2020/06/22/bayesian-data-analysis-week-2/
Mon, 22 Jun 2020 00:00:00 +0000/2020/06/22/bayesian-data-analysis-week-2/Bayesian Data Analysis (Gelman, Vehtari et. alter) is equals part a great introduction and THE reference for advanced Bayesian Statistics. Luckily, it’s freely available online. To make things even better for the online learner, Aki Vehtari (one of the authors) has a set of online lectures and homeworks that go through the basics of Bayesian Data Analysis.
In this series of blogposts, I’ll go over the homeworks that Aki has kindly made available online.Extreme Value Theory for Time Series
/2020/06/17/extreme-value-theory-for-time-series/
Wed, 17 Jun 2020 00:00:00 +0000/2020/06/17/extreme-value-theory-for-time-series/The Fisher-Tippet theorem (a type of CLT for the tail events) rests on the assumption that the observed values are independent and identically distributed. However, in any non trivial example, time series will reflect an underlying structure that will create dependence among the observations. Indeed, tail events tend to occur in clusters. Does this mean that we cannot use the Extreme Value Theory (EVT) to model the maxima of a time series?When are GARCH (and friends) models warranted?
/2020/06/14/when-are-garch-and-friends-models-warranted/
Sun, 14 Jun 2020 00:00:00 +0000/2020/06/14/when-are-garch-and-friends-models-warranted/In this blogpost, I’ll answer the question, following Nassim Taleb’s latest technical book: Statistical Consequences of Fat Tails, when can we use GARCH (and firends) models? As an example, also following Taleb, I’ll check the resulting conditions with the S&P500.
What are the obstacles?
The Generalized Autoregressive Conditional Heteroskedasticity (GARCH) family of models attempt to model a given time series by exploiting “volatility” clustering (i.e., for some periods volatility is consistently high, for other periods is consistently low).How to not get fooled by the "Empirical Distribution"
/2020/06/11/how-to-not-get-fooled-by-the-empirical-distribution/
Thu, 11 Jun 2020 00:00:00 +0000/2020/06/11/how-to-not-get-fooled-by-the-empirical-distribution/With fat-tailed random variables, as Nassim Taleb says, the tail wags the dogs. That is, “the tails (the rare events) play a disproportionately large role in determining the properties”. Following the presentation given by Taleb in his latest technical book: Statistical Consequences of Fat Tails, I’ll show:
Why using the empirical distribution for estimating the moments of a fat-tailed random variable is a terrible idea.
A less “unreliable” alternative to estimating the moments.Fisher Tippet Th: a "CLT" for the sample maxima
/2020/06/10/fisher-tippet-th-a-clt-for-the-sample-maxima/
Wed, 10 Jun 2020 00:00:00 +0000/2020/06/10/fisher-tippet-th-a-clt-for-the-sample-maxima/For fat-tailed random variables, the statistical properties are determined by a few observations in the tail. In Nassim Taleb’s words, “the tail wags the dog”. Therefore, it is vital to study the distribution of these few observations. A logical question to ask, then, is: is there a limiting distribution for the sample maxima as the number of samples grows? This is precisely what the Fisher Tippet Theorem states: the limiting distribution of (a normalized) sample maxima is the Generalized Extreme distribution (GED).Statistical Rethinking Week 10
/2020/06/09/statistical-rethinking-week-10/
Tue, 09 Jun 2020 00:00:00 +0000/2020/06/09/statistical-rethinking-week-10/This is the final week of the best Statistics course out there. It showed the benefits of being ruthless with conditional probabilities: replace everything you don’t know with a distribution conditioned on what you do know. Bayes will do the rest. This holds for both measurement error and missing data.
1st problem
Consider the relationship between brain volume (brain) and body mass (body) in the data(Primates301). These values are presented as single values for each species.Varying effects for continuous predictors -> GP regression
/2020/06/04/varying-effects-for-continuous-predictors/
Thu, 04 Jun 2020 00:00:00 +0000/2020/06/04/varying-effects-for-continuous-predictors/Statistical Rethinking is a fabulous course on Bayesian Statistics (and much more). Following its presentation, I’ll give a succinct intuitive introduction to Gaussian Process (GP) regression as a method to extend the varying effects strategy to continuous predictors. This method is incredibly useful when assuming a linear functional relationship between a continuous predictor and the outcome variable is not enough to capture the variation in the data.
First, I’ll begin by motivating the varying effects strategy.Bayesian Instrumental Variable Regression
/2020/06/03/bayesian-instrumental-variable-regression/
Wed, 03 Jun 2020 00:00:00 +0000/2020/06/03/bayesian-instrumental-variable-regression/Statistical Rethinking is a fabulous course on Bayesian Statistics (and much more). In what follows, I’ll give a succinct presentation of Instrumental Variable Regression in a Bayesian setting using simulated data.
I had already seen the traditional econometrics formulation and yet found Richard’s presentation both illuminating and fun. It’s a testament of his incredible achievement with this book.
The problem
The start of every instrumental variable setting is the following.Statistical Rethinking: Week 9
/2020/06/03/statistical-rethinking-week-9/
Wed, 03 Jun 2020 00:00:00 +0000/2020/06/03/statistical-rethinking-week-9/Week 9 was all about fitting models with multivariate distributions in them. For example, a multivariate likelihood helps us use an instrumental variable to estimate the true causal effect of a predictor. But also as an adaptive prior for some of the predictors. In both cases, we found out that the benefit comes from modelling the resulting var-cov matrix. In the instrumental variable case, the resulting joint distribution for the residuals was the key to capture the statistical information of the confounding variable.LLN for higher p Moments
/2020/06/02/lln-for-higher-p-moments/
Tue, 02 Jun 2020 00:00:00 +0000/2020/06/02/lln-for-higher-p-moments/I have recently been exploring Nassim Taleb’s latest technical book: Statistical Consequences of Fat Tails. In it, we have seen how the Law of Large Numbers for different estimators simply does not work fast enough (in Extremistan) to be used in real life. For example, we have seen how the distribution of the sample mean, PCA, sample correlation and \(R^2\) turn into pure noise when we are dealing with fat-tails.Understanding Pooling across Intercepts and Slopes
/2020/06/01/understanding-pooling-across-intercepts-and-slopes/
Mon, 01 Jun 2020 00:00:00 +0000/2020/06/01/understanding-pooling-across-intercepts-and-slopes/Statistical Rethinking is a fabulous course on Bayesian Statistics (and much more). By following simulations in the book, I recently tried to understand why pooling is the process and shrinkage is the result. In this post, I’ll try to do the same for a model where we pool across intercepts and slopes. That is, we will posit a multivariate common distribution for both intercept and slopes to impose adaptive regularization on our predictions.Central Limit Theorem in Action
/2020/05/30/central-limit-theorem-in-action/
Sat, 30 May 2020 00:00:00 +0000/2020/05/30/central-limit-theorem-in-action/I have recently been exploring Nassim Taleb’s latest technical book: Statistical Consequences of Fat Tails. In it, we have seen how the Law of Large Numbers for different estimators simply does not work fast enough (in Extremistan) to be used in real life. For example, we have seen how the distribution of the sample mean, PCA, sample correlation and \(R^2\) turn into pure noise when we are dealing with fat-tails.Statistical Rethinking Week 8
/2020/05/29/statistical-rethinking-week-8/
Fri, 29 May 2020 00:00:00 +0000/2020/05/29/statistical-rethinking-week-8/Statistical Rethinking Week 8
This week was our first introduction to Multilevel models. Models where we explicitly model a family of parameters as coming from a common distribution: with each sample, we simultaneously learn each parameter and the parameters of the common distribution. This process of sharing information is called pooling. The end result is shrinkage: each parameter gets pulled towards the estimated mean of the common distribution. I tried my best to understand this process and result by simulating in this postSimulating into understanding Multilevel Models
/2020/05/28/simulating-into-understanding-multilevel-models/
Thu, 28 May 2020 00:00:00 +0000/2020/05/28/simulating-into-understanding-multilevel-models/Simulating into Understanding Multilevel Models Pooling is the process and shrinkning is the result
Pooling and Shrinking are not easy concepts to understand. In the lectures, Richard, as always, does an excellent job of creating metaphors and examples to help us gain intuition around what Multilevel models do. Multilevel models are models mnesic models.
Imagine a cluster of observations: it can be different classrooms in a school. Pooling means using the information from other classrooms to inform our estimates for each classroom.R-squared and fat tails
/2020/05/26/r-squared-and-fat-tails/
Tue, 26 May 2020 00:00:00 +0000/2020/05/26/r-squared-and-fat-tails/R-squared and Fat-tails
This post continues to explore how common statistical methods are unreliable and dangerous when we are dealing with fat-tails. So far, we have seen how the distribution of the sample mean, PCA and sample correlation turn into pure noise when we are dealing with fat-tails. In this post, I’ll show the same for \(R^2\) (i.e., coefficient of determination). Remember, it is a random variable that we are estimating and thefore has its own distribution.Statistical Rethinking: Week 7
/2020/05/24/statistical-rethinking-week-7/
Sun, 24 May 2020 00:00:00 +0000/2020/05/24/statistical-rethinking-week-7/Statistical Rethinking: Week 7
This week paid off. All the hard work of understanding link functions, HMC flavored Monte-Carlo, and GLM allowed to study more complex models. To keep using Richard’s metaphor: it allowed us to study monsters: models with different parts made out of different models. In particular, Zero Inflated Models and Ordered Categories.
Homework
In the Trolley data—data(Trolley)—we saw how education level (modeled as
an ordered category) is associated with responses.Correlation is not Correlation
/2020/05/22/correlation-is-not-correlation/
Fri, 22 May 2020 00:00:00 +0000/2020/05/22/correlation-is-not-correlation/Correlation is not correlation
Small sample effects with Sample Correlation
Sample Correlation in Mediocristan
Sample correlations from Extremistan
Misused Correlation
Quadrant’s Correlation
Berkson’s paradox
Simpson’s Paradox
Correlation under non linearities
Misunderstanding of Correlation: its signal is non linear
Entropy and Mutual Information
Conclusion
Correlation is not correlation
To the usual phrase of correlation is not causation, Nassim Taleb often answers: correlation is not correlation.Statistical Rethinking: Week 6
/2020/05/20/statistical-rethinking-week-6/
Wed, 20 May 2020 00:00:00 +0000/2020/05/20/statistical-rethinking-week-6/Statistical Rethinking: Week 6
Quick summary of the week
The week was a whirlwind tour of:
Maximum entropy and introduction to GLMs.
The problems that come when using link functions.
The perils of relative effects when studying binomial regression and how complicated it is to directly calculate probabilities with GLMs: all the parameters interact among themselves.
This week was an introduction to GLMs and the principle of Maximum Entropy.Understanding the tail exponent
/2020/05/19/understanding-the-tail-exponent/
Tue, 19 May 2020 00:00:00 +0000/2020/05/19/understanding-the-tail-exponent/Understanding the tail exponent
Power Laws are ubiquitous to describe fat tails, a topic that I’ve been trying to wrap my head around for the last couple of weeks. However, up until now, I haven’t had a visceral understanding of what exactly is the function of their main parameter: the tail exponent \(\alpha\). This blogpost is my attempt at gaining understanding. To do so, I will be replicating some of the plots and derivations from two sources:Statistical Rethinking Week 5 -> HMC samples
/2020/05/15/statistical-rethinking-week-5-hmc-samples/
Fri, 15 May 2020 00:00:00 +0000/2020/05/15/statistical-rethinking-week-5-hmc-samples/Statistical Rethinking: Week 5
After a quick tour around interactions, this week was a quick introduction to MCMC samplers and how they are the engine that powers current Bayesian modelling. We looked at Metropolis, Gibbs and finally HMC. Not only HMC is more efficient, but it also let us know when it fails. Let’s tackle the homework with these new tools:
Homework 5
Problem Week 1
data("Wines2012")
wines <- Wines2012
wines %>% count(judge)
## # A tibble: 9 x 2
## judge n
## <fct> <int>
## 1 Daniele Meulder 20
## 2 Francis Schott 20
## 3 Jamal Rayyis 20
## 4 Jean-M Cardebat 20
## 5 John Foy 20
## 6 Linda Murphy 20
## 7 Olivier Gergaud 20
## 8 Robert Hodgson 20
## 9 Tyler Colman 20
We have 9 judges and each of them gave 20 reviews.Standard Deviation and Fat Tails
/2020/05/13/standard-deviation-and-fat-tails/
Wed, 13 May 2020 00:00:00 +0000/2020/05/13/standard-deviation-and-fat-tails/Statistical Consequences of Fat Tails
In this post, I’ll continue to explore with Monte-Carlo simulations the ideas in Nassim Taleb’s latest book: Statistical Consequences of Fat Tails. In other posts, I have look at the persistent small sample effect that plagues the mean estimates under fat tails as a consequence of the loooong pre-asymptotics of the law of large numbers. Also, how this in turn plagues other statistical techniques such as PCA.Statistical Rethinking: Week 5 -> Interactions
/2020/05/12/statistical-rethinking-week-5-interactions/
Tue, 12 May 2020 00:00:00 +0000/2020/05/12/statistical-rethinking-week-5-interactions/Statisical Rethinking: Week 5 -> Interactions
As Richard says in class, interactions are easy to code but incredibly difficult to interpret. By going through the problems in Chapter 8, I hope to gain a bit of practice working with them.
Chapter 8 Problems
Problem 1
Let’s run the tulips model but this time with the bed variable. Given that this is a categorical variable, it will create a different intercept for each of the beds in the sample.Statistical Rethinking: Week 4
/2020/05/11/statistical-rethinking-week-4/
Mon, 11 May 2020 00:00:00 +0000/2020/05/11/statistical-rethinking-week-4/Statistical Rethinking: Week 4
This week was a marathon of content. Richard introduced beautifully the trade-off between overfitting and underfitting and prescribed two complimentary methods to help us navigate this trade-off:
Regularizing priors
Information criteria and Cross-Validation estimates of the risk of overfitting.
Regularizing priors reduces the risk of overfitting of any model by introducing skepticisim into the priors. Whereas information criteria and Cross-Validation help us to estimate whether we have overfitted or not.What does it mean to fatten the tails?
/2020/05/09/what-does-it-mean-to-fatten-the-tails/
Sat, 09 May 2020 00:00:00 +0000/2020/05/09/what-does-it-mean-to-fatten-the-tails/What does it mean to fatten the tails?
First, let’s define what we mean by fatter tails.
What are fatter tails?
Intuitively, fat tails distribution are distributions for which their PDFs decay to zero very slowly. So slowly, that extreme values start gaining traction in the determination of the whole distribution. Thus, a distribution is fatter than another one if its PDF takes longer to decay to zero.
Fattening the tails
Thus, if we wanted to fatten the tails, the intuitive response is to add more mass at the tails such that the PDF takes more time to decay.Statistical Rethinking: Week 3
/2020/05/03/statistical-rethinking-week-3/
Sun, 03 May 2020 00:00:00 +0000/2020/05/03/statistical-rethinking-week-3/Statistical Rethinking: Week 3
Week 3 gave the most interesting discussion of multiple regression. Why isn’t it enough with univariate regression? It allows us to disentagle two types of mistakes:
Spurious correlation between the predictor and independent variable.
A masking relationship between two explanatory variables.
It also started to introduce DAGs and how they are an incredible tool for thinking before fitting. Specially, it managed to convince me the frequent strategy of tossing everything into a multiple regression and hoping for the ebst is a recipe for disaster.Wittgenstein's Ruler: Fat or Thin?
/2020/04/30/wittgenstein-s-ruler-fat-or-thin/
Thu, 30 Apr 2020 00:00:00 +0000/2020/04/30/wittgenstein-s-ruler-fat-or-thin/Statistical Rethinking: Week 2
/2020/04/28/statistical-rethinking-week-2/
Tue, 28 Apr 2020 00:00:00 +0000/2020/04/28/statistical-rethinking-week-2/library(rethinking)
library(tidyverse)
library(ggridges)
extrafont::loadfonts(device="win") set.seed(24)
data("Howell1")
precis(Howell1)
## mean sd 5.5% 94.5%
## height 138.2635963 27.6024476 81.108550 165.73500
## weight 35.6106176 14.7191782 9.360721 54.50289
## age 29.3443934 20.7468882 1.000000 66.13500
## male 0.4724265 0.4996986 0.000000 1.00000
## histogram
## height <U+2581><U+2581><U+2581><U+2581><U+2581><U+2581><U+2581><U+2582><U+2581><U+2587><U+2587><U+2585><U+2581>
## weight <U+2581><U+2582><U+2583><U+2582><U+2582><U+2582><U+2582><U+2585><U+2587><U+2587><U+2583><U+2582><U+2581>
## age <U+2587><U+2585><U+2585><U+2583><U+2585><U+2582><U+2582><U+2581><U+2581>
## male <U+2587><U+2581><U+2581><U+2581><U+2581><U+2581><U+2581><U+2581><U+2581><U+2587>
Week 2
Week 2 has gotten us to start exploring linear regression from a bayesian perspective. I found it the most interesting to propagate uncertainty through the model.Spurious PCA under Thick Tails
/2020/04/27/spurious-pca-under-thick-tails/
Mon, 27 Apr 2020 00:00:00 +0000/2020/04/27/spurious-pca-under-thick-tails/Spurious PCA under Thick Tails
PCA is a dimensionality reduction technique. It seeks to project the data onto a lower dimensional hyperplane such that as much of the original data variance is preserved. The underlying idea is that the vectors creating these lower dimensional hyperplanes reflect a latent structure in the data. However, what happens when there is no structure at all?
In his most recently published technical book,
Taleb examines this question under two different regimes: Mediocristan and Extremistan.Pareto 80/20 and Maximum Likelihood
/2020/04/23/pareto-80-20-and-maximum-likelihood/
Thu, 23 Apr 2020 00:00:00 +0000/2020/04/23/pareto-80-20-and-maximum-likelihood/Statistical Rethinking: Week 1
/2020/04/19/statistical-rethinking-week-1/
Sun, 19 Apr 2020 00:00:00 +0000/2020/04/19/statistical-rethinking-week-1/Week 1
Week 1 tries to go as deep as possible in the intuition and the mechanics of a very simple model. As always with McElreath, he goes on with both clarity and erudition.
Suppose the globe tossing data had turned out to be 8 water in 15 tosses.
Construct the posterior distribution, using grid approximation. Use the
same flat prior as before.
# define grid
p_grid <- seq(from = 0, to = 1, length.Fat vs Thin: does LLN work?
/2020/04/17/fat-vs-thin-does-lln-work/
Fri, 17 Apr 2020 00:00:00 +0000/2020/04/17/fat-vs-thin-does-lln-work/Fat tails are a different beast
Statistical estimation is based on the LLN and CLT. The CLT states that the sampling distribution will look like a normal. The LLN that the variance of the normal will decrease as our sampling size increases.
Or so does Nassim Nicholas Taleb says in his recently published technical book, wherein he explains how common practice statistical methodology breaks down under the Extremistan regime.Coursera Machine Learning: Neural Networks, Representation
/2020/01/08/coursera-machine-learning-neural-networks-representation/
Wed, 08 Jan 2020 00:00:00 +0000/2020/01/08/coursera-machine-learning-neural-networks-representation/Coursera Machine Learning Logistic Regression and Regularization
/2020/01/01/coursera-machine-learning-logistic-regression/
Wed, 01 Jan 2020 00:00:00 +0000/2020/01/01/coursera-machine-learning-logistic-regression/Classification Problems
Linear Regression?
Logistic Regression
Decision Boundaries
Cost Function
Cross Entropy
A maximum likelihood derivation
Vectorised implementation
Multi Classification Problem
Regularization
Solutions to overfitting:
Cost function
Moving towards neural networks
Classification Problems
In all of these problems the variable that we’re trying to predict is a variable \(y\) that we can think of as taking on two values either zero or one, either spam or not spam, fraudulent or not fraudulent, related malignant or benign.Coursera Machine Learning: Introduction and Linear Regression
/2019/12/26/coursera-machine-learning-week-1/
Thu, 26 Dec 2019 00:00:00 +0000/2019/12/26/coursera-machine-learning-week-1/Why?
Week 1
Why Machine Learning?
What is Machine Learning
Supervised Learning
Regression Problems and Classification Problems
Math Setting
Example of ML algorithm and a loss function
Which ML Algorithm for which Loss Function?
Unsupervised Learning
Linear Regression
Loss Function
Gradient Descent
Gradient Descent Justification
Gradient Descent in Linear Regression
Implementing Linear Regression with Gradient Descent
Problems with linear regression
Sensibility to Outliers
Multicollinearity
Heteroscedasticity
Why?Referential Arrays in Python
/2019/12/19/referential-arrays-in-python/
Thu, 19 Dec 2019 00:00:00 +0000/2019/12/19/referential-arrays-in-python/Referential Array
Arrays in pythons do not hold he objects themselves, but pointers to that objects. This can create problems with shallowcopying. For example:
Each index of each list contains a pointer to an object.
first_list = [1, 2, 3]
second_list = [4, 5, 6]
third_list = [7, 8, 9]
print(first_list)
## [1, 2, 3]
What if we want to combine these lists?
first_list.extend(second_list)
third_list.append(second_list)
print(first_list)
## [1, 2, 3, 4, 5, 6]
print(third_list)
## [7, 8, 9, [4, 5, 6]]
Now, first_list points to the same object as second list.Refresher Big Oh Notation
/2019/12/19/refresher-big-oh-notation/
Thu, 19 Dec 2019 00:00:00 +0000/2019/12/19/refresher-big-oh-notation/Big Oh Notation
First, let’s try to remember what problem we are actually trying to solve: compare the different runtime of different algorithms. We are not comparing them across time, but across number of some basic operations. And we care not about the time itself, but about the runtime growth as the input size also growths. Thus, we use asymptotic analysis.
\(O(n)\) can thus be described as: the runtime grows linearly as the input size increases.deeplearning.ai Specialization
/2019/12/18/deeplearning-ai-specialization/
Wed, 18 Dec 2019 00:00:00 +0000/2019/12/18/deeplearning-ai-specialization/deeplearning.ai
Over the next few days, I’ll go over (this time I am paying and thus have access to the exams :)) the deeplearning.ai Coursera Specialization. Here, I’ll gather my notes of the course for easy access:
Neural Networks and Deep Learning
Improving Deep Neural Networks: Hyperparameter tuning, Regularization and Optimization.
Structuring Machine Learning Projects.
Convolutional Neural Networks.
Sequence Models.
Why?
I have already used DL in a couple of personal projects.Neural Networks and Deep Learning
/2019/12/18/neural-networks-and-deep-learning/
Wed, 18 Dec 2019 00:00:00 +0000/2019/12/18/neural-networks-and-deep-learning/Week 1
Why Neural Networks?
Economic value from Neural Networks
Structured vs Unstructured
Why now? Scale, scale and scale
Universal Approximators
Week 2
Notation
Logistic Regression
A rational loss function
Arriving at cost function: Maximum Likelihood
Gradient Descent
Minimizing directional derivative
Computational Graphs and backprop
Example with logistic regression
Vectorisation
Broadcasting
Assingments’ need to remember:
Week 3
Hidden Layer
Notation
Activation function
Why bother with any activation function at all?Understanding Backtracking
/2019/12/12/understanding-backtracking/
Thu, 12 Dec 2019 00:00:00 +0000/2019/12/12/understanding-backtracking/Recursion black magic
While working through some HackerRank’s problems, I came across the following exercise:
Find the number of ways that a given integer, \(X\), can be expressed as the sum of the \(N^{th}\) powers of unique, natural numbers.
However, I could not inmediately come up with a compelling way to programatically enumerate and examine all the possible sequences of numbers to find the required solution. Thus, it’s good time as ever to refresh my knowledge about recursion and specically, backtracking.Netflix Habits through data
/2019/10/15/netflix-habits-through-data/
Tue, 15 Oct 2019 00:00:00 +0000/2019/10/15/netflix-habits-through-data/Netflix Habits
In the past, I believe I have spent an inordinate amount of time watching series and movies on Netflix. To try to gauge how my habits have changed through time, I downloaded the data that Netflix makes available and, of course, used ´R´ to analyze it.
Tidy Tools
library(tidyverse)
library(tsibble)
Let’s have a first look:
Sadly, there’s not that much information. However, let’s try to gauge how many shows have I watched:Qui mensis anni calidissimus est?
/2018/07/19/qui-mensis-anni-calidissimus-est/
Thu, 19 Jul 2018 00:00:00 +0000/2018/07/19/qui-mensis-anni-calidissimus-est/In capitulo XIII Linguae Latinae, cui nomen ‘Annus et Menses’ est, Quintus Aemiliam interrogat hoc:
Qui mensis anni calidissimus est?
Respondeo notitiae Nova Yorkae:
library(tidyverse)
data <- read_csv("https://raw.githubusercontent.com/fivethirtyeight/data/master/us-weather-history/KNYC.csv")
data <- data %>% mutate(date = lubridate::ymd(date),
month = lubridate::month(date),
mensis = case_when(
month == 1 ~ "Ianuarius",
month == 2 ~ "Februarius",
month == 3 ~ "Martius",
month == 4 ~ "Aprilis",
month == 5 ~ "Maius",
month == 6 ~ "Iunius",
month == 7 ~ "Iulius",
month == 8 ~ "Augustus",
month == 9 ~ "Septiembre",
month == 10 ~ "October",
month == 11 ~ "November", month == 12 ~ "December"
),
mensis = forcats::as_factor(mensis),
mensis = forcats::fct_reorder(mensis, month, .How to spell HODL?
/2018/07/17/how-spell-hodl/
Tue, 17 Jul 2018 00:00:00 +0000/2018/07/17/how-spell-hodl/The moody Mr. Market
Anybody who has even a dime on the stock market will eventually get dragged on following the daily (or even hourly) moves in the market. However, this is not only a stressful idea, but also a very ineffective one. Most days in the market won’t even bulge your final total return. In fact, final market return is mostly determined by a handful of days alone.Trees, Ensembles and beyond, XGBoost and LGBM
/2018/06/10/trees-ensembles-and-beyond/
Sun, 10 Jun 2018 00:00:00 +0000/2018/06/10/trees-ensembles-and-beyond/Why?
Set-up
Trees
Fitting them
Interpretation
Ensembles
Bagging
Bootstraping
Random Forests
Conclusions for Bagging
Boosting
Directional Derivative
Gradient Boosting: Back to our problem
Conclusions for Boosting
Why?
lightgbm and xgboost appear in every single competition at Kaggle. Thus, these boosting techniques must be able to learn something that cannot be easily learned from intelligent bagging techniques like Random Forests. This is my attempt to understand why and how they can do that.About
/about/
Thu, 07 Jun 2018 00:00:00 +0000/about/Hi! My name is David Salazar an I am dilettant’n my way into understanding Machine Learning and Data Science in general.The Adam Smith Problem: Tidytext in R
/2018/06/07/the-adam-smith-problem-tidytext-in-r/
Thu, 07 Jun 2018 00:00:00 +0000/2018/06/07/the-adam-smith-problem-tidytext-in-r/Why?
This is a fun (for me) exercise to explore Text Mining with R and make sure I can follow along.
What is it ?
Around the XIX century, some german scholars posited that Wealth of Nations’ Adam Smith was too different to Theory of Moral Sentiments’ Adam Smith, thus concluded that he must have had a change of heart somewhere along his life or that he was simply an incoherent man.Is it CR7 or Messi?: Using the fastai toolkit
/2018/06/01/is-it-cr7-or-messi-using-the-fastai-toolkit/
Fri, 01 Jun 2018 00:00:00 +0000/2018/06/01/is-it-cr7-or-messi-using-the-fastai-toolkit/World Cup Mode: CR vs Messi¶To try to solidify what I have learned from Deep Learning for coders from fast.ai, I'll try to train a computer vision algorithm such that it can recognize whether is Messi or Cristiano Ronaldo in the picture. Notebook in github or nbviewer
Imports¶
In [1]:
# Put these at the top of every notebook, to get automatic reloading and inline plotting
%reload_ext autoreload
%autoreload 2
%matplotlib inline
In [2]:
# This file contains all the main external libs we'll use
from fastai.Plotting Supply and Demand Curves with ggplot2
/2018/05/20/ggsupplydemand/
Sun, 20 May 2018 00:00:00 +0000/2018/05/20/ggsupplydemand/What is it?
ggsupplyDemand is an R package that makes it extremely easy to plot basic supply and demands using ggplot2
library(ggsupplyDemand)
create_supply_and_demand() %>% shift_demand(outwards = TRUE) %>% plot_supply_and_demand(consumer_surplus = TRUE)
Why?
I needed to plot some basic supply and demand curves in R. Obviously, I thought of ggplot2. However, it is not that straightforward. The best resource I could find was this blogpost from Andrew Heiss. I recopilated most of his functions, created a simple API and put all the functions on an package.Shiny e iCOLCAP
/2018/04/01/shiny-e-icolcap/
Sun, 01 Apr 2018 00:00:00 +0000/2018/04/01/shiny-e-icolcap/Siguiendo una inversión decepecionante
El mercado colombiano ha sido una decepción en los últimos cinco años. Para ver qué tan decepcionado debería estar, cree esta Shiny app. Para no ovlidarme de cómo se construyen en el futuro, acá un tutorial breve comentando selecciones del código:
ui: User Interface
En la ui especificamos la organización de los inputs y outputs, aún cuando no hayamos creado estos últimos.
ui <- fluidPage(
# Application title
titlePanel("Comparación de Mercado"),
# Sidebar with a date input
sidebarLayout( dateInput("fecha_inicial", "Fecha de inicio de comparación:", value = "2013-01-01")
# primer argumento input$fecha_inicial, como nos referiremos a esto en server.