# Statistical Rethinking: Week 9

Published

June 3, 2020

Week 9 was all about fitting models with multivariate distributions in them. For example, a multivariate likelihood helps us use an instrumental variable to estimate the true causal effect of a predictor. But also as an adaptive prior for some of the predictors. In both cases, we found out that the benefit comes from modelling the resulting var-cov matrix. In the instrumental variable case, the resulting joint distribution for the residuals was the key to capture the statistical information of the confounding variable. In the adaptive prior case, it helps understand the relationship between different parameter types.

# 1st question

Revisit the Bangladesh fertility data,data(bangladesh). Fit a model with both varying intercepts by district_id and varying slopes of urban (as a 0/1 indicator variable) by district_id. You are still predicting use.contraception. Inspect the correlation between the intercepts and slopes. Can you interpret this correlation, in terms of what it tells you about the pattern of contraceptive use in the sample? It might help to plot the varying effect estimates for both the intercepts and slopes, by district. Then you can visualize the correlation and maybe more easily think through what it means to have a particular correlation. Plotting predicted proportion of women using contraception, in each district, with urban women on one axis and rural on the other, might also help.

data("bangladesh")

# Fix the district id
bangladesh %>%
mutate(district_id = as.integer( as.factor(district) ) ) -> bangladesh
glimpse(bangladesh)
Rows: 1,934
Columns: 7
$woman <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 1…$ district          <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
$use.contraception <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 1…$ living.children   <int> 4, 1, 3, 4, 1, 1, 4, 4, 2, 4, 1, 1, 2, 4, 4, 4, 1, 4…
$age.centered <dbl> 18.4400, -5.5599, 1.4400, 8.4400, -13.5590, -11.5600…$ urban             <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
$district_id <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1… Let’s fit the varying effects models for each district to have its average contraception use its own the differential between urban and rural areas. data_varying <- list( contraception = bangladesh$use.contraception,
district_id = bangladesh$district_id, urban = bangladesh$urban
)

model_varying <- ulam(
alist(
contraception ~ binomial(1, p),
logit(p) <- alpha[district_id] + beta[district_id] * urban,

# adaptive priors
c(alpha, beta)[district_id] ~ multi_normal(c(a, b), Rho, sigma),

# hyper-priors
a ~ normal(-0.5, 1),
b ~ normal(0, 1),
sigma ~ exponential(1),
Rho ~ lkj_corr(2)
),
data = data_varying,
chains = 4, cores = 4,
iter = 2000
)
Running MCMC with 4 parallel chains, with 1 thread(s) per chain...

Chain 1 Iteration:    1 / 2000 [  0%]  (Warmup)
Chain 2 Iteration:    1 / 2000 [  0%]  (Warmup)
Chain 3 Iteration:    1 / 2000 [  0%]  (Warmup)
Chain 4 Iteration:    1 / 2000 [  0%]  (Warmup)
Chain 1 Iteration:  100 / 2000 [  5%]  (Warmup)
Chain 4 Iteration:  100 / 2000 [  5%]  (Warmup)
Chain 3 Iteration:  100 / 2000 [  5%]  (Warmup)
Chain 1 Iteration:  200 / 2000 [ 10%]  (Warmup)
Chain 2 Iteration:  100 / 2000 [  5%]  (Warmup)
Chain 4 Iteration:  200 / 2000 [ 10%]  (Warmup)
Chain 1 Iteration:  300 / 2000 [ 15%]  (Warmup)
Chain 2 Iteration:  200 / 2000 [ 10%]  (Warmup)
Chain 3 Iteration:  200 / 2000 [ 10%]  (Warmup)
Chain 4 Iteration:  300 / 2000 [ 15%]  (Warmup)
Chain 1 Iteration:  400 / 2000 [ 20%]  (Warmup)
Chain 2 Iteration:  300 / 2000 [ 15%]  (Warmup)
Chain 3 Iteration:  300 / 2000 [ 15%]  (Warmup)
Chain 4 Iteration:  400 / 2000 [ 20%]  (Warmup)
Chain 1 Iteration:  500 / 2000 [ 25%]  (Warmup)
Chain 2 Iteration:  400 / 2000 [ 20%]  (Warmup)
Chain 3 Iteration:  400 / 2000 [ 20%]  (Warmup)
Chain 1 Iteration:  600 / 2000 [ 30%]  (Warmup)
Chain 4 Iteration:  500 / 2000 [ 25%]  (Warmup)
Chain 2 Iteration:  500 / 2000 [ 25%]  (Warmup)
Chain 3 Iteration:  500 / 2000 [ 25%]  (Warmup)
Chain 1 Iteration:  700 / 2000 [ 35%]  (Warmup)
Chain 4 Iteration:  600 / 2000 [ 30%]  (Warmup)
Chain 2 Iteration:  600 / 2000 [ 30%]  (Warmup)
Chain 1 Iteration:  800 / 2000 [ 40%]  (Warmup)
Chain 3 Iteration:  600 / 2000 [ 30%]  (Warmup)
Chain 4 Iteration:  700 / 2000 [ 35%]  (Warmup)
Chain 2 Iteration:  700 / 2000 [ 35%]  (Warmup)
Chain 1 Iteration:  900 / 2000 [ 45%]  (Warmup)
Chain 3 Iteration:  700 / 2000 [ 35%]  (Warmup)
Chain 4 Iteration:  800 / 2000 [ 40%]  (Warmup)
Chain 2 Iteration:  800 / 2000 [ 40%]  (Warmup)
Chain 1 Iteration: 1000 / 2000 [ 50%]  (Warmup)
Chain 1 Iteration: 1001 / 2000 [ 50%]  (Sampling)
Chain 3 Iteration:  800 / 2000 [ 40%]  (Warmup)
Chain 4 Iteration:  900 / 2000 [ 45%]  (Warmup)
Chain 2 Iteration:  900 / 2000 [ 45%]  (Warmup)
Chain 1 Iteration: 1100 / 2000 [ 55%]  (Sampling)
Chain 4 Iteration: 1000 / 2000 [ 50%]  (Warmup)
Chain 4 Iteration: 1001 / 2000 [ 50%]  (Sampling)
Chain 2 Iteration: 1000 / 2000 [ 50%]  (Warmup)
Chain 2 Iteration: 1001 / 2000 [ 50%]  (Sampling)
Chain 3 Iteration:  900 / 2000 [ 45%]  (Warmup)
Chain 1 Iteration: 1200 / 2000 [ 60%]  (Sampling)
Chain 1 Iteration: 1300 / 2000 [ 65%]  (Sampling)
Chain 4 Iteration: 1100 / 2000 [ 55%]  (Sampling)
Chain 2 Iteration: 1100 / 2000 [ 55%]  (Sampling)
Chain 3 Iteration: 1000 / 2000 [ 50%]  (Warmup)
Chain 3 Iteration: 1001 / 2000 [ 50%]  (Sampling)
Chain 1 Iteration: 1400 / 2000 [ 70%]  (Sampling)
Chain 2 Iteration: 1200 / 2000 [ 60%]  (Sampling)
Chain 4 Iteration: 1200 / 2000 [ 60%]  (Sampling)
Chain 3 Iteration: 1100 / 2000 [ 55%]  (Sampling)
Chain 1 Iteration: 1500 / 2000 [ 75%]  (Sampling)
Chain 2 Iteration: 1300 / 2000 [ 65%]  (Sampling)
Chain 4 Iteration: 1300 / 2000 [ 65%]  (Sampling)
Chain 1 Iteration: 1600 / 2000 [ 80%]  (Sampling)
Chain 3 Iteration: 1200 / 2000 [ 60%]  (Sampling)
Chain 2 Iteration: 1400 / 2000 [ 70%]  (Sampling)
Chain 1 Iteration: 1700 / 2000 [ 85%]  (Sampling)
Chain 3 Iteration: 1300 / 2000 [ 65%]  (Sampling)
Chain 4 Iteration: 1400 / 2000 [ 70%]  (Sampling)
Chain 2 Iteration: 1500 / 2000 [ 75%]  (Sampling)
Chain 1 Iteration: 1800 / 2000 [ 90%]  (Sampling)
Chain 4 Iteration: 1500 / 2000 [ 75%]  (Sampling)
Chain 3 Iteration: 1400 / 2000 [ 70%]  (Sampling)
Chain 2 Iteration: 1600 / 2000 [ 80%]  (Sampling)
Chain 1 Iteration: 1900 / 2000 [ 95%]  (Sampling)
Chain 4 Iteration: 1600 / 2000 [ 80%]  (Sampling)
Chain 2 Iteration: 1700 / 2000 [ 85%]  (Sampling)
Chain 3 Iteration: 1500 / 2000 [ 75%]  (Sampling)
Chain 1 Iteration: 2000 / 2000 [100%]  (Sampling)
Chain 1 finished in 25.4 seconds.
Chain 2 Iteration: 1800 / 2000 [ 90%]  (Sampling)
Chain 3 Iteration: 1600 / 2000 [ 80%]  (Sampling)
Chain 4 Iteration: 1700 / 2000 [ 85%]  (Sampling)
Chain 2 Iteration: 1900 / 2000 [ 95%]  (Sampling)
Chain 4 Iteration: 1800 / 2000 [ 90%]  (Sampling)
Chain 3 Iteration: 1700 / 2000 [ 85%]  (Sampling)
Chain 2 Iteration: 2000 / 2000 [100%]  (Sampling)
Chain 2 finished in 29.4 seconds.
Chain 4 Iteration: 1900 / 2000 [ 95%]  (Sampling)
Chain 3 Iteration: 1800 / 2000 [ 90%]  (Sampling)
Chain 4 Iteration: 2000 / 2000 [100%]  (Sampling)
Chain 4 finished in 30.7 seconds.
Chain 3 Iteration: 1900 / 2000 [ 95%]  (Sampling)
Chain 3 Iteration: 2000 / 2000 [100%]  (Sampling)
Chain 3 finished in 32.7 seconds.

All 4 chains finished successfully.
Mean chain execution time: 29.5 seconds.
Total execution time: 33.0 seconds.

Let’s check our chains’ health:

traceplot_ulam(model_varying)

The chains look healthy enough. They are:

1. They are stationary
2. They mix well across the parameter space.
3. Different chains converge to explore the same parameter space.

Let’s check the $$\hat{R}$$ values:

results <- precis(model_varying, depth = 3)
results %>%
data.frame() %>%
select(Rhat4) %>%
summary()
     Rhat4
Min.   :0.9991
1st Qu.:0.9995
Median :0.9999
Mean   :1.0001
3rd Qu.:1.0002
Max.   :1.0051
NA's   :2       

The $$\hat{R}$$ look OK, indicating that the Markov chains are in close agreement with each other. Let’s check the parameters:

precis(model_varying, depth = 2, pars = c("sigma", "a", "b"))
               mean         sd       5.5%      94.5%     n_eff    Rhat4
sigma[1]  0.5732478 0.09736082  0.4268930  0.7362033 1117.7376 1.000141
sigma[2]  0.7767925 0.20201103  0.4672885  1.1105892  503.5447 1.005126
a        -0.7055373 0.10293818 -0.8725771 -0.5428108 2976.0407 1.000136
b         0.6969825 0.16603119  0.4444045  0.9658206 2375.1974 1.000239

The contraceptive use is not that likely, thus the negative (in log-odds scale) average value in the adaptive prior for $$a$$. The positive value for $$b$$, on the other hand, indicates that the average distribution of slopes is positive. That is, women in urban areas are, on average, more likely to use contraception. Finally, the variances. Both indicate quite a bit of variation in the multivariate population for intercepts and slopes.

precis(model_varying, pars = "Rho", depth = 3)
               mean        sd      5.5%      94.5%    n_eff    Rhat4
Rho[1,1]  1.0000000 0.0000000  1.000000  1.0000000      NaN      NaN
Rho[1,2] -0.6585362 0.1612475 -0.868516 -0.3679604 710.9425 1.003216
Rho[2,1] -0.6585362 0.1612475 -0.868516 -0.3679604 710.9425 1.003216
Rho[2,2]  1.0000000 0.0000000  1.000000  1.0000000      NaN      NaN

There’s a negative correlation between the parameter types: i.e., for districts with higher contraceptive usage overall, the correlation informs us that we should predict a lower than average differential in the use of contraceptives between rural and urban areas.

We can follow Richard’s advice and plot both types of parameters for each district. We can even overlay the ellipses that determine the levels of the multivariate adaptive prior:

samples <- extract.samples(model_varying)

Mu_est <- c(mean(samples$a), mean(samples$b))
rho_est <- mean(samples$Rho[,1,2]) sa_est <- mean(samples$sigma[,1])
sb_est <- mean(samples$sigma[, 2]) cov_ab <- sa_est*sb_est*rho_est Sigma_est <- matrix(c(sa_est^2, cov_ab, cov_ab, sb_est^2), ncol = 2) contour_level <- function(level) { ellipse::ellipse(Sigma_est, centre = Mu_est, level = level) %>% data.frame() %>% mutate(level = level) } purrr::map(c(0.1, 0.3, 0.5, 0.8, 0.99), contour_level) %>% bind_rows() -> data_elipses data_elipses %>% ggplot(aes(x, y)) + geom_path(aes(group = level), linetype = 2) + geom_point(data = data.frame(x = Mu_est[1]), y = Mu_est[2], color = "red") Finally, we can plot the points: samples$alpha %>%
as_tibble() %>%
pivot_longer(everything(), names_to = "district_id_", names_prefix = "V", values_to = "alpha") %>%
bind_cols(samples$beta %>% as_tibble() %>% pivot_longer(everything(), names_to = "district_id", names_prefix = "V", values_to = "beta")) %>% group_by(district_id) %>% median_qi(alpha, beta) %>% select(district_id, alpha, beta) %>% ggplot(aes(alpha, beta)) + geom_point(alpha = 0.6) + geom_path(data = data_elipses, inherit.aes = F, mapping = aes(x, y, group = level), linetype = 2, color = "dodgerblue4") + geom_point(data = data.frame(x = Mu_est[1]), y = Mu_est[2], color = "red", inherit.aes = FALSE, mapping = aes(x, y)) + labs(title = "Negative correlation between intercepts and slopes per district", subtitle = "Districts with higher overall use have lower differentials between urban and rural", x = expression(alpha), y = expression(beta)) # 2nd question Now consider the predictor variables age.centered and living.children, also contained in data(bangladesh). Suppose that age influences contraceptive use (changing attitudes) and number of children (older people have had more time to have kids). Number of children may also directly influence contraceptive use. Draw a DAG that reflects these hypothetical relationships. Then build models needed to evaluate the DAG. You will need at least two models. Retain district and urban, as in Problem 1. What do you conclude about the causal influence of age and children? dag <- dagitty::dagitty(" dag { Age -> N_children Age -> contraception N_children -> contraception }") drawdag(dag) Conditional on this DAG, the total causal effect of Age on contraception is mediated (pipe) with Number of Children. Thus, to get the total effect we must not control by number of children. Let’s fit this model: data_varying <- list( contraception = bangladesh$use.contraception,
district_id = bangladesh$district_id, urban = bangladesh$urban,
age = bangladesh$age.centered, kids = bangladesh$living.children
)
model_only_age <- ulam(
alist(
contraception ~ dbinom(1, p),
logit(p) <- alpha[district_id] + beta[district_id] * urban + gamma*age,

# traditional priors
gamma ~ normal(0, 1),

# adaptive priors
c(alpha, beta)[district_id] ~ multi_normal(c(a, b), Rho, sigma),

# hyper-priors
a ~ normal(-0.5, 1),
b ~ normal(0, 1),
sigma ~ exponential(1),
Rho ~ lkj_corr(2)
),
chains = 4, cores = 4,
data = data_varying,
iter = 2000
)
Running MCMC with 4 parallel chains, with 1 thread(s) per chain...

Chain 1 Iteration:    1 / 2000 [  0%]  (Warmup)
Chain 2 Iteration:    1 / 2000 [  0%]  (Warmup)
Chain 3 Iteration:    1 / 2000 [  0%]  (Warmup)
Chain 4 Iteration:    1 / 2000 [  0%]  (Warmup)
Chain 3 Iteration:  100 / 2000 [  5%]  (Warmup)
Chain 4 Iteration:  100 / 2000 [  5%]  (Warmup)
Chain 2 Iteration:  100 / 2000 [  5%]  (Warmup)
Chain 3 Iteration:  200 / 2000 [ 10%]  (Warmup)
Chain 4 Iteration:  200 / 2000 [ 10%]  (Warmup)
Chain 1 Iteration:  100 / 2000 [  5%]  (Warmup)
Chain 2 Iteration:  200 / 2000 [ 10%]  (Warmup)
Chain 3 Iteration:  300 / 2000 [ 15%]  (Warmup)
Chain 4 Iteration:  300 / 2000 [ 15%]  (Warmup)
Chain 3 Iteration:  400 / 2000 [ 20%]  (Warmup)
Chain 4 Iteration:  400 / 2000 [ 20%]  (Warmup)
Chain 2 Iteration:  300 / 2000 [ 15%]  (Warmup)
Chain 1 Iteration:  200 / 2000 [ 10%]  (Warmup)
Chain 4 Iteration:  500 / 2000 [ 25%]  (Warmup)
Chain 2 Iteration:  400 / 2000 [ 20%]  (Warmup)
Chain 3 Iteration:  500 / 2000 [ 25%]  (Warmup)
Chain 1 Iteration:  300 / 2000 [ 15%]  (Warmup)
Chain 2 Iteration:  500 / 2000 [ 25%]  (Warmup)
Chain 3 Iteration:  600 / 2000 [ 30%]  (Warmup)
Chain 4 Iteration:  600 / 2000 [ 30%]  (Warmup)
Chain 1 Iteration:  400 / 2000 [ 20%]  (Warmup)
Chain 3 Iteration:  700 / 2000 [ 35%]  (Warmup)
Chain 4 Iteration:  700 / 2000 [ 35%]  (Warmup)
Chain 2 Iteration:  600 / 2000 [ 30%]  (Warmup)
Chain 1 Iteration:  500 / 2000 [ 25%]  (Warmup)
Chain 2 Iteration:  700 / 2000 [ 35%]  (Warmup)
Chain 3 Iteration:  800 / 2000 [ 40%]  (Warmup)
Chain 4 Iteration:  800 / 2000 [ 40%]  (Warmup)
Chain 1 Iteration:  600 / 2000 [ 30%]  (Warmup)
Chain 2 Iteration:  800 / 2000 [ 40%]  (Warmup)
Chain 3 Iteration:  900 / 2000 [ 45%]  (Warmup)
Chain 4 Iteration:  900 / 2000 [ 45%]  (Warmup)
Chain 1 Iteration:  700 / 2000 [ 35%]  (Warmup)
Chain 2 Iteration:  900 / 2000 [ 45%]  (Warmup)
Chain 4 Iteration: 1000 / 2000 [ 50%]  (Warmup)
Chain 4 Iteration: 1001 / 2000 [ 50%]  (Sampling)
Chain 3 Iteration: 1000 / 2000 [ 50%]  (Warmup)
Chain 3 Iteration: 1001 / 2000 [ 50%]  (Sampling)
Chain 1 Iteration:  800 / 2000 [ 40%]  (Warmup)
Chain 2 Iteration: 1000 / 2000 [ 50%]  (Warmup)
Chain 2 Iteration: 1001 / 2000 [ 50%]  (Sampling)
Chain 4 Iteration: 1100 / 2000 [ 55%]  (Sampling)
Chain 3 Iteration: 1100 / 2000 [ 55%]  (Sampling)
Chain 1 Iteration:  900 / 2000 [ 45%]  (Warmup)
Chain 2 Iteration: 1100 / 2000 [ 55%]  (Sampling)
Chain 4 Iteration: 1200 / 2000 [ 60%]  (Sampling)
Chain 3 Iteration: 1200 / 2000 [ 60%]  (Sampling)
Chain 1 Iteration: 1000 / 2000 [ 50%]  (Warmup)
Chain 1 Iteration: 1001 / 2000 [ 50%]  (Sampling)
Chain 4 Iteration: 1300 / 2000 [ 65%]  (Sampling)
Chain 2 Iteration: 1200 / 2000 [ 60%]  (Sampling)
Chain 3 Iteration: 1300 / 2000 [ 65%]  (Sampling)
Chain 1 Iteration: 1100 / 2000 [ 55%]  (Sampling)
Chain 4 Iteration: 1400 / 2000 [ 70%]  (Sampling)
Chain 1 Iteration: 1200 / 2000 [ 60%]  (Sampling)
Chain 2 Iteration: 1300 / 2000 [ 65%]  (Sampling)
Chain 3 Iteration: 1400 / 2000 [ 70%]  (Sampling)
Chain 4 Iteration: 1500 / 2000 [ 75%]  (Sampling)
Chain 1 Iteration: 1300 / 2000 [ 65%]  (Sampling)
Chain 2 Iteration: 1400 / 2000 [ 70%]  (Sampling)
Chain 3 Iteration: 1500 / 2000 [ 75%]  (Sampling)
Chain 1 Iteration: 1400 / 2000 [ 70%]  (Sampling)
Chain 4 Iteration: 1600 / 2000 [ 80%]  (Sampling)
Chain 3 Iteration: 1600 / 2000 [ 80%]  (Sampling)
Chain 2 Iteration: 1500 / 2000 [ 75%]  (Sampling)
Chain 1 Iteration: 1500 / 2000 [ 75%]  (Sampling)
Chain 4 Iteration: 1700 / 2000 [ 85%]  (Sampling)
Chain 3 Iteration: 1700 / 2000 [ 85%]  (Sampling)
Chain 1 Iteration: 1600 / 2000 [ 80%]  (Sampling)
Chain 4 Iteration: 1800 / 2000 [ 90%]  (Sampling)
Chain 2 Iteration: 1600 / 2000 [ 80%]  (Sampling)
Chain 1 Iteration: 1700 / 2000 [ 85%]  (Sampling)
Chain 3 Iteration: 1800 / 2000 [ 90%]  (Sampling)
Chain 4 Iteration: 1900 / 2000 [ 95%]  (Sampling)
Chain 2 Iteration: 1700 / 2000 [ 85%]  (Sampling)
Chain 1 Iteration: 1800 / 2000 [ 90%]  (Sampling)
Chain 3 Iteration: 1900 / 2000 [ 95%]  (Sampling)
Chain 4 Iteration: 2000 / 2000 [100%]  (Sampling)
Chain 4 finished in 50.3 seconds.
Chain 2 Iteration: 1800 / 2000 [ 90%]  (Sampling)
Chain 1 Iteration: 1900 / 2000 [ 95%]  (Sampling)
Chain 3 Iteration: 2000 / 2000 [100%]  (Sampling)
Chain 3 finished in 52.1 seconds.
Chain 1 Iteration: 2000 / 2000 [100%]  (Sampling)
Chain 1 finished in 52.6 seconds.
Chain 2 Iteration: 1900 / 2000 [ 95%]  (Sampling)
Chain 2 Iteration: 2000 / 2000 [100%]  (Sampling)
Chain 2 finished in 54.6 seconds.

All 4 chains finished successfully.
Mean chain execution time: 52.4 seconds.
Total execution time: 54.7 seconds.

Let’s check our chains’ health:

traceplot_ulam(model_only_age)

The chains look healthy enough. They are:

1. They are stationary
2. They mix well across the parameter space.
3. Different chains converge to explore the same parameter space.

Let’s check the $$\hat{R}$$ values:

precis(model_only_age, depth = 3) %>%
data.frame() %>%
select(Rhat4) %>%
summary()
     Rhat4
Min.   :0.9991
1st Qu.:0.9996
Median :1.0000
Mean   :1.0002
3rd Qu.:1.0005
Max.   :1.0069
NA's   :2       

The $$\hat{R}$$ look OK, indicating that the Markov chains are in close agreement with each other. Let’s check the parameters:

precis(model_only_age, depth = 3, pars = c("a", "b", "gamma", "sigma", "Rho"))
                 mean          sd          5.5%       94.5%     n_eff     Rhat4
a        -0.711537292 0.102268885 -0.8787672400 -0.55198898 2905.2135 1.0005300
b         0.695464373 0.173304390  0.4243350000  0.97019381 2109.0982 1.0006393
gamma     0.009474581 0.005552167  0.0005521033  0.01849995 8484.1344 0.9992501
sigma[1]  0.584962697 0.100125045  0.4371533400  0.75428655 1260.7598 1.0016182
sigma[2]  0.802377308 0.196794320  0.5076205600  1.13047220  586.1314 1.0069074
Rho[1,1]  1.000000000 0.000000000  1.0000000000  1.00000000       NaN       NaN
Rho[1,2] -0.650102143 0.167089307 -0.8636559950 -0.33474679  864.0654 1.0037881
Rho[2,1] -0.650102143 0.167089307 -0.8636559950 -0.33474679  864.0654 1.0037881
Rho[2,2]  1.000000000 0.000000000  1.0000000000  1.00000000       NaN       NaN

The distribution of intercepts and slopes looks completely unchanged. For the $$\gamma$$, our estimated effect has much of its probability mass around zero and 0.02. Therefore, we conclude that the total causal effect of age on the use of contraception is small. For example, let’s take the woman from the first district and predict our expected probability that they use contraception, across both urban and rural areas, as function of age:

data.frame(data_varying) %>%
group_by(urban) %>%
data_grid(age, district_id = 1) %>%
add_fitted_draws(model_only_age) %>%
ggplot(aes(age, .value)) +
stat_lineribbon(fill = "dodgerblue4", alpha = 1/4) +
scale_fill_brewer(palette = "Greys") +
facet_wrap(~factor(urban, labels = c("Rural", "Urban"))) +
labs(title = "Predicted prob of using contraception as function of age",
subtitle = "Age has a positive small effect. No statistical adjustment by # of kids ",
y = "predicted prob")

Now for the model that takes into account the number of children each woman has:

model_age_kids <- ulam(
alist(
contraception ~ dbinom(1, p),
logit(p) <- alpha[district_id] + beta[district_id] * urban + gamma*age + delta*kids,

# traditional priors
gamma ~ normal(0, 1),
delta ~ normal(0, 1),

# adaptive priors
c(alpha, beta)[district_id] ~ multi_normal(c(a, b), Rho, sigma),

# hyper-priors
a ~ normal(-0.5, 1),
b ~ normal(0, 1),
sigma ~ exponential(1),
Rho ~ lkj_corr(2)
),
chains = 4, cores = 4,
data = data_varying,
iter = 2000
)
Running MCMC with 4 parallel chains, with 1 thread(s) per chain...

Chain 1 Iteration:    1 / 2000 [  0%]  (Warmup)
Chain 2 Iteration:    1 / 2000 [  0%]  (Warmup)
Chain 3 Iteration:    1 / 2000 [  0%]  (Warmup)
Chain 4 Iteration:    1 / 2000 [  0%]  (Warmup)
Chain 4 Iteration:  100 / 2000 [  5%]  (Warmup)
Chain 2 Iteration:  100 / 2000 [  5%]  (Warmup)
Chain 1 Iteration:  100 / 2000 [  5%]  (Warmup)
Chain 1 Iteration:  200 / 2000 [ 10%]  (Warmup)
Chain 2 Iteration:  200 / 2000 [ 10%]  (Warmup)
Chain 3 Iteration:  100 / 2000 [  5%]  (Warmup)
Chain 4 Iteration:  200 / 2000 [ 10%]  (Warmup)
Chain 1 Iteration:  300 / 2000 [ 15%]  (Warmup)
Chain 4 Iteration:  300 / 2000 [ 15%]  (Warmup)
Chain 3 Iteration:  200 / 2000 [ 10%]  (Warmup)
Chain 2 Iteration:  300 / 2000 [ 15%]  (Warmup)
Chain 4 Iteration:  400 / 2000 [ 20%]  (Warmup)
Chain 1 Iteration:  400 / 2000 [ 20%]  (Warmup)
Chain 2 Iteration:  400 / 2000 [ 20%]  (Warmup)
Chain 3 Iteration:  300 / 2000 [ 15%]  (Warmup)
Chain 4 Iteration:  500 / 2000 [ 25%]  (Warmup)
Chain 1 Iteration:  500 / 2000 [ 25%]  (Warmup)
Chain 2 Iteration:  500 / 2000 [ 25%]  (Warmup)
Chain 3 Iteration:  400 / 2000 [ 20%]  (Warmup)
Chain 4 Iteration:  600 / 2000 [ 30%]  (Warmup)
Chain 2 Iteration:  600 / 2000 [ 30%]  (Warmup)
Chain 1 Iteration:  600 / 2000 [ 30%]  (Warmup)
Chain 3 Iteration:  500 / 2000 [ 25%]  (Warmup)
Chain 4 Iteration:  700 / 2000 [ 35%]  (Warmup)
Chain 2 Iteration:  700 / 2000 [ 35%]  (Warmup)
Chain 3 Iteration:  600 / 2000 [ 30%]  (Warmup)
Chain 1 Iteration:  700 / 2000 [ 35%]  (Warmup)
Chain 4 Iteration:  800 / 2000 [ 40%]  (Warmup)
Chain 2 Iteration:  800 / 2000 [ 40%]  (Warmup)
Chain 3 Iteration:  700 / 2000 [ 35%]  (Warmup)
Chain 1 Iteration:  800 / 2000 [ 40%]  (Warmup)
Chain 4 Iteration:  900 / 2000 [ 45%]  (Warmup)
Chain 2 Iteration:  900 / 2000 [ 45%]  (Warmup)
Chain 1 Iteration:  900 / 2000 [ 45%]  (Warmup)
Chain 3 Iteration:  800 / 2000 [ 40%]  (Warmup)
Chain 4 Iteration: 1000 / 2000 [ 50%]  (Warmup)
Chain 4 Iteration: 1001 / 2000 [ 50%]  (Sampling)
Chain 2 Iteration: 1000 / 2000 [ 50%]  (Warmup)
Chain 2 Iteration: 1001 / 2000 [ 50%]  (Sampling)
Chain 1 Iteration: 1000 / 2000 [ 50%]  (Warmup)
Chain 1 Iteration: 1001 / 2000 [ 50%]  (Sampling)
Chain 3 Iteration:  900 / 2000 [ 45%]  (Warmup)
Chain 4 Iteration: 1100 / 2000 [ 55%]  (Sampling)
Chain 2 Iteration: 1100 / 2000 [ 55%]  (Sampling)
Chain 1 Iteration: 1100 / 2000 [ 55%]  (Sampling)
Chain 3 Iteration: 1000 / 2000 [ 50%]  (Warmup)
Chain 3 Iteration: 1001 / 2000 [ 50%]  (Sampling)
Chain 4 Iteration: 1200 / 2000 [ 60%]  (Sampling)
Chain 1 Iteration: 1200 / 2000 [ 60%]  (Sampling)
Chain 2 Iteration: 1200 / 2000 [ 60%]  (Sampling)
Chain 3 Iteration: 1100 / 2000 [ 55%]  (Sampling)
Chain 4 Iteration: 1300 / 2000 [ 65%]  (Sampling)
Chain 1 Iteration: 1300 / 2000 [ 65%]  (Sampling)
Chain 2 Iteration: 1300 / 2000 [ 65%]  (Sampling)
Chain 4 Iteration: 1400 / 2000 [ 70%]  (Sampling)
Chain 3 Iteration: 1200 / 2000 [ 60%]  (Sampling)
Chain 1 Iteration: 1400 / 2000 [ 70%]  (Sampling)
Chain 2 Iteration: 1400 / 2000 [ 70%]  (Sampling)
Chain 4 Iteration: 1500 / 2000 [ 75%]  (Sampling)
Chain 3 Iteration: 1300 / 2000 [ 65%]  (Sampling)
Chain 1 Iteration: 1500 / 2000 [ 75%]  (Sampling)
Chain 2 Iteration: 1500 / 2000 [ 75%]  (Sampling)
Chain 4 Iteration: 1600 / 2000 [ 80%]  (Sampling)
Chain 3 Iteration: 1400 / 2000 [ 70%]  (Sampling)
Chain 1 Iteration: 1600 / 2000 [ 80%]  (Sampling)
Chain 2 Iteration: 1600 / 2000 [ 80%]  (Sampling)
Chain 4 Iteration: 1700 / 2000 [ 85%]  (Sampling)
Chain 3 Iteration: 1500 / 2000 [ 75%]  (Sampling)
Chain 1 Iteration: 1700 / 2000 [ 85%]  (Sampling)
Chain 2 Iteration: 1700 / 2000 [ 85%]  (Sampling)
Chain 4 Iteration: 1800 / 2000 [ 90%]  (Sampling)
Chain 1 Iteration: 1800 / 2000 [ 90%]  (Sampling)
Chain 3 Iteration: 1600 / 2000 [ 80%]  (Sampling)
Chain 4 Iteration: 1900 / 2000 [ 95%]  (Sampling)
Chain 2 Iteration: 1800 / 2000 [ 90%]  (Sampling)
Chain 1 Iteration: 1900 / 2000 [ 95%]  (Sampling)
Chain 3 Iteration: 1700 / 2000 [ 85%]  (Sampling)
Chain 2 Iteration: 1900 / 2000 [ 95%]  (Sampling)
Chain 4 Iteration: 2000 / 2000 [100%]  (Sampling)
Chain 4 finished in 72.7 seconds.
Chain 1 Iteration: 2000 / 2000 [100%]  (Sampling)
Chain 1 finished in 73.5 seconds.
Chain 3 Iteration: 1800 / 2000 [ 90%]  (Sampling)
Chain 2 Iteration: 2000 / 2000 [100%]  (Sampling)
Chain 2 finished in 75.1 seconds.
Chain 3 Iteration: 1900 / 2000 [ 95%]  (Sampling)
Chain 3 Iteration: 2000 / 2000 [100%]  (Sampling)
Chain 3 finished in 78.7 seconds.

All 4 chains finished successfully.
Mean chain execution time: 75.0 seconds.
Total execution time: 78.9 seconds.

Let’s look at our $$\hat{R}$$:

precis(model_age_kids, depth = 3) %>%
data.frame() %>%
select(Rhat4) %>%
summary()
     Rhat4
Min.   :0.9992
1st Qu.:0.9997
Median :1.0000
Mean   :1.0002
3rd Qu.:1.0005
Max.   :1.0041
NA's   :2       

The $$\hat{R}$$ look OK, indicating agreement between chains. Let’s check our posterior’s parameters:

precis(model_age_kids,  depth = 3, pars = c("a", "b", "gamma", "sigma", "Rho", "delta"))
                mean          sd        5.5%       94.5%     n_eff     Rhat4
a        -1.83023872 0.189105439 -2.12943000 -1.53019765  737.8404 1.0010395
b         0.74380971 0.170902543  0.47651030  1.01466715 1834.3857 1.0011111
gamma    -0.02977116 0.007806217 -0.04238032 -0.01738009 1240.2477 0.9999487
sigma[1]  0.60960022 0.103413091  0.45610425  0.77975195 1147.9766 1.0014546
sigma[2]  0.77524901 0.204278550  0.46024833  1.10468865  480.3989 1.0041223
Rho[1,1]  1.00000000 0.000000000  1.00000000  1.00000000       NaN       NaN
Rho[1,2] -0.63624957 0.172037893 -0.85805022 -0.31101017  569.2614 1.0039501
Rho[2,1] -0.63624957 0.172037893 -0.85805022 -0.31101017  569.2614 1.0039501
Rho[2,2]  1.00000000 0.000000000  1.00000000  1.00000000       NaN       NaN
delta     0.41420075 0.056842441  0.32393682  0.50519928  715.3344 1.0002757

Our population distribution for slopes and parameters has shifted: the average probability of using contraception, for a woman with 1 kids, is much lower. That can be explained as our parameters for the number of children, $$\delta$$, is clearly positive with an 87% compatibility interval between (0.33, 0.50) in the log-odds. Notice also that the effect of age has changed signs and it’s mass is around (-0.04, -0.02) in the log odds scale. That is, older women, adjusting by the number of children they have, are less likely to use contraception.

Let’s plot the effect of having children for the women of the district 20 of average age:

data.frame(data_varying) %>%
group_by(urban) %>%
data_grid(kids, district_id = 20, age = 0) %>%
add_fitted_draws(model_age_kids) %>%
ggplot(aes(kids, .value)) +
stat_lineribbon(fill = "dodgerblue4", alpha = 1/4) +
scale_fill_brewer(palette = "Greys") +
facet_wrap(~factor(urban, labels = c("Rural", "Urban"))) +
labs(title = "Predicted prob of using contraception as function of # of kids",
subtitle = "Women with more kids are more likely to use contraception")

Now, for age:

data.frame(data_varying) %>%
group_by(urban) %>%
data_grid(age, district_id = 1, kids = 1) %>%
add_fitted_draws(model_age_kids) %>%
ggplot(aes(age, .value)) +
stat_lineribbon(fill = "dodgerblue4", alpha = 1/4) +
scale_fill_brewer(palette = "Greys") +
facet_wrap(~factor(urban, labels = c("Rural", "Urban"))) +
labs(title = "Predicted prob of using contraception as function of age",
subtitle = "Age has a negative effect. Statistically adjusting by # of kids",
y = "predicted prob")

Going back to our DAG, our findings are in accordance with it. The total causal effect of age is less than the direct causal effect due to the pipe that goes through number of kids. That is, older women have lower probabilities to use contraception once we statistically adjust by the number of kids they have. However, older women also tend to have more children and the direct effect of having more children is to be less likely to use contraception. Therefore, the mixed signal that we get from the total effect.

# 3rd question

Modify any models from Problem 2 that contained that children variable and model the variable now as a monotonic ordered category, like education from the week we did ordered categories. Education in that example had 8 categories. Children here will have fewer (no one in the sample had 8 children). So modify the code appropriately. What do you conclude about the causal influence of each additional child on use of contraception?

Almost inadvertently, in our previous model we assumed that the additional effect of each kid in the log odds of using contraception was constant. By modelling as an ordered category, we let the data decide whether it should be so.

data_varying <- list(
contraception = bangladesh$use.contraception, district_id = bangladesh$district_id,
urban = bangladesh$urban, age = bangladesh$age.centered,
kids = as.integer(bangladesh\$living.children),
alpha = rep(2, 3)
)

model_age_kids_ord <- ulam(
alist(
contraception ~ dbinom(1, p),
logit(p) <- alp[district_id] + beta[district_id] * urban + gamma*age + bks*sum(delta_j[1:kids]),

# traditional priors
gamma ~ normal(0, 1),
bks ~ normal(0, 1),
# adaptive priors
c(alp, beta)[district_id] ~ multi_normal(c(a, b), Rho, sigma),

# hyper-priors
a ~ normal(-0.5, 1),
b ~ normal(0, 1),
sigma ~ exponential(1),
Rho ~ lkj_corr(2),
vector[4]: delta_j <<- append_row(0, delta),
simplex[3]: delta ~ dirichlet(alpha)
),
chains = 4, cores = 4,
data = data_varying,
iter = 2000
)
Running MCMC with 4 parallel chains, with 1 thread(s) per chain...

Chain 1 Iteration:    1 / 2000 [  0%]  (Warmup)
Chain 2 Iteration:    1 / 2000 [  0%]  (Warmup)
Chain 3 Iteration:    1 / 2000 [  0%]  (Warmup)
Chain 4 Iteration:    1 / 2000 [  0%]  (Warmup)
Chain 2 Iteration:  100 / 2000 [  5%]  (Warmup)
Chain 3 Iteration:  100 / 2000 [  5%]  (Warmup)
Chain 2 Iteration:  200 / 2000 [ 10%]  (Warmup)
Chain 1 Iteration:  100 / 2000 [  5%]  (Warmup)
Chain 2 Iteration:  300 / 2000 [ 15%]  (Warmup)
Chain 3 Iteration:  200 / 2000 [ 10%]  (Warmup)
Chain 1 Iteration:  200 / 2000 [ 10%]  (Warmup)
Chain 2 Iteration:  400 / 2000 [ 20%]  (Warmup)
Chain 3 Iteration:  300 / 2000 [ 15%]  (Warmup)
Chain 4 Iteration:  100 / 2000 [  5%]  (Warmup)
Chain 1 Iteration:  300 / 2000 [ 15%]  (Warmup)
Chain 2 Iteration:  500 / 2000 [ 25%]  (Warmup)
Chain 3 Iteration:  400 / 2000 [ 20%]  (Warmup)
Chain 4 Iteration:  200 / 2000 [ 10%]  (Warmup)
Chain 1 Iteration:  400 / 2000 [ 20%]  (Warmup)
Chain 3 Iteration:  500 / 2000 [ 25%]  (Warmup)
Chain 2 Iteration:  600 / 2000 [ 30%]  (Warmup)
Chain 4 Iteration:  300 / 2000 [ 15%]  (Warmup)
Chain 3 Iteration:  600 / 2000 [ 30%]  (Warmup)
Chain 2 Iteration:  700 / 2000 [ 35%]  (Warmup)
Chain 1 Iteration:  500 / 2000 [ 25%]  (Warmup)
Chain 4 Iteration:  400 / 2000 [ 20%]  (Warmup)
Chain 2 Iteration:  800 / 2000 [ 40%]  (Warmup)
Chain 3 Iteration:  700 / 2000 [ 35%]  (Warmup)
Chain 1 Iteration:  600 / 2000 [ 30%]  (Warmup)
Chain 4 Iteration:  500 / 2000 [ 25%]  (Warmup)
Chain 2 Iteration:  900 / 2000 [ 45%]  (Warmup)
Chain 3 Iteration:  800 / 2000 [ 40%]  (Warmup)
Chain 1 Iteration:  700 / 2000 [ 35%]  (Warmup)
Chain 4 Iteration:  600 / 2000 [ 30%]  (Warmup)
Chain 1 Iteration:  800 / 2000 [ 40%]  (Warmup)
Chain 2 Iteration: 1000 / 2000 [ 50%]  (Warmup)
Chain 2 Iteration: 1001 / 2000 [ 50%]  (Sampling)
Chain 3 Iteration:  900 / 2000 [ 45%]  (Warmup)
Chain 1 Iteration:  900 / 2000 [ 45%]  (Warmup)
Chain 4 Iteration:  700 / 2000 [ 35%]  (Warmup)
Chain 2 Iteration: 1100 / 2000 [ 55%]  (Sampling)
Chain 3 Iteration: 1000 / 2000 [ 50%]  (Warmup)
Chain 3 Iteration: 1001 / 2000 [ 50%]  (Sampling)
Chain 1 Iteration: 1000 / 2000 [ 50%]  (Warmup)
Chain 1 Iteration: 1001 / 2000 [ 50%]  (Sampling)
Chain 2 Iteration: 1200 / 2000 [ 60%]  (Sampling)
Chain 4 Iteration:  800 / 2000 [ 40%]  (Warmup)
Chain 3 Iteration: 1100 / 2000 [ 55%]  (Sampling)
Chain 2 Iteration: 1300 / 2000 [ 65%]  (Sampling)
Chain 4 Iteration:  900 / 2000 [ 45%]  (Warmup)
Chain 1 Iteration: 1100 / 2000 [ 55%]  (Sampling)
Chain 3 Iteration: 1200 / 2000 [ 60%]  (Sampling)
Chain 2 Iteration: 1400 / 2000 [ 70%]  (Sampling)
Chain 1 Iteration: 1200 / 2000 [ 60%]  (Sampling)
Chain 4 Iteration: 1000 / 2000 [ 50%]  (Warmup)
Chain 4 Iteration: 1001 / 2000 [ 50%]  (Sampling)
Chain 2 Iteration: 1500 / 2000 [ 75%]  (Sampling)
Chain 1 Iteration: 1300 / 2000 [ 65%]  (Sampling)
Chain 3 Iteration: 1300 / 2000 [ 65%]  (Sampling)
Chain 4 Iteration: 1100 / 2000 [ 55%]  (Sampling)
Chain 2 Iteration: 1600 / 2000 [ 80%]  (Sampling)
Chain 3 Iteration: 1400 / 2000 [ 70%]  (Sampling)
Chain 1 Iteration: 1400 / 2000 [ 70%]  (Sampling)
Chain 4 Iteration: 1200 / 2000 [ 60%]  (Sampling)
Chain 2 Iteration: 1700 / 2000 [ 85%]  (Sampling)
Chain 3 Iteration: 1500 / 2000 [ 75%]  (Sampling)
Chain 1 Iteration: 1500 / 2000 [ 75%]  (Sampling)
Chain 4 Iteration: 1300 / 2000 [ 65%]  (Sampling)
Chain 2 Iteration: 1800 / 2000 [ 90%]  (Sampling)
Chain 3 Iteration: 1600 / 2000 [ 80%]  (Sampling)
Chain 1 Iteration: 1600 / 2000 [ 80%]  (Sampling)
Chain 2 Iteration: 1900 / 2000 [ 95%]  (Sampling)
Chain 4 Iteration: 1400 / 2000 [ 70%]  (Sampling)
Chain 2 Iteration: 2000 / 2000 [100%]  (Sampling)
Chain 2 finished in 97.2 seconds.
Chain 3 Iteration: 1700 / 2000 [ 85%]  (Sampling)
Chain 1 Iteration: 1700 / 2000 [ 85%]  (Sampling)
Chain 4 Iteration: 1500 / 2000 [ 75%]  (Sampling)
Chain 1 Iteration: 1800 / 2000 [ 90%]  (Sampling)
Chain 3 Iteration: 1800 / 2000 [ 90%]  (Sampling)
Chain 4 Iteration: 1600 / 2000 [ 80%]  (Sampling)
Chain 1 Iteration: 1900 / 2000 [ 95%]  (Sampling)
Chain 3 Iteration: 1900 / 2000 [ 95%]  (Sampling)
Chain 4 Iteration: 1700 / 2000 [ 85%]  (Sampling)
Chain 1 Iteration: 2000 / 2000 [100%]  (Sampling)
Chain 1 finished in 107.4 seconds.
Chain 3 Iteration: 2000 / 2000 [100%]  (Sampling)
Chain 3 finished in 107.9 seconds.
Chain 4 Iteration: 1800 / 2000 [ 90%]  (Sampling)
Chain 4 Iteration: 1900 / 2000 [ 95%]  (Sampling)
Chain 4 Iteration: 2000 / 2000 [100%]  (Sampling)
Chain 4 finished in 113.7 seconds.

All 4 chains finished successfully.
Mean chain execution time: 106.5 seconds.
Total execution time: 113.8 seconds.

Let’s look at our $$\hat{R}$$:

precis(model_age_kids_ord, depth = 3) %>%
data.frame() %>%
select(Rhat4) %>%
summary()
     Rhat4
Min.   :0.9992
1st Qu.:0.9998
Median :1.0004
Mean   :1.0006
3rd Qu.:1.0010
Max.   :1.0081
NA's   :2       

The $$\hat{R}$$ values look OK, indicating that the chains are in close agreement with each other. Let’s check our parameters:

precis(model_age_kids_ord,  depth = 3, pars = c("a", "b", "gamma", "sigma", "Rho", "bks"))
                mean          sd        5.5%       94.5%     n_eff    Rhat4
a        -1.65414138 0.150164278 -1.89677880 -1.41186295 1077.8983 1.001821
b         0.75640661 0.166677397  0.49969545  1.02912495 1975.9601 1.000855
gamma    -0.02849433 0.007341003 -0.04022939 -0.01663006 2461.7104 1.000734
sigma[1]  0.60865051 0.103430477  0.45507541  0.78141514 1311.5317 1.002836
sigma[2]  0.77329681 0.212034191  0.44747856  1.11943225  483.0631 1.008150
Rho[1,1]  1.00000000 0.000000000  1.00000000  1.00000000       NaN      NaN
Rho[1,2] -0.64433118 0.169482917 -0.86035707 -0.33008590  860.8983 1.006834
Rho[2,1] -0.64433118 0.169482917 -0.86035707 -0.33008590  860.8983 1.006834
Rho[2,2]  1.00000000 0.000000000  1.00000000  1.00000000       NaN      NaN
bks       1.37803430 0.160614910  1.12189780  1.62950030 1038.0832 1.002649

The overall effect of the children variable, when a woman has 4 children, has the same sign and roughly the same magnitude as previous inferences. Let’s look at the effect splitted by the number of children:

precis(model_age_kids_ord, depth = 3, pars = "delta")
               mean         sd       5.5%     94.5%    n_eff     Rhat4
delta[1] 0.73293447 0.08141250 0.59898722 0.8600923 4670.253 0.9994883
delta[2] 0.16762106 0.07829638 0.05269849 0.2999460 5027.078 0.9994653
delta[3] 0.09944448 0.05476397 0.02512521 0.1978564 5644.457 0.9993787

Remember that these are percentages of the total effect. That is, around 73% of the total effect comes from having the second child. Therefore, we conclude that most of the effect that having children increases the chances of using contraception comes from having a second child.