# use groundhog to make code maximally reproducible
if (!require("groundhog", quietly = TRUE)) {
install.packages("groundhog")
}
library("groundhog")
# use groundhog to install and load packages
pkgs <- c("here", # Path management
"tidyverse", # ggplot, dplyr, %>%, and friends
"easystats", # bayestestR, performance, effectsize and friends
"psych", # Descriptive statistics and other tools
"brms", # Bayesian modeling through Stan
"tidybayes", # Integration of Bayesian models in tidy + ggplot workflow
"broom", # Convert model objects to data frames
"broom.mixed", # Convert brms model objects to data frames
"tinytable", # Lightweight package to create tables
"hrbrthemes", # Additional ggplot themes
"extrafont", # Additional fonts for plots etc
"ggdist", # Special geoms for posterior distributions
"ggrepel", # Automatically position labels
"gghalves", # Special half geoms
"patchwork" # Combine ggplot objects
)
groundhog.library(pkgs, "2024-03-01") # change to a recent date
Teaching Myself: Bayesian Linear Regression
Meet the data
I will use a large open data set described in Eisenberg et al. (2018; 2019), combined with data from Zmigrod et al. (2021). The data from Eisenberg and colleagues includes measures from a large number of questionnaires and cognitive tasks that tap into the domain of self regulation. The data from Zmigrod and colleagues includes data from several measures of ideology.
The goal will be to find out whether self-report and task measures of self-control predict authoritarianism.
Variables of interest:
Self-reported self control: Brief Self-Control Scale
Behavioral self-control: Accuracy in a Go / No-Go task, a cognitive task that requires a response to a frequent Go stimulus, such as pressing the space bar when a red square is displayed, and the inhibition of this response when a No-Go stimulus is displayed.
Self reported authoritarianism.
data1 <- data1 %>%
rename("SubjectID" = "...1")
data <- inner_join(data1, data2, by = "SubjectID")
# select variables of interest
d <- data %>%
select(c(SubjectID, BSCSSCON, GNGDRPIM, Authoritarianism)) %>%
rename(ID = SubjectID,
bscs = BSCSSCON,
gonogo = GNGDRPIM,
aut = Authoritarianism) %>%
mutate(bscs_z = scale(bscs),
gonogo_z = scale(gonogo),
aut_z = scale(aut))
describe(d)
## vars n mean sd median trimmed mad min max range skew kurtosis
## ID* 1 334 167.50 96.56 167.50 167.50 123.80 1.00 334.00 333.00 0.00 -1.21
## bscs 2 334 46.07 10.30 47.00 46.33 11.86 18.00 65.00 47.00 -0.22 -0.53
## gonogo 3 333 3.65 0.76 3.73 3.67 0.64 0.82 5.97 5.15 -0.11 1.18
## aut 4 334 4.43 2.94 4.00 4.25 2.97 0.00 12.00 12.00 0.55 -0.05
## bscs_z 5 334 0.00 1.00 0.09 0.02 1.15 -2.73 1.84 4.56 -0.22 -0.53
## gonogo_z 6 333 0.00 1.00 0.10 0.03 0.85 -3.74 3.07 6.81 -0.11 1.18
## aut_z 7 334 0.00 1.00 -0.15 -0.06 1.01 -1.51 2.58 4.08 0.55 -0.05
## se
## ID* 5.28
## bscs 0.56
## gonogo 0.04
## aut 0.16
## bscs_z 0.05
## gonogo_z 0.05
## aut_z 0.05
vars1_density<- d %>%
select(c(bscs_z, gonogo_z, aut_z)) %>%
pivot_longer(., cols = c(bscs_z, gonogo_z, aut_z),
names_to = "Variable", values_to = "Value") %>%
ggplot(aes(x = Value, fill = factor(Variable))) +
geom_density(aes(alpha = 0.6)) +
labs(x="Standardised values", y=element_blank(),
title="Distributions of predictor and outcome variables") +
scale_fill_ipsum() +
scale_color_ipsum() +
guides(color = "none", alpha = "none") +
theme_ipsum(plot_title_size = 14) +
theme(legend.position = "bottom",
legend.title = element_blank())
vars1_density
Have a first visual look of the association between the variables.
ggplot(d, aes(x = bscs_z, y = aut_z)) +
geom_point(size = 1) +
geom_smooth(method = "lm") +
labs(x="Brief Self-Control Scale", y="Authoritarianism",
title="Association between self-control and authoritarianism") +
scale_fill_ipsum() +
scale_color_ipsum() +
guides(color = "none", alpha = "none") +
theme_ipsum(plot_title_size = 12)
ggplot(d, aes(x = gonogo_z, y = aut_z)) +
geom_point(size = 1) +
geom_smooth(method = "lm") +
labs(x="Go / No-Go Accuracy", y="Authoritarianism",
title="Association between Go / No-go accuracy and authoritarianism") +
scale_fill_ipsum() +
scale_color_ipsum() +
guides(color = "none", alpha = "none") +
theme_ipsum(plot_title_size = 12)
d %>%
select(c(bscs_z, gonogo_z, aut_z)) %>%
pairs.panels(.)
A first Bayesian linear regression: One continuous predictor
We already displayed regression lines. So far, however, all of them were fit using the frequentist lm() function. We want to change that and go Bayesian here. Let’s start by investigating the association between self-reported self-control and authoritarianism.
Selecting priors
I already scaled all variables of interest. This will make it a bit simpler to select priors. I will use weakly informative priors for our slope and intercept parameters (see https://github.com/stan-dev/stan/wiki/Prior-Choice-Recommendations). The selected prior is \(N(0,1)\).
It is also recommended to conduct a prior sensitivity analysis. This can be done for example by refitting the model with a different prior and by comparing the results. We will thus fit another model using \(N(0,2)\) as my prior. Let’s plot both of them to see what kind of an effect they may have on our parameters.
# simulate some data
x_values <- seq(-6, 6, length.out = 1000)
dists <- data.frame(x = x_values,
y1 = dnorm(x_values, mean = 0, sd = 1),
y2 = dnorm(x_values, mean = 0, sd = 2))
# colors
colors <- ipsum_pal()(2)
# plot for N(0,1)
p1 <- ggplot(dists, aes(x = x, y = y1)) +
geom_area(aes(fill = "N(0,1)"), alpha = 0.8) +
scale_fill_manual(values = c("N(0,1)" = colors[1])) +
labs(title = "N(0,1)", y = "Density", x = "Value") +
theme_ipsum() +
theme(legend.position = "none")
# plot for N(0,2)
p2 <- ggplot(dists, aes(x = x, y = y2)) +
geom_area(aes(fill = "N(0,2)"), alpha = 0.8) +
scale_fill_manual(values = c("N(0,2)" = colors[2])) +
labs(title = "N(0,2)", y = "Density", x = "Value") +
theme_ipsum() +
theme(legend.position = "none")
# Combine plots
p_combined <- p1 + p2 + plot_layout(ncol = 2)
print(p_combined)
We can see that in the the plot for \(N(0,1)\), more density is around 0. As we expect our parameter values to be between 0 and 1, this gives more weight to parameter values closer to zero, and less weight to parameter values that are far from zero, compared to \(N(0,2)\) , where more probability mass is also on parameters that are a bit further from 0. If we would know that the association is large, and at the same time our sample size is low, we could think of using informative priors.
Here we prefer the prior with less information - this prior also has a slight regularizing effect.
Let’s check how our priors are set by default in brms
.
get_prior(aut_z ~ bscs_z, data = d)
## prior class coef group resp dpar nlpar lb ub source
## (flat) b default
## (flat) b bscs_z (vectorized)
## student_t(3, -0.1, 2.5) Intercept default
## student_t(3, 0, 2.5) sigma 0 default
By default brms
sets non-informative flat priors for our slope parameters. We thus have to change this, and can do so directly when fitting the model.
Fitting and summarising the first model
m1.1 <- brm(aut_z ~ bscs_z,
data = d,
prior = c(prior(normal(0, 1), class = "Intercept"),
prior(normal(0, 1), class = "b")),
seed = bayes_seed,
file = "02_fits/fit_m1.1")
Check the results. We can use either summary(m1.1)
or print(m1.1)
to do so.
summary(m1.1, prob = 0.95) # prob = 0.95 is the default
## Family: gaussian
## Links: mu = identity; sigma = identity
## Formula: aut_z ~ bscs_z
## Data: d (Number of observations: 334)
## Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
## total post-warmup draws = 4000
##
## Population-Level Effects:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept 0.00 0.05 -0.11 0.11 1.00 4111 2956
## bscs_z 0.15 0.05 0.05 0.26 1.00 3702 2562
##
## Family Specific Parameters:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sigma 0.99 0.04 0.92 1.07 1.00 3605 3062
##
## Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).
Use the tools from bayestestR
to have another look at the posterior. We also define a region of practical equivalence ranging from [-0.05; 0.05].
m1.1_posteriors <- describe_posterior(m1.1,
centrality = "median",
dispersion = FALSE,
ci = 0.95,
ci_method = "hdi",
test = c("rope"),
rope_range = c(-0.05, 0.05),
rope_ci = 1)
m1.1_posteriors <- as.data.frame(m1.1_posteriors)
m1.1_posteriors
## Parameter Median CI CI_low CI_high ROPE_CI ROPE_low ROPE_high ROPE_Percentage
## 2 b_Intercept 0.000595 0.95 -0.1085 0.103 1 -0.05 0.05 0.6392
## 1 b_bscs_z 0.153896 0.95 0.0424 0.253 1 -0.05 0.05 0.0272
## Rhat ESS
## 2 1.000 4139
## 1 0.999 3690
Plot the conditional effects.
c_eff_m1.1 <- conditional_effects(m1.1)
c_eff_m1.1_plot <- plot(c_eff_m1.1,
points = T,
point_args = list(size = 1, alpha = 1/4, width = .05, height = .05, color = "black"),
plot = FALSE)[[1]] +
scale_color_ipsum() +
scale_fill_ipsum() +
theme_ipsum()
c_eff_m1.1_plot
Let’s use bayesplot
functions to plot the posterior distributions of the parameters.
mcmc_plot(m1.1,
type = "areas") +
theme_ipsum()
We can also manually create a similar plot using draws from the posterior distribution. For simplicity, Here we use gather_draws
from the tidybayes
package - this automatically returns a long dataframe suitable for plotting in ggplot.
draws_m1.1 <- m1.1 %>%
gather_draws(`b_.*`, regex = TRUE) %>%
mutate(variable = case_when(.variable == "b_Intercept" ~ "Intercept",
.variable == "b_bscs_z" ~ "Self-Control",
TRUE ~ .variable # Keeps the original value if it doesn't match any of the above
))
ggplot(draws_m1.1, aes(x = .value, y = fct_rev(variable), fill = variable)) +
geom_rect(aes(xmin = -0.05, xmax = 0.05, ymin = -Inf, ymax = Inf), fill = "#FFE3EB", alpha = 0.05) +
geom_vline(xintercept = 0) +
stat_halfeye(slab_alpha = 0.8,
.width = c(0.9, 0.95), point_interval = "median_hdi") +
scale_fill_ipsum() +
guides(fill = "none", slab_alpha = "none", alpha = "none") +
labs(title = "Effect estimates", x = "Coefficient", y = "Variable",
caption = "90% and 95% credible intervals shown in black. Area in rose indicates ROPE.") +
theme_ipsum(plot_title_size = 14,
axis_title_size = 12,
axis_title_face = "bold")
Checking the model
Prior check
Let’s check if the prior that we used for the slope and intercept was informative or, as intended, only weakly informative. We can use functionality from bayestestR
for this.
prior_summary(m1.1)
## prior class coef group resp dpar nlpar lb ub source
## normal(0, 1) b user
## normal(0, 1) b bscs_z (vectorized)
## normal(0, 1) Intercept user
## student_t(3, 0, 2.5) sigma 0 default
check_prior(m1.1)
## Parameter Prior_Quality
## 1 b_Intercept uninformative
## 2 b_bscs_z uninformative
Ok - as intended our priors were uninformative.
Posterior predictive check
Let’s also check how well our model performs in generating data that is similar to our observed data. We call this a Posterior predictive check (PPC).
pp_check(m1.1, ndraws= 200)
pp_check(m1.1, ndraws = 100, type ='error_scatter_avg', alpha = .1)
Trace plot
Commonly, we look at the trace plots, i.e., the estimated values across each iteration for a chain. See this nice practical guide by Michael Clark for some details. We will follow his code in several steps below.
What should the trace plot look like? Ideally it should look like a series from a normal distribution.
qplot(x = 1:1000, y = rnorm(1000), geom = 'line') +
theme_ipsum()
Now we can also plot the trace plot from our model, using nice functionality from bayesplot
.
mcmc_plot(m1.1, type = 'trace') +
theme_ipsum()
Prior sensitivity checks
Ok, so we fit our first model. We also want to check whether the choice of different priors would have a substantial effect. We refit a similar model, just this time with \(N(0,2)\) as prior for the intercept and slope parameter.
m1.2 <- brm(aut_z ~ bscs_z,
data = d,
prior = c(prior(normal(0, 2), class = "Intercept"),
prior(normal(0, 2), class = "b")),
seed = bayes_seed,
file = "02_fits/fit_m1.2")
summary(m1.1, prob = 0.95) # prob = 0.95 is the default
## Family: gaussian
## Links: mu = identity; sigma = identity
## Formula: aut_z ~ bscs_z
## Data: d (Number of observations: 334)
## Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
## total post-warmup draws = 4000
##
## Population-Level Effects:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept 0.00 0.05 -0.11 0.11 1.00 4111 2956
## bscs_z 0.15 0.05 0.05 0.26 1.00 3702 2562
##
## Family Specific Parameters:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sigma 0.99 0.04 0.92 1.07 1.00 3605 3062
##
## Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).
m1.2_posteriors <- describe_posterior(m1.2,
centrality = "median",
dispersion = FALSE,
ci = 0.95,
ci_method = "hdi",
test = c("rope"),
rope_range = c(-0.05, 0.05),
rope_ci = 1)
m1.2_posteriors <- as.data.frame(m1.2_posteriors)
# Add a new column 'model' to each dataframe
m1.1_posteriors <- m1.1_posteriors %>% mutate(model = "m1.1")
m1.2_posteriors <- m1.2_posteriors %>% mutate(model = "m1.2")
# Combine the dataframes
m1.1_m1.2_combined <- bind_rows(m1.1_posteriors, m1.2_posteriors)
# Print the combined dataframe
m1.1_m1.2_combined %>%
select(Parameter, Median, CI_low, CI_high, model) %>%
tt(.)
Parameter | Median | CI_low | CI_high | model |
---|---|---|---|---|
b_Intercept | 0.000595 | -0.1085 | 0.103 | m1.1 |
b_bscs_z | 0.153896 | 0.0424 | 0.253 | m1.1 |
b_Intercept | 0.000932 | -0.1045 | 0.109 | m1.2 |
b_bscs_z | 0.152366 | 0.0470 | 0.265 | m1.2 |
We can see that irrespective of our prior choice, both the median of the posterior distributions of the two different models, and the 95% HDI are very similar.
Takeaway
Using the Bayesian approach we get much more than just a point estimate for our slope parameter of interest - we get a posterior distribution of parameters that are plausible given our observed data.
An increase of self control of 1 SD is associated with an increase of 0.04-0.26 SDs (median: 0.15) in authoritarianism.
The 95% HDI of our predictor of interest “Self-Control” does not include 0. We can thus say with a relatively high probability (according to common criteria - some authors also argue for 89% or 90% CIs) given our data that the effect is different from 0.
This is still true when we define and look at the ROPE - a region of practical equivalence. There is only minor overlap between the posterior distribution of our parameter and the ROPE (around 3%), indicating that the probability that our effect is practically relevant given our defined criteria is large.
Resources
I am not a statistician, so please take all of what I wrote and coded here with a lot of caution. However, there are many great resources out there if you want to delve deeper into the topic - I am just getting started with this, but here are a few potentially helpful links.
Doing Bayesian Data Analysis in brms and the tidyverse: A brms and tidyverse version by Solomon Kurz of the classic book by Kruschke Doing Bayesian data analysis: A tutorial with R, JAGS, and Stan.
Practical Bayes Part I and Practical Bayes Part II: Great resources to dealing and avoiding common problems in Bayesian models by Micheal Clark
Bayesian Basics: General overview of how to run Bayesian models in R
bayestestR: Vignettes on basic Bayesian models and their interpretation
Stan Wiki: Prior Choice: Go to reference for selecting priors
Course Handouts for Bayesian Data Analysis Class by Mark Lai
And there are of course many many more resources, this is just to get started with some.