Bayesian First Aid for undergraduates

Introduction

Could undergraduates with basic statistical knowledge benefit from applying Bayesian methods in their dissertations? In most cases the answer would be probably not. Providing they have collected a reasonably informative data set I would typically advise them just to concentrate on showing any patterns in their data clearly using good figures. Even if reporting conventional p-values from NHSTs can be rather uninformative, it usually does no harm. Although NHSTs might not really address the scientific questions they are asking, nudging them to think more about confidence intervals can point them in the right direction.

However there are situations when a student has worked very hard but not managed to collect data that need much more than a simple t-test or correlation analysis. In these circumstances encouraging them to think a little more deeply about the basis of inference might help to improve the scientific rigour of their analysis. This might be particularly important if they are tempted to apply uninformative non-parametric tests instead of parametric analyses that allow them to extract the size of differences rather than simply report “significant” difference. Bayesian methods do rely on assumptions regarding distributions. In fact the assumptions in many cases are even more restrictive than frequentist methods. However guidance can be provided to make the methods more flexible if students can at least start down the path themselves.

Today I discovered a very interesting project online that seems to have been abandoned a bit by the author in recent years, although he continues producing very useful packages for R and course material-

http://www.sumsar.net/blog/2014/01/bayesian-first-aid/

The rest of Rasmus Bååth’s blog posts are all well worth browsing through.

The Bayesian First Aid package is loaded on the BU server, that also has JAGS preloaded, so it ready to go.

# devtools::install_github("rasmusab/bayesian_first_aid")

T-test

Why would an undergrad need a different t-test? Well there are a few reasons here. The theoretical justification is that a Bayesian prior based on a t-dustribution is genuinely more robust.

http://www.sumsar.net/blog/2014/02/bayesian-first-aid-one-sample-t-test/

The more pragamatic reason would be that the Bayesian first aid package provides some cool looking output that would provide the student with more to think about and discuss in depth.

The nice element of the package is that if a student has learned to run a t-test in R, all that’s needed to run the more sophisticated Bayesian version is to load the package and place “bayes” before the code.

So, if the base R t.test using a built in data set uses this code

t.test(extra ~ group, data = sleep,paired=TRUE)

## 
##  Paired t-test
## 
## data:  extra by group
## t = -4.0621, df = 9, p-value = 0.002833
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -2.4598858 -0.7001142
## sample estimates:
## mean of the differences 
##                   -1.58

The bayesian version is run with this.

library(BayesianFirstAid)
fit <- bayes.t.test(extra ~ group, data = sleep,paired=TRUE)

The package produces some nice graphical output.

plot(fit)

A summary

summary(fit)

##   Data
## group 1, n = 10
## group 2, n = 10
## 
##   Model parameters and generated quantities
## mu_diff: the mean pairwise difference between group 1 and group 2 
## sigma_diff: the scale of the pairwise difference, a consistent
##   estimate of SD when nu is large.
## nu: the degrees-of-freedom for the t distribution fitted to the pairwise difference
## eff_size: the effect size calculated as (mu_diff - 0) / sigma_diff
## diff_pred: predicted distribution for a new datapoint generated
##   as the pairwise difference between group 1 and group 2 
## 
##   Measures
##              mean     sd  HDIlo  HDIup %<comp %>comp
## mu_diff    -1.475  0.430 -2.366 -0.653  0.999  0.001
## sigma_diff  1.201  0.471  0.330  2.103  0.000  1.000
## nu         23.690 26.206  1.002 76.606  0.000  1.000
## eff_size   -1.416  0.682 -2.738 -0.294  0.999  0.001
## diff_pred  -1.480  2.004 -4.687  1.604  0.869  0.131
## 
## 'HDIlo' and 'HDIup' are the limits of a 95% HDI credible interval.
## '%<comp' and '%>comp' are the probabilities of the respective parameter being
## smaller or larger than 0.
## 
##   Quantiles
##             q2.5%   q25% median   q75% q97.5%
## mu_diff    -2.392 -1.725 -1.455 -1.205 -0.678
## sigma_diff  0.442  0.891  1.149  1.443  2.309
## nu          1.408  5.526 14.418 32.522 95.752
## eff_size   -3.018 -1.717 -1.327 -0.984 -0.413
## diff_pred  -4.648 -2.302 -1.457 -0.640  1.666

All the code, which can be used and adjusted if requires.

model.code(fit)

## ## Model code for Bayesian estimation supersedes the t test - paired samples ##
## require(rjags)
## 
## # Setting up the data
## x <- subset(sleep, as.factor(group) == "1", extra, drop = TRUE) 
## y <- subset(sleep, as.factor(group) == "2", extra, drop = TRUE) 
## pair_diff <- x - y
## comp_mu <-  0 
## 
## # The model string written in the JAGS language
## model_string <- "model {
##   for(i in 1:length(pair_diff)) {
##     pair_diff[i] ~ dt( mu_diff , tau_diff , nu )
##   }
##   diff_pred ~ dt( mu_diff , tau_diff , nu )
##   eff_size <- (mu_diff - comp_mu) / sigma_diff
##   
##   mu_diff ~ dnorm( mean_mu , precision_mu )
##   tau_diff <- 1/pow( sigma_diff , 2 )
##   sigma_diff ~ dunif( sigma_low , sigma_high )
##   # A trick to get an exponentially distributed prior on nu that starts at 1.
##   nu <- nuMinusOne + 1 
##   nuMinusOne ~ dexp(1/29)
## }"
## 
## # Setting parameters for the priors that in practice will result
## # in flat priors on mu and sigma.
## mean_mu = mean(pair_diff, trim=0.2)
## precision_mu = 1 / (mad(pair_diff)^2 * 1000000)
## sigma_low = mad(pair_diff) / 1000 
## sigma_high = mad(pair_diff) * 1000
## 
## # Initializing parameters to sensible starting values helps the convergence
## # of the MCMC sampling. Here using robust estimates of the mean (trimmed)
## # and standard deviation (MAD).
## inits_list <- list(
##   mu_diff = mean(pair_diff, trim=0.2),
##   sigma_diff = mad(pair_diff),
##   nuMinusOne = 4)
## 
## data_list <- list(
##   pair_diff = pair_diff,
##   comp_mu = comp_mu,
##   mean_mu = mean_mu,
##   precision_mu = precision_mu,
##   sigma_low = sigma_low,
##   sigma_high = sigma_high)
## 
## # The parameters to monitor.
## params <- c("mu_diff", "sigma_diff", "nu", "eff_size", "diff_pred")
## 
## # Running the model
## model <- jags.model(textConnection(model_string), data = data_list,
##                     inits = inits_list, n.chains = 3, n.adapt=1000)
## update(model, 500) # Burning some samples to the MCMC gods....
## samples <- coda.samples(model, params, n.iter=10000)
## 
## # Inspecting the posterior
## plot(samples)
## summary(samples)

Poisson test

no_cases <- c(41, 15)
no_years <- c(28010, 19017)

bayes.poisson.test(no_cases, no_years)

## 
##  Bayesian Fist Aid poisson test - two sample
## 
## number of events: 41 and 15, time periods: 28010 and 19017
## 
##   Estimates [95% credible interval]
## Group 1 rate: 0.0015 [0.0011, 0.0020]
## Group 2 rate: 8e-04 [0.00044, 0.0012]
## Rate ratio (Group 1 rate / Group 2 rate):
##               1.8 [1.0, 3.3]
## 
## The event rate of group 1 is more than 1 times that of group 2 by a probability 
## of 0.984 and less than 1 times that of group 2 by a probability of 0.016 .

# Save the return value in order to inspect the model result further.
fit <- bayes.poisson.test(no_cases, no_years)
summary(fit)

##   Model parameters and generated quantities
## lambda[1]: the rate of the process of group 1.
## lambda[2]: the rate of the process of group 2.
## x_pred[1]: predicted event count of group 1 during 28010 periods.
## x_pred[2]: predicted event count of group 2 during 19017 periods.
## rate_diff: The difference lambda[1] - lambda[2].
## rate_ratio: The ratio lambda[1] / lambda[2].
## 
##   Measures
##                 mean       sd     HDIlo    HDIup   %<comp   %>comp
## lambda[1]   0.001490 0.000230 1.050e-03  0.00195 1.00e+00 6.67e-05
## lambda[2]   0.000816 0.000207 4.220e-04  0.00122 1.00e+00 6.67e-05
## x_pred[1]  41.629000 9.059000 2.300e+01 58.00000 6.67e-05 1.00e+00
## x_pred[2]  15.562000 5.608000 5.000e+00 26.00000 6.67e-05 1.00e+00
## rate_diff   0.000669 0.000312 5.160e-05  0.00127 1.00e+00 6.67e-05
## rate_ratio  1.857000       NA 1.027e+00  3.35900 1.70e-02 9.83e-01
## 
## 'HDIlo' and 'HDIup' are the limits of a 95% HDI credible interval.
## '%<comp' and '%>comp' are the probabilities of the respective parameter being
## smaller or larger than 1.
## For rate_ratio the mean, 'HDIlo' and 'HDIup' are calculated on the log transformed
## samples and then transformed back to the original scale.
## 
##   Quantiles
##                q2.5%     q25%   median     q75%   q97.5%
## lambda[1]  1.070e-03 1.33e-03 1.47e-03 1.63e-03  0.00197
## lambda[2]  4.590e-04 6.71e-04 7.97e-04 9.45e-04  0.00127
## x_pred[1]  2.600e+01 3.50e+01 4.10e+01 4.70e+01 61.00000
## x_pred[2]  6.000e+00 1.20e+01 1.50e+01 1.90e+01 28.00000
## rate_diff  5.950e-05 4.61e-04 6.68e-04 8.77e-04  0.00128
## rate_ratio 1.052e+00 1.51e+00 1.84e+00 2.26e+00  3.46900

plot(fit)

Correlation test

x <- c(44.4, 45.9, 41.9, 53.3, 44.7, 44.1, 50.7, 45.2, 60.1)
y <- c( 2.6, 3.1, 2.5, 5.0, 3.6, 4.0, 5.2, 2.8, 3.8)

# First a classical correlation test:
cor.test(x, y)

## 
##  Pearson's product-moment correlation
## 
## data:  x and y
## t = 1.8411, df = 7, p-value = 0.1082
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.1497426  0.8955795
## sample estimates:
##       cor 
## 0.5711816

# And here is the Bayesian first aid alternative:


# Save the output into a variable for easy plotting and further inspection:
fit <- bayes.cor.test(rank(x), rank(y))
plot(fit)

summary(fit)

##   Data
## rank(x) and rank(y), n = 9
## 
##   Model parameters
## rho: the correlation between rank(x) and rank(y) 
## mu[1]: the mean of rank(x) 
## sigma[1]: the scale of rank(x) , a consistent
##   estimate of SD when nu is large.
## mu[2]: the mean of rank(y) 
## sigma[2]: the scale of rank(y) 
## nu: the degrees-of-freedom for the bivariate t distribution
## xy_pred[1]: the posterior predictive distribution of rank(x) 
## xy_pred[2]: the posterior predictive distribution of rank(y) 
## 
##   Measures
##              mean     sd  HDIlo  HDIup %<comp %>comp
## rho         0.474  0.279 -0.084  0.926  0.067  0.933
## mu[1]       5.014  1.191  2.662  7.410  0.001  0.999
## mu[2]       4.947  1.211  2.554  7.245  0.003  0.997
## sigma[1]    3.317  1.074  1.633  5.398  0.000  1.000
## sigma[2]    3.334  1.098  1.685  5.422  0.000  1.000
## nu         34.697 29.305  2.021 92.969  0.000  1.000
## xy_pred[1]  5.058  4.017 -2.735 13.111  0.081  0.919
## xy_pred[2]  5.006  4.087 -2.647 13.357  0.087  0.913
## 
## 'HDIlo' and 'HDIup' are the limits of a 95% HDI credible interval.
## '%<comp' and '%>comp' are the probabilities of the respective parameter being
## smaller or larger than 0.
## 
##   Quantiles
##             q2.5%   q25% median   q75%  q97.5%
## rho        -0.189  0.313  0.524  0.685   0.875
## mu[1]       2.663  4.286  5.016  5.734   7.410
## mu[2]       2.568  4.219  4.967  5.680   7.273
## sigma[1]    1.881  2.580  3.108  3.816   5.902
## sigma[2]    1.893  2.591  3.122  3.833   6.005
## nu          3.894 13.799 26.380 46.308 114.682
## xy_pred[1] -2.805  2.738  5.011  7.275  13.087
## xy_pred[2] -2.826  2.682  4.970  7.299  13.242

model.code(fit)

## ## Model code for the Bayesian First Aid alternative to Pearson's correlation test. ##
## require(rjags)
## 
## # Setting up the data
## x <- rank(x) 
## y <- rank(y) 
## xy <- cbind(x, y)
## 
## # The model string written in the JAGS language
## model_string <- "model {
##   for(i in 1:n) {
##     xy[i,1:2] ~ dmt(mu[], prec[ , ], nu) 
##   }
## 
##   xy_pred[1:2] ~ dmt(mu[], prec[ , ], nu)
## 
##   # JAGS parameterizes the multivariate t using precision (inverse of variance) 
##   # rather than variance, therefore here inverting the covariance matrix.
##   prec[1:2,1:2] <- inverse(cov[,])
## 
##   # Constructing the covariance matrix
##   cov[1,1] <- sigma[1] * sigma[1]
##   cov[1,2] <- sigma[1] * sigma[2] * rho
##   cov[2,1] <- sigma[1] * sigma[2] * rho
##   cov[2,2] <- sigma[2] * sigma[2]
## 
##   # Priors  
##   rho ~ dunif(-1, 1)
##   sigma[1] ~ dunif(sigmaLow, sigmaHigh) 
##   sigma[2] ~ dunif(sigmaLow, sigmaHigh)
##   mu[1] ~ dnorm(mean_mu, precision_mu)
##   mu[2] ~ dnorm(mean_mu, precision_mu)
##   nu <- nuMinusOne+1
##   nuMinusOne ~ dexp(1/29)
## }"
## 
## # Initializing the data list and setting parameters for the priors
## # that in practice will result in flat priors on mu and sigma.
## data_list = list(
##   xy = xy, 
##   n = length(x),
##   mean_mu = mean(c(x, y), trim=0.2) ,
##   precision_mu = 1 / (max(mad(x), mad(y))^2 * 1000000),
##   sigmaLow = min(mad(x), mad(y)) / 1000 ,
##   sigmaHigh = max(mad(x), mad(y)) * 1000)
## 
## # Initializing parameters to sensible starting values helps the convergence
## # of the MCMC sampling. Here using robust estimates of the mean (trimmed)
## # and standard deviation (MAD).
## inits_list = list(mu=c(mean(x, trim=0.2), mean(y, trim=0.2)), rho=cor(x, y, method="spearman"), 
##                   sigma = c(mad(x), mad(y)), nuMinusOne = 5)
## 
## # The parameters to monitor.
## params <- c("rho", "mu", "sigma", "nu", "xy_pred")
##   
## # Running the model
## model <- jags.model(textConnection(model_string), data = data_list,
##                     inits = inits_list, n.chains = 3, n.adapt=1000)
## update(model, 500) # Burning some samples to the MCMC gods....
## samples <- coda.samples(model, params, n.iter=5000)
## 
## # Inspecting the posterior
## plot(samples)
## summary(samples)

library(BayesFactor)
b1<-proportionBF(y=1, N=20, p=0.5,r=0.1)
b2<-proportionBF(y=1, N=20, p=0.5,r=0.3)
plot(c(b1,b2))

Bayesian First Aid for undergraduates

Duncan Golicher

1/2/2019

Introduction

T-test

Poisson test

Correlation test