Introduction

There is a growing tendency for Bayesian and frequentist methods to be combined for applied data analysis. This may seem odd, as there are intrinsic philosophical differences between the two approaches taken to inference (Ellison 2004). However, in a pragmatic sense, as long as non informative priors are being used, Bayesian methods and frequentist methods produce results that can be effectively identical (Edwards 1996). Without informative priors confidence intervals and credible intervals amount to much the same thing. As Bayesian model fitting through wrappers to MCMC algorithms have become incorporated into R, users of some libraries for fitting mixed effects models may even be unaware that they have applied a Bayesian technique.

Writing bespoke models in JAGS or Stan can be difficult and there are pitfalls. Really complex models are best left to specialist statisticians. However there are times when specifying simple models can help to solve routine problems, in combination with other methods.

library(rjags)
library(tidyverse)
library(rjags)
library(ggmcmc)
library(polspline)
library(propagate)
library(multcomp)
library(DT)

The mussels data set

I have used a very simple data set for introductory classes on one way Anova. Although finding differences between measurements attributable to a factor such as site of origin does not allow direct causal inference, it is a commonly needed as a screening analysis of observational data by ecologists, It is therefore useful as a demonstration of the concepts.

Heterogeneity of variance and unbalanced sample sizes

Note the difference in sample sizes between sites and the differences in sample standard deviations.

d<-read.csv("https://tinyurl.com/aqm-data/mussels.csv")
d %>% group_by(Site) %>% summarise(n=n(),mean=round(mean(Lshell),1),sd=round(sd(Lshell),1)) %>% DT::datatable()

Boxplots of the data also suggest notable heterogeneity in variances between measurements taken from different sites, although the assumption of normality seems to be reasonable.

library(ggplot2)

theme_set(theme_bw())
g0 <- ggplot(d,aes(x=Site,y=Lshell))
g_box<-g0+ geom_boxplot()
g_box

Confidence intervals

Conventional confidence intervals based on the standard errors around the group means suggest that there are differences between sites. The conventional advice given to students is that if such confidence intervals do not overlap then an unpaired t-test between the two sites will be significant. If they do overlap, then a test may be needed to establish a significant difference.

g_mean<-g0+stat_summary(fun.y=mean,geom="point")
g_mean<-g_mean+stat_summary(fun.data=mean_cl_normal,geom="errorbar")
g_mean

Conventional One way ANOVA and multiple comparisons

d$Site<-as.factor(as.numeric(d$Site)) ## Change to a site number for factto levels. This is for brevity in glht output
mod<-lm(data=d,Lshell~Site)

anova(lm(data=d,Lshell~Site))
## Analysis of Variance Table
## 
## Response: Lshell
##            Df Sum Sq Mean Sq F value    Pr(>F)    
## Site        5   5525    1105  6.1732 4.579e-05 ***
## Residuals 107  19153     179                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

In the introductory classes I explain the meaning of treatment contrasts and both the potential utility and the pitfalls involved in pair-wise comparisons between sites, making allowance for multiple comparisons using Tukey’s adjustment.

Treatment contrasts

The default treatment contrasts are more suitable in the context of an experiment with a reference level.

summary(mod)
## 
## Call:
## lm(formula = Lshell ~ Site, data = d)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -44.906  -8.340   1.031   9.231  30.550 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  100.769      2.624  38.405  < 2e-16 ***
## Site2          8.467      3.748   2.259   0.0259 *  
## Site3          6.037      5.409   1.116   0.2669    
## Site4         -3.619      5.409  -0.669   0.5049    
## Site5         18.697      3.925   4.763 6.02e-06 ***
## Site6          2.471      3.748   0.659   0.5111    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 13.38 on 107 degrees of freedom
## Multiple R-squared:  0.2239, Adjusted R-squared:  0.1876 
## F-statistic: 6.173 on 5 and 107 DF,  p-value: 4.579e-05

Multiple comparisons

plot(glht(mod, linfct = mcp(Site = "Tukey")))

summary(glht(mod, linfct = mcp(Site = "Tukey")))
## 
##   Simultaneous Tests for General Linear Hypotheses
## 
## Multiple Comparisons of Means: Tukey Contrasts
## 
## 
## Fit: lm(formula = Lshell ~ Site, data = d)
## 
## Linear Hypotheses:
##            Estimate Std. Error t value Pr(>|t|)    
## 2 - 1 == 0    8.467      3.748   2.259  0.21191    
## 3 - 1 == 0    6.037      5.409   1.116  0.86857    
## 4 - 1 == 0   -3.619      5.409  -0.669  0.98415    
## 5 - 1 == 0   18.697      3.925   4.763  < 0.001 ***
## 6 - 1 == 0    2.471      3.748   0.659  0.98517    
## 3 - 2 == 0   -2.430      5.435  -0.447  0.99756    
## 4 - 2 == 0  -12.086      5.435  -2.224  0.22702    
## 5 - 2 == 0   10.231      3.960   2.583  0.10539    
## 6 - 2 == 0   -5.996      3.784  -1.584  0.59996    
## 4 - 3 == 0   -9.656      6.690  -1.443  0.69111    
## 5 - 3 == 0   12.660      5.559   2.278  0.20445    
## 6 - 3 == 0   -3.566      5.435  -0.656  0.98548    
## 5 - 4 == 0   22.317      5.559   4.015  0.00141 ** 
## 6 - 4 == 0    6.090      5.435   1.121  0.86660    
## 6 - 5 == 0  -16.227      3.960  -4.097  0.00105 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## (Adjusted p values reported -- single-step method)

Sum to zero contrasts

I would advise students not to take too much notice of individual comparisons unless there is a good underlying motive. Site 5 does seem to be distinctly different to all the other sites.A sum contrast allows the means of each site apart from the last to be compared to the overall mean.

contrasts(d$Site) <- "contr.sum"

mod<-lm(data=d,Lshell~Site)
summary(mod)
## 
## Call:
## lm(formula = Lshell ~ Site, data = d)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -44.906  -8.340   1.031   9.231  30.550 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 106.1114     1.4384  73.773  < 2e-16 ***
## Site1        -5.3421     2.5804  -2.070   0.0408 *  
## Site2         3.1246     2.6158   1.195   0.2349    
## Site3         0.6949     4.1214   0.169   0.8664    
## Site4        -8.9614     4.1214  -2.174   0.0319 *  
## Site5        13.3553     2.7841   4.797 5.24e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 13.38 on 107 degrees of freedom
## Multiple R-squared:  0.2239, Adjusted R-squared:  0.1876 
## F-statistic: 6.173 on 5 and 107 DF,  p-value: 4.579e-05

So that would be more or less job done in class. I would point out that in a real study we would look into the characteristics of site 5 in more detail, but could ignore the other differences.

Heterogenity of variance

More astute students who spot the heterogeneity of variance are also shown how to run code for one-way tests and White’s correction. This is not shown here. An alternative, more advanced, approach would be to abandon the ANOVA model completely and run bootstrapped comparisons between sites using resampling from the original data. All this would probably be overkill unless the study really deserved a thorough analysis for a write up.

Bayesian credible inference

Clearly, in a practical sense, all the useful contextual inference that can be extracted from this very small data set has been found. The most important violations of assumptions are not always visible in the numerical data and are much more likely to be attributable to faults in sampling design and lack of independence between measurements. However, if it was important to determine differences between sites it would do no harm to consider other approaches. there might just be some better interpretation.

Pooled variance model using JAGS

A conventional pooled variance ANOVA can be written in JAGS very simply.

data=list(y=d$Lshell,
          ind=as.numeric(d$Site),
          N=length(d$Lshell),
          p=length(levels(d$Site)),
          overall_mean=mean(d$Lshell))

pooled_var="
  model {
      #######  Likelihood
      for (i in 1:N) {                    # Loop through observations
        mu[i]<-Beta[ind[i]]               # The expected values are just the group means
        y[i] ~ dnorm(mu[i],tau)           # Values treated as from a single normal
       
      }
     ############## Uninformative priors
    for (j in 1:p) {
     Beta[j]~dnorm(0,0.0001)
   
     Effect[j]<-Beta[j]-overall_mean  ### Calculate difference from overall mean
     ################### Calculate pair wise differences 
      for (n in 1:(j-1)){
        Difbeta[n,j]<-Beta[n]-Beta[j]
      }
    }
    
    tau ~ dgamma(scale, rate) ## Prior for normal distribution precision.
    scale ~ dunif(0, 1)       ### Hyper parameters for tau.
    rate ~ dunif(0, 1)
    
  }
"

Notice that the pairwise comparisons and a measure of effect size as a difference between the group mean and the overall mean were written into the code. So if we just monitor these nodes we can produce simple caterpillar plots for the parameters of interest. A full analysis would include diagnostics for chain mixing etc, which can make Bayesian analysis more time consuming.

model=jags.model(textConnection(pooled_var),data=data)
## Compiling model graph
##    Resolving undeclared variables
##    Allocating nodes
## Graph information:
##    Observed stochastic nodes: 113
##    Unobserved stochastic nodes: 9
##    Total graph size: 275
## 
## Initializing model
update(model,n.iter=1000)
output=coda.samples(model=model,variable.names=c("Difbeta","Effect"),n.iter=100000,thin=10)

ms <-ggs(output) 
mt<-filter(ms,grepl("Difbeta",Parameter))
ggs_caterpillar(mt) +geom_vline(xintercept = 0,col="red")

mt<-filter(ms,grepl("Effect",Parameter))
ggs_caterpillar(mt) +geom_vline(xintercept = 0,col="red")

Notice that the highest posterior differences, which equate closely to 95% confidence intervals in this case, suggest more differences that do not include zero than the Tukey corrected comparisons. One reason for this is that corrections for multiple comparisons are not part of the philosophy taken to formal inference under a Bayesian framework. This could be criticised by a referee! In particular there seems to be a credible difference between sites 2 and 4, but site 4 has a lower sample size. Although we are now using Bayesin inference we have probably not finished.

Independent variances model

As there is clear heterogeneity of variance involved it can easily be taken into account by simply not pooling variances. This does make sense, although it implies that pairwise comparisons are all independent of an overall model.

## The model is similar to the pooled variance model with some modifications noted

ind_var="
  model {
      ### Likeihood
      for (i in 1:N) {                   ## Loop through observations
        mu[i]<-Beta[ind[i]]              ## Now set an independent tau for each group
        y[i] ~ dnorm(mu[i],tau[ind[i]])
       
      }
    
    for (j in 1:p) {
     Beta[j]~dnorm(0,0.0001)            ## Set up a prior for each grous
     tau[j] ~ dgamma(scale, rate)
      Effect[j]<-Beta[j]-overall_mean 
      for (n in 1:(j-1)){
        Difbeta[n,j]<-Beta[n]-Beta[j]
      }
    }
   
    scale ~ dunif(0, 1)
    rate ~ dunif(0, 1)
    
  }
"
model=jags.model(textConnection(ind_var),data=data)
## Compiling model graph
##    Resolving undeclared variables
##    Allocating nodes
## Graph information:
##    Observed stochastic nodes: 113
##    Unobserved stochastic nodes: 14
##    Total graph size: 280
## 
## Initializing model
update(model,n.iter=1000)
output=coda.samples(model=model,variable.names=c("Difbeta","Effect"),n.iter=100000,thin=10)
ms <-ggs(output) 
mt<-filter(ms,grepl("Difbeta",Parameter))
ggs_caterpillar(mt) +geom_vline(xintercept = 0,col="red")

mt<-filter(ms,grepl("Effect",Parameter))
ggs_caterpillar(mt) +geom_vline(xintercept = 0,col="red")

Notice now that the heterogeneity shown initially by the boxplots and independently calculated confidence intervals is now clearly apparent.

Random effects model

Finally we could try something a little bit more sophisticated. As there are six sites we have just enough replication to fit a simple multi level model. We could suggest that the variability between sites forms a normal distribution. This approach follows (Gelman et al. 2012). Mutliple comparisons are not needed under a Bayesian approach that includes shrinkage to the mean.

rand_mod="
  model {
      ### Likeihood
      for (i in 1:N) {                  ## Loop through observations
        mu[i]<-mu_r+Beta[ind[i]]        ## This time Beta is added to an overall mean
        y[i] ~ dnorm(mu[i],tau[ind[i]])  ## Set an independent tau for each group agan. A                                              pooled variance model would also work  here
       
      }
    
    for (j in 1:p) {
     Beta[j]~dnorm(0,tau_r)            ## A single tau represents the variance between group                                        #   means
     tau[j] ~ dgamma(scale, rate)
      for (n in 1:(j-1)){
        Difbeta[n,j]<-Beta[n]-Beta[j]
      }
    }
   
    scale ~ dunif(0, 1)
    rate ~ dunif(0, 1)
    tau_r ~ dgamma(scale,rate)
    sigma_r <- 1/sqrt(tau_r)
    mu_r ~ dnorm(0,0.00001)     ## Prior for the overall mean
   
  }"

model=jags.model(textConnection(rand_mod),data=data)
## Compiling model graph
##    Resolving undeclared variables
##    Allocating nodes
## Graph information:
##    Observed stochastic nodes: 113
##    Unobserved stochastic nodes: 16
##    Total graph size: 288
## 
## Initializing model
update(model,n.iter=1000)
output=coda.samples(model=model,variable.names=c("sigma_r","mu_r","Difbeta","Beta"),n.iter=100000,thin=10)
ms <-ggs(output) 
mt<-filter(ms,grepl("Beta",Parameter))
ggs_caterpillar(mt) +geom_vline(xintercept = 0,col="red")

ms <-ggs(output) 
mt<-filter(ms,grepl("Difbeta",Parameter))
ggs_caterpillar(mt) +geom_vline(xintercept = 0,col="red")

mt<-filter(ms,grepl("Beta",Parameter))
ggplot(data=mt,aes(x=mt$value,col=Parameter,fill=Parameter)) + geom_density(alpha=0.2)

This is arguably the most appropriate analysis in this particular case. The Bayesian analysis has accounted for the unbalanced sample sizes and provided credible intervals for effect sizes based on the assumption that site means themselves form a normal distribution,. Now, only site 5 stands out as being of any interest at all.

Extensions and conclusion

In a practical sense, adding the Bayesian analysis has not led to much insight here. However it is surprisingly easy to include it. If inferences on effect sizes are contextually important is it worth looking at them from different perspectives in order to test the importance of the assumptions being used. If all analyses lead to exactly the same conclusions then inference is strengthened. In a more advanced setting some additional considerations of random effects and potentially informative prior knowledge can be added to the Bayesian models in a bespoke, contextually appropriate, manner.

References

Edwards, D., 1996. Comment: The first data analysis should be journalistic. ECOLOGICAL APPLICATIONS, 6 (4), 1090–1094.

Ellison, A. M., 2004. Bayesian inference in ecology. Ecology Letters, 7 (6), 509–520.

Gelman, A., Hill, J., and Yajima, M., 2012. Why we (usually) don’t have to worry about multiple comparisons. JOURNAL OF RESEARCH ON EDUCATIONAL EFFECTIVENESS, 5 (2), 189–211.