Chapter 19 One way ANOVA

The purpose of one way anova is

Test whether there is greater variability between groups than within groups
Quantify any differences found between group means

19.1 Grouped boxplots

Exploratory plots

g0<-ggplot(d,aes(x=Site,y=Lshell))
g0+geom_boxplot()

19.2 Histograms for each factor level

g0<-ggplot(d,aes(x=d$Lshell))
g1<-g0+geom_histogram(color="grey",binwidth = 5)

g1+facet_wrap(~Site) +xlab("Text for x label")

19.3 Confidence interval plot

g0<-ggplot(d,aes(x=Site,y=Lshell))
g1<-g0+stat_summary(fun.y=mean,geom="point")
g1<-g1 +stat_summary(fun.data=mean_cl_normal,geom="errorbar")
g1 +xlab("Text for x label") + ylab("Text for y label")

19.4 Fitting ANOVA

mod<-lm(data=d,Lshell~Site)

19.4.1 Model anova

anova(mod)

## Analysis of Variance Table
## 
## Response: Lshell
##            Df Sum Sq Mean Sq F value    Pr(>F)    
## Site        5   5525    1105  6.1732 4.579e-05 ***
## Residuals 107  19153     179                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

19.4.2 Model summary

19.4.2.1 Treatment effects

summary(mod)

## 
## Call:
## lm(formula = Lshell ~ Site, data = d)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -44.906  -8.340   1.031   9.231  30.550 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  100.769      2.624  38.405  < 2e-16 ***
## SiteSite_2     8.467      3.748   2.259   0.0259 *  
## SiteSite_3     6.037      5.409   1.116   0.2669    
## SiteSite_4    -3.619      5.409  -0.669   0.5049    
## SiteSite_5    18.697      3.925   4.763 6.02e-06 ***
## SiteSite_6     2.471      3.748   0.659   0.5111    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 13.38 on 107 degrees of freedom
## Multiple R-squared:  0.2239, Adjusted R-squared:  0.1876 
## F-statistic: 6.173 on 5 and 107 DF,  p-value: 4.579e-05

19.4.2.2 Change reference level

slevels<-levels(d$Site)
d$Site<-relevel(d$Site,"Site_5")
mod<-lm(data=d,Lshell~Site)
summary(mod)

## 
## Call:
## lm(formula = Lshell ~ Site, data = d)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -44.906  -8.340   1.031   9.231  30.550 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  119.467      2.920  40.919  < 2e-16 ***
## SiteSite_1   -18.697      3.925  -4.763 6.02e-06 ***
## SiteSite_2   -10.231      3.960  -2.583 0.011135 *  
## SiteSite_3   -12.660      5.559  -2.278 0.024739 *  
## SiteSite_4   -22.317      5.559  -4.015 0.000111 ***
## SiteSite_6   -16.227      3.960  -4.097 8.15e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 13.38 on 107 degrees of freedom
## Multiple R-squared:  0.2239, Adjusted R-squared:  0.1876 
## F-statistic: 6.173 on 5 and 107 DF,  p-value: 4.579e-05

d$Site <- factor(d$Site, levels=slevels)

19.4.2.3 Reverse levels

slevels<-levels(d$Site)
d$Site <- factor(d$Site, levels=rev(slevels))
mod<-lm(data=d,Lshell~Site)
summary(mod)

## 
## Call:
## lm(formula = Lshell ~ Site, data = d)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -44.906  -8.340   1.031   9.231  30.550 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  103.240      2.676  38.583  < 2e-16 ***
## SiteSite_5    16.227      3.960   4.097 8.15e-05 ***
## SiteSite_4    -6.090      5.435  -1.121    0.265    
## SiteSite_3     3.566      5.435   0.656    0.513    
## SiteSite_2     5.996      3.784   1.584    0.116    
## SiteSite_1    -2.471      3.748  -0.659    0.511    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 13.38 on 107 degrees of freedom
## Multiple R-squared:  0.2239, Adjusted R-squared:  0.1876 
## F-statistic: 6.173 on 5 and 107 DF,  p-value: 4.579e-05

d$Site <- factor(d$Site, levels=slevels)

19.4.2.4 Sum contrasts

Sum contrasts compare the effects to the mean. Notice that the last level is missing due to the way the design matrix is formed. So it can be worth running sum contrasts twice, with the order reversed, to get all the contrasts.

options(contrasts = c("contr.sum", "contr.poly"))
mod<-lm(data=d,Lshell~Site)
summary(mod)

## 
## Call:
## lm(formula = Lshell ~ Site, data = d)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -44.906  -8.340   1.031   9.231  30.550 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 106.1114     1.4384  73.773  < 2e-16 ***
## Site1        -5.3421     2.5804  -2.070   0.0408 *  
## Site2         3.1246     2.6158   1.195   0.2349    
## Site3         0.6949     4.1214   0.169   0.8664    
## Site4        -8.9614     4.1214  -2.174   0.0319 *  
## Site5        13.3553     2.7841   4.797 5.24e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 13.38 on 107 degrees of freedom
## Multiple R-squared:  0.2239, Adjusted R-squared:  0.1876 
## F-statistic: 6.173 on 5 and 107 DF,  p-value: 4.579e-05

options(contrasts = c("contr.treatment", "contr.poly"))

19.4.2.4.1 Reverse order

d$Site <- factor(d$Site, levels=rev(slevels))
options(contrasts = c("contr.sum", "contr.poly"))
mod<-lm(data=d,Lshell~Site)
summary(mod)

## 
## Call:
## lm(formula = Lshell ~ Site, data = d)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -44.906  -8.340   1.031   9.231  30.550 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 106.1114     1.4384  73.773  < 2e-16 ***
## Site1        -2.8714     2.6158  -1.098   0.2748    
## Site2        13.3553     2.7841   4.797 5.24e-06 ***
## Site3        -8.9614     4.1214  -2.174   0.0319 *  
## Site4         0.6949     4.1214   0.169   0.8664    
## Site5         3.1246     2.6158   1.195   0.2349    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 13.38 on 107 degrees of freedom
## Multiple R-squared:  0.2239, Adjusted R-squared:  0.1876 
## F-statistic: 6.173 on 5 and 107 DF,  p-value: 4.579e-05

options(contrasts = c("contr.treatment", "contr.poly"))
d$Site <- factor(d$Site, levels=slevels)

19.4.3 Tukey corrected pairwise comparisons

Use to find where signficant differences lie. This should confirm the pattern shown using the confidence interval plot.

mod<-aov(data=d,Lshell~Site)
TukeyHSD(mod)

##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = Lshell ~ Site, data = d)
## 
## $Site
##                     diff        lwr       upr     p adj
## Site_2-Site_1   8.466769  -2.408905 19.342443 0.2201442
## Site_3-Site_1   6.037019  -9.660664 21.734702 0.8737518
## Site_4-Site_1  -3.619231 -19.316914 12.078452 0.9849444
## Site_5-Site_1  18.697436   7.305950 30.088922 0.0000867
## Site_6-Site_1   2.470769  -8.404905 13.346443 0.9859123
## Site_3-Site_2  -2.429750 -18.201132 13.341632 0.9976925
## Site_4-Site_2 -12.086000 -27.857382  3.685382 0.2355928
## Site_5-Site_2  10.230667  -1.262165 21.723498 0.1103764
## Site_6-Site_2  -5.996000 -16.977781  4.985781 0.6105029
## Site_4-Site_3  -9.656250 -29.069479  9.756979 0.7004668
## Site_5-Site_3  12.660417  -3.470986 28.791819 0.2123990
## Site_6-Site_3  -3.566250 -19.337632 12.205132 0.9862071
## Site_5-Site_4  22.316667   6.185264 38.448069 0.0015143
## Site_6-Site_4   6.090000  -9.681382 21.861382 0.8718474
## Site_6-Site_5 -16.226667 -27.719498 -4.733835 0.0011239

Plot of results of Tukey HSD

plot(TukeyHSD(mod))

19.4.4 Anova with White’s correction

This will give you the overall Anova table if there is heterogeneity of variance.

library(sandwich)
library(car)
mod<-lm(Lshell~Site, data=d)
Anova(mod,white.adjust='hc3')

## Analysis of Deviance Table (Type II tests)
## 
## Response: Lshell
##            Df      F    Pr(>F)    
## Site        5 9.9682 7.541e-08 ***
## Residuals 107                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

19.4.5 Bayesian model with shrinkage

Specialist model. Probably the best for these particular data, but seek guidance. Don’t do this at home kids!

library(rjags)
library(ggmcmc)
rand_mod="
  model {
      ### Likeihood
      for (i in 1:N) {                  ## Loop through observations
        mu[i]<-mu_r+Beta[ind[i]]        ## Beta is added to an overall mean
        y[i] ~ dnorm(mu[i],tau[ind[i]])  ## Set an independent tau for each group agan. A                                              pooled variance model would also work  here
       
      }
    
    for (j in 1:p) {
     Beta[j]~dnorm(0,tau_r)            ## A single tau represents the variance between group                                        #   means
     tau[j] ~ dgamma(scale, rate)
      for (n in 1:(j-1)){
        Difbeta[n,j]<-Beta[n]-Beta[j]
      }
    }
   
    scale ~ dunif(0, 1)
    rate ~ dunif(0, 1)
    tau_r ~ dgamma(scale,rate)
    sigma_r <- 1/sqrt(tau_r)
    mu_r ~ dnorm(0,0.00001)     ## Prior for the overall mean
   
  }"

data=list(y=d$Lshell,
          ind=as.numeric(d$Site),
          N=length(d$Lshell),
          p=length(levels(d$Site)),
          overall_mean=mean(d$Lshell))
model=jags.model(textConnection(rand_mod),data=data)

## Compiling model graph
##    Resolving undeclared variables
##    Allocating nodes
## Graph information:
##    Observed stochastic nodes: 113
##    Unobserved stochastic nodes: 16
##    Total graph size: 288
## 
## Initializing model

update(model,n.iter=1000)
output=coda.samples(model=model,variable.names=c("sigma_r","mu_r","Difbeta","Beta"),n.iter=100000,thin=10)
ms <-ggs(output) 
mt<-filter(ms,grepl("Beta",Parameter))
ggs_caterpillar(mt) +geom_vline(xintercept = 0,col="red")

summary(output)

## 
## Iterations = 2010:102000
## Thinning interval = 10 
## Number of chains = 1 
## Sample size per chain = 10000 
## 
## 1. Empirical mean and standard deviation for each variable,
##    plus standard error of the mean:
## 
##                  Mean    SD Naive SE Time-series SE
## Beta[1]       -5.4651 4.329  0.04329        0.06864
## Beta[2]        1.9002 4.442  0.04442        0.07173
## Beta[3]       -0.0961 6.303  0.06303        0.06979
## Beta[4]       -5.0761 6.012  0.06012        0.07157
## Beta[5]       11.4599 4.265  0.04265        0.07125
## Beta[6]       -3.4625 4.200  0.04200        0.06830
## Difbeta[1,2]  -7.3653 3.595  0.03595        0.03595
## Difbeta[1,3]  -5.3690 6.477  0.06477        0.06640
## Difbeta[2,3]   1.9963 6.554  0.06554        0.06701
## Difbeta[1,4]  -0.3890 6.064  0.06064        0.06237
## Difbeta[2,4]   6.9763 6.327  0.06327        0.06434
## Difbeta[3,4]   4.9800 8.064  0.08064        0.08064
## Difbeta[1,5] -16.9250 3.306  0.03306        0.03306
## Difbeta[2,5]  -9.5596 3.350  0.03350        0.03350
## Difbeta[3,5] -11.5560 6.394  0.06394        0.06550
## Difbeta[4,5] -16.5360 6.293  0.06293        0.06474
## Difbeta[1,6]  -2.0026 3.087  0.03087        0.02939
## Difbeta[2,6]   5.3628 3.314  0.03314        0.03314
## Difbeta[3,6]   3.3664 6.327  0.06327        0.06497
## Difbeta[4,6]  -1.6136 6.018  0.06018        0.06187
## Difbeta[5,6]  14.9224 2.938  0.02938        0.02938
## mu_r         107.0018 3.929  0.03929        0.06598
## sigma_r        8.1920 3.620  0.03620        0.04058
## 
## 2. Quantiles for each variable:
## 
##                 2.5%      25%        50%       75%    97.5%
## Beta[1]      -14.310  -8.0732  -5.401344  -2.77934   3.0733
## Beta[2]       -6.688  -0.8489   1.858600   4.63003  10.9402
## Beta[3]      -13.202  -3.9252   0.003164   3.77948  12.3386
## Beta[4]      -17.474  -8.7326  -4.907342  -1.17014   6.2282
## Beta[5]        3.451   8.7367  11.287129  14.08022  20.2936
## Beta[6]      -12.010  -5.9775  -3.410657  -0.93567   4.9151
## Difbeta[1,2] -14.293  -9.7894  -7.353738  -5.00338  -0.1568
## Difbeta[1,3] -18.309  -9.5530  -5.328387  -1.20149   7.6621
## Difbeta[2,3] -10.737  -2.1270   1.878285   6.15791  15.1883
## Difbeta[1,4] -12.495  -4.3898  -0.366453   3.55944  11.7831
## Difbeta[2,4]  -5.137   2.7737   6.878559  11.08238  19.6276
## Difbeta[3,4] -10.926  -0.3007   4.882215  10.10134  21.1125
## Difbeta[1,5] -23.402 -19.1348 -16.955491 -14.71284 -10.3046
## Difbeta[2,5] -16.253 -11.7551  -9.545076  -7.34901  -2.9349
## Difbeta[3,5] -24.597 -15.5721 -11.335702  -7.36234   0.4388
## Difbeta[4,5] -29.225 -20.7013 -16.458230 -12.30726  -4.5097
## Difbeta[1,6]  -8.005  -4.0520  -2.003803   0.05552   4.0525
## Difbeta[2,6]  -1.217   3.1620   5.389310   7.57237  11.8316
## Difbeta[3,6]  -9.340  -0.5732   3.391249   7.40836  15.7616
## Difbeta[4,6] -13.568  -5.5946  -1.596956   2.38368  10.1275
## Difbeta[5,6]   8.969  13.0603  14.953756  16.83725  20.6540
## mu_r          98.992 104.6504 107.037428 109.35331 114.8330
## sigma_r        3.728   5.7625   7.422721   9.69408  17.2778