Chapter 5 One way ANOVA

The purpose of one way anova is

  1. Test whether there is greater variability between groups than within groups
  2. Quantify any differences found between group means

5.1 Grouped boxplots

Exploratory plots

g0<-ggplot(d,aes(x=Site,y=Lshell))
g0+geom_boxplot()

5.2 Histograms for each factor level

g0<-ggplot(d,aes(x=d$Lshell))
g1<-g0+geom_histogram(color="grey",binwidth = 5)

g1+facet_wrap(~Site) +xlab("Text for x label") 

5.3 Confidence interval plot

g0<-ggplot(d,aes(x=Site,y=Lshell))
g1<-g0+stat_summary(fun.y=mean,geom="point")
g1<-g1 +stat_summary(fun.data=mean_cl_normal,geom="errorbar")
g1 +xlab("Text for x label") + ylab("Text for y label")

5.4 Fitting ANOVA

mod<-lm(data=d,Lshell~Site)

5.4.1 Model anova

anova(mod)
## Analysis of Variance Table
## 
## Response: Lshell
##            Df Sum Sq Mean Sq F value    Pr(>F)    
## Site        5   5525    1105  6.1732 4.579e-05 ***
## Residuals 107  19153     179                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

5.4.2 Model summary

5.4.2.1 Treatment effects

summary(mod)
## 
## Call:
## lm(formula = Lshell ~ Site, data = d)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -44.906  -8.340   1.031   9.231  30.550 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  100.769      2.624  38.405  < 2e-16 ***
## SiteSite_2     8.467      3.748   2.259   0.0259 *  
## SiteSite_3     6.037      5.409   1.116   0.2669    
## SiteSite_4    -3.619      5.409  -0.669   0.5049    
## SiteSite_5    18.697      3.925   4.763 6.02e-06 ***
## SiteSite_6     2.471      3.748   0.659   0.5111    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 13.38 on 107 degrees of freedom
## Multiple R-squared:  0.2239, Adjusted R-squared:  0.1876 
## F-statistic: 6.173 on 5 and 107 DF,  p-value: 4.579e-05

5.4.2.2 Change reference level

slevels<-levels(d$Site)
d$Site<-relevel(d$Site,"Site_5")
mod<-lm(data=d,Lshell~Site)
summary(mod)
## 
## Call:
## lm(formula = Lshell ~ Site, data = d)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -44.906  -8.340   1.031   9.231  30.550 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  119.467      2.920  40.919  < 2e-16 ***
## SiteSite_1   -18.697      3.925  -4.763 6.02e-06 ***
## SiteSite_2   -10.231      3.960  -2.583 0.011135 *  
## SiteSite_3   -12.660      5.559  -2.278 0.024739 *  
## SiteSite_4   -22.317      5.559  -4.015 0.000111 ***
## SiteSite_6   -16.227      3.960  -4.097 8.15e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 13.38 on 107 degrees of freedom
## Multiple R-squared:  0.2239, Adjusted R-squared:  0.1876 
## F-statistic: 6.173 on 5 and 107 DF,  p-value: 4.579e-05
d$Site <- factor(d$Site, levels=slevels)

5.4.2.3 Reverse levels

slevels<-levels(d$Site)
d$Site <- factor(d$Site, levels=rev(slevels))
mod<-lm(data=d,Lshell~Site)
summary(mod)
## 
## Call:
## lm(formula = Lshell ~ Site, data = d)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -44.906  -8.340   1.031   9.231  30.550 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  103.240      2.676  38.583  < 2e-16 ***
## SiteSite_5    16.227      3.960   4.097 8.15e-05 ***
## SiteSite_4    -6.090      5.435  -1.121    0.265    
## SiteSite_3     3.566      5.435   0.656    0.513    
## SiteSite_2     5.996      3.784   1.584    0.116    
## SiteSite_1    -2.471      3.748  -0.659    0.511    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 13.38 on 107 degrees of freedom
## Multiple R-squared:  0.2239, Adjusted R-squared:  0.1876 
## F-statistic: 6.173 on 5 and 107 DF,  p-value: 4.579e-05
d$Site <- factor(d$Site, levels=slevels)

5.4.2.4 Sum contrasts

Sum contrasts compare the effects to the mean. Notice that the last level is missing due to the way the design matrix is formed. So it can be worth running sum contrasts twice, with the order reversed, to get all the contrasts.

options(contrasts = c("contr.sum", "contr.poly"))
mod<-lm(data=d,Lshell~Site)
summary(mod)
## 
## Call:
## lm(formula = Lshell ~ Site, data = d)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -44.906  -8.340   1.031   9.231  30.550 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 106.1114     1.4384  73.773  < 2e-16 ***
## Site1        -5.3421     2.5804  -2.070   0.0408 *  
## Site2         3.1246     2.6158   1.195   0.2349    
## Site3         0.6949     4.1214   0.169   0.8664    
## Site4        -8.9614     4.1214  -2.174   0.0319 *  
## Site5        13.3553     2.7841   4.797 5.24e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 13.38 on 107 degrees of freedom
## Multiple R-squared:  0.2239, Adjusted R-squared:  0.1876 
## F-statistic: 6.173 on 5 and 107 DF,  p-value: 4.579e-05
options(contrasts = c("contr.treatment", "contr.poly"))
5.4.2.4.1 Reverse order
d$Site <- factor(d$Site, levels=rev(slevels))
options(contrasts = c("contr.sum", "contr.poly"))
mod<-lm(data=d,Lshell~Site)
summary(mod)
## 
## Call:
## lm(formula = Lshell ~ Site, data = d)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -44.906  -8.340   1.031   9.231  30.550 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 106.1114     1.4384  73.773  < 2e-16 ***
## Site1        -2.8714     2.6158  -1.098   0.2748    
## Site2        13.3553     2.7841   4.797 5.24e-06 ***
## Site3        -8.9614     4.1214  -2.174   0.0319 *  
## Site4         0.6949     4.1214   0.169   0.8664    
## Site5         3.1246     2.6158   1.195   0.2349    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 13.38 on 107 degrees of freedom
## Multiple R-squared:  0.2239, Adjusted R-squared:  0.1876 
## F-statistic: 6.173 on 5 and 107 DF,  p-value: 4.579e-05
options(contrasts = c("contr.treatment", "contr.poly"))
d$Site <- factor(d$Site, levels=slevels)

5.4.3 Tukey corrected pairwise comparisons

Use to find where signficant differences lie. This should confirm the pattern shown using the confidence interval plot.

mod<-aov(data=d,Lshell~Site)
TukeyHSD(mod)
##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = Lshell ~ Site, data = d)
## 
## $Site
##                     diff        lwr       upr     p adj
## Site_2-Site_1   8.466769  -2.408905 19.342443 0.2201442
## Site_3-Site_1   6.037019  -9.660664 21.734702 0.8737518
## Site_4-Site_1  -3.619231 -19.316914 12.078452 0.9849444
## Site_5-Site_1  18.697436   7.305950 30.088922 0.0000867
## Site_6-Site_1   2.470769  -8.404905 13.346443 0.9859123
## Site_3-Site_2  -2.429750 -18.201132 13.341632 0.9976925
## Site_4-Site_2 -12.086000 -27.857382  3.685382 0.2355928
## Site_5-Site_2  10.230667  -1.262165 21.723498 0.1103764
## Site_6-Site_2  -5.996000 -16.977781  4.985781 0.6105029
## Site_4-Site_3  -9.656250 -29.069479  9.756979 0.7004668
## Site_5-Site_3  12.660417  -3.470986 28.791819 0.2123990
## Site_6-Site_3  -3.566250 -19.337632 12.205132 0.9862071
## Site_5-Site_4  22.316667   6.185264 38.448069 0.0015143
## Site_6-Site_4   6.090000  -9.681382 21.861382 0.8718474
## Site_6-Site_5 -16.226667 -27.719498 -4.733835 0.0011239

Plot of results of Tukey HSD

plot(TukeyHSD(mod))

5.4.4 Anova with White’s correction

This will give you the overall Anova table if there is heterogeneity of variance.

library(sandwich)
library(car)
mod<-lm(Lshell~Site, data=d)
Anova(mod,white.adjust='hc3')
## Analysis of Deviance Table (Type II tests)
## 
## Response: Lshell
##            Df      F    Pr(>F)    
## Site        5 9.9682 7.541e-08 ***
## Residuals 107                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

5.4.5 Bayesian model with shrinkage

Specialist model. Probably the best for these particular data, but seek guidance. Don’t do this at home kids!

library(rjags)
library(ggmcmc)
rand_mod="
  model {
      ### Likeihood
      for (i in 1:N) {                  ## Loop through observations
        mu[i]<-mu_r+Beta[ind[i]]        ## Beta is added to an overall mean
        y[i] ~ dnorm(mu[i],tau[ind[i]])  ## Set an independent tau for each group agan. A                                              pooled variance model would also work  here
       
      }
    
    for (j in 1:p) {
     Beta[j]~dnorm(0,tau_r)            ## A single tau represents the variance between group                                        #   means
     tau[j] ~ dgamma(scale, rate)
      for (n in 1:(j-1)){
        Difbeta[n,j]<-Beta[n]-Beta[j]
      }
    }
   
    scale ~ dunif(0, 1)
    rate ~ dunif(0, 1)
    tau_r ~ dgamma(scale,rate)
    sigma_r <- 1/sqrt(tau_r)
    mu_r ~ dnorm(0,0.00001)     ## Prior for the overall mean
   
  }"

data=list(y=d$Lshell,
          ind=as.numeric(d$Site),
          N=length(d$Lshell),
          p=length(levels(d$Site)),
          overall_mean=mean(d$Lshell))
model=jags.model(textConnection(rand_mod),data=data)
## Compiling model graph
##    Resolving undeclared variables
##    Allocating nodes
## Graph information:
##    Observed stochastic nodes: 113
##    Unobserved stochastic nodes: 16
##    Total graph size: 288
## 
## Initializing model
update(model,n.iter=1000)
output=coda.samples(model=model,variable.names=c("sigma_r","mu_r","Difbeta","Beta"),n.iter=100000,thin=10)
ms <-ggs(output) 
mt<-filter(ms,grepl("Beta",Parameter))
ggs_caterpillar(mt) +geom_vline(xintercept = 0,col="red")

summary(output)
## 
## Iterations = 2010:102000
## Thinning interval = 10 
## Number of chains = 1 
## Sample size per chain = 10000 
## 
## 1. Empirical mean and standard deviation for each variable,
##    plus standard error of the mean:
## 
##                   Mean    SD Naive SE Time-series SE
## Beta[1]       -5.47627 4.491  0.04491        0.07371
## Beta[2]        1.97178 4.612  0.04612        0.07840
## Beta[3]       -0.03195 6.285  0.06285        0.07712
## Beta[4]       -5.08636 6.230  0.06230        0.08266
## Beta[5]       11.48183 4.506  0.04506        0.07840
## Beta[6]       -3.39902 4.402  0.04402        0.07729
## Difbeta[1,2]  -7.44805 3.566  0.03566        0.03474
## Difbeta[1,3]  -5.44432 6.299  0.06299        0.06484
## Difbeta[2,3]   2.00373 6.455  0.06455        0.06691
## Difbeta[1,4]  -0.38991 6.145  0.06145        0.06622
## Difbeta[2,4]   7.05814 6.398  0.06398        0.07055
## Difbeta[3,4]   5.05441 7.937  0.07937        0.07937
## Difbeta[1,5] -16.95810 3.318  0.03318        0.03318
## Difbeta[2,5]  -9.51005 3.350  0.03350        0.03350
## Difbeta[3,5] -11.51378 6.326  0.06326        0.06645
## Difbeta[4,5] -16.56819 6.410  0.06410        0.07124
## Difbeta[1,6]  -2.07725 3.089  0.03089        0.03089
## Difbeta[2,6]   5.37080 3.324  0.03324        0.03324
## Difbeta[3,6]   3.36707 6.265  0.06265        0.06548
## Difbeta[4,6]  -1.68734 6.076  0.06076        0.06699
## Difbeta[5,6]  14.88085 3.013  0.03013        0.03013
## mu_r         106.93855 4.139  0.04139        0.07716
## sigma_r        8.23497 3.808  0.03808        0.04894
## 
## 2. Quantiles for each variable:
## 
##                 2.5%      25%        50%       75%    97.5%
## Beta[1]      -14.804  -8.0451  -5.385452  -2.72374   3.1254
## Beta[2]       -7.009  -0.8096   1.890166   4.66691  11.3211
## Beta[3]      -12.773  -3.8267  -0.008313   3.74431  12.8499
## Beta[4]      -18.336  -8.8018  -4.763532  -1.08290   6.3265
## Beta[5]        2.992   8.7313  11.305458  14.15075  20.8994
## Beta[6]      -12.445  -5.9517  -3.354256  -0.77537   5.2635
## Difbeta[1,2] -14.512  -9.8222  -7.400720  -5.03009  -0.6597
## Difbeta[1,3] -18.241  -9.4823  -5.485563  -1.38711   7.0552
## Difbeta[2,3] -10.707  -2.0920   1.813647   6.01591  15.1171
## Difbeta[1,4] -12.445  -4.3806  -0.416308   3.59752  11.9027
## Difbeta[2,4]  -5.313   2.8560   6.988941  11.20354  20.0753
## Difbeta[3,4] -10.225  -0.1554   4.846690  10.03297  21.5435
## Difbeta[1,5] -23.316 -19.1961 -17.011634 -14.78745 -10.2588
## Difbeta[2,5] -16.101 -11.7197  -9.530124  -7.26846  -2.9490
## Difbeta[3,5] -24.516 -15.4795 -11.376300  -7.35029   0.6536
## Difbeta[4,5] -29.315 -20.6985 -16.566974 -12.26756  -4.1735
## Difbeta[1,6]  -8.297  -4.0994  -2.068387  -0.04144   3.9230
## Difbeta[2,6]  -1.098   3.1156   5.318524   7.58192  11.9598
## Difbeta[3,6]  -9.292  -0.6254   3.413393   7.26087  15.9625
## Difbeta[4,6] -13.923  -5.6085  -1.633505   2.29052  10.3433
## Difbeta[5,6]   8.840  12.9346  14.969020  16.88517  20.7972
## mu_r          98.540 104.5482 106.977059 109.30635 115.1127
## sigma_r        3.708   5.7981   7.408521   9.63970  17.7680