Chapter 5 One way ANOVA
The purpose of one way anova is
- Test whether there is greater variability between groups than within groups
- Quantify any differences found between group means
5.1 Grouped boxplots
Exploratory plots
g0<-ggplot(d,aes(x=Site,y=Lshell))
g0+geom_boxplot()
5.2 Histograms for each factor level
g0<-ggplot(d,aes(x=d$Lshell))
g1<-g0+geom_histogram(color="grey",binwidth = 5)
g1+facet_wrap(~Site) +xlab("Text for x label")
5.3 Confidence interval plot
g0<-ggplot(d,aes(x=Site,y=Lshell))
g1<-g0+stat_summary(fun.y=mean,geom="point")
g1<-g1 +stat_summary(fun.data=mean_cl_normal,geom="errorbar")
g1 +xlab("Text for x label") + ylab("Text for y label")
5.4 Fitting ANOVA
mod<-lm(data=d,Lshell~Site)
5.4.1 Model anova
anova(mod)
## Analysis of Variance Table
##
## Response: Lshell
## Df Sum Sq Mean Sq F value Pr(>F)
## Site 5 5525 1105 6.1732 4.579e-05 ***
## Residuals 107 19153 179
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
5.4.2 Model summary
5.4.2.1 Treatment effects
summary(mod)
##
## Call:
## lm(formula = Lshell ~ Site, data = d)
##
## Residuals:
## Min 1Q Median 3Q Max
## -44.906 -8.340 1.031 9.231 30.550
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 100.769 2.624 38.405 < 2e-16 ***
## SiteSite_2 8.467 3.748 2.259 0.0259 *
## SiteSite_3 6.037 5.409 1.116 0.2669
## SiteSite_4 -3.619 5.409 -0.669 0.5049
## SiteSite_5 18.697 3.925 4.763 6.02e-06 ***
## SiteSite_6 2.471 3.748 0.659 0.5111
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 13.38 on 107 degrees of freedom
## Multiple R-squared: 0.2239, Adjusted R-squared: 0.1876
## F-statistic: 6.173 on 5 and 107 DF, p-value: 4.579e-05
5.4.2.2 Change reference level
slevels<-levels(d$Site)
d$Site<-relevel(d$Site,"Site_5")
mod<-lm(data=d,Lshell~Site)
summary(mod)
##
## Call:
## lm(formula = Lshell ~ Site, data = d)
##
## Residuals:
## Min 1Q Median 3Q Max
## -44.906 -8.340 1.031 9.231 30.550
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 119.467 2.920 40.919 < 2e-16 ***
## SiteSite_1 -18.697 3.925 -4.763 6.02e-06 ***
## SiteSite_2 -10.231 3.960 -2.583 0.011135 *
## SiteSite_3 -12.660 5.559 -2.278 0.024739 *
## SiteSite_4 -22.317 5.559 -4.015 0.000111 ***
## SiteSite_6 -16.227 3.960 -4.097 8.15e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 13.38 on 107 degrees of freedom
## Multiple R-squared: 0.2239, Adjusted R-squared: 0.1876
## F-statistic: 6.173 on 5 and 107 DF, p-value: 4.579e-05
d$Site <- factor(d$Site, levels=slevels)
5.4.2.3 Reverse levels
slevels<-levels(d$Site)
d$Site <- factor(d$Site, levels=rev(slevels))
mod<-lm(data=d,Lshell~Site)
summary(mod)
##
## Call:
## lm(formula = Lshell ~ Site, data = d)
##
## Residuals:
## Min 1Q Median 3Q Max
## -44.906 -8.340 1.031 9.231 30.550
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 103.240 2.676 38.583 < 2e-16 ***
## SiteSite_5 16.227 3.960 4.097 8.15e-05 ***
## SiteSite_4 -6.090 5.435 -1.121 0.265
## SiteSite_3 3.566 5.435 0.656 0.513
## SiteSite_2 5.996 3.784 1.584 0.116
## SiteSite_1 -2.471 3.748 -0.659 0.511
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 13.38 on 107 degrees of freedom
## Multiple R-squared: 0.2239, Adjusted R-squared: 0.1876
## F-statistic: 6.173 on 5 and 107 DF, p-value: 4.579e-05
d$Site <- factor(d$Site, levels=slevels)
5.4.2.4 Sum contrasts
Sum contrasts compare the effects to the mean. Notice that the last level is missing due to the way the design matrix is formed. So it can be worth running sum contrasts twice, with the order reversed, to get all the contrasts.
options(contrasts = c("contr.sum", "contr.poly"))
mod<-lm(data=d,Lshell~Site)
summary(mod)
##
## Call:
## lm(formula = Lshell ~ Site, data = d)
##
## Residuals:
## Min 1Q Median 3Q Max
## -44.906 -8.340 1.031 9.231 30.550
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 106.1114 1.4384 73.773 < 2e-16 ***
## Site1 -5.3421 2.5804 -2.070 0.0408 *
## Site2 3.1246 2.6158 1.195 0.2349
## Site3 0.6949 4.1214 0.169 0.8664
## Site4 -8.9614 4.1214 -2.174 0.0319 *
## Site5 13.3553 2.7841 4.797 5.24e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 13.38 on 107 degrees of freedom
## Multiple R-squared: 0.2239, Adjusted R-squared: 0.1876
## F-statistic: 6.173 on 5 and 107 DF, p-value: 4.579e-05
options(contrasts = c("contr.treatment", "contr.poly"))
5.4.2.4.1 Reverse order
d$Site <- factor(d$Site, levels=rev(slevels))
options(contrasts = c("contr.sum", "contr.poly"))
mod<-lm(data=d,Lshell~Site)
summary(mod)
##
## Call:
## lm(formula = Lshell ~ Site, data = d)
##
## Residuals:
## Min 1Q Median 3Q Max
## -44.906 -8.340 1.031 9.231 30.550
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 106.1114 1.4384 73.773 < 2e-16 ***
## Site1 -2.8714 2.6158 -1.098 0.2748
## Site2 13.3553 2.7841 4.797 5.24e-06 ***
## Site3 -8.9614 4.1214 -2.174 0.0319 *
## Site4 0.6949 4.1214 0.169 0.8664
## Site5 3.1246 2.6158 1.195 0.2349
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 13.38 on 107 degrees of freedom
## Multiple R-squared: 0.2239, Adjusted R-squared: 0.1876
## F-statistic: 6.173 on 5 and 107 DF, p-value: 4.579e-05
options(contrasts = c("contr.treatment", "contr.poly"))
d$Site <- factor(d$Site, levels=slevels)
5.4.3 Tukey corrected pairwise comparisons
Use to find where signficant differences lie. This should confirm the pattern shown using the confidence interval plot.
mod<-aov(data=d,Lshell~Site)
TukeyHSD(mod)
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = Lshell ~ Site, data = d)
##
## $Site
## diff lwr upr p adj
## Site_2-Site_1 8.466769 -2.408905 19.342443 0.2201442
## Site_3-Site_1 6.037019 -9.660664 21.734702 0.8737518
## Site_4-Site_1 -3.619231 -19.316914 12.078452 0.9849444
## Site_5-Site_1 18.697436 7.305950 30.088922 0.0000867
## Site_6-Site_1 2.470769 -8.404905 13.346443 0.9859123
## Site_3-Site_2 -2.429750 -18.201132 13.341632 0.9976925
## Site_4-Site_2 -12.086000 -27.857382 3.685382 0.2355928
## Site_5-Site_2 10.230667 -1.262165 21.723498 0.1103764
## Site_6-Site_2 -5.996000 -16.977781 4.985781 0.6105029
## Site_4-Site_3 -9.656250 -29.069479 9.756979 0.7004668
## Site_5-Site_3 12.660417 -3.470986 28.791819 0.2123990
## Site_6-Site_3 -3.566250 -19.337632 12.205132 0.9862071
## Site_5-Site_4 22.316667 6.185264 38.448069 0.0015143
## Site_6-Site_4 6.090000 -9.681382 21.861382 0.8718474
## Site_6-Site_5 -16.226667 -27.719498 -4.733835 0.0011239
Plot of results of Tukey HSD
plot(TukeyHSD(mod))
5.4.4 Anova with White’s correction
This will give you the overall Anova table if there is heterogeneity of variance.
library(sandwich)
library(car)
mod<-lm(Lshell~Site, data=d)
Anova(mod,white.adjust='hc3')
## Analysis of Deviance Table (Type II tests)
##
## Response: Lshell
## Df F Pr(>F)
## Site 5 9.9682 7.541e-08 ***
## Residuals 107
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
5.4.5 Bayesian model with shrinkage
Specialist model. Probably the best for these particular data, but seek guidance. Don’t do this at home kids!
library(rjags)
library(ggmcmc)
rand_mod="
model {
### Likeihood
for (i in 1:N) { ## Loop through observations
mu[i]<-mu_r+Beta[ind[i]] ## Beta is added to an overall mean
y[i] ~ dnorm(mu[i],tau[ind[i]]) ## Set an independent tau for each group agan. A pooled variance model would also work here
}
for (j in 1:p) {
Beta[j]~dnorm(0,tau_r) ## A single tau represents the variance between group # means
tau[j] ~ dgamma(scale, rate)
for (n in 1:(j-1)){
Difbeta[n,j]<-Beta[n]-Beta[j]
}
}
scale ~ dunif(0, 1)
rate ~ dunif(0, 1)
tau_r ~ dgamma(scale,rate)
sigma_r <- 1/sqrt(tau_r)
mu_r ~ dnorm(0,0.00001) ## Prior for the overall mean
}"
data=list(y=d$Lshell,
ind=as.numeric(d$Site),
N=length(d$Lshell),
p=length(levels(d$Site)),
overall_mean=mean(d$Lshell))
model=jags.model(textConnection(rand_mod),data=data)
## Compiling model graph
## Resolving undeclared variables
## Allocating nodes
## Graph information:
## Observed stochastic nodes: 113
## Unobserved stochastic nodes: 16
## Total graph size: 288
##
## Initializing model
update(model,n.iter=1000)
output=coda.samples(model=model,variable.names=c("sigma_r","mu_r","Difbeta","Beta"),n.iter=100000,thin=10)
ms <-ggs(output)
mt<-filter(ms,grepl("Beta",Parameter))
ggs_caterpillar(mt) +geom_vline(xintercept = 0,col="red")
summary(output)
##
## Iterations = 2010:102000
## Thinning interval = 10
## Number of chains = 1
## Sample size per chain = 10000
##
## 1. Empirical mean and standard deviation for each variable,
## plus standard error of the mean:
##
## Mean SD Naive SE Time-series SE
## Beta[1] -5.47627 4.491 0.04491 0.07371
## Beta[2] 1.97178 4.612 0.04612 0.07840
## Beta[3] -0.03195 6.285 0.06285 0.07712
## Beta[4] -5.08636 6.230 0.06230 0.08266
## Beta[5] 11.48183 4.506 0.04506 0.07840
## Beta[6] -3.39902 4.402 0.04402 0.07729
## Difbeta[1,2] -7.44805 3.566 0.03566 0.03474
## Difbeta[1,3] -5.44432 6.299 0.06299 0.06484
## Difbeta[2,3] 2.00373 6.455 0.06455 0.06691
## Difbeta[1,4] -0.38991 6.145 0.06145 0.06622
## Difbeta[2,4] 7.05814 6.398 0.06398 0.07055
## Difbeta[3,4] 5.05441 7.937 0.07937 0.07937
## Difbeta[1,5] -16.95810 3.318 0.03318 0.03318
## Difbeta[2,5] -9.51005 3.350 0.03350 0.03350
## Difbeta[3,5] -11.51378 6.326 0.06326 0.06645
## Difbeta[4,5] -16.56819 6.410 0.06410 0.07124
## Difbeta[1,6] -2.07725 3.089 0.03089 0.03089
## Difbeta[2,6] 5.37080 3.324 0.03324 0.03324
## Difbeta[3,6] 3.36707 6.265 0.06265 0.06548
## Difbeta[4,6] -1.68734 6.076 0.06076 0.06699
## Difbeta[5,6] 14.88085 3.013 0.03013 0.03013
## mu_r 106.93855 4.139 0.04139 0.07716
## sigma_r 8.23497 3.808 0.03808 0.04894
##
## 2. Quantiles for each variable:
##
## 2.5% 25% 50% 75% 97.5%
## Beta[1] -14.804 -8.0451 -5.385452 -2.72374 3.1254
## Beta[2] -7.009 -0.8096 1.890166 4.66691 11.3211
## Beta[3] -12.773 -3.8267 -0.008313 3.74431 12.8499
## Beta[4] -18.336 -8.8018 -4.763532 -1.08290 6.3265
## Beta[5] 2.992 8.7313 11.305458 14.15075 20.8994
## Beta[6] -12.445 -5.9517 -3.354256 -0.77537 5.2635
## Difbeta[1,2] -14.512 -9.8222 -7.400720 -5.03009 -0.6597
## Difbeta[1,3] -18.241 -9.4823 -5.485563 -1.38711 7.0552
## Difbeta[2,3] -10.707 -2.0920 1.813647 6.01591 15.1171
## Difbeta[1,4] -12.445 -4.3806 -0.416308 3.59752 11.9027
## Difbeta[2,4] -5.313 2.8560 6.988941 11.20354 20.0753
## Difbeta[3,4] -10.225 -0.1554 4.846690 10.03297 21.5435
## Difbeta[1,5] -23.316 -19.1961 -17.011634 -14.78745 -10.2588
## Difbeta[2,5] -16.101 -11.7197 -9.530124 -7.26846 -2.9490
## Difbeta[3,5] -24.516 -15.4795 -11.376300 -7.35029 0.6536
## Difbeta[4,5] -29.315 -20.6985 -16.566974 -12.26756 -4.1735
## Difbeta[1,6] -8.297 -4.0994 -2.068387 -0.04144 3.9230
## Difbeta[2,6] -1.098 3.1156 5.318524 7.58192 11.9598
## Difbeta[3,6] -9.292 -0.6254 3.413393 7.26087 15.9625
## Difbeta[4,6] -13.923 -5.6085 -1.633505 2.29052 10.3433
## Difbeta[5,6] 8.840 12.9346 14.969020 16.88517 20.7972
## mu_r 98.540 104.5482 106.977059 109.30635 115.1127
## sigma_r 3.708 5.7981 7.408521 9.63970 17.7680