Chapter 19 One way ANOVA
The purpose of one way anova is
- Test whether there is greater variability between groups than within groups
- Quantify any differences found between group means
19.1 Grouped boxplots
Exploratory plots
g0<-ggplot(d,aes(x=Site,y=Lshell))
g0+geom_boxplot()
19.2 Histograms for each factor level
g0<-ggplot(d,aes(x=d$Lshell))
g1<-g0+geom_histogram(color="grey",binwidth = 5)
g1+facet_wrap(~Site) +xlab("Text for x label")
19.3 Confidence interval plot
g0<-ggplot(d,aes(x=Site,y=Lshell))
g1<-g0+stat_summary(fun.y=mean,geom="point")
g1<-g1 +stat_summary(fun.data=mean_cl_normal,geom="errorbar")
g1 +xlab("Text for x label") + ylab("Text for y label")
19.4 Fitting ANOVA
mod<-lm(data=d,Lshell~Site)
19.4.1 Model anova
anova(mod)
## Analysis of Variance Table
##
## Response: Lshell
## Df Sum Sq Mean Sq F value Pr(>F)
## Site 5 5525 1105 6.1732 4.579e-05 ***
## Residuals 107 19153 179
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
19.4.2 Model summary
19.4.2.1 Treatment effects
summary(mod)
##
## Call:
## lm(formula = Lshell ~ Site, data = d)
##
## Residuals:
## Min 1Q Median 3Q Max
## -44.906 -8.340 1.031 9.231 30.550
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 100.769 2.624 38.405 < 2e-16 ***
## SiteSite_2 8.467 3.748 2.259 0.0259 *
## SiteSite_3 6.037 5.409 1.116 0.2669
## SiteSite_4 -3.619 5.409 -0.669 0.5049
## SiteSite_5 18.697 3.925 4.763 6.02e-06 ***
## SiteSite_6 2.471 3.748 0.659 0.5111
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 13.38 on 107 degrees of freedom
## Multiple R-squared: 0.2239, Adjusted R-squared: 0.1876
## F-statistic: 6.173 on 5 and 107 DF, p-value: 4.579e-05
19.4.2.2 Change reference level
slevels<-levels(d$Site)
d$Site<-relevel(d$Site,"Site_5")
mod<-lm(data=d,Lshell~Site)
summary(mod)
##
## Call:
## lm(formula = Lshell ~ Site, data = d)
##
## Residuals:
## Min 1Q Median 3Q Max
## -44.906 -8.340 1.031 9.231 30.550
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 119.467 2.920 40.919 < 2e-16 ***
## SiteSite_1 -18.697 3.925 -4.763 6.02e-06 ***
## SiteSite_2 -10.231 3.960 -2.583 0.011135 *
## SiteSite_3 -12.660 5.559 -2.278 0.024739 *
## SiteSite_4 -22.317 5.559 -4.015 0.000111 ***
## SiteSite_6 -16.227 3.960 -4.097 8.15e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 13.38 on 107 degrees of freedom
## Multiple R-squared: 0.2239, Adjusted R-squared: 0.1876
## F-statistic: 6.173 on 5 and 107 DF, p-value: 4.579e-05
d$Site <- factor(d$Site, levels=slevels)
19.4.2.3 Reverse levels
slevels<-levels(d$Site)
d$Site <- factor(d$Site, levels=rev(slevels))
mod<-lm(data=d,Lshell~Site)
summary(mod)
##
## Call:
## lm(formula = Lshell ~ Site, data = d)
##
## Residuals:
## Min 1Q Median 3Q Max
## -44.906 -8.340 1.031 9.231 30.550
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 103.240 2.676 38.583 < 2e-16 ***
## SiteSite_5 16.227 3.960 4.097 8.15e-05 ***
## SiteSite_4 -6.090 5.435 -1.121 0.265
## SiteSite_3 3.566 5.435 0.656 0.513
## SiteSite_2 5.996 3.784 1.584 0.116
## SiteSite_1 -2.471 3.748 -0.659 0.511
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 13.38 on 107 degrees of freedom
## Multiple R-squared: 0.2239, Adjusted R-squared: 0.1876
## F-statistic: 6.173 on 5 and 107 DF, p-value: 4.579e-05
d$Site <- factor(d$Site, levels=slevels)
19.4.2.4 Sum contrasts
Sum contrasts compare the effects to the mean. Notice that the last level is missing due to the way the design matrix is formed. So it can be worth running sum contrasts twice, with the order reversed, to get all the contrasts.
options(contrasts = c("contr.sum", "contr.poly"))
mod<-lm(data=d,Lshell~Site)
summary(mod)
##
## Call:
## lm(formula = Lshell ~ Site, data = d)
##
## Residuals:
## Min 1Q Median 3Q Max
## -44.906 -8.340 1.031 9.231 30.550
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 106.1114 1.4384 73.773 < 2e-16 ***
## Site1 -5.3421 2.5804 -2.070 0.0408 *
## Site2 3.1246 2.6158 1.195 0.2349
## Site3 0.6949 4.1214 0.169 0.8664
## Site4 -8.9614 4.1214 -2.174 0.0319 *
## Site5 13.3553 2.7841 4.797 5.24e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 13.38 on 107 degrees of freedom
## Multiple R-squared: 0.2239, Adjusted R-squared: 0.1876
## F-statistic: 6.173 on 5 and 107 DF, p-value: 4.579e-05
options(contrasts = c("contr.treatment", "contr.poly"))
19.4.2.4.1 Reverse order
d$Site <- factor(d$Site, levels=rev(slevels))
options(contrasts = c("contr.sum", "contr.poly"))
mod<-lm(data=d,Lshell~Site)
summary(mod)
##
## Call:
## lm(formula = Lshell ~ Site, data = d)
##
## Residuals:
## Min 1Q Median 3Q Max
## -44.906 -8.340 1.031 9.231 30.550
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 106.1114 1.4384 73.773 < 2e-16 ***
## Site1 -2.8714 2.6158 -1.098 0.2748
## Site2 13.3553 2.7841 4.797 5.24e-06 ***
## Site3 -8.9614 4.1214 -2.174 0.0319 *
## Site4 0.6949 4.1214 0.169 0.8664
## Site5 3.1246 2.6158 1.195 0.2349
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 13.38 on 107 degrees of freedom
## Multiple R-squared: 0.2239, Adjusted R-squared: 0.1876
## F-statistic: 6.173 on 5 and 107 DF, p-value: 4.579e-05
options(contrasts = c("contr.treatment", "contr.poly"))
d$Site <- factor(d$Site, levels=slevels)
19.4.3 Tukey corrected pairwise comparisons
Use to find where signficant differences lie. This should confirm the pattern shown using the confidence interval plot.
mod<-aov(data=d,Lshell~Site)
TukeyHSD(mod)
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = Lshell ~ Site, data = d)
##
## $Site
## diff lwr upr p adj
## Site_2-Site_1 8.466769 -2.408905 19.342443 0.2201442
## Site_3-Site_1 6.037019 -9.660664 21.734702 0.8737518
## Site_4-Site_1 -3.619231 -19.316914 12.078452 0.9849444
## Site_5-Site_1 18.697436 7.305950 30.088922 0.0000867
## Site_6-Site_1 2.470769 -8.404905 13.346443 0.9859123
## Site_3-Site_2 -2.429750 -18.201132 13.341632 0.9976925
## Site_4-Site_2 -12.086000 -27.857382 3.685382 0.2355928
## Site_5-Site_2 10.230667 -1.262165 21.723498 0.1103764
## Site_6-Site_2 -5.996000 -16.977781 4.985781 0.6105029
## Site_4-Site_3 -9.656250 -29.069479 9.756979 0.7004668
## Site_5-Site_3 12.660417 -3.470986 28.791819 0.2123990
## Site_6-Site_3 -3.566250 -19.337632 12.205132 0.9862071
## Site_5-Site_4 22.316667 6.185264 38.448069 0.0015143
## Site_6-Site_4 6.090000 -9.681382 21.861382 0.8718474
## Site_6-Site_5 -16.226667 -27.719498 -4.733835 0.0011239
Plot of results of Tukey HSD
plot(TukeyHSD(mod))
19.4.4 Anova with White’s correction
This will give you the overall Anova table if there is heterogeneity of variance.
library(sandwich)
library(car)
mod<-lm(Lshell~Site, data=d)
Anova(mod,white.adjust='hc3')
## Analysis of Deviance Table (Type II tests)
##
## Response: Lshell
## Df F Pr(>F)
## Site 5 9.9682 7.541e-08 ***
## Residuals 107
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
19.4.5 Bayesian model with shrinkage
Specialist model. Probably the best for these particular data, but seek guidance. Don’t do this at home kids!
library(rjags)
library(ggmcmc)
rand_mod="
model {
### Likeihood
for (i in 1:N) { ## Loop through observations
mu[i]<-mu_r+Beta[ind[i]] ## Beta is added to an overall mean
y[i] ~ dnorm(mu[i],tau[ind[i]]) ## Set an independent tau for each group agan. A pooled variance model would also work here
}
for (j in 1:p) {
Beta[j]~dnorm(0,tau_r) ## A single tau represents the variance between group # means
tau[j] ~ dgamma(scale, rate)
for (n in 1:(j-1)){
Difbeta[n,j]<-Beta[n]-Beta[j]
}
}
scale ~ dunif(0, 1)
rate ~ dunif(0, 1)
tau_r ~ dgamma(scale,rate)
sigma_r <- 1/sqrt(tau_r)
mu_r ~ dnorm(0,0.00001) ## Prior for the overall mean
}"
data=list(y=d$Lshell,
ind=as.numeric(d$Site),
N=length(d$Lshell),
p=length(levels(d$Site)),
overall_mean=mean(d$Lshell))
model=jags.model(textConnection(rand_mod),data=data)
## Compiling model graph
## Resolving undeclared variables
## Allocating nodes
## Graph information:
## Observed stochastic nodes: 113
## Unobserved stochastic nodes: 16
## Total graph size: 288
##
## Initializing model
update(model,n.iter=1000)
output=coda.samples(model=model,variable.names=c("sigma_r","mu_r","Difbeta","Beta"),n.iter=100000,thin=10)
ms <-ggs(output)
mt<-filter(ms,grepl("Beta",Parameter))
ggs_caterpillar(mt) +geom_vline(xintercept = 0,col="red")
summary(output)
##
## Iterations = 2010:102000
## Thinning interval = 10
## Number of chains = 1
## Sample size per chain = 10000
##
## 1. Empirical mean and standard deviation for each variable,
## plus standard error of the mean:
##
## Mean SD Naive SE Time-series SE
## Beta[1] -5.4651 4.329 0.04329 0.06864
## Beta[2] 1.9002 4.442 0.04442 0.07173
## Beta[3] -0.0961 6.303 0.06303 0.06979
## Beta[4] -5.0761 6.012 0.06012 0.07157
## Beta[5] 11.4599 4.265 0.04265 0.07125
## Beta[6] -3.4625 4.200 0.04200 0.06830
## Difbeta[1,2] -7.3653 3.595 0.03595 0.03595
## Difbeta[1,3] -5.3690 6.477 0.06477 0.06640
## Difbeta[2,3] 1.9963 6.554 0.06554 0.06701
## Difbeta[1,4] -0.3890 6.064 0.06064 0.06237
## Difbeta[2,4] 6.9763 6.327 0.06327 0.06434
## Difbeta[3,4] 4.9800 8.064 0.08064 0.08064
## Difbeta[1,5] -16.9250 3.306 0.03306 0.03306
## Difbeta[2,5] -9.5596 3.350 0.03350 0.03350
## Difbeta[3,5] -11.5560 6.394 0.06394 0.06550
## Difbeta[4,5] -16.5360 6.293 0.06293 0.06474
## Difbeta[1,6] -2.0026 3.087 0.03087 0.02939
## Difbeta[2,6] 5.3628 3.314 0.03314 0.03314
## Difbeta[3,6] 3.3664 6.327 0.06327 0.06497
## Difbeta[4,6] -1.6136 6.018 0.06018 0.06187
## Difbeta[5,6] 14.9224 2.938 0.02938 0.02938
## mu_r 107.0018 3.929 0.03929 0.06598
## sigma_r 8.1920 3.620 0.03620 0.04058
##
## 2. Quantiles for each variable:
##
## 2.5% 25% 50% 75% 97.5%
## Beta[1] -14.310 -8.0732 -5.401344 -2.77934 3.0733
## Beta[2] -6.688 -0.8489 1.858600 4.63003 10.9402
## Beta[3] -13.202 -3.9252 0.003164 3.77948 12.3386
## Beta[4] -17.474 -8.7326 -4.907342 -1.17014 6.2282
## Beta[5] 3.451 8.7367 11.287129 14.08022 20.2936
## Beta[6] -12.010 -5.9775 -3.410657 -0.93567 4.9151
## Difbeta[1,2] -14.293 -9.7894 -7.353738 -5.00338 -0.1568
## Difbeta[1,3] -18.309 -9.5530 -5.328387 -1.20149 7.6621
## Difbeta[2,3] -10.737 -2.1270 1.878285 6.15791 15.1883
## Difbeta[1,4] -12.495 -4.3898 -0.366453 3.55944 11.7831
## Difbeta[2,4] -5.137 2.7737 6.878559 11.08238 19.6276
## Difbeta[3,4] -10.926 -0.3007 4.882215 10.10134 21.1125
## Difbeta[1,5] -23.402 -19.1348 -16.955491 -14.71284 -10.3046
## Difbeta[2,5] -16.253 -11.7551 -9.545076 -7.34901 -2.9349
## Difbeta[3,5] -24.597 -15.5721 -11.335702 -7.36234 0.4388
## Difbeta[4,5] -29.225 -20.7013 -16.458230 -12.30726 -4.5097
## Difbeta[1,6] -8.005 -4.0520 -2.003803 0.05552 4.0525
## Difbeta[2,6] -1.217 3.1620 5.389310 7.57237 11.8316
## Difbeta[3,6] -9.340 -0.5732 3.391249 7.40836 15.7616
## Difbeta[4,6] -13.568 -5.5946 -1.596956 2.38368 10.1275
## Difbeta[5,6] 8.969 13.0603 14.953756 16.83725 20.6540
## mu_r 98.992 104.6504 107.037428 109.35331 114.8330
## sigma_r 3.728 5.7625 7.422721 9.69408 17.2778