This document provides all the code chunks that may be useful in the context of the data analysis component of the assignment. The data set used to illustrate is the mussels data, that can be analysed using one way ANOVA and regression in the context of calibrating a relationship.
You should look through ALL the handouts provided on these techniques to understand the underlying theory. This “crib sheet” simply shows the most useful code.
Include this chunk at the top of you analysis to ensure that you have all the packages. It also includes the wrapper to add buttons to a data table if you want to use this. Remember that data tables can only be included in HTML documents.
library(ggplot2)
library(dplyr)
library(mgcv)
library(DT)
theme_set(theme_bw())
dt<-function(x) DT::datatable(x,
filter = "top",
extensions = c('Buttons'), options = list(
dom = 'Blfrtip',
buttons = c('copy', 'csv', 'excel'), colReorder = TRUE
))
Use read.csv with your own data set
d<-read.csv("https://tinyurl.com/aqm-data/mussels.csv")
Checking the structure of the data.
str(d)
You can also look at your data by clicking on the dataframe in the Global Environment window in R Studio.
dt(d)
If you want to run an analysis for a single site (factor level) at a time, you can get a dataframe for just factor level.
s1<-subset(d,d$Site == "Site_1")
s1<-droplevels(s1)
Change the name of the variable to match a numerical variable in your own data set. The command removes NAs just in case you have them
summary(d$Lshell,na.rm=TRUE)
Mean, median, standard deviation and variance.
mean(d$Lshell, na.rm=TRUE)
median(d$Lshell, na.rm=TRUE)
sd(d$Lshell, na.rm=TRUE)
var(d$Lshell, na.rm=TRUE)
Useful for your own quick visualisation.
boxplot(d$Lshell)
Useful for your own quick visualisation.
hist(d$Lshell)
This uses ggplot. Change the bin width if you want to use this.
g0<-ggplot(d,aes(x=d$Lshell))
g0+geom_histogram(color="grey",binwidth = 5)
In this data set there are two numerical variables. So we can run a linear regresion.
g0<-ggplot(d,aes(x=Lshell,y=BTVolume))
g0+geom_point()
Type the text you want for the x and y axes to replace the variable names
g0<-ggplot(d,aes(x=Lshell,y=BTVolume))
g1<-g0+geom_point() + geom_smooth(method="lm")
g1 + xlab("Some text for the x asis") + ylab("Some text for the y axis")
Change the names of the variables in the first line.
mod<-lm(data= d, BTVolume~Lshell)
d$residuals<-residuals(mod)
summary(mod)
anova(mod)
confint(mod)
Look at the regression handout to understand these plots.
plot(mod,which=1)
plot(mod,which=2)
plot(mod,which=3)
plot(mod,which=4)
plot(mod,which=5)
Used if all else fails. Not needed with these data, but included for reference.
g0<-ggplot(d,aes(x=rank(Lshell),y=rank(BTVolume)))
g0+geom_point() + geom_smooth(method="lm")
cor.test(d$Lshell,d$BTVolume,method="spearman")
Only use if you suspect that the relationship is not well described by a straight line.
library(mgcv)
g0<-ggplot(d,aes(x=Lshell,y=BTVolume))
g1<-g0 + geom_point() + geom_smooth(method="gam", formula =y~s(x))
g1 + xlab("Some text for the x asis") + ylab("Some text for the y axis")
In this case the line is the same as the linear model. Get a summary using this code.
mod<-gam(data=d, BTVolume~s(Lshell))
summary(mod)
If you do use this model remember that its only needed if you can’t use linear regression. Report the ajusted R squared value, the estimated degrees of freedom and the p-value for the smooth term (not the intercept). You must include the figure in your report, as that is the only way to show the shape of the response.
The purpose of one way anova is
Exploratory plots
g0<-ggplot(d,aes(x=Site,y=Lshell))
g0+geom_boxplot()
g0<-ggplot(d,aes(x=d$Lshell))
g1<-g0+geom_histogram(color="grey",binwidth = 5)
g1+facet_wrap(~Site) +xlab("Text for x label")
g0<-ggplot(d,aes(x=Site,y=Lshell))
g1<-g0+stat_summary(fun.y=mean,geom="point")
g1<-g1 +stat_summary(fun.data=mean_cl_normal,geom="errorbar")
g1 +xlab("Text for x label") + ylab("Text for y label")
mod<-aov(data=d,Lshell~Site)
summary(mod)
Use to find where signficant differences lie. This should confirm the pattern shown using the confidence interval plot.
mod<-aov(data=d,Lshell~Site)
TukeyHSD(mod)
plot(TukeyHSD(mod))
This will give you the overall Anova table if there is heterogeneity of variance.
library(sandwich)
library(car)
mod<-lm(Lshell~Site, data=d)
Anova(mod,white.adjust='hc3')