When analysing time series it can be very important to take into account serial autocorrelation. However if serial autocorrelation is not particularly strong simple regression analysis can provide reliable insight regarding trends.
The following data are taken from the UK Meterological office. They represent total precipitation and mean monthly temperatures averaged over climate stations in England. Is there any evidence of trends?
library(ggplot2)
library(dplyr)
library(reshape2)
Temp<-read.csv("https://tinyurl.com/aqm-data/Temp.csv")
Prec<-read.csv("https://tinyurl.com/aqm-data/Prec.csv")
library(DT)
datatable(Temp)
Notice that the data is in a “wide” format with one column per month. This often is the way climate data is presented, but as you know by now it is not a classic data frame format. We’ll see how to deal with this later.
datatable(Prec)
Choose one of the months for analysis. Start a new markdown document. Carefully cut and paste the relevant chunks from this document into it. Now, using the code from the regression report, design a regression analysis to look at linear trends in temperature and precipitation over the last 100 years.
There are now many different ways in R to reshape data frames into consistent formats. The new “tidyr” and dplyr are becoming increasingly popular. A simple way of stacking many columns into two (variable and value) is provided by the reshape package. The melt function takes the original data frame and arguments defining “id” variables and “measurement” variables.
Temp2<-melt(Temp[,3:15],id="Year")
Prec2<-melt(Prec[,3:15],id="Year")
str(Temp2)
## 'data.frame': 1236 obs. of 3 variables:
## $ Year : int 1910 1911 1912 1913 1914 1915 1916 1917 1918 1919 ...
## $ variable: Factor w/ 12 levels "JAN","FEB","MAR",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ value : num 3.2 3.4 3.3 3.9 2.9 3.5 6.9 1.2 3.3 2.5 ...
All the months can now be plotted on a single figure using ggplot2.
g0<-ggplot(Temp2,aes(x=Year,y=value))
g0+geom_point()+geom_smooth(method="lm")+facet_wrap("variable")
Temp2 %>% group_by(Year) %>% summarise(mean=mean(value)) ->TMean
Prec2 %>% group_by(Year) %>% summarise(sum=sum(value)) -> PTot
Now look at the trends for the yearly data. You may want to subset the data to look only at the last fifty years.
TMean67<-subset(TMean,TMean$Year>1967)
PTot60<-subset(PTot,PTot$Year>1967)
Load the data and analyse them using linear regression.
mussels<-Prec<-read.csv("https://tinyurl.com/aqm-data/mussels.csv")
Can you work out how to run the regression for each site?
data(anscombe)
str(anscombe)
## 'data.frame': 11 obs. of 8 variables:
## $ x1: num 10 8 13 9 11 14 6 4 12 7 ...
## $ x2: num 10 8 13 9 11 14 6 4 12 7 ...
## $ x3: num 10 8 13 9 11 14 6 4 12 7 ...
## $ x4: num 8 8 8 8 8 8 8 19 8 8 ...
## $ y1: num 8.04 6.95 7.58 8.81 8.33 ...
## $ y2: num 9.14 8.14 8.74 8.77 9.26 8.1 6.13 3.1 9.13 7.26 ...
## $ y3: num 7.46 6.77 12.74 7.11 7.81 ...
## $ y4: num 6.58 5.76 7.71 8.84 8.47 7.04 5.25 12.5 5.56 7.91 ...
d<-anscombe
We can fit models to x1,y1, x2,y2 etc
mod1<-lm(data=d,y1~x1)
mod2<-lm(data=d,y2~x2)
mod3<-lm(data=d,y3~x3)
mod4<-lm(data=d,y4~x4)
The summaries of the models look very similar.
summary(mod1)
##
## Call:
## lm(formula = y1 ~ x1, data = d)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.92127 -0.45577 -0.04136 0.70941 1.83882
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.0001 1.1247 2.667 0.02573 *
## x1 0.5001 0.1179 4.241 0.00217 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.237 on 9 degrees of freedom
## Multiple R-squared: 0.6665, Adjusted R-squared: 0.6295
## F-statistic: 17.99 on 1 and 9 DF, p-value: 0.00217
summary(mod2)
##
## Call:
## lm(formula = y2 ~ x2, data = d)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.9009 -0.7609 0.1291 0.9491 1.2691
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.001 1.125 2.667 0.02576 *
## x2 0.500 0.118 4.239 0.00218 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.237 on 9 degrees of freedom
## Multiple R-squared: 0.6662, Adjusted R-squared: 0.6292
## F-statistic: 17.97 on 1 and 9 DF, p-value: 0.002179
summary(mod3)
##
## Call:
## lm(formula = y3 ~ x3, data = d)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.1586 -0.6146 -0.2303 0.1540 3.2411
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.0025 1.1245 2.670 0.02562 *
## x3 0.4997 0.1179 4.239 0.00218 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.236 on 9 degrees of freedom
## Multiple R-squared: 0.6663, Adjusted R-squared: 0.6292
## F-statistic: 17.97 on 1 and 9 DF, p-value: 0.002176
summary(mod4)
##
## Call:
## lm(formula = y4 ~ x4, data = d)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.751 -0.831 0.000 0.809 1.839
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.0017 1.1239 2.671 0.02559 *
## x4 0.4999 0.1178 4.243 0.00216 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.236 on 9 degrees of freedom
## Multiple R-squared: 0.6667, Adjusted R-squared: 0.6297
## F-statistic: 18 on 1 and 9 DF, p-value: 0.002165
However only one of these data sets is really suitable for linear regression. Run the appropriate diagnostics to find out which.