Analysing the relationships between pine diameters, age and heights

The data

Last time we saw that students had measured pines at Arne in two distinct habitats. The main analysis that was carried out in all years focussed on the heatland site (Coombes heath) where the issue from a management perspective concerned unwanted regeneration.

Last year students also spent some time measuring pines in the “restoration” site. Here pines were regenerating after felling of a plantation of pines.

The data have all been added to the aqm package and can be loaded on the server with a line of R.

data(arne_pine_hts)
str(pine_hts)

## Classes 'sf' and 'data.frame':   256 obs. of  6 variables:
##  $ Site    : Factor w/ 2 levels "1","2": 1 1 1 1 1 1 1 1 1 1 ...
##  $ WP      : num  1 1 1 2 2 2 3 3 3 4 ...
##  $ Age     : num  8 4 7 5 7 7 4 5 4 5 ...
##  $ DiMM    : num  26.77 6.39 14.41 35 25.65 ...
##  $ HeCM    : num  100 43 72 125 95 77 95 99 75 141 ...
##  $ geometry:sfc_POINT of length 256; first list element:  'XY' num  -2.04 50.69
##  - attr(*, "sf_column")= chr "geometry"
##  - attr(*, "agr")= Factor w/ 3 levels "constant","aggregate",..: NA NA NA NA NA
##   ..- attr(*, "names")= chr  "Site" "WP" "Age" "DiMM" ...

This is a spatial object. The coordinates are already added to the data frome using the code we saw last week. This data frame differs from last weeks data as it concerns individual pine trees rather than densities within a quadrat.

Let’s see the data as a map.

mapview(pine_hts,zcol="Site")

That’s useful to see where the data came from. However we are now going to analyse the data as a non spatial object.

The data can be changed back to a typical data frame easily.

d<-data.frame(pine_hts)

The variables

str(d)

## 'data.frame':    256 obs. of  6 variables:
##  $ Site    : Factor w/ 2 levels "1","2": 1 1 1 1 1 1 1 1 1 1 ...
##  $ WP      : num  1 1 1 2 2 2 3 3 3 4 ...
##  $ Age     : num  8 4 7 5 7 7 4 5 4 5 ...
##  $ DiMM    : num  26.77 6.39 14.41 35 25.65 ...
##  $ HeCM    : num  100 43 72 125 95 77 95 99 75 141 ...
##  $ geometry:sfc_POINT of length 256; first list element:  'XY' num  -2.04 50.69

The variables in the data frame are

Age - number of counted whorls)
Site - A factor wiht two levels
DiMM - Diameter at 10cm above ground level measured in millimeters
HeCM - Height measured in cm

Looking at the relationship between diameter and height

g0<- ggplot(d,aes(x=DiMM, y=HeCM)) 
g1 <- g0 + geom_point()
g1 + ylab("Height of pines (cm)") + xlab("Diameter of ines (mm)")

Exercise

Fit a regression line and a regression model to these data. The code is not shown in this handout. You must try to use code from previous examples.

## `geom_smooth()` using formula 'y ~ x'

## 
## Call:
## lm(formula = HeCM ~ DiMM, data = d)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -89.094 -13.230  -2.461  11.893 105.200 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   29.485      2.345   12.57   <2e-16 ***
## DiMM           3.165      0.108   29.30   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 21.75 on 254 degrees of freedom
## Multiple R-squared:  0.7716, Adjusted R-squared:  0.7707 
## F-statistic: 858.3 on 1 and 254 DF,  p-value: < 2.2e-16

What do you conclude?

Fitting a more flexible curve

When a regression does not produce a convicing fit to the data there are various more advanced analyses that can be used. This code fits a “gam” smoother.

g1 + geom_smooth(method = "gam", formula = y ~ s(x))

The gam fits the data rather better but does not result in a simple formula.

mod<-gam(data=d,HeCM~s(DiMM))
summary(mod)

## 
## Family: gaussian 
## Link function: identity 
## 
## Formula:
## HeCM ~ s(DiMM)
## 
## Parametric coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   85.453      1.265   67.56   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Approximate significance of smooth terms:
##           edf Ref.df     F p-value    
## s(DiMM) 4.106  5.096 202.2  <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## R-sq.(adj) =  0.802   Deviance explained = 80.5%
## GCV = 417.93  Scale est. = 409.6     n = 256

However this can be useful if we want to extract residuals.

d$residuals<- residuals(mod)

What do the residuals mean in this context?

Exercise

Code not shown

Find out whether the conditions for pine growth differs between the two sites.

Exercise

Find out whether there is a significant difference between the heights of the pines aftercorrecting for diameters between the two sites using the appropriate statistical technique. (hint, one way ANOVA is equivalent to T test when there are only two levels).

What does this imply? Write this up into your mamnegement report for the assignment as a finding.

Raster data

The terrain analysis that we will build over the next couple of weeks is based on two raster layers derived from Lidar data.

We can load these layers and look at them using the following code. These layers are held on the server. If you were working independently you would probably first process the layers using a desktop GIS such as QGIS before uploading them to R in the form of GEOTIFF files that can be read in with a line of code. In this case this intial work has already been done for you so that you can concentrate on the meaning of the data.

The principles of terrain analysis are outlined here

http://r.bournemouth.ac.uk:82/books/quant_spat/_book/terrain-analysis.html

data("arne_pines")
arne_pines %>% filter(site=="Heath" ) -> arne_pines
data(arne_lidar)
arne_pines<-st_as_sf(arne_pines,coords = c("lon","lat"),crs=4326)
mapview(arne_pines)

contours<-rasterToContour(dtm, levels=seq(0,60,5))
qtm(dtm) + qtm(contours)

mapview(study_area,alpha.regions = 0,lwd=4)

pol<-as(study_area,"Spatial") ## Convert to the older spatial classes
dsm<-raster::crop(dsm,pol)
dtm<-raster::crop(dtm,pol)

contours<-rasterToContour(dtm, nlevels=10)
map2<- mapview(dsm,col.regions=terrain.colors(1000),legend=TRUE) + mapview(dtm,col.regions=terrain.colors(1000)) + mapview(contours) + mapview(arne_pines)
map2 %>% extras()

## Loading required package: leaflet.extras

## Loading required package: leaflet

Arne assignment 2020

Duncan Golicher

16/11/2020

Analysing the relationships between pine diameters, age and heights

The data

The variables

Looking at the relationship between diameter and height

Exercise

Fitting a more flexible curve

Exercise

Exercise

Raster data