In this handout we will see how to obtain information about the relationships between spatial layers using R. You have already tried this in QGIS. The information will be the same regardless of the softweare used. The advantage of conducting this step in R is that there is no need for intermediate files. The data obtained is immediately available for statistical analysis using the tools that you have already seen.
Although GIS work does have a strong visual component, i.e. making maps, looking at maps and presenting maps to the audience for your analysis, a lot of GIS work does not produce new maps. When running some forms of spatial analysis it can be helpful to quickly visualise the results in order to check that they are as expected (this can be done easily using mapview). However much of the time all that happens is that new data representing the relationships between objects is added to the data you already have. These results are then analysed in a non spatial manner.
knitr::opts_chunk$set(echo = TRUE, message=FALSE, warning = FALSE,echo = TRUE)
library(tidyverse)
library(mapview)
library(sf)
library(tmap)
library(RColorBrewer)
library(elevatr)
library(raster)
library(units)
library(leaflet.extras)
birds<- st_read("Bird-data-v1.shp")
## Reading layer `Bird-data-v1' from data source
## `/home/rstudio/webpages/books/Bird_mapping_assignment/Bird-data-v1.shp'
## using driver `ESRI Shapefile'
## Simple feature collection with 16732 features and 1 field
## Geometry type: POINT
## Dimension: XY
## Bounding box: xmin: 378500 ymin: 75500.02 xmax: 435500 ymax: 113000.1
## Projected CRS: OSGB 1936 / British National Grid
habitat<-st_read("Land-cover-data-v1.shp")
## Reading layer `Land-cover-data-v1' from data source
## `/home/rstudio/webpages/books/Bird_mapping_assignment/Land-cover-data-v1.shp'
## using driver `ESRI Shapefile'
## Simple feature collection with 9 features and 1 field
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: 378387 ymin: 75197.4 xmax: 435753 ymax: 113266
## Projected CRS: OSGB 1936 / British National Grid
Note that we may as well use a very low resolution DEM here as the bird data is spaced 1km apart.
dem<-elevatr::get_elev_raster(birds, z=6)
We could use the coastline layer in the data folder. However this is very detailed and there is no real need to find very precise measurements given the scale of the bird data. So I will take a general map of the outline of the UK and use that. It needs to be cast to a linestring and projected to British National Grid.
library(rnaturalearth)
coast<-st_as_sf(rnaturalearthhires::countries10)
filter(coast, NAME=="United Kingdom") -> coast
coast <- st_cast(coast,"MULTILINESTRING")
coast<-st_transform(coast,27700)
So now we want to know some relationships between these spatial layers and the bird layer. This requires some simple GIS operations.
The dem is a raster layer so the operation involves extracting the elevation at each of the bird observation points.
birds$elevation<-raster::extract(dem,birds)
Finding the distance to the coastline is also easy. We just ask R for the distances between the two Geometries. The only detail to be aware of is that the results are given using a format for units. This is good while more calculations are being done as it keeps precision of the calculations. However in our case its best to drop the units and round the results.
library(units)
birds$coast<-st_distance(birds, coast)
## Drop the units class and convert to km
birds$coast<-round(drop_units(birds$coast/1000),1)
This is also a simple task in R. The st_join operation joins the two tables based on intersecting geometries.
birds %>% st_join(habitat) -> birds
Once we have the information we can drop the geometries and we are left with a simple long data frame
birds %>% st_drop_geometry() -> btable
To see what the btable looks like we can do this.
aqm::dt(btable)
This a simple “data wrangling” task in R when you know how to use dplyr (which you don’t yet)
btable %>% group_by(LandCover,Species) %>% summarise(count=n()) -> bcounts
Look at the bcounts table and see what it now contains. Most of the simple GIS operations are now done.
aqm::dt(bcounts)
Although we now have the numbers of observations of birds per land use class, the distances of the observations from the coast and their elevation above sea level there are still some elements to add.
We previously calculated the areas of each land class using GIS in R. Sp we can quickly do that again
habitat$Area<-round(drop_units(set_units(st_area(habitat), km^2)),2)
habitat %>% st_drop_geometry() -> htable
The absolute areas might not be what we are most interested in. We could want the percentage of the total area in each land class.
htable %>% mutate(Percent_area=round(100*Area/sum(Area),1)) ->htable
Now we can add the areas to the bird data table. So now each habitat type is matched to its total area and the percentage of the total area that this habitat type makes up.
bcounts %>% group_by(Species) %>% mutate (Percent_count=round(100*count/sum(count),1)) ->bcounts
merge(htable, bcounts) -> bcounts
As the areas of each land use class differ, the number of counts for each species would be expected to differ in proportion to the area. So we can calculate the expected counts and also the difference between observed and expected.
bcounts %>% group_by(Species) %>% mutate(Expected_count= round(sum(count)*Percent_area/100)) %>% mutate(Difference=count-Expected_count) ->bcounts
aqm::dt(bcounts)
OK. So we now have most of the GIS work completed. We are left with some non spatial tables that can answer some questions about the data. The next step is to think carefully about how these data can be used and how to present the results to the audience for your analysis.
If you want to start a new markdown which has all the objects that were loaded or formed in this one ready for further analysis you can do this.
save(list=ls(),file="Handout4.rda")
Start a new markdown document in the usual way. Load all the libraries that you need as usual (the workspace does not include the libraries). Make sure the workspace is clean (use the sweep tool in the top right to clear it out)
Load the data from the previous workspace
load("Handout4.rda")
You will now see all the same objects that were present in your workspace after running this handout. You can contine working with them
Remember that you have some different tables that may be used in different ways.
The btable contains a list of the observations with the name of the species and the characteristics of the site where they were observed in terms of elevation, distance from the coast and landuse.
The bcounts table has the counts of the number of observations of each species in each of the land use classes. It also has the percentages of the observations of each species found in the land use class and the percent of the total area represented by this land use class. I have also calculated the expected numbers in terms of counts and percentages.
Each of these tables may be useful in different ways and you might think about other ways of working with the data. You can ask me for help if you are unsure of how to (or if you can) do something
If you have two counts, one consisting of an observed number and the other of an expected number you might remember a simple statistical “test” known as the chi squared test.
The code below runs a simple test to see if a single observed value differs “significantly” from a single expected value. Clearly when there are only a small number of observations the observed difference may be “due to chance”.
Observed<-10
Expected<-15
d<-data.frame(Observed,Expected)
chisq.test(d)
##
## Chi-squared test for given probabilities
##
## data: d
## X-squared = 1, df = 1, p-value = 0.3173