In the last class we looked at some simple raster processing and added vegetation height layers to our leaflet map. We also calculated slope and aspect. Now we’ll look at how vector data can be manipulated using some common geoprocessing operations such as intersections, unions and differences.
## R packages needed
library(rpostgis)
library(RPostgreSQL)
library(sp)
library(raster)
library(rgeos)
library(leaflet)
load("spatial_class2.rob")
The map we saved last time is in memory, so can be added just by typing its name.
arne_map
conn <- dbConnect("PostgreSQL", host = "postgis", dbname = "bu" ,
user = "bu_user", password = "bu_user")
We can download all the vector lines for paths, roads etc from OSM within a five km radius of the centre of the site with another query that looks for all the lines within a radius of the centre of Arne on the database.
lat<- 50.69
lon<--2.05
query<-sprintf("select osm_id, way geom FROM dorset_line
WHERE ST_DWITHIN(way::geography, ST_MAKEPOINT(%s,%s)::geography,5000) ",lon,lat)
paths<-pgGetGeom(conn, query=query)
This is the result
plot(paths,axes =T)
Our query is rather similar to one we used in the first class. Again I do not expect you to write and spatial SQL yourselves. It takes time and practice. It is also best to try all non routine queries by running them in QGIS or PgAdmin first to make sure that they all work as planned. I will however explain the query
select osm_id, way geom FROM dorset_line WHERE ST_DWITHIN(way::geography, ST_MAKEPOINT(-2.05,50.69)::geography,5000)
This gets all the lines within 5km of the centre of Arne by making a point from the coordinates provided and setting it to a geography type. The geography format allows extremely precise calculations to be carried out on global data without projection and is very useful to know about if you work on large global projects, as it works the same regardless of place. The st_dwithin function simply looks for the lines within the given radius in meters.
Notice that the OSM data are held in ESPG 4326 while the Lidar and Natural England data were in EPSG 27700 (National grid). There is a problem here as some of the lines extend beyond the boundary. We’ll sort this out using an intersection.
Providing we have two layers in the same CRS we can use functions in the R package RGEOS to carry out many common vector geoprocessing operations. In the case of the paths layer the query returned may geometries that extended beyond the limits of the SSSIs. In general terms running operations at the desktop level, i.e. using rgeos, is a good way to work for a single site, as it is quick and simple. In contrast, if many operations are to be run over multiple polygons over a very large area we would design a full spatial query and run it on the database itself.
Checking the CRS shows that they are the same, but the actual notation used in the proj4string differs, as one was derived from a transform in R and the other from postgis.
paths@proj4string
## CRS arguments:
## +proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0
arne4326@proj4string
## CRS arguments:
## +init=epsg:4326 +proj=longlat +datum=WGS84 +no_defs +ellps=WGS84
## +towgs84=0,0,0
This can happen quite often with data processed using different software that may use slightly different notations for the same CRS. It would throw an unecessary error. So providing we are convinced that the two CRSs are really the same we can just set one to the other so that they match perfectly.
paths@proj4string<-arne4326@proj4string
library(rgeos)
library(raster)
paths<-gIntersection(paths, arne4326, byid = T)
plot(paths,axes=T)
plot(arne4326,add=T)
Another way of running an intersection would be to take advantage of the very powerful geoprocessing capabilities of postgis itself. As mentioned, this would be the way to work if for example, we wanted to do something such as calculating the total length of paths within all the SSSIs in the country, or a detailed breakdown of the proportion of all the SSSIs affected by access through paths.
The best possible practice when designing a large institutional database would be to store all the data in the same CRS. In this case we have two distinct CRS. As the spatial index that speeds up extraction of the paths is in EPSG 4326 the fast way to run the intersection is to build a temporary table for arne, cross this with the paths, then remove it.
In postgis we can get the paths within the Arne sssi polygons by selecting the way geometry that intersects the Arne geometry using st_intersects(way,geom) and selecting from the two tables. We then take the intersection of the two geometries as the new geom. In this case we can get mixed geomtries as some of the results are lines and others are multilines, so we turn them all into multilines. This is rather complex and writing spatial SQL queries requieres some experience. However it is very efficient as there is no need for any intermediate steps.
query<-"create table arnepol as select st_transform(geom,4326) geom from sssi_units_england where sssi_name = 'Arne';"
dbSendQuery(conn,query)
query<-"select osm_id,st_multi(st_intersection(geom,way)) as geom from
dorset_line paths, arnepol
where st_intersects(way,geom);"
paths2<-pgGetGeom(conn, query=query)
query<-"drop table arnepol"
dbSendQuery(conn,query)
This does look quite complicated,and it can be quite tricky to get SQL queries exactly right. In this particular case I got it wrong myself, as I first tried transforming the geometry for arne within the main query itself and found that the query ran very slowly. The transform was being executed every time a test for the intersection was made which slowed down what should have been a quick intersection search using a spatially indexed table. The solution was to build the temporary table and then drop it. One of the tasks for a GIS professional working an very large databases can be fiding ways to optimise the speed at which queries run, particularly if the same operation is used frequently. The advantage of using queries is that they can reused easily in other circumstances, as we will see when we develop a map of Hengistbury head.
plot(paths2,col="red",axes=T)
plot(arne4326,add=T)
OK we now have the paths in EPSG 4326. However if we want to buffer around them we need a projected layer. Again, not a problem. We’ve clipped the layers now, so we’ll do the transform in R rather than the data base.
path_proj <- spTransform(paths, CRS("+init=epsg:27700"))
pathbuf50<-gBuffer(path_proj,width=50)
plot(pathbuf50,axes=T,col="grey")
As you can see, we know have a single polygon that contains all the area within 50m of a path. If we only wanted to use major paths or paved paths this would be possible by first subsetting the data using knowledge of the attributes of each path.
The Arne SSSI layer consists of one polygon for each site unit. If we want to combine them to make a single polgon we use a union operation.
bounds<-gUnaryUnion(arne)
plot(bounds,axes=T,col="grey")
We can do things with this such as calculating the total area.If we ask for the area by id of the original polygon we get a breakdown.
areas<-gArea(arne,byid=TRUE)/10000
areas
## 1 2 3 4 5 6
## 16.183116 9.426226 1.946019 1.130019 72.937481 22.195856
## 7 8 9 10 11 12
## 2.661927 15.851205 98.062962 115.196846 65.119602 0.823472
## 13 14 15 16 17 18
## 13.062478 3.812886 20.065937 29.650465 22.827212 46.652044
The sum of the areas of each polygon should match the area of the boundary polygon.
gArea(bounds)/10000
## [1] 557.6057
sum(areas)
## [1] 557.6058
away_from_paths<-gDifference(bounds,pathbuf50)
plot(away_from_paths,axes=T,col="grey")
Now we have two geometries. One represents all the area within 50m of the paths that falls within the SSSIs. The other represents the remaining area. These are common GIS operations that are explained in many text books on geoprocessing.
Let’s find out the areas
near<-gArea(pathbuf50)/10000
near
## [1] 290.1163
far<-gArea(away_from_paths)/10000
far
## [1] 296.7847
The total areas are actually quite similar.
The exercise on the field trip involves sampling quadrats at random points. In order to avoid disturbance and unecessary trampling through the vegetation we may decide to choose points within the 50m buffer.
randpoints <- spsample(pathbuf50, n=100, type='random')
plot(pathbuf50,col="grey")
plot(randpoints,add=TRUE,pch=21,bg=2)
We can’t see the exact points very well on this figure, but we wil fix this by adding them to the zoomable leaflet map later on. If you carry out this exercise in Arc you can visualise them through the interface.
A spatial overlay allows us to add the values of the raster layers at these random points to a data frame. We might use these values as covariates in some statistical analysis of our data later.
randpoints$hts<-extract(height10x10,randpoints)
randpoints$slope<-extract(slope,randpoints)
randpoints$aspect<-extract(aspect,randpoints)
randpoints$elev<-extract(dtm,randpoints)
head(randpoints@data)
## hts slope aspect elev
## 1 4.5790000 12.080378 236.94105 8.477
## 2 0.2368334 1.912620 88.06935 14.470
## 3 8.4766800 5.658996 117.13072 23.550
## 4 4.0616817 11.840335 356.47905 16.612
## 5 2.1540833 2.273485 307.19462 1.537
## 6 0.1129999 2.788891 36.44350 7.898
We can back transform anything if we need the points in Lat Long. We have used British National grid up to know as we want all the units used for the buffer and other vector layers kept in meters, not degrees.
hts_near_path<-unlist(extract(height10x10,pathbuf50))
hts_away_path<-unlist(extract(height10x10,away_from_paths))
summary(hts_near_path)
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 0.1000 0.1687 0.4546 1.6677 2.3208 19.2756 1794
summary(hts_away_path)
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 0.1000 0.1659 0.3306 1.5060 1.6377 21.8954 1635
We can see that the distribution of vegetation heights in our raster layer for the two areas is in fact rather similar. This may be useful to know in order to justify the assumption that the accessible areas are fairly representative of the area as a whole. Of course, other unmeasured effects such as trampling and disturbance may in fact differ.
We are now going to add our points as markers on the map with popup labels to show the values that we extracted from our layers.
The vector layers and points are currently held as a spatial point data frame in R in EPSG 27700 (i.e British National Grid). We will need Latitudes and Longitudes to add them to the map. This is easy using sp_transform. We can then turn the points into a conventional data table with x and y columns
pathbuf50_4326 <- spTransform(pathbuf50, CRS("+init=epsg:4326"))
pnts<-spTransform(randpoints, CRS("+init=epsg:4326"))
pnts<-data.frame(pnts)
head(pnts)
## hts slope aspect elev x y optional
## 1 4.5790000 12.080378 236.94105 8.477 -2.062319 50.68596 TRUE
## 2 0.2368334 1.912620 88.06935 14.470 -2.038817 50.68819 TRUE
## 3 8.4766800 5.658996 117.13072 23.550 -2.057491 50.68678 TRUE
## 4 4.0616817 11.840335 356.47905 16.612 -2.073610 50.67572 TRUE
## 5 2.1540833 2.273485 307.19462 1.537 -2.028353 50.69498 TRUE
## 6 0.1129999 2.788891 36.44350 7.898 -2.037717 50.68302 TRUE
In order to look at the values we extracted for each point we may want to prepare some text that will be shown when the marker head’s are clicked in leaflet.
## Round the numbers
hts<-round(pnts$hts,2)
slps<-round(pnts$slope,0)
elevs<-round(pnts$elev,1)
aspect<-round(pnts$aspect,0)
### Build some text with line breaks (<br/>)
pnts$pop<-as.character(sprintf("Vegetation height = %s m
<br/> Slope = %s degrees
<br/> Elevation = %s masl
<br/> Aspect = %s degrees",hts,slps,elevs,aspect))
Now we can rebuild the leaflet map to show all this. Try clicking on the heads of the markers to see the data we extracted at each point.
arne_map<-arne_map %>%
addPolylines(data=paths2,group="Paths",color="Red") %>%
addMarkers(data=pnts,lat=pnts$y,lng=pnts$x, popup = ~pop, group="Points") %>%
removeLayersControl() %>%
addLayersControl(baseGroups = c("OSM","Satellite"),overlayGroups = c("Arne SSSIs","Height","Height 10m","DTM","contours","Paths","Points"))
arne_map
dbDisconnect(conn)
## [1] TRUE
save(list=ls(),file="spatial_class3.rob")