Thames_analysis

Reproducing table 3.1

d2<-read.csv("data/LondonGatewayDataFINAL_2017.CSV")
d2$Date<-as.Date(d2$Date, "%d/%m/%Y")

## Give the sites consistent names 

d2$Site<-as.character(d2$Site)
d2$Site[grepl("Mucking",d2$Site)]<-"NM" ##All the Muckings are called NM as they are North Mucking in the historical data

d2$Site[grepl("SiteA",d2$Site)]<-"SW"
d2$Site[grepl("SiteX",d2$Site)]<-"SF"
d2$Site<-as.factor(d2$Site)

This table contains the data for historic winter peak counts for the 19 target species listed in Clause 10.5.4.

The question that arises is how were the historic winter counts calculated?

Taking the maximum monthly value recorded for each species

d2 %>% group_by(Winter,Month,Species) %>% summarise(msum=sum(Number)) %>% group_by(Year=Winter,Species) %>% summarise(Peak=max(msum)) -> historical_peaks

historical_peaks<-as.data.frame( historical_peaks)

historical_peaks %>% filter(Year< 2004) %>% spread(Year,Peak) %>% dt()

OK. So this is exactly the same as table 3.1. So this must be how Leo calculated the peak abundances of the species. We’ll come on to the assemblage

Table 3.2

Just filtering out the key species and comparing to the sum DOES NOT equate to table 3.2. The total assemblage number being reported is not the sum of all the peak abundances.

historical_peaks %>% group_by(Year) %>% summarise(Sum=sum(Peak)) -> assemblage
assemblage<-data.frame(Year=assemblage$Year,Species="Assemblage",Peak=assemblage$Sum)

hp <-rbind(historical_peaks,assemblage)

key_species<-c("Avocet","Black-tailed godwit","Dunlin","Assemblage")
hp %>% filter(Species %in% key_species) %>% spread(Year,Peak) %>% dt()

Trying taking the month with peak counts as the assemblage

d2$Year_Month<- paste(d2$Winter, d2$Month)

f<-function(x)x[1]

d2 %>% group_by(Year_Month,Winter,Month) %>% summarise(sum=sum(Number)) %>% group_by(Winter) %>% summarise(Year_Month=f(Year_Month),Assemblage=max(sum)) -> assemblage

dt(assemblage)

Now the numbers coincide. So Leo in fact used a hybrid method. The peak counts for the individual species are the peak observations for any month over the winter. However the assemblage count is the total for the month with the peak observation of all the species. This was NOT what I was doing. I was taking the species counts themselves from the month with the peak count.

The problem with this is that if the proportional contribution of all the species to the total assemblage were to be added together it would be over 100%. This was not obvious, because in table 3.2 the proportional contribution was limited to three species. I am not sure what the best solution to this would be.

Notice that this explains why the stacked bar charts do not lead to the same pattern. In fact tacked bar charts cannot be used to show these data, as they don’t stack up to the peak assemblage.

Solution

The only clean solution to this would seem to be to produce two data sets. One using the recorded values for the species during the peak assemblage month as the values. This way they will stack to the assemblage value. The other sums the peak abundances that Leo used. In this case the assemblage value will differ from that included in last year’s report.

Table one: Species abundances in peak month

d2$Year_Month<- paste(d2$Winter, d2$Month)

## Find maximum

f<-function(x)x[1]

d2 %>% group_by(Year_Month,Winter,Month) %>% summarise(sum=sum(Number)) %>% group_by(Winter) %>% summarise(Year_Month=f(Year_Month),Peak=max(sum)) -> peak_months

d2 %>% filter(Year_Month %in% peak_months$Year_Month) %>% group_by(Winter,Month,Species) %>% summarise(msum=sum(Number)) %>% group_by(Year=Winter,Species) %>% summarise(Peak=max(msum)) -> historical_peaks

historical_peaks<-as.data.frame( historical_peaks)

Adding 2017 data

d3<-read.csv("data/df_2018.csv")
d3 %>% group_by(month) %>% summarise(sum(value))

## # A tibble: 6 x 2
##   month    `sum(value)`
##   <fct>           <int>
## 1 December        20478
## 2 February        19211
## 3 January         19695
## 4 March           17125
## 5 November        19029
## 6 October         11339

## So peak month was December. T

I am pretty much convinced that the Dunlin counts were the maximums, not the low water counts, in past surveys. There is no evidence of this in the report, but as the dunline counts are generally high, and using this years methodology that divides the counts by tide leads to an odd reduction in the counts if this is not accounted for I’ll run with this assumption.

d3 %>% filter(month=="December") %>% group_by(Species) %>% summarise(Peak=max(value)) -> d4
d4<-data.frame(Year=2017,d4)
d4<-rbind(historical_peaks,d4)
## Spread to make an Excel friendly table
d4 %>% spread(Year,Peak,fill=0) %>% dt()

write.csv(d4,file="data/table1.csv",row.names=FALSE)

Table 2

In this alternative case I will use the method Leo did to obtain the peak species counts, which leads to the sum of the counts becoming the assemblage total. The hybrid version between table one and table two was used in the original report.

d2 %>% group_by(Winter,Month,Species) %>% summarise(msum=sum(Number)) %>% group_by(Year=Winter,Species) %>% summarise(Peak=max(msum)) -> historical_peaks

historical_peaks<-as.data.frame( historical_peaks)

d3 %>% group_by(Species) %>% summarise(Peak=max(value)) -> d4
d4<-data.frame(Year=2017,d4)
d4<-rbind(historical_peaks,d4)

d4 %>% spread(Year,Peak,fill=0) %>% dt()

write.csv(d4,file="data/table2.csv",row.names=FALSE)

Thames_analysis_2018

Duncan Golicher

1/4/2019