d2<-read.csv("data/LondonGatewayDataFINAL_2017.CSV")
d2$Date<-as.Date(d2$Date, "%d/%m/%Y")
## Give the sites consistent names
d2$Site<-as.character(d2$Site)
d2$Site[grepl("Mucking",d2$Site)]<-"NM" ##All the Muckings are called NM as they are North Mucking in the historical data
d2$Site[grepl("SiteA",d2$Site)]<-"SW"
d2$Site[grepl("SiteX",d2$Site)]<-"SF"
d2$Site<-as.factor(d2$Site)
This table contains the data for historic winter peak counts for the 19 target species listed in Clause 10.5.4.
The question that arises is how were the historic winter counts calculated?
d2 %>% group_by(Winter,Month,Species) %>% summarise(msum=sum(Number)) %>% group_by(Year=Winter,Species) %>% summarise(Peak=max(msum)) -> historical_peaks
historical_peaks<-as.data.frame( historical_peaks)
historical_peaks %>% filter(Year< 2004) %>% spread(Year,Peak) %>% dt()
OK. So this is exactly the same as table 3.1. So this must be how Leo calculated the peak abundances of the species. We’ll come on to the assemblage
Just filtering out the key species and comparing to the sum DOES NOT equate to table 3.2. The total assemblage number being reported is not the sum of all the peak abundances.
historical_peaks %>% group_by(Year) %>% summarise(Sum=sum(Peak)) -> assemblage
assemblage<-data.frame(Year=assemblage$Year,Species="Assemblage",Peak=assemblage$Sum)
hp <-rbind(historical_peaks,assemblage)
key_species<-c("Avocet","Black-tailed godwit","Dunlin","Assemblage")
hp %>% filter(Species %in% key_species) %>% spread(Year,Peak) %>% dt()
d2$Year_Month<- paste(d2$Winter, d2$Month)
f<-function(x)x[1]
d2 %>% group_by(Year_Month,Winter,Month) %>% summarise(sum=sum(Number)) %>% group_by(Winter) %>% summarise(Year_Month=f(Year_Month),Assemblage=max(sum)) -> assemblage
dt(assemblage)
Now the numbers coincide. So Leo in fact used a hybrid method. The peak counts for the individual species are the peak observations for any month over the winter. However the assemblage count is the total for the month with the peak observation of all the species. This was NOT what I was doing. I was taking the species counts themselves from the month with the peak count.
The problem with this is that if the proportional contribution of all the species to the total assemblage were to be added together it would be over 100%. This was not obvious, because in table 3.2 the proportional contribution was limited to three species. I am not sure what the best solution to this would be.
Notice that this explains why the stacked bar charts do not lead to the same pattern. In fact tacked bar charts cannot be used to show these data, as they don’t stack up to the peak assemblage.
The only clean solution to this would seem to be to produce two data sets. One using the recorded values for the species during the peak assemblage month as the values. This way they will stack to the assemblage value. The other sums the peak abundances that Leo used. In this case the assemblage value will differ from that included in last year’s report.
d2$Year_Month<- paste(d2$Winter, d2$Month)
## Find maximum
f<-function(x)x[1]
d2 %>% group_by(Year_Month,Winter,Month) %>% summarise(sum=sum(Number)) %>% group_by(Winter) %>% summarise(Year_Month=f(Year_Month),Peak=max(sum)) -> peak_months
d2 %>% filter(Year_Month %in% peak_months$Year_Month) %>% group_by(Winter,Month,Species) %>% summarise(msum=sum(Number)) %>% group_by(Year=Winter,Species) %>% summarise(Peak=max(msum)) -> historical_peaks
historical_peaks<-as.data.frame( historical_peaks)
d3<-read.csv("data/df_2018.csv")
d3 %>% group_by(month) %>% summarise(sum(value))
## # A tibble: 6 x 2
## month `sum(value)`
## <fct> <int>
## 1 December 20478
## 2 February 19211
## 3 January 19695
## 4 March 17125
## 5 November 19029
## 6 October 11339
## So peak month was December. T
I am pretty much convinced that the Dunlin counts were the maximums, not the low water counts, in past surveys. There is no evidence of this in the report, but as the dunline counts are generally high, and using this years methodology that divides the counts by tide leads to an odd reduction in the counts if this is not accounted for I’ll run with this assumption.
d3 %>% filter(month=="December") %>% group_by(Species) %>% summarise(Peak=max(value)) -> d4
d4<-data.frame(Year=2017,d4)
d4<-rbind(historical_peaks,d4)
## Spread to make an Excel friendly table
d4 %>% spread(Year,Peak,fill=0) %>% dt()
write.csv(d4,file="data/table1.csv",row.names=FALSE)
In this alternative case I will use the method Leo did to obtain the peak species counts, which leads to the sum of the counts becoming the assemblage total. The hybrid version between table one and table two was used in the original report.
d2 %>% group_by(Winter,Month,Species) %>% summarise(msum=sum(Number)) %>% group_by(Year=Winter,Species) %>% summarise(Peak=max(msum)) -> historical_peaks
historical_peaks<-as.data.frame( historical_peaks)
d3 %>% group_by(Species) %>% summarise(Peak=max(value)) -> d4
d4<-data.frame(Year=2017,d4)
d4<-rbind(historical_peaks,d4)
d4 %>% spread(Year,Peak,fill=0) %>% dt()
write.csv(d4,file="data/table2.csv",row.names=FALSE)