Thames_analysis

Loading data

Combine the data with this years.

d2<-read.csv("data/LondonGatewayDataFINAL_2017.CSV")
d2$Date<-as.Date(d2$Date, "%d/%m/%Y")

## Give the sites consistent names 

d2$Site<-as.character(d2$Site)
d2$Site[grepl("Mucking",d2$Site)]<-"NM" ##All the Muckings are called NM as they are North Mucking in the historical data

d2$Site[grepl("SiteA",d2$Site)]<-"SW"
d2$Site[grepl("SiteX",d2$Site)]<-"SF"
d2$Site<-as.factor(d2$Site)
d2$Year<-d2$Winter
d2$month<-d2$Month
d2$Month<-month.name[d2$Month]

This table contains the data for historic winter peak counts for the 19 target species listed in Clause 10.5.4.

Adding 2017 data

I am pretty much convinced that the Dunlin counts were the maximums, not the low water counts, in past surveys. There is no evidence of this in the report, but as the dunline counts are generally high, and using this years methodology that divides the counts by tide leads to an odd reduction in the counts if this is not accounted for I’ll run with this assumption.

d3<-read.csv("data/df_2018.csv")

## Take maximum of paired tide readings (not just the low, as Dunlins)
d3 %>% group_by(Site,Species,month) %>% summarise(value=max(value)) -> d3
data1<-data.frame(Site=d3$Site,Species=d3$Species,Year=2017,Month=d3$month,value=d3$value)

data2<-data.frame(Site=d2$Site,Species=d2$Species,Year=d2$Year,Month=d2$Month,value=d2$Number)

data<-rbind(data1,data2)

Reproducing table 3.1

The question that arises is how were the historic winter maximums actually calculated? Does this reprodice the data in the report.

Taking the maximum monthly value recorded for each species

data %>% group_by(Year,Month,Species) %>% summarise(msum=sum(value)) %>% group_by(Year,Species) %>% summarise(Peak=max(msum)) -> peaks

peaks<-as.data.frame(peaks)

## Excel friendly spread out table

peaks %>% filter(Year< 2004) %>% spread(Year,Peak) %>% dt()

OK. So this is exactly the same as table 3.1. So this must be how Leo calculated the peak abundances of the species. Now we’ll come on to the assemblage

Table 3.2

Just filtering out the key species and comparing to the sum from table 1 DOES NOT equate to the data in table 3.2 of the report. The total assemblage number being reported is not the sum of all the peak abundances.

peaks %>% group_by(Year) %>% summarise(Sum=sum(Peak)) -> assemblage

assemblage<-data.frame(Year=assemblage$Year,Species="Assemblage",Peak=assemblage$Sum)

hp <-rbind(peaks,assemblage)

key_species<-c("Avocet","Black-tailed godwit","Dunlin","Assemblage")
hp %>% filter(Species %in% key_species) %>% spread(Year,Peak) %>% dt()

Taking the month with peak counts as the assemblage

Let’s try the other approach. Find the month with the maximum total assemblage.

data %>% group_by(Year,Month) %>% summarise(sum=sum(value)) %>% group_by(Year) %>% summarise(Month=Month[which(sum==max(sum))],Assemblage=max(sum)) -> peak_month

dt(peak_month)

Now the peak numbers coincide again. So Leo in fact used a hybrid method. The peak counts for the individual species are the peak observations for any month over the winter. However the assemblage count is the total for the month with the peak observation of all the species. This was NOT what I was doing. I was taking the species counts themselves from the month with the peak count.

This now reproduces Leo’s table 3.2 exactly (apart from the lower section in which propoptions are calculated)

assemblage<-data.frame(Year=peak_month$Year,Species="Assemblage",Peak=peak_month$Assemblage)

hp <-rbind(peaks,assemblage)

key_species<-c("Avocet","Black-tailed godwit","Dunlin","Assemblage")
hp %>% filter(Species %in% key_species) %>% spread(Year,Peak) %>% dt()

The problem with this is that if the proportional contribution of all the species to the total assemblage were to be added together it would be over 100%. This was not obvious, because in table 3.2 the proportional contribution was limited to three species. I am not sure what the best solution to this might be as the original report mixed it up.

Notice that this explains why the stacked bar charts do not lead to the same pattern. In actual fact stacked bar charts cannot be used to show these data in this form, as the numbers don’t actually stack up to the peak assemblage.

Solution?

The only clean solution to this would seem to be to produce two data sets. One using the recorded values for the species during the peak assemblage month as the values. This way they will stack to the assemblage value. The other sums the peak abundances that Leo used. In this case the assemblage value will differ from that included in last year’s report.

Table one: Species abundances in peak assemblage month

data$Year_Month<-paste(data$Year,data$Month)
peak_year_months<-paste(peak_month$Year,peak_month$Month)

data %>% filter(Year_Month %in% peak_year_months) %>%
group_by(Year,Species) %>% summarise(Sum=sum(value)) -> peak_species
  
peak_species %>% spread(Year,Sum,fill=0) %>% dt()

Checking that sums add up

peak_species %>% group_by(Year) %>% summarise(Sum=sum(Sum))

## # A tibble: 14 x 2
##     Year   Sum
##    <dbl> <dbl>
##  1  1999  8606
##  2  2000  8395
##  3  2001  4190
##  4  2002 10263
##  5  2007  8635
##  6  2008  6869
##  7  2010  2554
##  8  2011  4115
##  9  2012  8666
## 10  2013  8213
## 11  2014  7410
## 12  2015  3338
## 13  2016 10106
## 14  2017 14377

So, a stacked boxplot based on table one would work, but would be different to the results in the previous year’s report.

write.csv(peak_species,file="data/table1.csv",row.names = FALSE)

Table 2

In the alternative case I will use the method thar Leo did to obtain the peak species counts, which leads to the sum of the counts becoming the assemblage total. The hybrid version between table one and table two was used in the original report.

data %>% group_by(Year,Month,Species) %>% summarise(msum=sum(value)) %>% group_by(Year,Species) %>% summarise(Sum=max(msum)) -> species_peaks



species_peaks %>% spread(Year,Sum,fill=0) %>% dt()

write.csv(species_peaks,file="data/table2.csv",row.names=FALSE)

Now the totals will not be the same as the previous report, but they are consistent with the methodology of taling the peak abundance in any one month as a measure.

species_peaks %>% group_by(Year) %>% summarise(Sum=sum(Sum))

## # A tibble: 14 x 2
##     Year   Sum
##    <dbl> <dbl>
##  1  1999  9284
##  2  2000  8628
##  3  2001  5812
##  4  2002 10766
##  5  2007 10355
##  6  2008  8887
##  7  2010  2554
##  8  2011  4994
##  9  2012 13718
## 10  2013 11455
## 11  2014 10983
## 12  2015  4267
## 13  2016 15231
## 14  2017 21112

Conclusion

The “hybrid”" analysis presented in the previous report is potentially rather confusing. This is may be the result of the monitoring and data analysis being tied to protocols that were established before consideration of their relationship with the avaailable data. However it might be better to base analyses on either table one or table two as formed from the raw data.

Thames_analysis_2018

Duncan Golicher

1/4/2019