Peak assemblage

Methodology

The peak assemblage table was derived from applying a consistent algorithm to the pooled data. The abundances were summed over all the sites for each month for all of the 19 species. The maximum count, rather then the low water count was used in the case of the observations of dunlin in 2017/18. This was considered to most likely match the method used for the data that was obtained from previous years surveys. The species abundances were first summed over all sites to produce an assemblage count for all winter months. For each winter season the month with the maximum assemblage value was found. The abundances of each of the species recorded in this month was taken as a measure of their individual contributions to this peak assemblage. This is a consistent methodology. However it differs from the data tables produced in the report in 2017. In the previous analysis the peak species abundances referred to the maximum observed count across all the months of each season. A proportional contribution could not be calculated using this approach, as the sum of the species counts exceeded the total for the peak assemblage.

Data

Counts data

Table of percent contribution to assemblage

Peak assemblage values

Barcharts

London Gateway

Top ten species

Excluding Dunlin

Top three species (key species)

The top three most abundant species are Dunlin, Black-tailed godwit and Avocet.

Thames estuary

Top 10 species

Excluding Dunlin

Top three species (key species)

The top three most abundant species are Dunlin, Black-tailed godwit and Avocet.

Mean differences

Methodology

As there is no evidence of consistent trends, the variability between years can be treated as if the were a set of independent observations. This does not imply that there is no underlying relationship between the total population of birds in consecutive, simply that random variability due to movements of flocks and changes in the observability of the birds in combination with stochastic population fluctuations are adequate explanations for the observed variability.

In this case the period becomes a factor with two levels. Data can be visualised as boxplots and means with confidence intervals calculated from the variability around the mean values.

Aggregation

Boxplots and confidence intervals

T-test

## 
##  Welch Two Sample t-test
## 
## data:  Sum by Period
## t = -1.2712, df = 5.4785, p-value = 0.255
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -9982.316  3260.602
## sample estimates:
##  mean in group Pre mean in group Post 
##           7073.143          10434.000

The t-test provides no evidence of a significant difference between the mean peak assemblages pre and post works (p = 0.25).

Dunlin

Boxplots and confidence intervals

T-test

## 
##  Welch Two Sample t-test
## 
## data:  Sum by Period
## t = -0.39343, df = 9.2777, p-value = 0.7029
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -4920.765  3457.051
## sample estimates:
##  mean in group Pre mean in group Post 
##           4913.143           5645.000

The t-test provides no evidence of a significant difference between the mean peak assemblages pre and post works (p = 0.7).

Black tailed godwit

Boxplots and confidence intervals

T-test

## 
##  Welch Two Sample t-test
## 
## data:  Sum by Period
## t = -0.67993, df = 9.2893, p-value = 0.5131
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -993.3234  532.5234
## sample estimates:
##  mean in group Pre mean in group Post 
##              559.0              789.4

The t-test provides no evidence of a significant difference between the mean peak assemblages pre and post works (p = 0.51).

Avocet

Boxplots and confidence intervals

T-test

## 
##  Welch Two Sample t-test
## 
## data:  Sum by Period
## t = -1.2736, df = 7.3299, p-value = 0.2417
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -1815.0436   536.7578
## sample estimates:
##  mean in group Pre mean in group Post 
##           483.8571          1123.0000

The t-test provides no evidence of a significant difference between the mean peak assemblages pre and post works (p = 0.24).

Bayesian t.test

Methodology

There is an issue with regard to the interpretation of the result of such a test. Under null hypothesis significance testing (NHST) it is not possible to accept the null hypothesis of no difference. NHST simply fails to reject the null. The p value represents the probability of obtaining the data, or data more extreme, given that the null hypothesis is in fact true. However the precise point null hypothesis of exactly no difference between assemblage counts before and after the works is not a credible one. There must be some differences. The issue is whether any differences fall within acceptable bounds. Thus NHST is problematic when the decision rule in question involves looking at the evidence in favour of the null hypothesis. This is the case here as shown in the description of the decision rule provided in the report.

The initial target against which the success of the mitigation and compensation will be assessed shall be that the sites in combination support an assemblage of wintering waterfowl at low tide comprising, on a 5-year mean peak basis at least 7900 birds made up of, in particular, avocet, dunlin and black-tailed godwit in similar proportions to those supported by North Mucking during the winters of 1999/2000 to 2002/2003 (considered in the context of the wider population trends)

The alternative to NHST is to adopt a Bayesian approach to inference. Under this approach he 5 year means for peak abundances are not considered to be fixed quantities, but are themselves treated as random variables with distributions. Bayes’ formula provides a formal mechanism of providing probabilities for unknown quantities of interest. In this case the difference between \(\mu_1\) and \(\mu_2\) (the mean peak assemblages before and after the works ) is an unknown quantity.

Bayes theorem states.

\(p(\theta | D) = \frac{p(D|\theta) p(\theta)} {p(D)}\) Where \(\theta=(\mu_1, \mu_2,\sigma_1,\sigma_2,v)\)

So, the posterior credibility of the combination of values for \((\mu_1, \mu_2,\sigma_1,\sigma_2,v)\) is the likelihood of that combination times the prior credibility of the combination, divided by the constant p(D). When it is assumed that the data are independently sampled, the likelihood is the multiplicative product across all the data values of the probability density of a t distribution. The prior is the product of the five independent parameter distributions. The constant p(D) is the marginal likelihood, which may be obtained by integrating the product of the likelihood and prior over the entire parameter space. This integral is difficult to compute analytically. This difficulty limited the application of Bayesian methods before computational solutions using simulation became available. However this limitation no longer exists. It is now computationally simple to fit the true Bayesian model using tools supplied through R (Plummer 2018). This allows the full posterior distributions of the parameter values to be obtained, leading to a richer and more informative analysis (Kruschke 2013).

Providing that uninformative prior probabilities for the parameters are used, applying Bayes theorem in the context of a t-test will then provide credible intervals for the differences between means (Edwards 1996). Although the estimates may be numerically very similar to those derived from the confidence intervals of a traditional t-test, the interpretation of the result now directly maps onto the required decision rule.

The traditional t-test found that the best estimate for the pre-works mean as 484 and the post works mean as 1123 giving a point estimate difference between the means of 639.

The original target value of 7900 for the overall assemblage were derived from low water count data for the four winter periods 1999/2000 to 2002/2003. The pre works data used in the t-test included some additional observations as, These observations are helpful in establishing the range of variability for inference so have been included. If it were desirable these values could be excluded and the analysis re-run without them.

Bayesian model fitting allows a formal evaluation of a decision rule based on the concept of the region of practical equivalence (ROPE). This is an area around the null value of no difference which encloses those values of the parameter that are deemed to be not importantly different from the null value for the practical purposes of the study.

In this case any increase in assemblage numbers, even if not statistically significant, are of no practical importance in evaluating whether Clause 10.5.4 has been met. The ROPE can therfore extend to the right almost indefinitely. The choice of a left hand boundary for the ROPE has to be considered through a careful evaluation of the available data. The target value was originally set at around 8000 birds. This is around 1000 higher than the first estimate of the pre-works mean. It would thus seem reasonable to set a ROPE lying between -1000 and 1000.

The bayesian t-test is then run using the package BEST {Kruschke and Meredith (2018)} in R. The model used completely non-informative vague priors for the parameters of interest in order to avoid subjectivity. The resulting simulation provided the full posterior distribution for the differences between the two means.

Aggregation

The figure shows the full posterior distribution for the difference between the two means, which is treated as a random variable and takes a t-distribution. The interpretation of the figure in terms of a decision rule of the Bayesian t-test analysis is clear when the ROPE is superimposed on the distribution. Around 80% of the posterior distribution for the difference between the two means lies within the ROPE. Although this does imply that there is a 20% chance that the value could stil lie outside the ROPE, the analysis is still being based on limited data. Intuitively it would be impossible to decide with certainty that the criteria had been met based on many fewer data points. The analysis formalises the strength of the currently avaialable evidence. As more data becomes available the probability that the criteria would be met becomes higher. Bayesian analysis allows for updating posterior distributions through additional data.

Dunlin

The analysis shows that around 90% of the posterior distribution falls within the ROPE. Thus there is very strong evidence that the works have had no practical impact on overall dunlin numbers.

Black tailed godwit

In this case a small amount (around 2%) of the ROPE actually falls below the posterior 95% highest density interval for the differences between the two means. The practical equivalence criteria is met to a very high degree of certainty, given the addditional evidence that black tailed godwit numbers have significantly increased over the period from 1998.

Avocet

The practical equivalence criteria is again met to a very high degree of certainty, given the addditional evidence that avocet numbers have significantly increased over the period from 1998.

Differences in proportional abundance

Methodology

The target goal was also stated in terms of proportional abundance. At least 7900 birds made up of, in particular, avocet, dunlin and black-tailed godwit in similar proportions to those supported by North Mucking during the winters of 1999/2000 to 2002/2003

Inspection of the stacked bar charts and the raw data shows that the proportion of dunlin in the assemblage was higher between 1999 and 2002 than at present. As dunlin are small common waders a decrease in their proportional contribution would be interpreted as a positive effect, rather than a negative one.

Changes in proportional abundance of dunlin.

Trend analysis for proportional abundance of dunlin using beta regression

In order to establish the signficance of the change generalised linear modelling based on the beta distribution would provide the most robust approach. Proportions cannot be modelled with normally distributed errors. The betareg package in R allows this {Grün et al. (2012)}

Yearly trend
## 
## Call:
## betareg(formula = Proportion ~ Year, data = dunlin)
## 
## Standardized weighted residuals 2:
##     Min      1Q  Median      3Q     Max 
## -1.5951 -1.0382  0.4062  0.5733  2.0217 
## 
## Coefficients (mean model with logit link):
##              Estimate Std. Error z value Pr(>|z|)  
## (Intercept) 191.27253   76.45111   2.502   0.0124 *
## Year         -0.09503    0.03804  -2.498   0.0125 *
## 
## Phi coefficients (precision model with identity link):
##       Estimate Std. Error z value Pr(>|z|)   
## (phi)    4.449      1.615   2.756  0.00586 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 
## 
## Type of estimator: ML (maximum likelihood)
## Log-likelihood: 3.865 on 3 Df
## Pseudo R-squared: 0.4152
## Number of iterations: 2748 (BFGS) + 8 (Fisher scoring)

Beta regression produces evidence of a statistically signficant (p=0.012) reduction in the proportional contribution of dunlin to the species assemblage between 1999 and present. However the trend occurred prior to the works cmencing.

Changes in species diversity

Although the criteria used to evaluate the impact of the works aimed to ensure a comparable mix of species abundances, the decline in relative abundance of dunlin and the increase in the relative contribution of other species may have increased species diversity. This is generally considered to be a positive outcome for conservation.

In order to evaluate changes in species diversity a commonly used diversity index was calculated for the assemblage. Shannnon’s index is based on proportional contributions of each species to the assemblage.

\(H=-\sum_{i=1}^{N} p_i \ln(p_i)\)

Where \(p_i\) is the proportional abundance of each species in an assemblage consisting of N species.

Shannon’s index was calculated by transforming the table of counts into a matrix and applying the diversity function in the R package vegan {Oksanen et al. (2018)}

Changes in mean Shannon’s index

Peak abundances

Methodology

The peak abundances table was derived from applying a consistent algorithm to the pooled data. The abundances were summed over all the sites for each month for all of the 19 species. The maximum count, rather then the low water count was used in the case of the observations of dunlin in 2017/18. This was considered to most likely match the method used for the data that was obtained from previous years surveys. The maximum species abundances in any month within the winter season was taken as the peak abundance. The peak assemblage in this case was calculated as the sum of the peak abundances over all the months. This Produces a rather higher estimate of the peak assemblage than the previous methodology. In both cases the proportional abundances of the species are calculated in realation to the sum. The two methodologies have been included in order to assess the sensitivity of the conclusions to the method used to calculate the peak assemblage. The substantive conclusions are unaltered and are not sensitive to the choice of methododlogy, although some of the quantitative results are slightly different.

Data

Counts data

Table of percent contribution to assemblage

Peak assemblage values

Barcharts

London Gateway

Top ten species

Excluding Dunlin

Top three species (key species)

The top three most abundant species are Dunlin, Black-tailed godwit and Avocet.

Thames estuary

All species

Excluding Dunlin

Top three species (key species)

The top three most abundant species are Dunlin, Black-tailed godwit and Avocet.

Mean differences

Methodology

As there is no evidence of consistent trends, the variability between years can be treated as if the were a set of independent observations. This does not imply that there is no underlying relationship between the total population of birds in consecutive, simply that random variability due to movements of flocks and changes in the observability of the birds in combination with stochastic population fluctuations are adequate explanations for the observed variability.

In this case the period becomes a factor with two levels. Data can be visualised as boxplots and means with confidence intervals calculated from the variability around the mean values.

Aggregation

Boxplots and confidence intervals

T-test

## 
##  Welch Two Sample t-test
## 
## data:  Sum by Period
## t = -2.0741, df = 6.6863, p-value = 0.07862
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -12819.7940    899.8416
## sample estimates:
##  mean in group Pre mean in group Post 
##           8040.857          14000.833

The t-test provides no evidence of a significant difference between the mean peak assemblages pre and post works (p = 0.08).

Dunlin

Boxplots and confidence intervals

T-test

## 
##  Welch Two Sample t-test
## 
## data:  Sum by Period
## t = -0.20799, df = 10.996, p-value = 0.839
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -4044.499  3346.118
## sample estimates:
##  mean in group Pre mean in group Post 
##           5009.143           5358.333

The t-test provides no evidence of a significant difference between the mean peak assemblages pre and post works (p = 0.84).

Black tailed godwit

Boxplots and confidence intervals

T-test

## 
##  Welch Two Sample t-test
## 
## data:  Sum by Period
## t = -3.2978, df = 6.9184, p-value = 0.01338
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -3840.0651  -628.3635
## sample estimates:
##  mean in group Pre mean in group Post 
##           634.2857          2868.5000

The t-test provides no evidence of a significant difference between the mean peak assemblages pre and post works (p = 0.01).

Avocet

Boxplots and confidence intervals

T-test

## 
##  Welch Two Sample t-test
## 
## data:  Sum by Period
## t = -1.8297, df = 10.996, p-value = 0.09451
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -1908.8066   175.8543
## sample estimates:
##  mean in group Pre mean in group Post 
##           801.8571          1668.3333

The t-test provides no evidence of a significant difference between the mean peak assemblages pre and post works (p = 0.09).

Bayesian t.test

Methodology

There is an issue with regard to the interpretation of the result of such a test. Under null hypothesis significance testing (NHST) it is not possible to accept the null hypothesis of no difference. NHST simply fails to reject the null. The p value represents the probability of obtaining the data, or data more extreme, given that the null hypothesis is in fact true. However the precise point null hypothesis of exactly no difference between assemblage counts before and after the works is not a credible one. There must be some differences. The issue is whether any differences fall within acceptable bounds. Thus NHST is problematic when the decision rule in question involves looking at the evidence in favour of the null hypothesis. This is the case here as shown in the description of the decision rule provided in the report.

The initial target against which the success of the mitigation and compensation will be assessed shall be that the sites in combination support an assemblage of wintering waterfowl at low tide comprising, on a 5-year mean peak basis at least 7900 birds made up of, in particular, avocet, dunlin and black-tailed godwit in similar proportions to those supported by North Mucking during the winters of 1999/2000 to 2002/2003 (considered in the context of the wider population trends)

The alternative to NHST is to adopt a Bayesian approach to inference. Under this approach he 5 year means for peak abundances are not considered to be fixed quantities, but are themselves treated as random variables with distributions. Bayes’ formula provides a formal mechanism of providing probabilities for unknown quantities of interest. In this case the difference between \(\mu_1\) and \(\mu_2\) (the mean peak assemblages before and after the works ) is an unknown quantity.

Bayes theorem states.

\(p(\theta | D) = \frac{p(D|\theta) p(\theta)} {p(D)}\) Where \(\theta=(\mu_1, \mu_2,\sigma_1,\sigma_2,v)\)

So, the posterior credibility of the combination of values for \((\mu_1, \mu_2,\sigma_1,\sigma_2,v)\) is the likelihood of that combination times the prior credibility of the combination, divided by the constant p(D). When it is assumed that the data are independently sampled, the likelihood is the multiplicative product across all the data values of the probability density of a t distribution. The prior is the product of the five independent parameter distributions. The constant p(D) is the marginal likelihood, which may be obtained by integrating the product of the likelihood and prior over the entire parameter space. This integral is difficult to compute analytically. This difficulty limited the application of Bayesian methods before computational solutions using simulation became available. However this limitation no longer exists. It is now computationally simple to fit the true Bayesian model using tools supplied through R (Plummer 2018). This allows the full posterior distributions of the parameter values to be obtained, leading to a richer and more informative analysis (Kruschke 2013).

Providing that uninformative prior probabilities for the parameters are used, applying Bayes theorem in the context of a t-test will then provide credible intervals for the differences between means (Edwards 1996). Although the estimates may be numerically very similar to those derived from the confidence intervals of a traditional t-test, the interpretation of the result now directly maps onto the required decision rule.

The traditional t-test found that the best estimate for the pre-works mean as 802 and the post works mean as 1668 giving a point estimate difference between the means of 866.

The original target value of 7900 for the overall assemblage were derived from low water count data for the four winter periods 1999/2000 to 2002/2003. The pre works data used in the t-test included some additional observations as, These observations are helpful in establishing the range of variability for inference so have been included. If it were desirable these values could be excluded and the analysis re-run without them.

Bayesian model fitting allows a formal evaluation of a decision rule based on the concept of the region of practical equivalence (ROPE). This is an area around the null value of no difference which encloses those values of the parameter that are deemed to be not importantly different from the null value for the practical purposes of the study.

In this case any increase in assemblage numbers, even if not statistically significant, are of no practical importance in evaluating whether Clause 10.5.4 has been met. The ROPE can therfore extend to the right almost indefinitely. The choice of a left hand boundary for the ROPE has to be considered through a careful evaluation of the available data. The target value was originally set at around 8000 birds. This is around 1000 higher than the first estimate of the pre-works mean. It would thus seem reasonable to set a ROPE lying between -1000 and 1000.

The bayesian t-test is then run using the package BEST {Kruschke and Meredith (2018)} in R. The model used completely non-informative vague priors for the parameters of interest in order to avoid subjectivity. The resulting simulation provided the full posterior distribution for the differences between the two means.

Aggregation

The figure shows the full posterior distribution for the difference between the two means, which is treated as a random variable and takes a t-distribution. The interpretation of the figure in terms of a decision rule of the Bayesian t-test analysis is clear when the ROPE is superimposed on the distribution. Around 80% of the posterior distribution for the difference between the two means lies within the ROPE. Although this does imply that there is a 20% chance that the value could stil lie outside the ROPE, the analysis is still being based on limited data. Intuitively it would be impossible to decide with certainty that the criteria had been met based on many fewer data points. The analysis formalises the strength of the currently avaialable evidence. As more data becomes available the probability that the criteria would be met becomes higher. Bayesian analysis allows for updating posterior distributions through additional data.

Dunlin

The analysis shows that around 90% of the posterior distribution falls within the ROPE. Thus there is very strong evidence that the works have had no practical impact on overall dunlin numbers.

Black tailed godwit

In this case a small amount (around 2%) of the ROPE actually falls below the posterior 95% highest density interval for the differences between the two means. The practical equivalence criteria is met to a very high degree of certainty, given the addditional evidence that black tailed godwit numbers have significantly increased over the period from 1998.

Avocet

The practical equivalence criteria is again met to a very high degree of certainty, given the addditional evidence that avocet numbers have significantly increased over the period from 1998.

Differences in proportional abundance

Explanation

The target goal was also stated in terms of proportional abundance. At least 7900 birds made up of, in particular, avocet, dunlin and black-tailed godwit in similar proportions to those supported by North Mucking during the winters of 1999/2000 to 2002/2003

Inspection of the stacked bar charts and the raw data shows that the proportion of dunlin in the assemblage was higher between 1999 and 2002 than at present. As dunlin are small common waders a decrease in their proportional contribution would be interpreted as a positive effect, rather than a negative one.

Changes in proportional abundance of dunlin.

Trend analysis for proportional abundance of dunlin using beta regression

In order to establish the signficance of the change generalised linear modelling based on the beta distribution would provide the most robust approach. Proportions cannot be modelled with normally distributed errors. The betareg package in R allows this {Grün et al. (2012)}

Yearly trend
## 
## Call:
## betareg(formula = Proportion ~ Year, data = dunlin)
## 
## Standardized weighted residuals 2:
##     Min      1Q  Median      3Q     Max 
## -2.1982 -0.6117  0.3324  1.0148  1.4687 
## 
## Coefficients (mean model with logit link):
##              Estimate Std. Error z value Pr(>|z|)    
## (Intercept) 219.59870   60.18978   3.648 0.000264 ***
## Year         -0.10931    0.02996  -3.649 0.000263 ***
## 
## Phi coefficients (precision model with identity link):
##       Estimate Std. Error z value Pr(>|z|)   
## (phi)    8.223      3.075   2.674  0.00749 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 
## 
## Type of estimator: ML (maximum likelihood)
## Log-likelihood:  6.36 on 3 Df
## Pseudo R-squared: 0.5438
## Number of iterations: 2976 (BFGS) + 8 (Fisher scoring)

Beta regression produces evidence of a statistically signficant (p=0.012) reduction in the proportional contribution of dunlin to the species assemblage between 1999 and present. However the trend occurred prior to the works cmencing.

Changes in species diversity

Although the criteria used to evaluate the impact of the works aimed to ensure a comparable mix of species abundances, the decline in relative abundance of dunlin and the increase in the relative contribution of other species may have increased species diversity. This is generally considered to be a positive outcome for conservation.

In order to evaluate changes in species diversity a commonly used diversity index was calculated for the assemblage. Shannnon’s index is based on proportional contributions of each species to the assemblage.

\(H=-\sum_{i=1}^{N} p_i \ln(p_i)\)

Where \(p_i\) is the proportional abundance of each species in an assemblage consisting of N species.

Shannon’s index was calculated by transforming the table of counts into a matrix and applying the diversity function in the R package vegan {Oksanen et al. (2018)}

Changes in mean Shannon’s index

References

Edwards, D. 1996. Comment: The first data analysis should be journalistic. ECOLOGICAL APPLICATIONS 6:1090–1094.

Grün, B., I. Kosmidis, and A. Zeileis. 2012. Extended beta regression in r: Shaken, stirred, mixed, and partitioned. Journal of Statistical Software 48:1–25.

Kruschke, J. K. 2013. Bayesian estimation supersedes the t test. Journal of Experimental Psychology: General 142:573–588.

Kruschke, J. K., and M. Meredith. 2018. BEST: Bayesian estimation supersedes the t-test.

Oksanen, J., F. G. Blanchet, M. Friendly, R. Kindt, P. Legendre, D. McGlinn, P. R. Minchin, R. B. O’Hara, G. L. Simpson, P. Solymos, M. H. H. Stevens, E. Szoecs, and H. Wagner. 2018. Vegan: Community ecology package.

Plummer, M. 2018. Rjags: Bayesian graphical models using mcmc.