The ultimate goal of applied predictive ecology is to provide guidance for the management of populations of species affected by changes in their environment. Many human activities result in changes to the amount and availability of food sources. The speed of food assimilation through foraging activities determines energy balances and ultimately may be a key determinant of population sizes through its impact on survival and breeding success. The functional response curve represents the relationship between food density and feeding rate (Abrams 1982, Yates et al. 2000, Stillman and Simmons 2006 Triplet et al. (1999)). Parameters derived from field measurements of functional responses can be used in a variety of population lavel models in order to make predictions (Stillman 2008). However field observations frequently provide an incomplete and potentially biased data set for model parametrisation. Behavioural determinants of foraging rates are only partially observable and may be conditional on the circumstances under which observations are made. Overly precise estimates of functional response curves that do not consider all the uncertainty involved in model fitting may lead to advice being provided to decision makers that could lead to species loss on one hand or overly precautionary measures being adopted on the other.
The original data is fully described and first analysed by Stillman et al. (2008). The stated aims of the paper were to ..
Although the paper successfully met some of these aims, the parametrisation of the models proved challenging using the conventional techniques available at the time of publication. In particular the analysis using traditional statistical significance as a criteria failed to distinguish between models with very different interpretations. Thus the paper did not fully meet aim #4.
The aim of the following analysis is to suggest an improved method for parametrising functional response curves using a modern Bayesian approach. This is compared with another alternative approach using quantile regression. The methods aim to provide more robust and flexible means of estimating key parameters of functional response equations from observational data that challenge conventional model fitting.
The data in the paper were collected by observing the feeding behaviour of corn bunting when presented with seven densities of wheat seed on experimental platforms. The densities chosen aimed to represent densities commonly found in stubble fields. However seed were likely to be more visible on feeding platforms than they would be in the natural system.
The observations on the selected birds which were video taped consisted of observed feeding rate (seeds consumed s⁻¹) the handling time, the proportion of time spent vigilant and time spent searching over the time taken to consume five seeds.
The observations on bird behaviour did therefore allow the researchers to partition feeding into distinct activities, including vigilance during feeding. However to fulfil aim 4 in the paper equations were fitted to the data that were derived from assumptions regarding the way activities are partitioned.
The starting point for all research into functional responses is Holling (1959) disc equation.
\(R=\frac{aD}{1+aDH}\)
Where
Holling’s model describes a rectangular hyperbola with a peak feeding rate of 1/H being the asymptotic value. Under this model pure handling time cannot be directly observed and is inferred from data However, as the curve can effectively flatten out within the range of observations H may be very close to actual values found in observational studies.
An alternative model was developed in the paper. In this model a term for vigilance behaviour is included. Vigilance and/or competitive squabbling for food affects bird’s feeding rates mainly at high densities of food. The model therefore partitions the handling time.
\(R=\frac{(1-v)aD}{1+aDH}\)
Where v = proportion of time spent vigilant. This is model 2 in the paper, and will referred to here as the vigilance model. Three other models were also developed, but all of these models were based on assumptions regarding conditional behaviour switches so they did not lead to the use of one single equation. These models will not be considered here.
library(rjags)
library(tidyverse)
library(rjags)
library(ggmcmc)
library(polspline)
library(propagate)
d<-read.csv("/home/aqm/course/data/buntings.csv")
The densities in the original paper were expressed in seeds per m\(²\). These have been converted to seeds per cm\(²\) to for ease of interpretability. So the search rate here is expressed in cm\(²\)second\(-1\)
d$densitycm2<-d$density/10000
xlabel <-expression("Density of seeds per cm"^{2})
ylabel<-"Feeding rate seeds per second"
theme_set(theme_bw())
g0<-ggplot(d,aes(x=densitycm2,y=rate)) + geom_point() + xlab(xlabel) + ylab(ylabel)
fig1<-g0 + stat_summary(aes(y = rate,group=1), fun.y=mean, colour="red", geom="line",group=1)
fig1 <- fig1+stat_summary(aes(y = rate,group=1),fun.data=mean_cl_normal,geom="errorbar", group=1,col="red")
fig1
Figure 1 demonstrates the challenges involved in fitting a model to the data. The independent variable actually consists of a discrete ordered factor. The variability around the means is notable and
There is evidence of a monotonic increase in feeding rate with density. A simple spearman’s rank correlation is highly significant (p<0.001) with rho = 0.4432446. However the feeding rate show some slight evidence of a decrease at very high seed densities although a simple t-test between the two highest factor levels does not show significance at the 0.05 level.
In Stillman’s paper the lines of best fit to the Holling’s disk equation, and to all the alternative equations were established by minimising the least squares deviation. This leads to finding the best fit through the centre of the cloud of points for each treatment level as shown in figure 1. In order to fit the line using non linear least squares optimisation some reasonable starting values for the parameters have to be supplied. However these starting values only effect the final fit through aiding effective convergence to a solution. In other respects they are not influential in determining the parameters.
HDmod<-nls(rate~a*densitycm2/(1+a*densitycm2*H),data = d,start = list(a =1,H=2))
A summary of the model gives the estimated parameters and their standard errors.
summary(HDmod)
##
## Formula: rate ~ a * densitycm2/(1 + a * densitycm2 * H)
##
## Parameters:
## Estimate Std. Error t value Pr(>|t|)
## a 39.6636 8.6245 4.599 9.22e-06 ***
## H 1.8419 0.0638 28.868 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.1393 on 144 degrees of freedom
##
## Number of iterations to convergence: 10
## Achieved convergence tolerance: 2.286e-06
Likelihood profiling leads to confidence intervals for the parameters of the Holling model.
confint(HDmod)
## 2.5% 97.5%
## a 25.930863 66.949394
## H 1.713496 1.976979
It would be invalid to simply use the standard errors for the parameters to produce confidence bands for the whole model. In order to plot the line of best fit with confidence bands the variability needs to be propagated through the model by simulation. This can be achieved using the propagate package in R (Spiess 2018).
require(propagate)
newdata<-data.frame(densitycm2=seq(0,0.4,length=100))
pred_model <- predictNLS(HDmod, newdata=newdata,nsim = 10000)
conf_model <- pred_model$summary
newdata<-data.frame(newdata,conf_model)
fig2<-g0+geom_smooth(method="nls",formula=y~a*x/(1+a*x*H),method.args=list(start = c(a = 1,H=2)), se=FALSE,col="black") + geom_line(data=newdata,aes(x=densitycm2,y=Prop.2.5.),col="blue") + geom_line(data=newdata,aes(x=densitycm2,y=Prop.97.5.),col="blue")
fig2
There is no direct means of fitting the vigilance model using least squares optimisation as the shape of Hollings model and the shape of the vigilance model are identical. The vigilance parameter therefore cannot be identified, although it can be included in the model as a fixed offset term. This was the approach taken in the paper. The authors’ stated that “While both fitted models generated the same relationship between feeding rate and seed density the goodness-of-fit of model 2 (vigilance) was slightly poorer as it contained one extra parameter….The non-linear regression could not calculate best-fitting values for the proportion of time spent vigilant, implying that the observed functional response was not dependent upon vigilance time.”.
This is clearly an unsatisfactory result. Vigilance time was actually measured as a component of the study. The traditional approach to mediating between models based on goodness of fit was therefore leading the authors towards an ecologically implausible conclusion.
An inspection of either figure 1 or figure 2 provides an explanation for the difficulty that arose. Model fitting has assumed that the variability around the fit is the consequence of random error. So the two equations that are being used are in statistical terms expressed as ..
\(R=\frac{aD}{1+aDH} + \epsilon\) where \(\epsilon=N(o,\sigma^{2})\)
\(R=\frac{(1-v)aD}{1+aDH} + \epsilon\) where \(\epsilon=N(o,\sigma^{2})\)
There are serious problems accepting the assumption of independent, identically distributed normal errors in the case of functional responses. From a purely statistical viewpoint symmetrically distributed errors lead to predictions of negative feeding rates at low densities. So fitting using least squares is at best only appropriate at high food densities. From an ecological viewpoint 1/handling time is a measure of the peak rate of food consumption. The handling time estimated as the asymptote of a functional response fitted through the centre of the point clouds actually falls below the peak consumption measured in the field. Either the measurements are erroneous and attributable to observer errors, or the model is wrong. In this case vigilance was explicitly measured as a component of the behaviour during feeding. Therefore Hollings equation should be fit to the top of the point cloud (i.e. birds feeding without spending time on vigilance) rather than the centre.
Before considering how a Bayesian model can handle this issue, one practical though less elegant, solution would be to use non linear quantile regression. This allows relationships to be inferred from the edges of scatter plots (Scharf et al. 1998).
The quantreg (Koenker 2018) package in R can be used within the framework provided by non linear least squares optimisation to find the best fitting model to the 10% and 90% quantiles of the cloud of points. This is effectively making an assumption that around 10% of the error is attributable to observer error and the rest could be considered to be a form of “process error”, i.e. natural variability in the process under observation. The approach shares some of the statistical weaknesses of ordinary least squares regression in this particular case, but may lead to a more satisfactory result from an ecological standpoint.
library(quantreg)
QuantMod90<-nlrq(rate~a*densitycm2/(1+a*densitycm2*H),data = d,start = list(a =1,H=2),tau=0.9)
QuantMod10<-nlrq(rate~a*densitycm2/(1+a*densitycm2*H),data = d,start = list(a =1,H=2),tau=0.1)
Q90 <- predict(QuantMod90, newdata = newdata)
Q10 <- predict(QuantMod10, newdata = newdata)
newdata<-data.frame(newdata,Q90,Q10)
fig3<-fig2 + geom_line(data=newdata,aes(x=densitycm2,y=Q90),col="red")
fig3<-fig3 + geom_line(data=newdata,aes(x=densitycm2,y=Q10),col="red")
fig3
If handling time limits the maximum rate at which seed can be consumed, then the estimate based on the upper 10% of the data should be closer to the true handling time. So if we look at the summaries of these models we should be able to get a handle on vigilance without the prior knowledge.
summary(QuantMod90)
##
## Call: nlrq(formula = rate ~ a * densitycm2/(1 + a * densitycm2 * H),
## data = d, start = list(a = 1, H = 2), tau = 0.9, control = list(
## maxiter = 100, k = 2, InitialStepSize = 1, big = 1e+20,
## eps = 1e-07, beta = 0.97), trace = FALSE)
##
## tau: [1] 0.9
##
## Coefficients:
## Value Std. Error t value Pr(>|t|)
## a 81.90787 11.38520 7.19424 0.00000
## H 1.39180 0.04490 30.99527 0.00000
summary(QuantMod10)
##
## Call: nlrq(formula = rate ~ a * densitycm2/(1 + a * densitycm2 * H),
## data = d, start = list(a = 1, H = 2), tau = 0.1, control = list(
## maxiter = 100, k = 2, InitialStepSize = 1, big = 1e+20,
## eps = 1e-07, beta = 0.97), trace = FALSE)
##
## tau: [1] 0.1
##
## Coefficients:
## Value Std. Error t value Pr(>|t|)
## a 13.33371 6.46769 2.06159 0.04105
## H 2.75864 0.29180 9.45389 0.00000
The upper limit (pure handling time) is 1.39 (se= 0.047)
The lower estimate that may include time spent being vigilant is estimated using quantile regression as 2.75 (se= 0.31)
The proportion of time spent vigilant thus could now, be estimated as the difference between the upper and lower estimates of the handling time divided by the upper value. As the uncertainty around these values has also been provided by the quantile regression in the form of a standard error a montecarlo procedure could be used to find confidence intervals for vigilance by simulating from the distributions and finding the percentiles of the result.
quantile((rnorm(10000,2.65,0.31)-rnorm(10000,1.39,0.047))/rnorm(10000,2.75,0.31),c(0.025,0.5,0.975))
## 2.5% 50% 97.5%
## 0.2249023 0.4550318 0.7295255
This is within the range of the measured value.
Quantile regression therefore produces a straightforward, ecologically plausible description of the data in this specific context. Variability in feeding rate is attributed to a combination of the intrinsic Hollings functional response model and time spent vigilant, with an arbitrary consideration of observer error. However the solution lacks statistical elegance. The model fit still uses least squares, which could be inappropriate at low densities. Finding the uncertainty regarding the distribution of vigilance time requires a bespoke monte carlo simulation. The approach works within this specific context but would be challenging to extent to other data sets.
A fully Bayesian approach to model fitting involves adopting a different perspective on the problem. Bayesian analysis aims to provide credible intervals for unknown parameters based on combining prior distributions for the parameters with data. The posterior distribution of the parameters thus takes into account the influence of both the data and any prior knowledge which is used when fitting models. To prevent accusations of subjectivity in the choice of priors eg (Edwards 1996) many contemporary Bayesian analyses choose deliberately uninformative priors. In these cases there may be little practical difference between a Bayesian credible interval and a frequentist confidence interval, apart from interpretation. However mildly informative priors derived from documented evidence (additional data) avoid accusations of including subjective prior beliefs in the modelling process. The appropriate empirical data on vigilance time exists in this case.
Bayesian model fitting using computer intensive Monte Carlo Markov Chain based methods are extremely flexible and allow the choice of a wide range of distributions for both the likelihood and the prior distributions. Given this flexibility model building involves stating and justifying all sources of uncertainty explicitly. The influence of the data on the model can be investigated by comparing the prior distribution of parameters provided before the data are included with the posterior distributions resulting from the combination.
In this case, model 2 (the vigilance model) is the only model that needs to be considered, as if vigilance is effectively zero the model reduces to the Hollings model. However a wide range of parametrisation of the model may be obtained through varying the priors.
The bespoke Bayesian model has been programmed using Jags. The following assumptions are used.
The model itself is quite simple to define using the Jags language. Running the Gibbs sampler and analysing the output is however more demanding than fitting a model using standard techniques in R.
vig_mod<-"
model{
#Likelihood
for( i in 1:n)
{
y[i] ~ dgamma( ((mu[i]^2)/sig), (mu[i]/sig)) ## Reparameterised gamma
mu[i]<- (1-v)*a*x[i]/(1+a*x[i]*H) ## The non linear model to be fit
}
#priors
# The user must provide their own prior estimates of both the best guess expected value for a (a_mu) and H (H_mu) and an estimated standard deviation representing uncertainty regarding these values (a_sig and H_sig.
a~dgamma( ((a_mu^2)/a_sig), (a_mu/a_sig))
H~dgamma( ((H_mu^2)/H_sig), (H_mu/H_sig))
v ~ dunif(vmin,1)
## Hyper parameter for the gamma on the likelihood.
sig~dunif(0,10)
}"
a_mu<-20
a_sig<-10
H_mu<-2
H_sig<-2
v<-0
prior_dat<-list(
n=1,
x=d$densitycm2,
y=d$rate,
a_mu=a_mu,
a_sig=a_sig,
H_mu=H_mu,
H_sig=H_sig,
vmin=v)
dat<-list(
n=length(d$densitycm2),
x=d$densitycm2,
y=d$rate,
a_mu=a_mu,
a_sig=a_sig,
H_mu=H_mu,
H_sig=H_sig,
vmin=v)
prior_mod <- jags.model(textConnection(vig_mod), data = prior_dat, n.chains = 1)
## Compiling model graph
## Resolving undeclared variables
## Allocating nodes
## Graph information:
## Observed stochastic nodes: 1
## Unobserved stochastic nodes: 4
## Total graph size: 324
##
## Initializing model
update(prior_mod, 10000)
prior_mod_sim <- coda.samples(model = prior_mod, variable.names = c("a","H","v"), n.iter = 10000)
mod <- jags.model(textConnection(vig_mod), data = dat, n.chains = 1)
## Compiling model graph
## Resolving undeclared variables
## Allocating nodes
## Graph information:
## Observed stochastic nodes: 146
## Unobserved stochastic nodes: 4
## Total graph size: 801
##
## Initializing model
update(mod, 10000)
mod_sim <- coda.samples(model = mod, variable.names = c("a","H","v"), n.iter = 10000)
plot(prior_mod_sim)
ms <-ggs(prior_mod_sim)
ms %>% spread(Parameter, value) -> ms
x<-seq(0,0.4,length=100)
pred<-function(x=x,mt=ms){
pred<-mt$a*x/(1+mt$a*x*mt$H)
return(list(quantile(pred,c(0.025,0.5,0.975))))
}
pred_mod<-data.frame(x=x,do.call("rbind",sapply(x,pred)))
names(pred_mod)<-c("x","lwr","mid","upr")
fig5<-g0+geom_point()+ geom_line(data=pred_mod,aes(x=x,y=lwr),col="blue") + geom_line(data=pred_mod,aes(x=x,y=upr),col="red") + geom_line(data=pred_mod,aes(x=x,y=mid),col="black")
fig5
summary(mod_sim)
##
## Iterations = 11001:21000
## Thinning interval = 1
## Number of chains = 1
## Sample size per chain = 10000
##
## 1. Empirical mean and standard deviation for each variable,
## plus standard error of the mean:
##
## Mean SD Naive SE Time-series SE
## H 1.64553 0.12007 0.0012007 0.006125
## a 27.36605 3.12982 0.0312982 0.051560
## v 0.06214 0.05648 0.0005648 0.003131
##
## 2. Quantiles for each variable:
##
## 2.5% 25% 50% 75% 97.5%
## H 1.360894 1.57999 1.66623 1.73073 1.8291
## a 21.709034 25.14207 27.26747 29.35042 34.1506
## v 0.001576 0.01867 0.04555 0.09008 0.2053
plot(mod_sim)
ms <-ggs(mod_sim)
ms %>% spread(Parameter, value) -> ms
x<-seq(0,0.4,length=100)
pred<-function(x=x,mt=ms){
pred<-mt$a*x/(1+mt$a*x*mt$H)
return(list(quantile(pred,c(0.025,0.5,0.975))))
}
pred_mod<-data.frame(x=x,do.call("rbind",sapply(x,pred)))
names(pred_mod)<-c("x","lwr","mid","upr")
fig7<-g0+geom_point()+ geom_line(data=pred_mod,aes(x=x,y=lwr),col="blue") + geom_line(data=pred_mod,aes(x=x,y=upr),col="red") + geom_line(data=pred_mod,aes(x=x,y=mid),col="black")
fig7
The key message from figure 7 is that some prior assumptions regarding a non zero vigilance term is needed if the estimated handling time is going to be set to a realistically low level (i.e potentially higher predicted rate of feeding at the asymptote). The Bayesian model, like the more conventional least squares fit, is still assigning unexplained variability to an error around the mean values. In this case the error forms a gamma distribution, which leads to asymmetric confidence bands. However a key element of the system being modelled needs to be included through a deliberately informative prior. The advantage of using Bayesian model fitting over quantile regression is that this is made explicit and uncertainty regarding the vigilance term is also returned through the posterior.
a_mu<-20
a_sig<-10
H_mu<-2
H_sig<-2
v<-0.2
summary(mod_sim)
##
## Iterations = 11001:21000
## Thinning interval = 1
## Number of chains = 1
## Sample size per chain = 10000
##
## 1. Empirical mean and standard deviation for each variable,
## plus standard error of the mean:
##
## Mean SD Naive SE Time-series SE
## H 1.3131 0.07786 0.0007786 0.003463
## a 29.2936 3.10480 0.0310480 0.051632
## v 0.2378 0.03420 0.0003420 0.001539
##
## 2. Quantiles for each variable:
##
## 2.5% 25% 50% 75% 97.5%
## H 1.140 1.2664 1.3227 1.3673 1.4418
## a 23.646 27.1700 29.1155 31.2783 35.7241
## v 0.201 0.2113 0.2276 0.2547 0.3256
The aim of a Bayesian analysis with informative priors is to provide a justifiable model based on the assumptions made. All models are wrong, but some are useful. Therefore there is no “best model” to be evaluated on the grounds of a fit to the data provided if the data themselves do not include all aspects of the system. This is the case here, as vigilance is not explicitly included in the data used for estimating parameters. In order to avoid accusations of subjectivity a Bayesian analysis should include some modifications to the priors in order to investigate sensitivity to the assumptions. In this case major modifications can be made to the prior distributions of either a or H without causing major modifications to the eventual model. However, as would be expected, the model is highly sensitive to all assumptions regarding vigilance.
The code can be converted into a shiny app with sliders used to set parameters in order to allow a range of options to be quickly investigated and visualised.
http://r.bournemouth.ac.uk:3838/jags_functional_response/
Abrams, P. A., 1982. Functional responses of optimal foragers. Am Nat, 120 (3), 382–390.
Edwards, D., 1996. Comment: The first data analysis should be journalistic. ECOLOGICAL APPLICATIONS, 6 (4), 1090–1094.
Holling, C. S., 1959. Some characteristics of simple types of predation and parasitism. The Canadian Entomologist [online], 91 (07), 385–398. Available from: http://www.journals.cambridge.org/abstract\_S0008347X00072692.
Koenker, R., 2018. Quantreg: Quantile regression [online]. Available from: https://CRAN.R-project.org/package=quantreg.
Scharf, F. S., Juanes, F., and Sutherland, M., 1998. Inferring ecological relationships from the edges of scatter diagrams. Ecology, 79 (2), 448–460.
Spiess, A.-N., 2018. Propagate: Propagation of uncertainty [online]. Available from: https://CRAN.R-project.org/package=propagate.
Stillman, R. a., 2008. MORPHAn individual-based model to predict the effect of environmental change on foraging animal populations. Ecological Modelling [online], 216 (3-4), 265–276. Available from: http://linkinghub.elsevier.com/retrieve/pii/S0304380008002020.
Stillman, R. A. and Simmons, V. L., 2006. Predicting the functional response of a farmland bird. Functional Ecology [online], 20 (4), 723–730. Available from: http://doi.wiley.com/10.1111/j.1365-2435.2006.01155.x http://onlinelibrary.wiley.com/doi/10.1111/j.1365-2435.2006.01155.x/full.
Stillman, R. A., Smart, S. L., and Norris, K. J., 2008. Measuring the functional responses of farmland birds: An example for a declining seed-feeding bunting. Journal of Animal Ecology, 77 (4), 687–695.
Triplet, P., Stillman, R. A., and Goss-Custard, J. D., 1999. Prey abundance and the strength of interference in a foraging shorebird. Journal of Animal Ecology, 68 (2), 254–265.
Yates, M. G., Stillman, R. A., and Goss-Custard, J. D., 2000. Contrasting interference functions and foraging dispersion in two species of shorebird (charadrii). JOURNAL OF ANIMAL ECOLOGY, 69 (2), 314–322.