Bayesian credible intervals for diversity indices

Background

The ideas behind the code below occurred to me way back in 2004. At the time I was researching Mexican forest biodiversity. My PhD had focussed on individual based models of forest dynamics. Although I needed to find parameters for IBMs using statistical analysis I had not really based the work as a whole on inferential statistics. At one point I had used some Bayesian networks within my PhD as a tool for understanding probabilistic decision making. So the basics of Bayes theorem were familiar.

When I started collaborating with other researchers I began to think more about the meaning of statistical significance. I realised that I had been looking at research problems from a rather Bayesian perspective. I didn’t fully understand all the nuances and differences between frequentist and Bayesian inference . I just found myself attaching more importance to confidence intervals than p-values. Specifically, I couldn’t realy see the point of most null hypothesis testing (Johnson 1999). Arguments made by the likes of Ellison (2004) made a lot of sense to me. So I tried hard to get my head around some Bayesian theory. As I didn’t have a mathematical background, a great deal of this was impenetrable to me. However Monte Carlo Markov Chains (MCMC) did make a lot of sense. Monte Carlo techniques are used all the time by modellers in order to include uncertainty regarding model parameters. Many of the aspects involved in MCMC Bayesian model fitting were already reasonably familiar. So, I experimented a bit with MCMC. At the time WInBugs was the main tool for model fitting, which was time consuming and rather awkward to use from R, It’s now easy using JAGS (Plummer 2018)

The issue of unobserved species in a sample is a constant cause of concern when analysing available data on tropical species diversity. A project I was working on had stated in the methodology provided to the funding body that it would use results from a program called EstimateS (Colwell and Coddington (1994)) to reduce bias when studying biodiversity patterns. This program used a series of “non-parametric” estimators of species richness to estimate the “true” species number from a sample. As the term “non-parametric” does not necessarily imply assumption free inference, I was rather concerned by this, particularly as the program was not doing anything useful. I looked into the assumptions of some of these estimators. I found that most of them were based on a uniform underlying relative abundance distribution. This made no sense to me. Uniform abundances would never be found in any ecological community (O’Hara {2005}). The estimators were all just adding an arbitrary number to observed species richness based on the number of observed singletons. At the same time Hubbell (2001) had presented an extremely influential book that (to cut a very long story short) had posited a semi-mechanistic explanation for the fact that relative species abundances seemed to almost ubiquitously follow a ·J shaped" log series. Hubbell based his model on neutral theory. The log series was first developed by Fisher (Fisher et al. 1943). Fisher’s original paper derived the log series as a distribution that explicitly applied to a sample, but not to the underlying population. It seemed odd to me that Hubbell could use the same distribution in the context of a meta-community. A much more reasonable model for the underlying species pool had been developed by Preston (1962).

So, while reading all this literature I noticed that, from a Bayesian perspective, the two models were actually based on the same thing. Fisher had used a Gamma distribution as the prior for a set of poisson distributions for the sampled count of species abundances. Preston didn’t take into account any stochasticity in the sampling, so he had used an arbitrary “veil line” for unobserved species. However if he had taken stochastic sampling into account then a joint distribution with an underlying log-normal for mean species abundances and poisson sampling for the observed values would not be distinguishable from a model with a gamma. Researchers such as Dewdney (1998) trying to evaluate whether a log-normal or log-series are the “best fit” to the data seemed to be chasing a red herring . Preston treated small samples as partial representations of the population without considering stochastic variability. SO, in essence both of these great mathematical ecologists were right. They were simply looking at the same effect at different scales. Hence the title of my paper (Golicher et al. 2006).

Before the paper was published I had already changed my focus towards spatial aspects of patterns of tropical tree diversity. Estimating species richness without a spatial context seemed a bit futile. I had first intended to write up some of my thoughts as a brief paper that just pointed out the similarities between the models. As I had limited mathematics developing a formal proof was not possible. In the end I just presented a Bayesian analysis of a simple data set as a practical example of how Bayesian techniques could be applied. I then forgot all about it. The argument that confidence intervals and credible intervals were effectively the same thing convinced me that it was all not worth arguing about (Edwards 1996). At that time fitting models using WinBugs was very time consuming and involved a great deal of rigmarole regarding convergence criteria. Edwards arguments against the use of informative priors did have some merit. Most of the users of Bayesian techniques avoided them. MCMC seemed to be mainly used to fit complex hierarchical mixed effects models that didn’t use prior information. Computer scientists were working on making this simpler and this didn’t seem to be something I could engage with.

However, by chance, last week I found the original WinBugs and R code I had submitted as supporting material for the article on figshare. I realised that the method could in fact still prove useful in some practical circumstances. In order to demonstrate the advantages of MCMC I had used an subsidiary benefit of Bayesian inference using MCMC. When calculating joint posterior distributions for the parameters of interest it is also possible to calculate distributions for any quantities that depend on them. What I had provided in the paper was a means of calculating credible intervals for the differences in diversity indices between vectors of species abundances from pairs of sites. In the big scheme of things this is of minor interest, as a full difference matrix for a region would expand to include all possible combinations. However many researchers actually work on quite modest data sets and struggle to justify differences in diversity indices between sites.

The application

Say you visit a site and produce a vector of abundances of each species observed. In the paper the example used was butterflies (Fisher based his paper on moths). The method makes most sense for a mobile species where precise position and timing of an observation is not relevant. You will end up with a very simple set of data for each site, which could be a transect in the case of butterflies. Logically the strength of inference regarding whether there is a genuine difference between one site (or transect) and any another depends on the number of observations made, which in this case is simply the total number of individuals identified for each species. If (say) you find just one individual of some species this may well be a chance observation from a poisson distribution with a range of low mean abundances. Next time you return you may not see the species again, if the true mean abundance is very low. The mean abundances are taken from either a right skewed gamma distribution or from a log-normal. The observations are “over dispersed” as there is extra poisson variability. This is in essence very simular to fitting a negative binomial. So, then set a vague non-informative prior on the underlying distribution and use each observation from each species in the calculation of the likelihood. Combine these using MCMC to find the joint posterior distributions. At the same time calculate some of the commonly used diversity indices (Shannon’s, Simpson’s etc) based on the simulated draws.

Jags model code

library(rjags)
library(ggmcmc)

gamma_model <-"
model{
for (i in 1:plots)
{
  for (j in 1 : spp)                 #Loop through the species
  {
  muSp[i,j] ~ dgamma(Alph[i], Beta[i])  #muSp is the mean abundance
  lambda[i,j] <- muSp[i,j]   #lambda is the poisson parameter
    x[i,j] ~ dpois(lambda[i,j] )    #x's are data points
    p[i,j]<-lambda[i,j]/tot[i]         #p's are proportional posterior abundances 
    H1[i,j]<--p[i,j]*log(p[i,j])      # A step in calculating Shannon's index
    S1[i,j]<-p[i,j]*p[i,j]            # A step in calculating Simpson's index    
    }
######################################################################
Shan[i]<-sum(H1[i, ])                                   # Shannon's index
Simp[i]<-sum(S1[i, ])                  # Simpson's index
InvS[i]<-1/Simp[i]                     # The inverse of Simpson's index
Hill[i]<-(InvS[i]-1)/(exp(Shan[i])-1)
PieJ[i]<-Shan[i]/log(spp)
Equi[i]<-exp(Shan[i])/spp
tot[i]<-sum(lambda[i, ])    # A step in calculating proportional abundances           
##################################################################
#######Priors
Alph[i] ~ dexp(1)
Beta[i] ~ dgamma(0.1, 1.0)
Shap[i]<-Alph[i]
Scal[i]<-1/Beta[i]
Mean[i]<-Shap[i]*Scal[i]
Sigm[i]<- sqrt(Shap[i]*Scal[i]*Scal[i])
########################################################
for (n in 1:(i-1)){
DifAl[n,i]<-Alph[n]-Alph[i] ## Calculate all the differences between pairs of sites
DifSi[n,i]<-Sigm[n]-Sigm[i]
DifSh[n,i]<-Shan[n]-Shan[i]
DifIS[n,i]<-InvS[n]-InvS[i]
DifHi[n,i]<-Hill[n]-Hill[i]
DifPJ[n,i]<-PieJ[n]-PieJ[i]
DifEq[n,i]<-Equi[n]-Equi[i]
 for (j in 1 : spp)
{
 a1[n,i,j]<-abs(lambda[n,j]-lambda[i,j])
 ra1[n,i,j]<-abs(lambda[n,j]/tot[n]-lambda[i,j]/tot[i])
 e1[n,i,j]<-pow((lambda[n,j]-lambda[i,j]),2)
}
CityBlock[n,i]<-sum(a1[n,i, ])  ## Calculate all the differences between pairs of sites
Euclidean[n,i]<-sqrt(sum(e1[n,i, ]))
Sorenson[n,i]<-sum(a1[n,i, ])/(tot[i]+tot[n])
RelSorenson[n,i]<-0.5*sum(ra1[n,i,])
Jacard[n,i]<-2*sum(a1[n,i, ])/(tot[i]+tot[n]+sum(a1[n,i, ]))

}
}
}"


log_normal_model<-"model{
for (i in 1:plots)
{
  for (j in 1 : spp)                 #Loop through the species
  {
  muSp[i,j] ~ dnorm(muS[i], tauS[i])  #muSp is the mean of the logarithmn of abundance
  log(lambda[i,j]) <- muSp[i,j]   #lambda is the poisson parameter
    x[i,j] ~ dpois(lambda[i,j] )    #x's are data points
    p[i,j]<-lambda[i,j]/tot[i]         #p's are proportional posterior abundances 
    H1[i,j]<--p[i,j]*log(p[i,j])      # A step in calculating Shannon's index
    S1[i,j]<-p[i,j]*p[i,j]            # A step in calculating Simpson's index    
    }
######################################################################
#Some example diversity indices
Shan[i]<-sum(H1[i, ])                                   # Shannon's index
Simp[i]<-sum(S1[i, ])                  # Simpson's index
InvS[i]<-1/Simp[i]                     # The inverse of Simpson's index
Hill[i]<-(InvS[i]-1)/(exp(Shan[i])-1)
PieJ[i]<-Shan[i]/log(spp)
Equi[i]<-exp(Shan[i])/spp
tot[i]<-sum(lambda[i, ])    # A step in calculating proportional abundances           
##################################################################
#Priors
    muS[i]~dnorm(0, 0.001)
  tauS[i]~ dgamma(1,0.01)
  Sigm[i]<-1/sqrt(tauS[i])
#####################################3
for (n in 1:(i-1)){
DifSi[n,i]<-Sigm[n]-Sigm[i] ## Calculate all the differences between pairs of sites
DifSh[n,i]<-Shan[n]-Shan[i]
DifIS[n,i]<-InvS[n]-InvS[i]
DifHi[n,i]<-Hill[n]-Hill[i]
DifPJ[n,i]<-PieJ[n]-PieJ[i]
DifEq[n,i]<-Equi[n]-Equi[i]
 for (j in 1 : spp)
{
 a1[n,i,j]<-abs(lambda[n,j]-lambda[i,j])
 ra1[n,i,j]<-abs(lambda[n,j]/tot[n]-lambda[i,j]/tot[i])
 e1[n,i,j]<-pow((lambda[n,j]-lambda[i,j]),2)
}
City[n,i]<-sum(a1[n,i, ]) ## Calculate all the differences between pairs of sites
Eucl[n,i]<-sqrt(sum(e1[n,i, ]))
Sore[n,i]<-sum(a1[n,i, ])/(tot[i]+tot[n])
RelS[n,i]<-0.5*sum(ra1[n,i,])
Jaca[n,i]<-2*sum(a1[n,i, ])/(tot[i]+tot[n]+sum(a1[n,i, ]))

}
}
}"


## Save the model files locally if you want, or call them from the code using textconnection()

writeLines(gamma_model,con="gamma_model.txt")
writeLines(log_normal_model,con="log_normal_model.txt")

Example

The example data set used in the paper can be downloaded here. Note that the data is transposed before use.

x<-read.table("testbugs.txt",header=T)
x<-as.matrix(x)
dt(data.frame(x))

x<-t(x) ## Transpose for the modelling to place species as column names and sites as row names

Reproducible example

To make a very easily reproducible example I’ll use some sites containen in the BCI site by species data matrix in the vegan package. This represents sub plots in the permanent plot monitored by Hubbel et al. at the Smithsonian Barro Colorado site.

library(vegan)

data(BCI)

x<-BCI[1:5,] ## Take just the first five subplots

dt(data.frame(t(x))) ## Transpose to show the sites as column headers and species as rows.

Form a data list for the model

 dataList = list(
    spp=ncol(x),    ## The species count is the number of columns
    plots=nrow(x),  ## The plots count is the number of rows
    x=x             ## x is the matrix of numbers with zeros for unobserved species
)

Run the model

For the simple practical purpose of looking at differences in diversity between sites the models are each as good as the other. It doesn’t really matter which you use. The Log-normal is nicer in the sense that it provides a measure of the standard deviation of Preston’s distribution. The veil line is not used, as species with zero abundances are provided with finite mean abundances through the prior.

adaptSteps = 500              # Number of steps to "tune" the samplers.
burnInSteps = 1000            # Number of steps to "burn-in" the samplers.
nChains = 3                   # Number of chains to run.
numSavedSteps=50000           # Total number of steps in chains to save.
thinSteps=1                   # Number of steps to "thin" (1=keep every step).
nIter = ceiling( ( numSavedSteps * thinSteps ) / nChains ) # Steps per chain.
# Create, initialize, and adapt the model:
ln_model = jags.model( textConnection(log_normal_model), data=dataList , # inits=initsList ,
                        n.chains=nChains , n.adapt=adaptSteps )

## Compiling model graph
##    Resolving undeclared variables
##    Allocating nodes
## Graph information:
##    Observed stochastic nodes: 1125
##    Unobserved stochastic nodes: 1135
##    Total graph size: 27498
## 
## Initializing model

Choose the parameters to monitor

I’ll use almost all of them.

parameters<<-c("Sigm","Shan","Simp","InvS","Hill","PieJ","Equi","DifSi","DifSh","DifIS","DifHi","DifPJ","DifEq")
 cat( "Burning in the MCMC chain...\n" )

## Burning in the MCMC chain...

update(ln_model , n.iter=burnInSteps )
# The saved MCMC chain:
cat( "Sampling final MCMC chain...\n" )

## Sampling final MCMC chain...

codaSamples = coda.samples( ln_model , variable.names=parameters ,
                            n.iter=nIter , thin=thinSteps )

The result

The result is a long set of credible intervals for all the parameters monitored. So the analysis has produced a rich set of results from some rather modest data. This could be quite useful in the context of a student’s dissertation.

summary(codaSamples)

## 
## Iterations = 1501:18167
## Thinning interval = 1 
## Number of chains = 3 
## Sample size per chain = 16667 
## 
## 1. Empirical mean and standard deviation for each variable,
##    plus standard error of the mean:
## 
##                  Mean       SD  Naive SE Time-series SE
## DifEq[1,2]  0.0502915 0.025186 1.126e-04      3.156e-04
## DifEq[1,3]  0.0594706 0.024719 1.105e-04      2.921e-04
## DifEq[2,3]  0.0091791 0.023142 1.035e-04      2.674e-04
## DifEq[1,4]  0.0191409 0.025635 1.146e-04      3.060e-04
## DifEq[2,4] -0.0311506 0.023667 1.058e-04      2.773e-04
## DifEq[3,4] -0.0403298 0.023679 1.059e-04      2.638e-04
## DifEq[1,5]  0.0179101 0.026217 1.172e-04      3.091e-04
## DifEq[2,5] -0.0323814 0.024638 1.102e-04      2.858e-04
## DifEq[3,5] -0.0415606 0.024426 1.092e-04      2.740e-04
## DifEq[4,5] -0.0012308 0.024967 1.117e-04      2.800e-04
## DifHi[1,2]  0.0305509 0.032385 1.448e-04      2.965e-04
## DifHi[1,3]  0.0713242 0.030705 1.373e-04      2.756e-04
## DifHi[2,3]  0.0407733 0.033181 1.484e-04      3.130e-04
## DifHi[1,4]  0.0345226 0.032025 1.432e-04      2.702e-04
## DifHi[2,4]  0.0039717 0.034302 1.534e-04      3.031e-04
## DifHi[3,4] -0.0368016 0.032572 1.457e-04      2.898e-04
## DifHi[1,5]  0.1006660 0.035075 1.569e-04      2.925e-04
## DifHi[2,5]  0.0701151 0.036858 1.648e-04      3.085e-04
## DifHi[3,5]  0.0293418 0.035179 1.573e-04      2.873e-04
## DifHi[4,5]  0.0661434 0.036440 1.630e-04      2.880e-04
## DifIS[1,2]  8.5239347 3.904838 1.746e-02      3.114e-02
## DifIS[1,3] 11.8967917 3.744688 1.675e-02      2.897e-02
## DifIS[2,3]  3.3728570 3.443126 1.540e-02      2.468e-02
## DifIS[1,4]  4.7556548 4.071043 1.821e-02      3.150e-02
## DifIS[2,4] -3.7682799 3.774688 1.688e-02      2.729e-02
## DifIS[3,4] -7.1411369 3.622468 1.620e-02      2.505e-02
## DifIS[1,5]  8.7576894 4.189277 1.873e-02      3.088e-02
## DifIS[2,5]  0.2337547 3.910957 1.749e-02      2.847e-02
## DifIS[3,5] -3.1391023 3.755967 1.680e-02      2.598e-02
## DifIS[4,5]  4.0020346 4.050725 1.812e-02      2.763e-02
## DifPJ[1,2]  0.0335791 0.016726 7.480e-05      2.087e-04
## DifPJ[1,3]  0.0404331 0.016719 7.477e-05      1.952e-04
## DifPJ[2,3]  0.0068540 0.017220 7.701e-05      1.988e-04
## DifPJ[1,4]  0.0120358 0.016093 7.197e-05      1.913e-04
## DifPJ[2,4] -0.0215432 0.016334 7.305e-05      1.920e-04
## DifPJ[3,4] -0.0283972 0.016663 7.452e-05      1.863e-04
## DifPJ[1,5]  0.0112777 0.016459 7.360e-05      1.939e-04
## DifPJ[2,5] -0.0223013 0.016909 7.562e-05      1.972e-04
## DifPJ[3,5] -0.0291553 0.017075 7.636e-05      1.916e-04
## DifPJ[4,5] -0.0007581 0.016206 7.248e-05      1.814e-04
## DifSh[1,2]  0.1818676 0.090587 4.051e-04      1.131e-03
## DifSh[1,3]  0.2189895 0.090551 4.050e-04      1.057e-03
## DifSh[2,3]  0.0371219 0.093268 4.171e-04      1.077e-03
## DifSh[1,4]  0.0651874 0.087162 3.898e-04      1.036e-03
## DifSh[2,4] -0.1166802 0.088469 3.956e-04      1.040e-03
## DifSh[3,4] -0.1538022 0.090250 4.036e-04      1.009e-03
## DifSh[1,5]  0.0610813 0.089141 3.986e-04      1.050e-03
## DifSh[2,5] -0.1207863 0.091583 4.096e-04      1.068e-03
## DifSh[3,5] -0.1579082 0.092479 4.136e-04      1.038e-03
## DifSh[4,5] -0.0041060 0.087775 3.925e-04      9.825e-04
## DifSi[1,2] -0.1846524 0.278996 1.248e-03      4.784e-03
## DifSi[1,3] -0.0641303 0.263583 1.179e-03      4.313e-03
## DifSi[2,3]  0.1205221 0.285046 1.275e-03      4.791e-03
## DifSi[1,4] -0.0588024 0.266164 1.190e-03      4.368e-03
## DifSi[2,4]  0.1258501 0.278252 1.244e-03      4.555e-03
## DifSi[3,4]  0.0053279 0.270992 1.212e-03      4.427e-03
## DifSi[1,5]  0.1071386 0.250919 1.122e-03      4.077e-03
## DifSi[2,5]  0.2917910 0.268629 1.201e-03      4.418e-03
## DifSi[3,5]  0.1712689 0.254982 1.140e-03      4.023e-03
## DifSi[4,5]  0.1659410 0.252642 1.130e-03      4.064e-03
## Equi[1]     0.3027451 0.018856 8.433e-05      2.397e-04
## Equi[2]     0.2524536 0.016528 7.392e-05      1.977e-04
## Equi[3]     0.2432744 0.016229 7.258e-05      1.820e-04
## Equi[4]     0.2836042 0.017071 7.634e-05      1.891e-04
## Equi[5]     0.2848350 0.018174 8.127e-05      2.010e-04
## Hill[1]     0.6026601 0.021507 9.618e-05      1.924e-04
## Hill[2]     0.5721093 0.024425 1.092e-04      2.282e-04
## Hill[3]     0.5313359 0.022084 9.876e-05      2.035e-04
## Hill[4]     0.5681376 0.023814 1.065e-04      1.924e-04
## Hill[5]     0.5019942 0.027664 1.237e-04      2.028e-04
## InvS[1]    41.4511326 2.961324 1.324e-02      2.431e-02
## InvS[2]    32.9271979 2.553554 1.142e-02      1.911e-02
## InvS[3]    29.5543409 2.299825 1.029e-02      1.602e-02
## InvS[4]    36.6954778 2.777385 1.242e-02      2.003e-02
## InvS[5]    32.6934432 2.957403 1.323e-02      1.995e-02
## PieJ[1]     0.7790298 0.011474 5.131e-05      1.456e-04
## PieJ[2]     0.7454507 0.012066 5.396e-05      1.445e-04
## PieJ[3]     0.7385967 0.012308 5.504e-05      1.381e-04
## PieJ[4]     0.7669939 0.011105 4.966e-05      1.229e-04
## PieJ[5]     0.7677521 0.011775 5.266e-05      1.301e-04
## Shan[1]     4.2193036 0.062145 2.779e-04      7.887e-04
## Shan[2]     4.0374360 0.065351 2.923e-04      7.828e-04
## Shan[3]     4.0003141 0.066661 2.981e-04      7.479e-04
## Shan[4]     4.1541162 0.060145 2.690e-04      6.656e-04
## Shan[5]     4.1582222 0.063777 2.852e-04      7.046e-04
## Sigm[1]     1.9708374 0.186385 8.335e-04      3.113e-03
## Sigm[2]     2.1554898 0.208168 9.309e-04      3.564e-03
## Sigm[3]     2.0349677 0.192408 8.605e-04      3.179e-03
## Sigm[4]     2.0296398 0.187356 8.379e-04      2.972e-03
## Sigm[5]     1.8636988 0.168069 7.516e-04      2.595e-03
## Simp[1]     0.0242484 0.001739 7.777e-06      1.415e-05
## Simp[2]     0.0305539 0.002386 1.067e-05      1.776e-05
## Simp[3]     0.0340418 0.002661 1.190e-05      1.852e-05
## Simp[4]     0.0274088 0.002093 9.359e-06      1.513e-05
## Simp[5]     0.0308410 0.002830 1.266e-05      1.908e-05
## 
## 2. Quantiles for each variable:
## 
##                  2.5%        25%        50%        75%     97.5%
## DifEq[1,2]  1.116e-03  0.0333543  0.0500376  0.0669983  0.100393
## DifEq[1,3]  1.146e-02  0.0427067  0.0593305  0.0760669  0.108466
## DifEq[2,3] -3.610e-02 -0.0063394  0.0091388  0.0247536  0.054721
## DifEq[1,4] -3.066e-02  0.0020708  0.0190185  0.0359694  0.070203
## DifEq[2,4] -7.746e-02 -0.0471039 -0.0312517 -0.0151918  0.015405
## DifEq[3,4] -8.687e-02 -0.0562112 -0.0402450 -0.0244078  0.006204
## DifEq[1,5] -3.330e-02  0.0002452  0.0176569  0.0355495  0.069920
## DifEq[2,5] -8.042e-02 -0.0489015 -0.0324681 -0.0158859  0.015911
## DifEq[3,5] -8.957e-02 -0.0580079 -0.0414312 -0.0250997  0.006218
## DifEq[4,5] -5.053e-02 -0.0181045 -0.0010872  0.0153852  0.047647
## DifHi[1,2] -3.271e-02  0.0088569  0.0304430  0.0523574  0.094575
## DifHi[1,3]  1.071e-02  0.0508380  0.0713994  0.0920939  0.130984
## DifHi[2,3] -2.436e-02  0.0184050  0.0409460  0.0630775  0.105686
## DifHi[1,4] -2.774e-02  0.0127832  0.0344654  0.0561590  0.097068
## DifHi[2,4] -6.330e-02 -0.0192986  0.0041115  0.0272329  0.071015
## DifHi[3,4] -1.000e-01 -0.0589256 -0.0369741 -0.0146995  0.026958
## DifHi[1,5]  3.303e-02  0.0768453  0.1002657  0.1243291  0.170381
## DifHi[2,5] -2.352e-03  0.0453309  0.0700335  0.0949838  0.142723
## DifHi[3,5] -3.837e-02  0.0053670  0.0290678  0.0529531  0.098962
## DifHi[4,5] -5.136e-03  0.0416906  0.0657459  0.0904698  0.138416
## DifIS[1,2]  9.329e-01  5.8883325  8.4967270 11.1350168 16.229712
## DifIS[1,3]  4.649e+00  9.3746051 11.8653305 14.4096221 19.303907
## DifIS[2,3] -3.424e+00  1.0662499  3.3758740  5.6810995 10.152861
## DifIS[1,4] -3.214e+00  2.0408932  4.7285565  7.4700686 12.850163
## DifIS[2,4] -1.120e+01 -6.3018690 -3.7511423 -1.2246414  3.655434
## DifIS[3,4] -1.425e+01 -9.5749801 -7.1484137 -4.7244483 -0.074157
## DifIS[1,5]  5.364e-01  5.9269310  8.7510048 11.5609252 17.007051
## DifIS[2,5] -7.447e+00 -2.3959181  0.2481307  2.8589651  7.900231
## DifIS[3,5] -1.057e+01 -5.6647357 -3.1191763 -0.5866808  4.191437
## DifIS[4,5] -3.993e+00  1.2920772  4.0121445  6.7305949 11.909792
## DifPJ[1,2]  7.456e-04  0.0223207  0.0334774  0.0447505  0.066599
## DifPJ[1,3]  7.835e-03  0.0291103  0.0403691  0.0517234  0.073428
## DifPJ[2,3] -2.693e-02 -0.0047328  0.0068275  0.0184883  0.040776
## DifPJ[1,4] -1.927e-02  0.0013090  0.0120075  0.0227156  0.043864
## DifPJ[2,4] -5.348e-02 -0.0325832 -0.0216090 -0.0104848  0.010537
## DifPJ[3,4] -6.098e-02 -0.0396242 -0.0283676 -0.0171635  0.004360
## DifPJ[1,5] -2.087e-02  0.0001499  0.0111552  0.0223806  0.043779
## DifPJ[2,5] -5.511e-02 -0.0336968 -0.0223961 -0.0109496  0.010965
## DifPJ[3,5] -6.256e-02 -0.0406770 -0.0291633 -0.0176473  0.004412
## DifPJ[4,5] -3.251e-02 -0.0117413 -0.0007081  0.0100487  0.031093
## DifSh[1,2]  4.038e-03  0.1208909  0.1813167  0.2423733  0.360708
## DifSh[1,3]  4.244e-02  0.1576644  0.2186430  0.2801391  0.397692
## DifSh[2,3] -1.459e-01 -0.0256334  0.0369785  0.1001343  0.220846
## DifSh[1,4] -1.043e-01  0.0070895  0.0650337  0.1230300  0.237573
## DifSh[2,4] -2.897e-01 -0.1764742 -0.1170363 -0.0567870  0.057071
## DifSh[3,4] -3.303e-01 -0.2146089 -0.1536417 -0.0929593  0.023614
## DifSh[1,5] -1.131e-01  0.0008119  0.0604177  0.1212155  0.237110
## DifSh[2,5] -2.985e-01 -0.1825053 -0.1212996 -0.0593041  0.059386
## DifSh[3,5] -3.388e-01 -0.2203109 -0.1579511 -0.0955797  0.023894
## DifSh[4,5] -1.761e-01 -0.0635923 -0.0038349  0.0544249  0.168404
## DifSi[1,2] -7.438e-01 -0.3681631 -0.1820442  0.0008717  0.354828
## DifSi[1,3] -5.831e-01 -0.2405644 -0.0628456  0.1130511  0.454621
## DifSi[2,3] -4.343e-01 -0.0709604  0.1180215  0.3100173  0.683676
## DifSi[1,4] -5.807e-01 -0.2364646 -0.0591815  0.1194751  0.468435
## DifSi[2,4] -4.150e-01 -0.0600813  0.1241965  0.3115888  0.674725
## DifSi[3,4] -5.241e-01 -0.1747984  0.0013342  0.1850798  0.544979
## DifSi[1,5] -3.804e-01 -0.0623753  0.1056338  0.2727197  0.604893
## DifSi[2,5] -2.259e-01  0.1104923  0.2883187  0.4711856  0.828618
## DifSi[3,5] -3.178e-01  0.0001822  0.1684415  0.3374207  0.686983
## DifSi[4,5] -3.294e-01 -0.0015813  0.1634758  0.3332026  0.670889
## Equi[1]     2.679e-01  0.2897537  0.3019994  0.3150645  0.341628
## Equi[2]     2.219e-01  0.2410261  0.2518418  0.2631952  0.286525
## Equi[3]     2.130e-01  0.2320366  0.2428369  0.2539532  0.276723
## Equi[4]     2.516e-01  0.2719010  0.2830638  0.2948052  0.318474
## Equi[5]     2.507e-01  0.2722998  0.2842544  0.2967419  0.321954
## Hill[1]     5.591e-01  0.5884987  0.6031166  0.6173543  0.643765
## Hill[2]     5.229e-01  0.5560015  0.5725782  0.5887222  0.619006
## Hill[3]     4.875e-01  0.5166151  0.5313872  0.5462623  0.574346
## Hill[4]     5.205e-01  0.5522916  0.5686662  0.5844020  0.613609
## Hill[5]     4.459e-01  0.4836053  0.5026574  0.5210189  0.554095
## InvS[1]     3.587e+01 39.4140425 41.3773663 43.3956802 47.491426
## InvS[2]     2.810e+01 31.1772912 32.8584526 34.5950218 38.143974
## InvS[3]     2.522e+01 27.9759643 29.4805869 31.0572036 34.272497
## InvS[4]     3.143e+01 34.8054194 36.6356436 38.5309645 42.291451
## InvS[5]     2.705e+01 30.6798921 32.6275048 34.6552341 38.667895
## PieJ[1]     7.568e-01  0.7712886  0.7789313  0.7867510  0.801697
## PieJ[2]     7.220e-01  0.7372925  0.7453972  0.7535387  0.769220
## PieJ[3]     7.145e-01  0.7302745  0.7386745  0.7469387  0.762793
## PieJ[4]     7.452e-01  0.7595470  0.7669757  0.7744797  0.788738
## PieJ[5]     7.445e-01  0.7598177  0.7677507  0.7756887  0.790745
## Shan[1]     4.099e+00  4.1773765  4.2187702  4.2611225  4.342069
## Shan[2]     3.910e+00  3.9932502  4.0371462  4.0812413  4.166172
## Shan[3]     3.870e+00  3.9552401  4.0007353  4.0454950  4.131363
## Shan[4]     4.036e+00  4.1137831  4.1540176  4.1946600  4.271885
## Shan[5]     4.033e+00  4.1152490  4.1582148  4.2012080  4.282754
## Sigm[1]     1.638e+00  1.8419425  1.9595456  2.0890201  2.369621
## Sigm[2]     1.785e+00  2.0098178  2.1423000  2.2868885  2.599621
## Sigm[3]     1.692e+00  1.9015192  2.0225293  2.1553534  2.447770
## Sigm[4]     1.695e+00  1.9000938  2.0176957  2.1473812  2.430728
## Sigm[5]     1.563e+00  1.7454781  1.8538616  1.9704446  2.218597
## Simp[1]     2.106e-02  0.0230438  0.0241678  0.0253717  0.027877
## Simp[2]     2.622e-02  0.0289059  0.0304336  0.0320746  0.035584
## Simp[3]     2.918e-02  0.0321986  0.0339206  0.0357450  0.039646
## Simp[4]     2.365e-02  0.0259532  0.0272958  0.0287312  0.031821
## Simp[5]     2.586e-02  0.0288557  0.0306490  0.0325946  0.036963

Visualisation

A full analysis would look at some of the diagnostics in more detail, but the chains are generally well behaved and well mixed. So there

d<-ggs(codaSamples)

## Warning: `tbl_df()` is deprecated as of dplyr 1.0.0.
## Please use `tibble::as_tibble()` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_warnings()` to see where this warning was generated.

d %>% filter(grepl("Sigm",Parameter)) ->Sigma
d %>% filter(grepl("Shan",Parameter)) ->Shan
d %>% filter(grepl("DifSh",Parameter)) ->DifSh

Underlying distribution

A caterpillar plt shows that in this case there is little difference between the underlying distributions inferred from the samples taken from the five plots. That makes good sense, as the underlying species pool at BCI is in fact the same.

ggs_caterpillar(Sigma)

Shannon’s

When looking at the diversity indices calculated for the samples we see rather greater differences.

ggs_caterpillar(Shan)

Pair-wise comparisons of Shannon’s index between plots

Because the code calulated pair wise differences, we can (Bingo!) now make some statements about differences in observed diversity between the plots as measured by Shannon’s index. Differences in the credible intervals that don’t include zero may be considered as “significant”. Plot 1 does appear to have the highest species diversity according to this particular index, and plot 3 has the lowest. So, there are now some potentially interpretable differences to be looked at. Other diversity indices can be used in the same way. Equitability is often informative. So a rather simple data set can be enriched through combining the data with a prior probability derived through theoretical considerations. Which could often prove useful in the setting of a student’s dissertation.

ggs_caterpillar(DifSh) + geom_vline(xintercept=0,col="red")

References

Colwell, R. K., and J. A. Coddington. 1994. Estimating Terrestrial Biodiversity through Extrapolation. Philosophical Transactions of the Royal Society of London Series B-Biological Sciences 345:101–118.

Dewdney, A. K. 1998. A General Theory of the Sampling Process with Applications to the “Veil Line”. Theoretical Population Biology 54:294–302.

Edwards, D. 1996. Comment: The first data analysis should be journalistic. ECOLOGICAL APPLICATIONS 6:1090–1094.

Ellison, A. M. 2004. Bayesian inference in ecology. Ecology Letters 7:509–520.

Fisher, R. A., A. S. Corbet, and C. B. Williams. 1943. The relation between the number of species and the number of individuals in a random sample of an animal population. Journal of Animal Ecology 12:42–58.

Golicher, D. J., R. B. O’Hara, L. Ruíz-Montoya, and L. Cayuela. 2006. Lifting a veil on diversity: a Bayesian approach to fitting relative-abundance models. Ecological applications : a publication of the Ecological Society of America 16:202–12.

Hubbell, S. 2001. The unified neutral theory of biodiversity and biogeography. Princeton Univ.

Johnson, D. H. 1999. The Insignificance of Statistical Significance Testing. The Journal of Wildlife Management 63:763.

O’Hara, R. {2005}. Species richness estimators: how many species can dance on the head of a pin? JOURNAL OF ANIMAL ECOLOGY 74:375–386.

Plummer, M. 2018. Rjags: Bayesian graphical models using mcmc.

Preston, F. 1962. The Canonical Distribution of Commonness and Rarity : Part I Author ( s ): F . W . Preston Published by : Ecological Society of America Stable URL : http://www.jstor.org/stable/1931976. Ecology 43:185–215.