Chapter 20 Analysis of multi-species data

One of the commonest tasks that quantitative ecologists face is the analysis of site by species matrices. Such data are routinely produced as a result of many types of ecological survey. Because multiple species are involved such data is always challenging to analyse. There are a lot of questions that can be addressed with multispecies data. Many of these involve analysis of species diversity in some respect.

Species diversity is classically regarded as consisting of three components.

\(gamma=alpha+beta\)

Alpha diversity is measured at the scale of some sampling unit (i.e. plot or quadrat). Gamma diversity is the overall diversity over the whole area that is being measured. Beta diversity is the difference between the two. Or is it? The issues involved in analysing diversity are extremely complex and could form the basis of a whole taught unit.

For the moment let’s accept this simplified version of the issue for the pragmatic purpose of finding some methods that will allow us to work with a sites by species matrix.

20.1 Working with the sites by species matrix

One approach to working with the species by sites matrix would be to produce a measure that could be used as a response variable in some other model. In this class we will look at some simple measures of species diversity. The next class will apply some more complex methods in order to look at species composition. However you first need to become familiar with the format of the matrix in order to work with the data comfortably.

20.1.1 BCI data

Let’s look at some realistically complex data from a classic study of tropical diversity. The Smithsonian institute has a permanent sample plot of 50 hectares of tropical forest located on the Island of Barro Colorado in Panama.

The data set consists of a grid of completely censused plot of 100m x 100m within which all trees over 5cm in diameter were counted and identified to species.

library(rgdal)
x<- rep(seq(625754, 626654, by=100),each=5)
y<- rep(seq(1011569,  1011969, by=100),len=50) 
coords<-data.frame(x,y)
dd<-data.frame(coords,id=1:100)
coordinates(dd)<-~x+y
proj4string(dd)<-CRS("+init=epsg:32617")
dd<-spTransform(dd, CRS("+init=epsg:4326"))
library(mapview)
m<-mapview(dd)
library(leaflet.extras)
m@map %>% addFullscreenControl()
library(vegan)
data(BCI)

You can look at this matrix using

str(BCI)
## 'data.frame':    50 obs. of  225 variables:
##  $ Abarema.macradenia              : int  0 0 0 0 0 0 0 0 0 1 ...
##  $ Vachellia.melanoceras           : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ Acalypha.diversifolia           : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ Acalypha.macrostachya           : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ Adelia.triloba                  : int  0 0 0 3 1 0 0 0 5 0 ...
##  $ Aegiphila.panamensis            : int  0 0 0 0 1 0 1 0 0 1 ...
##  $ Alchornea.costaricensis         : int  2 1 2 18 3 2 0 2 2 2 ...
##  $ Alchornea.latifolia             : int  0 0 0 0 0 1 0 0 0 0 ...
##  $ Alibertia.edulis                : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ Allophylus.psilospermus         : int  0 0 0 0 1 0 0 0 0 0 ...
##  $ Alseis.blackiana                : int  25 26 18 23 16 14 18 14 16 14 ...
##  $ Amaioua.corymbosa               : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ Anacardium.excelsum             : int  0 0 0 0 0 0 0 1 0 0 ...
##  $ Andira.inermis                  : int  0 0 0 0 1 1 0 0 1 0 ...
##  $ Annona.spraguei                 : int  1 0 1 0 0 0 0 1 1 0 ...
##  $ Apeiba.glabra                   : int  13 12 6 3 4 10 5 4 5 5 ...
##  $ Apeiba.tibourbou                : int  2 0 1 1 0 0 0 1 0 0 ...
##  $ Aspidosperma.desmanthum         : int  0 0 0 1 1 1 0 0 0 1 ...
##  $ Astrocaryum.standleyanum        : int  0 2 1 5 6 2 2 0 2 1 ...
##  $ Astronium.graveolens            : int  6 0 1 3 0 1 2 2 0 0 ...
##  $ Attalea.butyracea               : int  0 1 0 0 0 1 1 0 0 0 ...
##  $ Banara.guianensis               : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ Beilschmiedia.pendula           : int  4 5 7 5 8 6 5 9 11 14 ...
##  $ Brosimum.alicastrum             : int  5 2 4 3 2 2 6 4 3 6 ...
##  $ Brosimum.guianense              : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ Calophyllum.longifolium         : int  0 2 0 2 1 2 2 2 2 0 ...
##  $ Casearia.aculeata               : int  0 0 0 0 0 0 0 1 0 0 ...
##  $ Casearia.arborea                : int  1 1 3 2 4 1 2 3 9 7 ...
##  $ Casearia.commersoniana          : int  0 0 1 0 1 0 0 0 1 0 ...
##  $ Casearia.guianensis             : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ Casearia.sylvestris             : int  2 1 0 0 0 3 1 0 1 1 ...
##  $ Cassipourea.guianensis          : int  2 0 1 1 3 4 4 0 2 1 ...
##  $ Cavanillesia.platanifolia       : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ Cecropia.insignis               : int  12 5 7 17 21 4 0 7 2 16 ...
##  $ Cecropia.obtusifolia            : int  0 0 0 0 1 0 0 2 0 2 ...
##  $ Cedrela.odorata                 : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ Ceiba.pentandra                 : int  0 1 1 0 1 0 0 1 0 1 ...
##  $ Celtis.schippii                 : int  0 0 0 2 2 0 1 0 0 0 ...
##  $ Cespedesia.spathulata           : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ Chamguava.schippii              : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ Chimarrhis.parviflora           : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ Maclura.tinctoria               : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ Chrysochlamys.eclipes           : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ Chrysophyllum.argenteum         : int  4 1 2 2 6 2 3 2 4 2 ...
##  $ Chrysophyllum.cainito           : int  0 0 0 0 0 0 1 0 0 0 ...
##  $ Coccoloba.coronata              : int  0 0 0 1 2 0 0 1 2 1 ...
##  $ Coccoloba.manzinellensis        : int  0 0 0 0 0 0 0 2 0 0 ...
##  $ Colubrina.glandulosa            : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ Cordia.alliodora                : int  2 3 3 7 1 1 2 0 0 2 ...
##  $ Cordia.bicolor                  : int  12 14 35 23 13 7 5 10 7 13 ...
##  $ Cordia.lasiocalyx               : int  8 6 6 11 7 6 6 3 0 4 ...
##  $ Coussarea.curvigemma            : int  0 0 0 1 0 2 1 0 1 1 ...
##  $ Croton.billbergianus            : int  2 2 0 11 6 0 0 4 2 0 ...
##  $ Cupania.cinerea                 : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ Cupania.latifolia               : int  0 0 0 1 0 0 0 0 0 0 ...
##  $ Cupania.rufescens               : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ Cupania.seemannii               : int  2 2 1 0 3 0 1 2 2 0 ...
##  $ Dendropanax.arboreus            : int  0 3 6 0 5 2 1 6 1 3 ...
##  $ Desmopsis.panamensis            : int  0 0 4 0 0 0 0 0 0 1 ...
##  $ Diospyros.artanthifolia         : int  1 1 1 1 0 0 0 0 0 1 ...
##  $ Dipteryx.oleifera               : int  1 1 3 0 0 0 0 2 1 2 ...
##  $ Drypetes.standleyi              : int  2 1 2 0 0 0 0 0 0 0 ...
##  $ Elaeis.oleifera                 : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ Enterolobium.schomburgkii       : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ Erythrina.costaricensis         : int  0 0 0 0 0 3 0 0 1 0 ...
##  $ Erythroxylum.macrophyllum       : int  0 1 0 0 0 0 0 1 1 1 ...
##  $ Eugenia.florida                 : int  0 1 0 7 2 0 0 1 1 3 ...
##  $ Eugenia.galalonensis            : int  0 0 0 0 0 0 0 1 0 0 ...
##  $ Eugenia.nesiotica               : int  0 0 1 0 0 0 5 4 3 0 ...
##  $ Eugenia.oerstediana             : int  3 2 5 1 5 2 2 3 3 3 ...
##  $ Faramea.occidentalis            : int  14 36 39 39 22 16 38 41 33 42 ...
##  $ Ficus.colubrinae                : int  0 1 0 0 0 0 0 0 0 0 ...
##  $ Ficus.costaricana               : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ Ficus.insipida                  : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ Ficus.maxima                    : int  1 0 0 0 0 0 0 0 0 0 ...
##  $ Ficus.obtusifolia               : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ Ficus.popenoei                  : int  0 0 0 0 0 0 1 0 0 0 ...
##  $ Ficus.tonduzii                  : int  0 0 1 2 1 0 0 0 0 0 ...
##  $ Ficus.trigonata                 : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ Ficus.yoponensis                : int  1 0 0 0 0 1 1 0 0 0 ...
##  $ Garcinia.intermedia             : int  0 1 1 3 2 1 2 2 1 0 ...
##  $ Garcinia.madruno                : int  4 0 0 0 1 0 0 0 0 1 ...
##  $ Genipa.americana                : int  0 0 1 0 0 0 1 0 1 1 ...
##  $ Guapira.myrtiflora              : int  3 1 0 1 1 7 3 1 1 1 ...
##  $ Guarea.fuzzy                    : int  1 1 0 1 3 0 0 2 0 3 ...
##  $ Guarea.grandifolia              : int  0 0 0 0 0 0 0 1 0 0 ...
##  $ Guarea.guidonia                 : int  2 6 2 5 3 4 4 0 1 5 ...
##  $ Guatteria.dumetorum             : int  6 16 6 3 9 7 8 6 2 2 ...
##  $ Guazuma.ulmifolia               : int  0 0 0 1 0 0 0 0 0 0 ...
##  $ Guettarda.foliacea              : int  1 5 1 2 1 0 0 4 1 3 ...
##  $ Gustavia.superba                : int  10 5 0 1 3 1 8 4 4 4 ...
##  $ Hampea.appendiculata            : int  0 0 1 0 0 0 0 0 2 1 ...
##  $ Hasseltia.floribunda            : int  5 9 4 11 9 2 7 6 3 4 ...
##  $ Heisteria.acuminata             : int  0 0 0 0 1 1 0 0 0 0 ...
##  $ Heisteria.concinna              : int  4 5 4 6 4 8 2 5 1 5 ...
##  $ Hirtella.americana              : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ Hirtella.triandra               : int  21 14 5 4 6 6 7 14 8 7 ...
##  $ Hura.crepitans                  : int  0 0 0 0 0 2 1 1 0 0 ...
##  $ Hieronyma.alchorneoides         : int  0 2 0 0 0 0 0 0 1 0 ...
##   [list output truncated]
##  - attr(*, "original.names")= chr  "Abarema.macradenium" "Acacia.melanoceras" "Acalypha.diversifolia" "Acalypha.macrostachya" ...
dim(BCI)
## [1]  50 225

The researchers therefore found 225 species in the 50 plots. The area is quite homogeneous, but there is a very high diversity of tree species. There are over 3000 species of tree in Panama.

Whether 225 is considered as a measure of gamma diversity (for the 50 plots) or alpha diversity (for 50 ha plots within an area with a higher gamma diversity clearly depends on the scale at which diversity is being observed and studies. There are complex issues here.

20.1.2 Reshaping the site by species matrix

Note that there are many zeros in the site by species matrix. As we have seen before, this is not strictly a “raw” data format, as it can be derived from a more compact table format. Here is how to interchange the two formats using the tidyverse

library(tidyverse)
# Make a data frame with Site ID as one of the columns
bci<-data.frame(Site=1:50,BCI)
#Melt the data frame using the Site ID as the identifier.
bci<-pivot_longer(bci,cols=-1)
# Remove zeros
bci<-bci[bci$value>0,]
head(bci)
## # A tibble: 6 x 3
##    Site name                    value
##   <int> <chr>                   <int>
## 1     1 Alchornea.costaricensis     2
## 2     1 Alseis.blackiana           25
## 3     1 Annona.spraguei             1
## 4     1 Apeiba.glabra              13
## 5     1 Apeiba.tibourbou            2
## 6     1 Astronium.graveolens        6

To reshape this long format back into a sites by species matrix form, with a column for the site name you can run this line of code.

bci2<-pivot_wider(bci,id=1, values_fill=0)

Check the format. You may often have to remove some of the columns in order to work with a matrix that only consists of species abundence. In this case

bcimat<-as.matrix(bci2[,-1])

Let’s look at how many species are found in each plot. We can produce a vector of species richness using the specnumber function.

S<-specnumber(BCI)
summary(S)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   77.00   86.00   91.00   90.78   94.00  109.00
hist(S,col="grey")

So, species number could perhaps be used as a response variable if we had some explanatory variables for each sub plot. We will come back to this issue later.

20.1.3 Working with apply

Many of the methods for working with quadrat type data work directly on the broad, site by species matrix.

Although it is usually possible to use data frame objects in R for matrix based operations there is a difference between a data frame and a true matrix. A matrix can only hold numbers, while a data frame can hold any type of values. The “apply” command will work on any matrix, but would produce odd results on some data frames.

When you ask R to apply a function to a matrix it is like writing a function at the end of a set of columns or rows on an Excel spreadsheet and then dragging them across. This is only going to work if all the columns or rows contain the same sort of values. If we want to apply a function (say sum) to rows of a matrix (say BCI) we use a 1 to refer to rows and write apply(BCI,1,sum). If we want to apply the function to the columns we write apply(BCI,2,sum) So, for example to find the abundances of the species. We can sum over all the subplots to produce a vector of abundances.

abun<-apply(BCI,2,sum)

A few abundant species contribute a relatively large number of individuals to the total.

head(sort(abun,dec=T))
##   Faramea.occidentalis  Trichilia.tuberculata       Alseis.blackiana 
##                   1717                   1681                    983 
##      Oenocarpus.mapora       Poulsenia.armata Quararibea.asterolepis 
##                    788                    755                    724

20.1.4 Working with the long format data frame

It can be more convenient to use dplyr to produce tables. To do this we need to work with the long data. When the matrix was put into the “long” database form by the reshape operation species name became the variable and the count of individual trees the value. So this code will produce a table.

library(dplyr)
library(DT)
bci %>% group_by(name) %>% summarise(occur=n(),abun=sum(value)) -> abunds
datatable(abunds)

We can calculate the relative abundances of these top ten species as percentages using R

round(100*head(sort(abun,dec=T))/sum(abun),2)
##   Faramea.occidentalis  Trichilia.tuberculata       Alseis.blackiana 
##                   8.00                   7.83                   4.58 
##      Oenocarpus.mapora       Poulsenia.armata Quararibea.asterolepis 
##                   3.67                   3.52                   3.37

So even though there are 225 species recorded, 20% of the observations are from only three species.

summary(abun)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    1.00    7.00   25.00   95.36   82.00 1717.00

The median abundance is 25. 50% of the species have 25 or fewer individuals. This is a very typical pattern in diversity data, whether in the tropics or elsewhere. Most individuals belong to a handful of common species. However, most species have few individuals. To put it in a delberately perverse way common species turn out to be rare and rare species are common!

This has generated a huge literature based on attempts to describe this pattern mathematically and find underlying reasons for it. If we plot a histogram of abundances we can see the pattern.

hist(abun,col="grey")

Taking logarithms

hist(log10(abun),col="grey")

This may explain why there is a difference between the number of species found in each subplot and the total number of species. It is simply chance. As there are so many rare species you would not expect eachsubplot to contain all the species. We can look at the pattern ofaccumulation of species that we obtain if we aggregate plots randomly into larger units using the specaccum function.

AcCurve<-specaccum(BCI,"random")
plot(AcCurve,ci.type="poly", col="red", lwd=2, ci.lty=0, ci.col="grey")
boxplot(AcCurve, col="blue", pch="+",add=T)

specpool(BCI)
##     Species     chao chao.se  jack1 jack1.se    jack2     boot  boot.se  n
## All     225 236.3732 6.54361 245.58 5.650522 247.8722 235.6862 3.468888 50

So species richness accumulates in a non linear manner. This has been modelled as a hyperbolic function in order to estimate the asymptotic(maximum) species richness. Fitting a Lomolino curve actually produces an estimate of the maximum number of species.

## Fit Lomolino model to the exact accumulation
AcCurve<-specaccum(BCI,"exact")
mod1 <- fitspecaccum(AcCurve, "lomolino")
coef(mod1)
##       Asym       xmid      slope 
## 258.440682   2.442061   1.858694
plot(AcCurve)
## Add Lomolino model using argument 'add'
plot(mod1, add = TRUE, col=2, lwd=2)

There are a range of “non parametric” estimators of species richness that claim to do the same thing.

pool<-apply(BCI,2,sum)
estimateR(pool)
##      S.obs    S.chao1   se.chao1      S.ACE     se.ACE 
## 225.000000 237.214286   7.434824 238.217659   7.118588

However if we look at the accumulation on a log scale we may be less convinced that it is flattening off at all.

plot(AcCurve, ci.type="poly", col="red", lwd=2, ci.lty=0, ci.col="grey",log="x")

The practical implications are that when planning field work you may quickly observe at least half the species that you are going to find in any given area. However rare species will continue to add to the list, and there may be no real limit to this process.

Because the accumulation is more or less linear on a logarithmic scale there are clearly diminishing returns. If, say, six more species are added by doubling the number of sites, four times the number of sites will be needed to add six more, eight more for the next six and so on.

In this case you should notice that you expect to find around half the total number of species if you draw two sites at random from the data set and count the numbers of species. This may be useful when designing future studies.

The dune meadows data set found in the vegan package uses quadrat cover classes, so the numbers do not represent counts. The default settings will be OK for this sort of data.

data(dune)
AcCurve<-specaccum(dune)
plot(AcCurve)

20.2 Resampling individuals

Some of the methods for resampling species accumulation curves (eg. Colemans, and rarefaction) are based on individuals. We can try an alternative form of resampling at the level of each plot.

resamp2 <-
function(X, rep=1,plot=FALSE) {
 require(vegan)
  Samp <- function(size, Indivs) {  length(table(sample(Indivs, size,
          replace=F)))  }
   N<-X[X>0]
   TotSp <- length(N)
   TotInds<-sum(N)#### Accumulation curve up to the maximum number of
   #individuals
   Sp <- 1:TotSp
   Inds <- rep(1:TotSp, N)
   Size <- rep(floor(TotInds*0.25):TotInds, rep) ###Note its(TotInds*0.25):TotInds
   sp.count <- sapply(Size, Samp, Inds)
   if(plot)plot(Size,sp.count,xlab="Number of individuals",ylab="Number of species",log="x")
   Sp.Ac1<-lm(sp.count~log(Size))
   Sp.Ac2<-lm(sp.count~log2(Size))
   fa<-fisher.alpha(N)
   res<-list(Logslope=Sp.Ac1$coefficients[2],Log2slope=Sp.Ac2$coefficients[2],Alpha=fa)
   return(res)
}
resamp2(BCI[1,],plot=TRUE)

## $Logslope
## log(Size) 
##  30.63284 
## 
## $Log2slope
## log2(Size) 
##   21.23306 
## 
## $Alpha
## [1] 35.67297
ans<-apply(BCI,1,resamp2)
res<-do.call("rbind",ans)
datatable(res) %>% formatRound(columns=c(1:3), digits=1)

Once more, this pattern has some useful practical implications and some deeper theoretical ones. By looking at the slope after log transforming to base 2 we obtain an estimate of the number of new species which would be added each time we double the number of individuals counted.

20.2.1 Simpson’s and Shannon’s indices

Ok, we have seen that species abundance distributions tend to be very highly skewed, with a few common species and many more rare species. The counts of individuals in any one sampling unit will reflect this. Intuitively there is a difference between a sampling unit that contains 10 individuals of species A, 10 of species B and 10 of species C when it is compared to a sampling unit that has 28 of species A and only one individual of each of the other species.

There is a very long tradition in Ecology of measuring diversity using indices that combine measures of both species richness and equitability. There are two widely used indices of diversity, and around 30 that are less well known.

20.2.1.1 Simpson’s index

The simplest diversity index to understand is Simpson’s index.

Let’s take the example of an equitable community with ten individuals of five species. The proportional abundance for each species is 0.2.

eq<-c(10,10,10,10,10)
prop<-eq/sum(eq)
prop
## [1] 0.2 0.2 0.2 0.2 0.2

Now imagine we draw an individual at random from the community. It could be of any one of the five species with an equal probability of 0.2. Let’s say that it is species A. If we replace the individual and draw another, what is the probability that it will also be of species A? The answer must also be 0.2. So the probability of drawing two individuals of species A is 0.2 x 0.2 = 0.2² = 0.04.

The same applies to all the other species.

So the overall probability of drawing two individuals of the same species is given by the sum of p². We can call this D. If we subtract D from 1 we have the probability of drawing two individuals at random that are of different species. Alternatively we can find the reciprocal of D which represents the number of equally abundant species that would provide the probability obtained.

\(D=\sum p^{2}\)

\(simp=1-D\)

\(invsimp=\frac{1}{D}\)

sprop<-prop^2
sprop
## [1] 0.04 0.04 0.04 0.04 0.04
D<-sum(sprop)
1-D
## [1] 0.8
D
## [1] 0.2

Now what do we find for a much less equitable community?

uneq<-c(100,10,2,2,1)
prop<-uneq/sum(uneq)
prop
## [1] 0.869565217 0.086956522 0.017391304 0.017391304 0.008695652

In this case we are much more likely to draw an individual of the common species first. We are then very likely to draw another. So the probability of getting two individuals of the same species is much higher.

sprop<-prop^2
sprop
## [1] 7.561437e-01 7.561437e-03 3.024575e-04 3.024575e-04 7.561437e-05
D<-sum(sprop)
1-D
## [1] 0.2356144
1/D
## [1] 1.30824

The probability of obtaining two individuals of the same species is higher, as we are likely to draw two individuals of the very common species. Thus the ``effective’’ number of species as measured by the inverse of Simpson’s index is lower.

Simpson’s index is influenced by both the number of species and the equitability of the distribution.

We can obtain a measure of pure equitability that takes a value between zero and 1 by dividing the inverse of simpsons’s by the number of species.

Simp<-diversity(BCI,"simp")
Simp
##         1         2         3         4         5         6         7         8 
## 0.9746293 0.9683393 0.9646078 0.9716117 0.9678267 0.9627557 0.9672014 0.9671998 
##         9        10        11        12        13        14        15        16 
## 0.9534257 0.9663808 0.9658398 0.9550599 0.9692075 0.9718626 0.9709057 0.9686598 
##        17        18        19        20        21        22        23        24 
## 0.9545126 0.9676685 0.9655820 0.9748589 0.9686058 0.9548316 0.9723529 0.9694268 
##        25        26        27        28        29        30        31        32 
## 0.9726152 0.9709567 0.9669962 0.9499296 0.9481041 0.9602659 0.9635807 0.9565267 
##        33        34        35        36        37        38        39        40 
## 0.9586946 0.9607876 0.7983976 0.9648567 0.9565015 0.9365144 0.9360204 0.9137131 
##        41        42        43        44        45        46        47        48 
## 0.9731442 0.9731849 0.9569632 0.9578733 0.9528853 0.9646728 0.9672083 0.9676412 
##        49        50 
## 0.9609552 0.9679784
InvSimp<-diversity(BCI,"invsimp")
InvSimp
##         1         2         3         4         5         6         7         8 
## 39.415554 31.584877 28.254778 35.225771 31.081658 26.849731 30.489077 30.487609 
##         9        10        11        12        13        14        15        16 
## 21.471056 29.744868 29.273803 22.251827 32.475442 35.539830 34.371014 31.907937 
##        17        18        19        20        21        22        23        24 
## 21.984098 30.929617 29.054548 39.775448 31.853042 22.139382 36.170213 32.708387 
##        25        26        27        28        29        30        31        32 
## 36.516636 34.431303 30.299530 19.971863 19.269343 25.167317 27.457940 23.002620 
##        33        34        35        36        37        38        39        40 
## 24.209883 25.502106  4.960258 28.454909 22.989309 15.751596 15.629977 11.589250 
##        41        42        43        44        45        46        47        48 
## 37.235945 37.292428 23.235938 23.737903 21.224806 28.306797 30.495526 30.903463 
##        49        50 
## 25.611603 31.228916
Es<-InvSimp/specnumber(BCI)
Es
##          1          2          3          4          5          6          7 
## 0.42382316 0.37601044 0.31394198 0.37474225 0.30773918 0.31587919 0.37181801 
##          8          9         10         11         12         13         14 
## 0.34645010 0.23856729 0.31643477 0.33648049 0.26490271 0.34919830 0.36265132 
##         15         16         17         18         19         20         21 
## 0.36958080 0.34309609 0.23638815 0.34752379 0.26655549 0.39775448 0.32174790 
##         22         23         24         25         26         27         28 
## 0.24328991 0.36535568 0.34429881 0.34777748 0.37836597 0.30605585 0.23496309 
##         29         30         31         32         33         34         35 
## 0.22406212 0.25945688 0.35659662 0.26139341 0.28151027 0.27719680 0.05976214 
##         36         37         38         39         40         41         42 
## 0.30929249 0.26124214 0.19209264 0.18607116 0.14486563 0.36505828 0.42864860 
##         43         44         45         46         47         48         49 
## 0.27018532 0.29306053 0.26203464 0.32914881 0.29897574 0.33959850 0.28144618 
##         50 
## 0.33579479

20.2.1.2 Shannon’s index

Shannon’s index (sometimes called the Shannon-Wienner index) is a commonly used diversity index that is rather similar in practice to Simpson’s index. Unfortunately very few users of the index have an intuitive idea of it’s meaning, as it is based on some rather obscure information theory.

\(H=-\sum p.log(p)\)

The values that H can take tend to vary between 0.8 and 4. Values above 2.5 are high, but there is no simple rule. Fortunately Shannon’s index can easily be converted into an index of equitability (or evenness) in a rather similar way to Simpson’s by dividing it by the maximum value that it could take, which in this case is log(S). The index downweights rare species rather more than does Simpson’s index.

The two indices basically measure the same thing. Simpson’s is much simpler to understand and can be used by default. Shannon’s is used by tradition as it was thought to be the better measure for many years. Shannon’s is therefore always worth calculating and reporting for comparative purposes.

Simpson’s and Shannon’s indices are always highly correlated. An empirical correlation between species richness per se and evenness also tends to occur, but this is not automatic.

H<-diversity(BCI,"shannon")
Eh<-H/log(specnumber(BCI))
div<-data.frame(S,Simp,InvSimp,H,Es,Eh)
source("https://tinyurl.com/aqm-data/QEScript")
Xpairs(div)

The main reason that quantitative ecologists calculte diversity indices in most cases is to relate them to environmental variables. The two components of diversity are species richness and equitability (evenness). Either or both of these can take be modelled as functions of environmental variables. A significant reduction in evenness can, in appropriate circumstances, be taken as a symptom of environmental degradation. Low evenness scores can be the result of a community being dominated by a few species that are tolerant to, or favoured by, anthropogenic pollution or disturbance. However scores must be placed in context. The size of the site (quadrat or soil core) can have a great influence on diversity indices and this must always be taken into account.

20.2.2 Exercises

20.2.2.1 Mexican trees

The first exercise involves data taken from three forest types in Mexico. The abundances are counts of individual trees. What differences are there in species diversity between the forest types?

#mexveg<- read.csv("http://tinyurl.com/QEcol2013/mexveg.csv")
#mexmat<-mexveg[,-c(1,2)]