Convex hulls have been suggested as a means of estimating a species extent of occurence for red listing purposes. However convex hulls tend to be suffer from artefacts due to outlying observations. They may also include large areas of completely unsuitable habitat, including water.
A concave hull is likely to be a much better way to measure the species absolute range size. This model can be fitted to the data points by extending a buffer (100 km) around them in order to unify all those falling within the range and then dissolving the buffer in order to remove it. This leaves polygons that fit the shape of the distribution and do not include non terrestrial areas.
The analysis that has been run here only uses points within the MesoAmerican. If the figure below shows points extending beyond this then the species niche will only be partly tipified. In these cases the alternative analyses run at the continental scale should be consulted.
Background points for modelling have been extracted from the terrestrial area within a 100 km buffer around the convex hull.
The figure below shows the standard Water and Leith climate diagram for the mean values of precipitation and temperature extracted from the species presence points.
The Walter and Leith diagram assumes that the growing season occurs when rainfall is over 100mm
A more refined method is to extract the values from a bucket model that keeps track of input to the soil profile through precipitation and reductions in soil moisture through evaptranspiration over the course of the year. This can be compared to changes in NDVI at the collection points.
In both cases the data are normalised to take values between zero and one. Soil moisture may fall below its maximum values without having an effect on NDVI.
An index of seasonality has been calculated as the percentage reduction in the value at the lowest point of the curve.
In some cases median NDVI will remain fairly constant, even when the balance model shows that soil water constant is lowered for part of the year. Providing SWC is above 50% of maximum levels the vegetation would not experience a great deal of hydric stress. NDVI values can be highly variable as they are affected by land use. The general trend should be comparable to that shown by modelled soil moisture.
The table below shows the range of values measured for key climatic variables at the collection points. A wide range of values may suggest some erroneous points. Distribution modelling is not sensitive to the presence of a few outliers, but the results may be distorted if the data contains many erroneous points.
Min. | 1st Qu. | Median | Mean | 3rd Qu. | Max. | |
---|---|---|---|---|---|---|
Longitude | -113.00 | -109.00 | -99.40 | -103.00 | -97.20 | -92.60 |
Latitude | 16.40 | 18.20 | 20.90 | 21.80 | 24.90 | 30.10 |
Elevation | -4.00 | 122.00 | 663.00 | 773.00 | 1250.00 | 2540.00 |
Annual precipitation | 46.00 | 363.00 | 489.00 | 527.00 | 633.00 | 2190.00 |
Mean annual temperature | 14.40 | 20.90 | 23.10 | 22.30 | 24.50 | 26.70 |
Annual temperature range | 14.20 | 22.10 | 23.30 | 24.10 | 25.80 | 36.90 |
Total annual actual evapotranspiration | 45.60 | 363.00 | 491.00 | 508.00 | 633.00 | 1310.00 |
Minimum proportion of available soil moisture | 0.01 | 0.08 | 0.13 | 0.14 | 0.17 | 0.47 |
It is easy to fit convincing models that apparently predict observed species distributions closely using machine learning algorithms such as RandomForest or Maxent. However on closer inspection the response surfaces that are being used for prediction are often lacking in biological realism. This occurs due to overfitting of models to data derived from a partial exploration of a species abiotic niche. This effect may be attributable to insuficient data leading to observed disjunction in the species range when in reality the species is found accross a wider area. In other cases the species distribution may be limited by barriers to dispersal or stochastic effects that led to aggregation in some areas. Deforestation may also have removed habitat from the centre of the species range. Any of these effects can be spotted as multimodality of the niche space.
The following diagrams show kernel densities one two synthestic climate axes (total annual rainfall and mean annual temperature). If their are signs of multimodality this may indicate that the species has not fully explored its climate niche, or that there are disjunct populations with differing characteristics. The method will not show clear results for species with few collection points.
The same analysis can be run to look at spatial clustering. The kernel densities are smoothed, so will only suggest multimodality if the points are very highly clustered.
The significance of any spatial clusters can be checked using the silhouette width method. The width is calculated for values of k between 2 and 5. If any are higher than 0.52 the analysis will produce a diagram showing the clusters.
## [1] "The collections form 2 spatial clusters"
The collections form 2 spatial clusters
If there is evidence that the points fall into at least 2 groups, but fewer than 6 we can look at whether there is significant differences in variability between and within groups in the climatic conditions at the sites using Anosim. This is a sensitive test, as would be MANOVA, so there will often be significant differences. They should only be intepreted as important if R is much larger than 0.3.
##
## Call:
## anosim(dat = dis, grouping = fit$cluster, permutations = 100)
## Dissimilarity: euclidean
##
## ANOSIM statistic R: 0.867
## Significance: 0.0099
##
## Based on 100 permutations
##
## Upper quantiles of permutations (null model):
## 90% 95% 97.5% 99%
## 0.0126 0.0194 0.0228 0.0343
##
## Dissimilarity ranks between and within classes:
## 0% 25% 50% 75% 100% N
## Between 10518 27311 33874 40038 46360 22554
## 1 85 6406 10962 17404 37319 7875
## 2 85 5627 12455 18742 45773 15931
## Analysis of Variance Table
##
## Response: vars$mtemp
## Df Sum Sq Mean Sq F value Pr(>F)
## as.factor(fit$cluster) 1 348 348 65.5 1.4e-14 ***
## Residuals 303 1608 5
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The collections form 2 spatial clusters
A simple model uses mean temperature, temperature range and annual precipitation. The data are binary responses, so a GAM of the binomial family should be used. However as there is no interest in evaluating the statistical significance of a model fit to pseudoabsence data (as sample size is arbitrary) a Gaussian model is used in order to simplify interpretation. Tests show that predictive output from the two models is usually indistinguishable.
The output should consist of monotonic or unimodal responses. If these are not observed it indicates potential problems with the model.
One of the reasons for high AUC values in the literature is the use of random subsets of data taken from within the species known range. Model evaluation using this method always eroneously suggests good discrimination due to spatial autocorrelation. This effect is not removed by trying to control for spatial autocorrelation by taking fewer points. All available points should be used when fitting models. The problem is that more rigorous tests shoudl be used to evaluate model performance.
The default colouring in R may de-emphasise differences in some cases.
Ideally truly independent data should be used for model evaluation. One simple way of testing a model without independent data is to split the values spatially. A model using only the Eastern side of a species range is used to predict the Western side, and vice versa. This usually reveals more weaknesses in the model's predictive ability. AUC values over 0.8 show that the model is very useful as tool for prediction. Values between 0.6 and 0.8 suggest that the model is using climatic variables to narrow the species range to some degree. If values are below this then it may be better to use a purely spatial model to estimate the species distribution.
A gam using temperature range, mean temperature and the annual soil moisture dynamic as input may be more reliable, as the “bucket model” of soil moisture changes should represent patterns of hydric stress throughout the year. Temperature effects can be simplified to mean temperature and the range.
ROC analysis using random data split.
In most cases the more complex machine learning algorithms show better discrimination than simpler models when tested against a random subset of the data. This is to be expected given that they are designed to do this job well. However ecological theory suggests that they are unlikely to be reliable predictors of unseen areas of a species range due to overfitting to a spatially defined subset of autocorrelated data. So in most cases the predictions produced are either no better than those from simpler models or worse. In order to predict a species distribution spatial elements should normally be taken into account explicitly.
The diagnostic analysis is likely to have revealed some issues regarding the reliablity of the model. These issues must be taken into account when applying model results to issues of conservation concern. The model is best regarded as a guide that may indicate the potential climatic limits to a species distribution. The realised distribution may be determined by other factors including biotic interactions and limitation to movement.