Chapter 1 Introduction

In this unit you wil learn to apply a range of statistical methods that are considered to be more advanced than are typically taught on an introductory course in data analysis. However the unit aims to provide more than a simple set of recipes for applying methods. Advances in computing power have resulted in an explosion of new methods for analysing ecological data. It would be impossible to design a course that includes examples of all the possible methods that might be relevant to the analysis of a data set that forms the subject of a master’s dissertation. Data analysis does not simply involve applying statistical methods. Good data management and effective data manipulation are just as important as good analysis. So an important element of the course wil focus on understanding the nature of data and the ways data can be manipulated.

1.1 The new statistics

The underlying philosophy of the course is based on a contemporary concept of good statistical practice. This has sometimes been called “the new statistics” (Hobbs and Hilborn 2006)

The aim of the new statistics is to evaluate the relative strength of evidence in data for hypotheses represented as models. Traditionally, models used by ecologists for statistical inference have been limited to a relatively small set of linear forms. The functional forms and definitions of parameters in these models were chosen for statistical reasons; that is, they were not constructed to explicitly symbolize biological states and processes. Consequently, composing models to represent hypotheses has traditionally played a relatively minor role in developing testable statements by most ecological researchers. Statistical models were used to represent verbal hypotheses, and little thought was applied to the model building that ultimately supported inference.

The new statistics require deliberate, thoughtful specification of models to represent competing ecological hypotheses

1.2 Statistical models vs statistical tests

“New statistics” has developed not just as a response to advances in computing power, although that has played an important role. A fundamental element of contemporary thinking with regard to the use of statistics is the avoidance, or at the least the downplaying, of null hypothesis **tests* (NHST). This can be confusing for students who have taken introductory courses in statistics in which the words “test” and “p-value” constantly were emphasised. Many students (and many researchers) may be unaware that the whole basis of null hypothesis testing was in fact controversial from the outset. Many prominent statisticians have argued against NHST (Nickerson et al. 2000). Some of the criticisms that can be found in Nickerson’s excelent review have been severe. For example “Null hypothesis testing provides researchers with no incentive to specify either their own research hypotheses or competing hypotheses…. it is surely the most bone-headedly misguided procedure ever institutionalized in the rote training of science students” (Gigerenzer 1998). The biggest problem with NHST is that it does not actually test a hypothesis of genuine interest. Some of the other arguments against NHST made by statisticians are quite technical in nature, although they are well worth trying to understand. A reasonably non-technical review is provided by Goodman (2008) in which 12 common misconceptions are laid out clearly. A very influential and highly cited paper that restates many of the criticisms of NHST made by statisticians in the context of ecology is Johnson (1999). I won’t repeat all the arguments made by Johnson. It it well worth reading the paper.

1.3 Bayesian vs frequentist approaches to inference

It is my belief (and the word belief is important in this context) that all students should be aware that a vigorous debate arose between two schools of thought regarding the nature of statistical inference at the time that many of the conventional, text book, statistical tests were being developed. It is fascinating to read the words of R A Fisher writing in the “Design of experiments”, first published in 1935 . Fisher stated that “I shall not assume the truth of Bayes’ axiom”. What is most remarkable about that particular statement is that Bayes’ axiom is a simple mathematical consequence of applying the rules of probability. There is no controversy whatsoever regarding the axiom’s mathematical validity. Fisher was aware that implementing what was known as “inverse probability” would be mathematically unfeasible in the context of the sort of problems he was working on at the time. Integration formulas to calculate marginal likelihoods are horrendously complex in even very simple cases involving continuous ditributions, and are totally intractable for most designs. However his main objection to the use of Bayes’ theorem was that to him it appeared to introduce an element of subjectivity, and thus did not allow researchers to reach clear conclusions. It is unfortunate that this led to what some studies of the history of scientific thought have described as a “holy war” between two schools of thought. The unfortunate consequence of this was that efforts to remove the incorporation of subjective prior beliefs into analyses led to the conventional analytical device of NHST in which all prior information was ignored. This was never intended by Fisher himself. It occurred almost by accident, as a result of the development of NHST by Pearson and Neyman. Fisher originally intended p-values to be indications that further research was worth conducting, and not as decision rules regarding the truth of either a null hypothesis nor any alternative hypothesis. However for a while this got overlooked in statistical courses that emphasised the logic and mathematical basis of NHST without placing the null in the context of any field of study. Students being told to state null hypotheses in the form \(H_0\) “There is no difference in species richness between heathland and grassland” would be quite correct in pointing out the illogical and unhelpful nature of \(H_0\). We already know a priori that there must be a difference, so why on earth would we ever test this? The answer is that we clearly should not. In an applied setting we would aim to develop a much richer and more informative analysis of the difference between the two vegetation types, conditional on the data that we have available. There is no need to place subjective prior probabilities on the size of the difference in order to do this, but to ignore all relevant knowledge regarding the two systems would clearly be quite wrong. Finding evidence against \(H_0\) is only worth stating as an objective of a study such if \(H_0\) is inherently credible. This can be reasonable in the context of a well planned experiment, but In all other cases there will be much more information to be obtained from the data when \(H_0\) is not considered to be an objective of the analysis.

1.4 Using p-values with discretion

In the quantiative and spatial analysis unit I introduced a pragmatic approach towards NHST that will continue to be used on this unit. This approach emphasises confidence intervals as the most important result of applying technique based on frequentist inference. It accepts the use of p-values associated with null hypothesis tests, but only as the “statistic of last resort”. The actual numerical values of confidence intervals match those produced by Bayesian methods with non-informative priors in many cases. So, confidence intervals can informally be interpreted as measures of uncertainty regarding the parameters of interest, even if formally they are not. When taking this approach, when running any regresion analysis the most important result would be the slope of the line, and associated confidence interval. The next most valuable output is the coeficient of determination (R squared) as a measure of the signal to scatter ratio. If all else fails, then a p-value can be reported, with due discretion and care with regard to the wording of the interpretation, as a “test” of any detectable association between variables conditional on the data obtained. Although power analysis is highly advisable when planning any study in order to assess whether the level of replication is likely to be sufficient to provide statistical signficance, when adopting this pragmetic approach you should not set out with the initial intention of “testing” a null hypothesis. Instead you should aim to estimate effect sizes. The most important element of any study is the scientific question that is of interest. Many questions can only be answered through estimating the size of effects and/or the shape and form of any response. Simply detecting a difference, an association or a correlation does not usually answer the question fully. We should, and usually can, do much better than that. This implies that non parametric methods will not be taught on this course, although it is still worth being aware of them as potential fall back methods of last resort when all else fails. Non parametric tests are easy to run and understand in R. The R code to run them is provided in this crib sheet for reference.

Always follow the advice of your supervisor with regards to the presentation of p-values. Some supervisors will insist on p-values being reported for all analyses. As they are included in the output of all inferential methods it is simple enough to include them in a report, even when they are not particularly informative. Just make sure that confidence intervals are also provided and used when answering questions regarding effect sizes and meaningful comparisons.

1.5 Common pitfalls

Some of the advice provided in the literature regarding the validity of statistical tests when assumptions are not met is potentially misleading. In non experimental studies the assumptions of almost any inferential method are rarely met in full. So technically almost all statistical methods might be considered to be “invalid”. In reality it is not the violation of an assumption that is important, it is the degree to which an assumption is violated and the influence that any violation may have on the validity of the substantive findings that is really important. If an analysis produces a p-value of < 0.0001 when using a method that leads to minor violations of assumptions, almost any correction will still lead to rejection of the null hypothesis. So a conclusion based only on NHST will not change at all. On the other hand confidence intervals may have to be broadened when the violations are corrected for, so a modification as a result of making a correction for unmet assumptions will take place. This is a powerful additional argument against emphasising NHST alone (naked p-values) that is not mede clear in all the papers cited here. Many students have learned to test “the data”" for normality. This is poor advice, for many reasons, not least of which is that students often test the wrong element in the data The assumption of normality in regresion analysis applies to the residuals, not to the raw date (Zuur, Ieno, and Elphick 2010). Rigorous diagnostics are a very important part of building statistical models, but these are best conducted through careful inspection of the patterns shown in the residuals rather than through automated statistical tests. A statistical test of normality does not actually test the data anyway. It is, in reality, testing whether the data could have been obtained from an underlying normal distribution. The only way to “test” the data themselves is to look at them carefully. As suggested by Steel et al. (2013), plot the data early and often. When looking at observational data that were not derived from a conventional experimentla design, you should adopt an incremental approach to choosing an analysis. There is nothing to prevent you using the “wrong” models as initial tools for understanding the data and (hopefully) finally choosing a better, more appropriate model as the analysis progresses.

References

Hobbs, N. T., and R. Hilborn. 2006. “Alternatives to Statistical Hypothesis Testing in Ecology: A Guide to Self Teaching.” ECOLOGICAL APPLICATIONS 16 (1): 5–19. doi:10.1890/04-0645.

Nickerson, Raymond S., Jonathan Baron, Richard Chechile, and William Es-. 2000. “Null Hypothesis Significance Testing : A Review of an Old and Continuing Controversy” 5 (2): 241–301. doi:10.1037//1082-989X.5.2.241.

Gigerenzer, G. 1998. “Surrogates for Theories.” THEORY & PSYCHOLOGY 8 (2): 195–204. doi:10.1177/0959354398082006.

Goodman, Steven. 2008. “A Dirty Dozen: Twelve P-Value Misconceptions.” Seminars in Hematology 45 (3): 135–40. doi:10.1053/j.seminhematol.2008.04.003.

Johnson, Douglas H. 1999. “The Insignificance of Statistical Significance Testing.” The Journal of Wildlife Management 63 (3): 763. doi:10.2307/3802789.

Zuur, Alain F., Elena N. Ieno, and Chris S. Elphick. 2010. “A Protocol for Data Exploration to Avoid Common Statistical Problems.” Methods in Ecology and Evolution 1 (1): 3–14. doi:10.1111/j.2041-210X.2009.00001.x.

Steel, E. Ashley, Maureen C. Kennedy, Patrick G. Cunningham, and John S. Stanovick. 2013. “Applied Statistics in Ecology: Common Pitfalls and Simple Solutions.” Ecosphere 4 (9): 1–13. doi:10.1890/ES13-00160.1.