Basic statistical concepts. Refresher using a simple example

Introduction

It may have been a while since you last learned about statistics. Statistical software now makes running the calculations themselves very easy. Although using R may seem rather challenging at first, it is really much faster to use R than to carry out statistics “by hand” or using non dedicated software such as Excel However the problem with using R for statistics is that many students concentrate attention on learning to use the software rather then remembering the concepts. In this class we will refresh the most basic ideas in statistics without using any software. We will then see how to use R to make life easier.

Getting some data

Let’s look at the reaction times on a simple test for the students in the class. Click on the link below. Run the test. After a “burn in” period to get used to the idea, record your own time in ten trials.

Action: Get some personal data

Click on the link, https://www.humanbenchmark.com/tests/reactiontime/
Try your reactions three times as a “burn in”
Record ten trials and write down the times on a piece of paper.
Calculate your mean reaction time using a calculator, a piece of paper or any other method. Hint: The mean is the sum of all the times divided by the number of trials
Write your own results on the white bowrd.

Ok we now have some data from the class. We will now go a step further.

Action: Summarise the class data

Find the maximum.
FInd the minimum
FInd the median reaction time. To do this, place the results in descending order. Find the number in the middle of the table. If there is an even number of results then average the two numbers at an equal distance from the top and the bottom.
Find the mean reaction time for the class.

These are basic descriptive statistics.

Inferential statistics

This part of the exercise is more subtle. We will spend a lot of time during the classes discussing the nature of statistical inference and the importance of testing assumptions. For the moment, let’s just do the maths.

Action: Calculate the standard deviation “by hand”

Subtract the mean reaction time from each observation to produce a table of differences from the mean. Some will be positive, others negative.
Try adding all these differences together. What do you get? Why?
Calculate the square of each difference.
Add together all the squared differences to find the sum of the squares.
Divide the sum of the squares by n-1 (where n is the number of observations)
Take the square root of this number.

Just out of interest You don’t have to follow this yet, the follwing R code does this operation “by hand”

data<- c(440,340,350,456,470) ## Five observations of reaction time

x<-data-mean(data)  # Subtract the mean from the data
x<-x^2  # Square the results
x<-sum(x)/4  # Divide by n.1 
sqrt(x) # Take the square root

## [1] 61.45893

And again, just out of interest, statistical software does all this (and a lot more) with a simple function.

sd(data)

## [1] 61.45893

Calcuate the standard error of the mean.

The standard errror of the mean is the standard deviation divided by the square root of the sample size. The 95% confidence interval for the mean is approximately two times the standard error.

Action: Calculate the standard error and 95% confidence interval for the mean

Divide the standard deviation by the square root of the sample size.
Multiply this by two.
Provide a range by adding this number to the mean and subtracting it from the mean.

Test whether the statistics “work”

Action: Rinse and repeat.

Conduct the experiment again. This time just calculate the mean.
Does the mean fall inside the confidence intervals you found?
Discuss the results.

Conclusion

There is a lot to discuss here. We will look at the nature of inferential statistics in more detail as we go on. Practice calcualting standard deviations and standard errors “by hand” in order to get a feel for the process. Once you understand the calculations you can leave the rest to the computer. However you will have to understand the assumptions involved in calculating a standard error in order to apply statistics correctly. We will go thought this carefully later.