Chapter 14 Sleep data analysis
library(tidyverse)
library(aqm)
library(plotly)
14.1 Read in the data
data(sleep)
<-sleep d
If you read in the data using read_csv you would find that the Species and Diet variables are considered to be of the class “character.” This behaviour differs from the older read.csv function.
For most analyses it is preferable to set the variables to be factors. The code below does this.
$Diet<- as.factor(d$Diet)
d$Species<- as.factor(d$Species) d
Now we already know that these variables need log transforming.
$LogBrainWt <- log10(d$BrainWt)
d$LogBodyWt <- log10(d$BodyWt) d
14.2 Using ggplot for visualising regression
<- ggplot(d, aes(x=LogBodyWt, d$LogBrainWt, label=Species)) # set up the aesthetics
g0 <- g0 + geom_point() + geom_smooth(method="lm") # Add the geometries
g1 g1
We can label the points, but this is going to look very messy.
+ geom_label() g1
14.3 Using plotly for interactive data exploration
Although this cannot be used in a publication, interactive data exploration using plotly is the easiest way to solve this problem when you are exploring the data yourself.
ggplotly(g1)
14.4 Fitting the regression
We can fit the regression in the usual manner.
<- lm(data=d, LogBrainWt~LogBodyWt) mod
Note the warning about not being interested in elements such as p-values. Although we can ask R for a summary of the model we have no interest in the significance. The \(R^2\) may be of some limited interest. However the real reason for fitting the model is to obtain the residuals in this case.
summary(mod)
##
## Call:
## lm(formula = LogBrainWt ~ LogBodyWt, data = d)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.74953 -0.22133 -0.02063 0.18531 0.82617
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.92489 0.04670 19.80 <2e-16 ***
## LogBodyWt 0.76407 0.03153 24.23 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.3168 on 51 degrees of freedom
## Multiple R-squared: 0.9201, Adjusted R-squared: 0.9185
## F-statistic: 587.2 on 1 and 51 DF, p-value: < 2.2e-16
14.5 Obtaining the residuals
The residuals can be used in additional analyses as an indicator of brain weight after controlling for body weight. Note that they remain on a logarithmic scale.
$brain_index <- residuals(mod) d
14.6 Some tricks to plot the residuals
The fct_reorder function that is part of the tidyverse can be handy for reordering a factor.
$Species <- fct_reorder(d$Species, d$brain_index, min) d
Now we can get a rather “busy” but useful visualisation of the residuals. Notice the use of geom_col to produce columns. I.e. we get a barplot. The bars can be coloured by diet for added effect. The default colours are not very nice and can be changed quite easily, but I will leave it there for now.
<-ggplot(d, aes(x=Species,y=brain_index, label=Species, fill=Diet)) + geom_col() + coord_flip()
g1 g1