Chapter 3 Data types and objects
All objects in R have a given type. You already know most of them, as these types are also used in mathematics. Integers, floating point numbers, or floats, matrices, etc, are all objects you are already familiar with. But R has other, maybe lesser known data types (that you can find in a lot of other programming languages) that you need to become familiar with. But first, we need to learn how to assign a value to a variable. This can be done in two ways:
or
there is almost no difference between these two approaches. You would need to pay attention to
this, and use <-
in very specific situations to which you will very likely never be confronted
to.
Another thing you must know before going further is that you can convert from one type to another
using functions that start with as.()
, such as as.character()
, as.numeric()
, as.logical()
,
etc… For example, as.character(1)
converts the number 1
to the character (or string) “1”.
There are also is.character()
, is.numeric()
and so on that test if the object is of the
required class. These functions exist for each object type, and are very useful. Make sure you
remember them!
3.1 The numeric
class
To define single numbers, you can do the following:
The class()
function allows you to check the class of an object:
## [1] "numeric"
Decimals are defined with the character .
:
3.2 The character
class
Use " "
to define characters (called strings in other programming languages):
## [1] "character"
A very nice package to work with characters is {stringr}
, which is also part of the {tidyverse}
.
3.3 The factor
class
Factors look like characters, but are very different. They are the representation of categorical
variables. A {tidyverse}
package to work with factors is {forcats}
. You would rarely use
factor variables outside of datasets, so for now, it is enough to know that this class exists.
We are going to manipulate factor variables in the next chatper 5.
3.4 The Date
class
Dates also look like characters, but are very different too:
## [1] "2019-03-19"
## [1] "Date"
Manipulating dates and time can be
tricky, but thankfully there’s a {tidyverse}
package for that, called {lubridate}
. We are going
to go over this package in Chapter 5.
3.5 Vectors and matrices
You can create a vector in different ways. But first of all, it is important to understand that a vector in most programming languages is nothing more than a list of things. These things can be numbers (either integers or floats), strings, or even other vectors.
3.5.1 The c()
function
A very important function that allows you to build a vector is c()
:
This creates a vector with elements 1, 2, 3, 4, 5. If you check its class:
## [1] "numeric"
This can be confusing: you where probably expecting a to be of class vector or
something similar. This is not the case if you use c()
to create the vector, because c()
doesn’t build a vector in the mathematical sense, but rather a list with numbers.
Checking its dimension:
## NULL
returns NULL
because a list doesn’t have a dimension,
that’s why the dim()
function returns NULL
. If you want to create a true vector, you need to
use cbind()
or rbind()
.
3.5.2 cbind()
and rbind()
You can create a true vector with cbind()
:
Check its class now:
## [1] "matrix"
This is exactly what we expected. Let’s check its dimension:
## [1] 1 5
This returns the dimension of a
using the LICO notation (number of LInes first, the number of COlumns).
It is also possible to bind vectors together to create a matrix.
Now let’s put vector a
and b
into a matrix called matrix_c
using rbind()
.
rbind()
functions the same way as cbind()
but glues the vectors together by rows and not by columns.
## [,1] [,2] [,3] [,4] [,5]
## [1,] 1 2 3 4 5
## [2,] 6 7 8 9 10
3.5.3 The matrix
class
R also has support for matrices. For example, you can create a matrix of dimension (5,5) filled
with 0’s with the matrix()
function:
If you want to create the following matrix:
\[ B <- \left( \begin{array}{ccc} 2 & 4 & 3 \\ 1 & 5 & 7 \end{array} \right) \]
you would do it like this:
The option byrow <- TRUE
means that the rows of the matrix will be filled first.
You can access individual elements of matrix_a
like so:
## [1] 0
and R returns its value, 0. We can assign a new value to this element if we want. Try:
and now take a look at matrix_a
again.
## [,1] [,2] [,3] [,4] [,5]
## [1,] 0 0 0 0 0
## [2,] 0 0 7 0 0
## [3,] 0 0 0 0 0
## [4,] 0 0 0 0 0
## [5,] 0 0 0 0 0
Recall our vector b
:
To access its third element, you can simply write:
## [1] 8
3.6 The logical
class
This class is the result of logical comparisons, for example, if you type:
## [1] TRUE
R returns true. If we save this in a variable k
:
and check k
’s class:
## [1] "logical"
R returns logical
. In other programming languages, logical
s are often called bool
s.
A logical
variable can only have two values, either TRUE
or FALSE
.
3.7 The list
class
The list
class is a very flexible class, and thus, very useful. You can put anything inside a list,
such as numbers:
or other lists constructed with c()
:
you can also put objects of different classes in the same list:
and of course create list of lists:
To check the contents of a list, you can use the structure function str()
:
## List of 3
## $ :List of 2
## ..$ : num 3
## ..$ : num 2
## $ :List of 2
## ..$ : num [1:2] 1 2
## ..$ : num [1:2] 3 4
## $ :List of 3
## ..$ : num 3
## ..$ : num [1:2] 1 2
## ..$ : chr "lists are amazing!"
or you can use RStudio’s Environment pane:
You can also create named lists:
and you can access the elements in two ways:
## [1] 2
or, for named lists:
## NULL
Lists are used extensively because they are so flexible. You can build lists of datasets and apply functions to all the datasets at once, build lists of models, lists of plots, etc… In the later chapters we are going to learn all about them. Actually, I use lists very often, but never vectors or matrices. Lists are much more flexible and in R, datasets behave like lists.
3.8 The data.frame
and tibble
classes
In the next chapter we are going to learn how to import datasets into R. Once you import data, the
resulting object is either a data.frame
or a tibble
depending on which package you used to
import the data. tibble
s extend data.frame
s so if you know about data.frame
objects already,
working with tibble
s will be very easy. tibble
s have a better print()
method, and some other
niceties. If you want to know more, I go into more detail in my other
book
but for our purposes, there’s not much you need to know about data.frame
and tibble
objects,
apart that this is the representation of a dataset when loaded into R.
However, I want to stress that these objects are central to R and are thus very important. There
are different ways to print a data.frame
or a tibble
if you wish to inspect it. You can use
View(my_data)
to show the my_data
data.frame
in the View pane of RStudio:
You can also use the str()
function:
And if you need to access an individual column, you can use the $
sign, same as for a list:
3.9 Formulas
We will learn more about formulas later, but because it is an important object, it is useful if you already know about them early on. A formula is defined in the following way:
## [1] "formula"
Formula objects are defined using the ~
symbol. Formulas are useful to define statistical models,
for example for a linear regression:
or also to define anonymous functions, but more on this later.
3.10 Models
A statistical model is an object like any other in R:
## [1] "lm"
my_model
is an object of class lm
. You can apply different functions to a model object:
##
## Call:
## lm(formula = mpg ~ hp, data = mtcars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -5.7121 -2.1122 -0.8854 1.5819 8.2360
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 30.09886 1.63392 18.421 < 2e-16 ***
## hp -0.06823 0.01012 -6.742 1.79e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.863 on 30 degrees of freedom
## Multiple R-squared: 0.6024, Adjusted R-squared: 0.5892
## F-statistic: 45.46 on 1 and 30 DF, p-value: 1.788e-07
This class will be explored in later chapters.
3.11 The is.*()
and as.*()
functions
is.*()
and as.*()
are very powerful, and this is the right moment to introduce them. is.*()
test the class of an object:
## [1] FALSE
## [1] FALSE
as.*()
functions convert from one type to another:
## [1] "7"
## [1] 23.12
but only if it makes sense:
## Warning: NAs introduced by coercion
## [1] NA
Keep these in mind, because they are going to be very useful. The {purrr}
package introduces
similar functions, is_*()
and as_*()
. We will explore them in Chapter 9.
3.12 Exercises
Exercise 1
Try to create the following vector:
\[a = (6,3,8,9)\]
and add it this other vector:
\[b = (9,1,3,5)\]
and save the result to a new variable called result
.
Exercise 2
Using a
and b
from before, try to get their dot product.
Try with a * b
in the R console. What happened?
Try to find the right function to get the dot product. Don’t hesitate to google the answer!
Exercise 3
How can you create a matrix of dimension (30,30) filled with 2’s by only using the function matrix()
?
Exercise 4
Save your first name in a variable a
and your surname in a variable b
. What does the function:
do? Look at the help for paste()
with ?paste
or using the Help pane in RStudio. What does the
optional argument sep
do?
Exercise 5
Define the following variables: a <- 8
, b <- 3
, c <- 19
. What do the following lines check?
What do they return?
Exercise 6
Define the following matrix:
\[ \text{matrix_a} = \left( \begin{array}{ccc} 9 & 4 & 12 \\ 5 & 0 & 7 \\ 2 & 6 & 8 \\ 9 & 2 & 9 \end{array} \right) \]
- What does
matrix_a >= 5
do? - What does
matrix_a[ , 2]
do? - Can you find which function gives you the transpose of this matrix?
Exercise 7
Solve the following system of equations using the solve()
function:
\[ \left( \begin{array}{cccc} 9 & 4 & 12 & 2 \\ 5 & 0 & 7 & 9\\ 2 & 6 & 8 & 0\\ 9 & 2 & 9 & 11 \end{array} \right) \times \left( \begin{array}{ccc} x \\ y \\ z \\ t \\ \end{array}\right) = \left( \begin{array}{ccc} 7\\ 18\\ 1\\ 0 \end{array} \right) \]
Exercise 8
Load the mtcars
data (mtcars
is include in R, so you only need to use the data()
function to
load the data):
if you run class(mtcars)
, you get “data.frame”. Try now with typeof(mtcars)
. The answer is now
“list”! This is because the class of an object is an attribute of that object, which can even
be assigned by the user:
## [1] "don't do this"
The type of an object is R’s internal type of that object, which cannot be manipulated by the user.
It is always useful to know the type of an object (not just its class). For example, in the particular
case of data frames, because the type of a data frame is a list, you can use all that you learned
about lists to manipulate data frames! Recall that $
allowed you to select the element of a list
for instance:
## [1] 1
Because data frames are nothing but fancy lists, this is why you can access columns the same way:
## [1] 21.0 21.0 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 17.8 16.4 17.3 15.2
## [15] 10.4 10.4 14.7 32.4 30.4 33.9 21.5 15.5 15.2 13.3 19.2 27.3 26.0 30.4
## [29] 15.8 19.7 15.0 21.4