RStudio is a complete environment for working with the R language. Once you have got used to it you will find that it makes working with R far more productive than using the R console version. However some of the concepts involved in using RStudio may be new. RStudio provides an interface for working with R code, rather than an interface for running analyses directly.
The RStudio server version runs directly through any web browser. There is no need to install any software on your laptop, PC or tablet
Access to the server is through the following URL. This works both on and off campus.
http://r.bournemouth.ac.uk:8789/
The RStudio server is an integrated platform for doing the following …
Advanced features can be used without any programming skills through sharing scripts. However you do need to become familiar with some new concepts in order to use the server. The RStudio server is ideal for collaborative work. You have your own permanent space on the server for saving your own work and building up a portfolio of useful analyses. Only one person can be logged in at any one time under your username. However I can always log into your user space at any time in order to help correct any errors and to give you advice on the analysis.
Once you are logged in you will see three sections of the interface by default. This will change to four sections when you begin using scripts in the interface.
knitr::include_graphics("/home/rstudio/webpages/books/AQM_book/images/rstudio.gif")
Look carefully at the interface and learn to recognise the sections.
A key concept to understand when using the server is that your home directory on the server is like a directory (folder) on your PC. It is rather like the university H drive. However it is all “encapsulated” on the server, which is also running R. So it is distinct from your H drive and it is not directly linked. In order to move data files and scripts into your home directory you must upload them. You will see buttons labelled New Folder, Upload, Delete, Rename and More. If you click on the More button you will also find an option to Export your files. The upload and the export buttons are frequently used to move files onto the server and to directly move files off the server. It is very important to be aware of this concept. Files saved on the server will always be available for use later. In contrast active analyses that take place in the server memory, as opposed to the server’s hard disk space, will be temporary and will be lost between sessions.
You can use Rstudio without opening projects. However, projects make organising your work much simpler. A project is a set of instructions to restore the server to the same state that it was in when you closed the project. So if you are analysing a range of data sets you can use one project per data set to keep your work organised.
To form a new project and add a new folder
knitr::include_graphics("/home/rstudio/webpages/books/AQM_book/images/ss_1.gif")
The course material generally uses data that is already placed on the server. However in order in order to conduct your own data analyses you will need to upload data files to your folders. The easiest way to ensure that the data and the data analysis are kept unified is to upload the data into the same folder you opened for the analysis project. That way there will be no need to specify a path when the data are read into R.
Although R can read data from many different formats, the data files that you upload must be in some form of conventional format. The easiest format to use is to save each table as a single comma separated variable (.csv) file. The first line should contain short variable names with no spaces. The variable definitions should be kept separately and referred to when writing figure labels and captions, but not used in the column headers.
Data files are added to the project using the upload button in the files pane (bottom right). If you want to upload multiple files at once (e.g shapefiles) you should first compress them into a zip file. The zip file will expand when uploaded.
knitr::include_graphics("/home/rstudio/webpages/books/AQM_book/images/ss_2.gif")
This course will concentrate on the use of markdown documents as a way of running R code. There are many advantages of using markdown.
The steps are shown in the animated gif below
knitr::include_graphics("/home/rstudio/webpages/books/AQM_book/images/ss_3.gif")
Now try pressing the “knit” button on the top right pane. You will see the default demonstration document that was produced as a template “knit” into a simple data report. This is not yet using your data of course.
knitr::include_graphics("/home/rstudio/webpages/books/AQM_book/images/ss_4.gif")
The steps above will always produce the default “demo” markdown document. Every time you start a new markdown Rstudio will start off with this one. You should take a look at the logic of the demo document carefully. It consists of “chunks” of R code that produce output in the form of tables and figures embedded in text. The R code automatically produces output and adds it to the document after knitting. So if you have R code available that will run an analysis that you are interested in you don’t have to remember any other steps in order to run it. Simply ensure that the data that is being added to the analysis is appropriate for the type of analysis being run and you can obtain the same results with your own data. This will be the way R is used in this course.
if(knitr::is_html_output()) {
knitr::include_graphics("/home/rstudio/webpages/books/AQM_book/images/ss_5.gif")
}
If you are new to R you may be tempted to look around for a button on the RStudio interface to “load” the data file. You won’t find one. Although there is a way to load data interactively you really must not do this.
Always make sure that you include a line of code that loads the data at the start of your R script. Do not load data using the import data feature when building a markdown document. It will not work, as the data will not be loaded when you compile.
In this case the working directory that R is using coincides with the project directory. So there is no need to include the path to the data file. This line will read the file into R and assign the data to a data frame called “d”.
d<-read.csv("sleep.csv")
To form a code chunk click on the button on the interface labelled “Insert”. Alternatively the keyboard short-cut control alt I will work. Then type the code very carefully into the chunk. Make sure that the code sits within the body of the chunk and that you do not disturb the dashes that separate the chunk from the rest of the document.
You should type the line into a single block of code that loads the data. Some types of data, such as GIS layers or SPSS data files require packages to be loaded first. You should include a code chunk to load these packages first if you need them.
When conducting an analysis with just a single data frame I often call the data object “d”. This is just for brevity. As there are no other data objects present there is no ambiguity. If you load several data frames you should give them more informative names.
if(knitr::is_html_output()) {
knitr::include_graphics("/home/rstudio/webpages/books/AQM_book/images/ss_6.gif")
}
When you have finished typing the line of code to load the data, click on the run button of the chunk. You will see a data object appear in the environment pane in the top right corner of the RStudio server. If you click on this object it will open as a spreadsheet like table in the main panel.
if(knitr::is_html_output()) {
knitr::include_graphics("/home/rstudio/webpages/books/AQM_book/images/ss_7.gif")
}
Once you have the data loaded you can begin to build up an analysis. You should use a separate chunk for each step and write some text between each chunk that explains what you are doing. Code from the course documents and crib sheets will form the basis for most of your analyses. It is very important to run all the code in the right order. Code chunks often depend on actions that are taken previously. For example in fig @ref(fig:rstudio10) the animated gif shows that two code chunks have been added after the data were loaded. The first produces a new variable which is the log transformed body weight. The second inspects the relationship between mean time spent sleeping and the log transformed variable. If the code were not run in order the last chunk would not work. The downward facing button on a code chunk runs all the chunks above it. You should use this frequently in order to check that everything is in the right order.
if(knitr::is_html_output()) {
knitr::include_graphics("/home/rstudio/webpages/books/AQM_book/images/ss_8.gif")
}
Once you have written all the code needed for your analysis and tested it by stepping through each chunk in the correct order you can compile your report into a document. This whole book has been written and compiled in this way. The idea of using markdown is to ensure that all the code to produce an analysis is reproducible and that the results of the analysis are annotated with comments that explain them both as a reminder to yourself and potentially as a report read by others.
if(knitr::is_html_output()) {
knitr::include_graphics("/home/rstudio/webpages/books/AQM_book/images/ss_9.gif")
}
knitr::opts_chunk$set(echo = TRUE)