Chapter 2 Using the RStudio server
2.1 Introduction
RStudio is a complete environment for working with the R language. Once you have got used to it you will find that it makes working with R far more productive than using the R console version. However some of the concepts involved in using RStudio may be new. RStudio provides an interface for working with R code, rather than an interface for running analyses directly.
There are a lot of new concepts to get used to, so you may have to work throguuh the videos in this class several times in order to get used to the RStudio way of working.
At the end of the class you will have learned
- How to log into the RStudio server.
- How to navigate around the interface.
- How to upload data files from your PC onto the server in order to analyse them in R
- How to organise your work in the form of projects
- How to start a new analysis in the form of an R markdown document.
2.2 Getting started with the RStudio server
The RStudio server version runs directly through any web browser. There is no need to install any software on your laptop, PC or tablet
Access to the server is through the following URL. This works both on and off campus.
http://r.bournemouth.ac.uk:8788/
(Note that this is an updated version. In previous years the port 8789 was used, and the videos made last year refer to this port number.)
2.2.1 Log into the RStudio server
- Click on the URL http://r.bournemouth.ac.uk:8788/ in a browser. Use Firefox or Chrome.
- You will see a log in page.
- Log in using the username and the password you have been provided
2.3 RStudio server concepts
The RStudio server is an integrated platform for doing the following …
- Saving and sharing data files
- Running analyses
- Compiling reports
- Connecting to data stores
- Sharing analyses with others.
Advanced features can be used without any programming skills through sharing scripts. However you do need to become familiar with some new concepts in order to use the server. The RStudio server is ideal for collaborative work. You have your own permanent space on the server for saving your own work and building up a portfolio of useful analyses. Only one person can be logged in at any one time under your username. However I can always log into your user space at any time in order to help correct any errors and to give you advice on the analysis.
2.4 Finding your way around the interface
Once you are logged in you will see three sections of the interface by default. This will change to four sections when you begin using scripts in the interface.
2.4.1 Logging on
Look carefully at the interface and learn to recognise the sections.
- The RConsole. This is showing up on the left hand side when you first log in. The console can be used for running R code interactively. There is a tab showing up labelled “terminal” as well. You won’t use this, as it is for more advanced programming.
- The environment, history and connections pane is at the top right of the screen. The environment tab is the one that is most used. This tab will show the data that is in the active workspace in R. The concept will only become clear after beginning to use R.
- The files, plots, packages, help and viewer tab at the bottom right. The files tab is the most important to understand at this stage. There will be no files in your home directory yet, nor will there be any folders.
2.5 Next steps
The next steps involve moving around the rstudio interface and trying to understand what you are looking at. The best advice at this point is not to try pointing and clicking on many of the menu options. This is because the way we work in R is quite different to the way you may have been used to running analyses in programs such as SPSS. Watch this video to hear some tips on what to do, and just as importantly what not to do.
2.7 Shortcuts
Although you don’t need to remember them, it can he handy to navigate the interface using shortcuts. This video shows how to zoom to different panes within the interface using the keyboard.
2.8 More on shortcuts
This is an optional video showing a little trick that can help you learn the keyboard shortcuts. Again, you don’t need to remember this to use the interface and you can skip this video if you wish.
2.9 Simple example of how to use the R console
Now that you have seen some of the basics of the rstudio environment it is time to try running some simple R code. You can do this by typing code into the console.
Of course, at this stage you don’t know how to write R code! However if you follow the instructions in this video carefully you will see R in action. This code will produce a simple histogram with simulated data.
2.10 Moving data between the Rstudio server and your PC
A key concept to understand when using the server is that your home directory on the server is like a directory (folder) on your PC. In one sense it is rather like the university H drive. However the data, the instructions for processing the data and the software (R and R packages) are all “encapsulated” on the server. So this is different from your H drive. You cannot run analyses using the standard university server. You can run analyses within the RStudio server.
In order to move data files and scripts into your home directory within the server you must upload them. You will see buttons labelled New Folder, Upload, Delete, Rename and More. If you click on the More button you will also find an option to Export your files. The upload and the export buttons are frequently used to move files onto the server and to directly move files off the server. It is very important to be aware of this concept. Files saved on the server will always be available for use later. In contrast active analyses that take place in the server memory, as opposed to the server’s hard disk space, will be temporary and will be lost between sessions.
To see the way data is moved between the server and your PC or laptop watch this video. Note that the command that you type into the console in order to add the example data set is aqm::add_file()
2.11 Loading data into R
It is important to draw a clear distinction between uploading data onto the RStudio server and actually working with data in R. Uploading a data file simply involves moving the file from one place to another. Loading data into R involves turning the data held in the data file into data held in a format R can work with.
2.11.1 The wrong way to load data into R!
The video below shows the technique that you might stumble across yourself if you explore the Rstudio interface. This is not recommended practice I simply include the example in order to dissuade users from following it.
2.12 Using projects in RStudio
You can use Rstudio without opening projects. However, projects make organising your work much simpler. A project is a set of instructions to restore the server to the same state that it was in when you closed the project. So if you are analysing a range of data sets you can use one project per data set to keep your work organised.
To form a new project and add a new folder
- Click on the file menu at the top left of the interface.
- Go to New Project
- You will see a window with three options to create a project. Choose the first option labelled New Directory
- The next window will show a range of advanced options. Ignore them and just select New Project
- You will now see a window with a prompt for the Directory name (and some other options). In this example the project will contain data on sleep in mammals, so “sleep” could be used as the directory name.
- Click create project
- Look at the files pane in the bottom right corner. You will now see that after Home there is the word sleep. You can also see a file called sleep.Rproj in the folder.
- Click on home. You can see a folder called sleep in your home directory. So .. you have created a new project and placed the project file within the folder.
- Now you can upload the data that you are going to work on.
Data files are added to the project using the upload button in the files pane (bottom right). If you want to upload multiple files at once (e.g shapefiles) you should first compress them into a zip file. The zip file will expand when uploaded. Although R can read data from many different formats, the data files that you upload must be in some form of conventional format. The easiest format to use is to save each table as a single comma separated variable (.csv) file. The first line should contain short variable names with no spaces. The variable definitions should be kept separately and referred to when writing figure labels and captions, but not used in the column headers.
So you can use Excel spreadsheets to store your data. However be careful to keep the raw data in a simple R readable format. If you are used to using Excel and wish to combine the use of Excel and R then you can copy the raw data on the first sheet over to a second sheet for analysis using Excel. Do not add anything else, such as figures and compiled statistics, to the sheet of raw data in Excel.
The video below takes you through the steps involved in starting a project and loading a CSV file.
2.13 Starting a markdown analysis
This course will concentrate on the use of markdown documents as a way of running R code. There are many advantages of using markdown.
- Embedded code can be either revealed to other users to show how the results were obtained or hidden to simply produce a report with embedded figures and statistics.
- Annotation of the results of an analysis can be embedded around the results to explain the key results.
- Very limited knowledge of the R language and syntax is necessary to adapt markdown documents in order to analyse your own data.
- With a little more knowledge and experience of R complex methods can be applied by altering markdown found on-line.
In order to produce a markdown document you need to follow these steps. RStudio “helpfully” produces a default template document for you to edit. This can seem rather confusing the first time you see it.
- Go to file on the top menu bar
- Choose “New file”
- Choose “R Markdown”
- You will now see a window in which you can type in a name for the title of your analysis. By default the name is “untitled.” Change that to some title that makes sense for the analysis you are going to run. It is easy to change the title later.
- You will now see an untitled markdown document added to the top pane in RStudio. Rather confusingly it is still untitled, even though you’ve just typed a title! The reason for this is that the title you typed is used as the first line of the data report, but so far you still have not saved the file as a named document.
- Click on save to save the report. Now give the file itself a name.
Now try pressing the “knit” button on the top right pane. You will see the default demonstration document that was produced as a template “knit” into a simple data report. This is not yet using your data of course.
The steps above will always produce the default “demo” markdown document. Every time you start a new markdown Rstudio will start off with this one.
You should take a look at the logic of the demo document carefully. It consists of “chunks” of R code that produce output in the form of tables and figures embedded in text. The R code automatically produces output and adds it to the document after knitting. So if you have R code available that will run an analysis that you are interested in you don’t have to remember any other steps in order to run it. Simply ensure that the data that is being added to the analysis is appropriate for the type of analysis being run and you can obtain the same results with your own data. This will be the way R is used in this course.
2.14 Showing the data within the markdown document.
When data has been loaded into R it is often convenient to provide the data set to others. If the data set is very large you would send them the original data file, together with the R script that loads the data and runs the analysis, In the case of small data sets it is convenient to embed the data itself within the report. In this way you only need to provide a single file in order to share both the data and the analysis.