You can use Rstudio without opening projects. However, projects make organising your work much simpler. A project is a set of instructions to restore the server to the same state that it was in when you closed the project. So if you are analysing a range of data sets you can use one project per data set to keep your work organised.
To form a new project and add a new folder
The course material generally uses data that is already placed on the server. However in order in order to conduct your own data analyses you will need to upload data files to your folders. The easiest way to ensure that the data and the data analysis are kept unified is to upload the data into the same folder you opened for the analysis project. That way there will be no need to specify a path when the data are read into R.
Although R can read data from many different formats, the data files that you upload must be in some form of conventional format. The easiest format to use is to save each table as a single comma separated variable (.csv) file. The first line should contain short variable names with no spaces. The variable definitions should be kept separately and referred to when writing figure labels and captions, but not used in the column headers.
Data files are added to the project using the upload button in the files pane (bottom right). If you want to upload multiple files at once (e.g shapefiles) you should first compress them into a zip file. The zip file will expand when uploaded.
To add the sleep.csv file into your project folder for this taster session run the following line in the rconsole.
library(aqm)
add_file()
This course will concentrate on the use of markdown documents as a way of running R code. There are many advantages of using markdown.
The steps are shown in the animated gif below
Now try pressing the “knit” button on the top right pane. You will see the default demonstration document that was produced as a template “knit” into a simple data report. This is not yet using your data of course.
The steps above will always produce the default “demo” markdown document. Every time you start a new markdown Rstudio will start off with this one. You should take a look at the logic of the demo document carefully. It consists of “chunks” of R code that produce output in the form of tables and figures embedded in text. The R code automatically produces output and adds it to the document after knitting. So if you have R code available that will run an analysis that you are interested in you don’t have to remember any other steps in order to run it. Simply ensure that the data that is being added to the analysis is appropriate for the type of analysis being run and you can obtain the same results with your own data. This will be the way R is used in this course.
If you are new to R you may be tempted to look around for a button on the RStudio interface to “load” the data file. You won’t find one. Although there is a way to load data interactively you really must not do this.
Always make sure that you include a line of code that loads the data at the start of your R script. Do not load data using the import data feature when building a markdown document. It will not work, as the data will not be loaded when you compile.
In this case the working directory that R is using coincides with the project directory. So there is no need to include the path to the data file. This line will read the file into R and assign the data to a data frame called “d”.
d<-read.csv("sleep.csv")
To form a code chunk click on the button on the interface labelled “Insert”. Alternatively the keyboard short-cut control alt I will work. Then type the code very carefully into the chunk. Make sure that the code sits within the body of the chunk and that you do not disturb the dashes that separate the chunk from the rest of the document.
You should type the line into a single block of code that loads the data. Some types of data, such as GIS layers or SPSS data files require packages to be loaded first. You should include a code chunk to load these packages first if you need them.
When conducting an analysis with just a single data frame I often call the data object “d”. This is just for brevity. As there are no other data objects present there is no ambiguity. If you load several data frames you should give them more informative names.
When you have finished typing the line of code to load the data, click on the run button of the chunk. You will see a data object appear in the environment pane in the top right corner of the RStudio server. If you click on this object it will open as a spreadsheet like table in the main panel.
Once you have the data loaded you can begin to build up an analysis. You should use a separate chunk for each step and write some text between each chunk that explains what you are doing. Code from the course documents and crib sheets will form the basis for most of your analyses. It is very important to run all the code in the right order. Code chunks often depend on actions that are taken previously. For example in fig @ref(fig:rstudio10) the animated gif shows that two code chunks have been added after the data were loaded. The first produces a new variable which is the log transformed body weight. The second inspects the relationship between mean time spent sleeping and the log transformed variable. If the code were not run in order the last chunk would not work. The downward facing button on a code chunk runs all the chunks above it. You should use this frequently in order to check that everything is in the right order.
Once you have written all the code needed for your analysis and tested it by stepping through each chunk in the correct order you can compile your report into a document. This whole book has been written and compiled in this way. The idea of using markdown is to ensure that all the code to produce an analysis is reproducible and that the results of the analysis are annotated with comments that explain them both as a reminder to yourself and potentially as a report read by others.