Introduction to R

What is R?

  • Language for statistical computing
  • Open source
  • Cross platform
  • Free!
  • Extended by packages many written by leading academics
  • Thousands of R packages available

Why is R so popular?

  • R will do almost anything
  • Can run code in many other programs
  • Uses modern graphics packages
  • Excellent spatial capability
  • Modern methods for genetics, phylogenetics, time series analyses etc.. etc.
  • Respected by academics

R versus SPSS

  • SPSS (Statistical Package for Social Science) is a traditional statistics program
  • Uses batch processing approach by default i.e. click buttons to set up process and receive all the results. Can be scripted-
  • R by default has no click button GUI, although they are available.
  • R uses scripting by default.
  • Many R packages are specifically designed for Environmental data analysis

Advantages of scripting

  • A script is a transparent and reproducible set of instructions for running an analysis.
  • Can be reused and adapted.
  • If you have a copy of a script that works for one data set it will probably work for another data set providing the data is in the same format

Types of scripts

  • Scripts range from simple, single lines of code, to complex lists of instructions
  • So called “wrapper” functions often simplify scripting
  • Default settings can often be used to get acceptable results very simply, particularly when using wrappers.
  • More advanced bespoke scripting can produce exactly what you need
  • The 80:20 rule. You can get 80% of what you need with 20% effort. Obtaining that 20% extra to get it all just right takes the time.

R Studio and Markdown

  • RStudio is an interface to R to make writing scripts easier
  • It is not a push button GUI
  • RStudio helps by completing lines and offering options automatically
  • Markdown allows R code to be combined with text to form a document.

Example

This document itself has been written using markdown. Two lines of R code produce a simple histogram of simulated data.

x<-rnorm(100,mean=10,sd=2)

The variable x now contains 100 random numbers.

Example

The vector is plotted as a histogram.

hist(x)

plot of chunk unnamed-chunk-2

Code chunks

Anything inside a code chunk is R code. An analysis is built up step by step. So the previous chunk produced a set of random numbers in a vectors called x. We can ask R about these numbers.

mean(x)
[1] 10.08172

The R Studio server

You can install R and RStudio on your own laptop very easily by following some simple install instructions on line. It is a good idea to do this. However some of the spatial functions would not be available. You would also have to install extra packages for this course, some of which may differ between operating systems. The server provides a common platform with a single installation and all packages ready for use.

Accessing the server

You can access the server on campus or off campus from any browser, including those running on a tablet or mobile phone. Trying to program R though a tiny phone screen is not recommended!

Your user name

I have set up usernames for the course on the server. The user names all begin with msc_ (msc underscore) together with your first name as registered on the grade centre. There are also two generic users (msc_student1 and msc_student2) Only one login is allowed for any username at any one time. If you log into an account in use you will logout the previous user. So always use your own account.

Your user name

Check in the next slide for your name and use this to log in. The default password is

msc123 (all lower case)

If you want to prevent others logging into your account by using the default password I will show you how to change it.

Your user name

 [1] "msc_adrianus"  "msc_data"      "msc_elisabeth" "msc_hannah"   
 [5] "msc_jasmine"   "msc_jessica"   "msc_katie"     "msc_kayleigh" 
 [9] "msc_kristian"  "msc_marisa"    "msc_martin"    "msc_natalie"  
[13] "msc_rebecca"   "msc_remi"      "msc_ryan"      "msc_sarah"    
[17] "msc_scripts"   "msc_student1"  "msc_student2"  "msc_viki"     

Your user name

msc_scripts and msc_data are special users. You should not login to these accounts. msc_student1 and msc_student2 are for guest users.

Server usage

You can upload and download files from the server very easily. So you should regard the server as the base platform for conducting analyses but it may be used in combination with other software such as QGIS that you run locally on your laptop or PC. Nothing is “locked in” to the server. Just move the flies off it and onto your hard drive whenever necessary. However scripts that load data that is held on the server will only run on the server itself.

Tips for learning R

R has a steep learning curve. In other words it is hard to start writing your own scripts. So at the start don't even try. Just learn to run the prebuilt scripts. As time goes on you will gain the confidence to start changing some of the lines and adapt them.

Tips for learning R

Google is your friend! All R users work with one browser tab open for coding and another couple of tabs open showing help files and googled examples. No one can possibly remember the syntax for the millions of functions in R. The syntax prompts and auto completion in RStudio helps, with experience.

Tips for learning R

Don't be scared of code that you don't “understand”. Concentrate on the output from the code. Learn why you would want to run an analysis rather than how you run it, to begin with. Use the examples throughout.

Tips for learning R

Learn to understand data formats. Think about the way R uses data, rather than the way you may have worked previously in Excel. Look carefully at the data as you run each line of code and make sure you can produce data that has an identical form.

Tips for learning R

Learn what a data frame consists of and how this relates to the way you may collect your own data. In the classes we will be loading data into R and analysing the data using quite simple lines of code. Providing your data is in the right format for loading into R all the lines of code will work for your own data. If you turn data into any unconventional format, that you have decided upon yourself without any reference to the norms and conventions, you won't be able to use R to analyse your data without seeking help. DATA FORMATING IS A KEY SKILL