1.1 Help, what’s a script?

A script is a list of instructions. It is just a text file and no special software is required to view one. An example R script is shown in Figure 1.1.

Don’t panic! The only thing you need to understand at this point is that what you’re looking at is a list of instructions written in the R language.

You should also notice that some parts of the script look like normal English. These are the lines that start with a # and they are called “comments”. We can (and should) include these comments in everything we do. These are notes of what we were doing, both for colleagues as well as our future selves.

An example R script from RStudio.

FIGURE 1.1: An example R script from RStudio.

Lines that do not start with a # are R code. This is where the number crunching really happens. We will cover the details of this R code in the next few chapters, the purpose of this chapter is to describe some of the terminology as well as the interface and tools we use.

For the impatient:

  • We interface R using RStudio
  • We use the tidyverse packages that are a substantial extension to base R functionality (we repeat: extension, not replacement)

Even though R is a language, don’t think that after reading this book you should be able to open a blank file and just start typing in R code like an evil computer genius from a movie. This is not what real world programming looks like.

Firstly, you should be copy-pasting and adapting existing R code examples - whether from this book or later from your own previous work. Re-writing everything from scratch is not efficient. Yes, you will understand and eventually remember a lot of it. But to spend time memorising very specific things that can easily be looked up and copied is simply not necessary.

Secondly, R is an interactive language. Meaning that we “run” R code line by line and get immediate feedback. We would never write a whole script without trying everything out as we go along.

Thirdly, do not worry about making mistakes. Celebrate them! The whole point of R and reproducibility is that manipulations are not applied directly on a dataset but a copy of it. And that everything is in a script - so if we do make a wrong move (e.g. accidentally overwrite or remove some data) we can always reload it, rerun the steps that worked well and continue figuring our where we went wrong at the end. And since all of these steps are written down in a script, R will redo everything with a single push of a button. You do not have to redo anything, that’s what R is for.