3.6 Common arithmetic functions - sum(), mean(), median(), etc.

Statistics is an R strength, so if there is an arithmetic function you can think of, it probably exists in R.

The most common ones are:

  • sum()
  • mean()
  • median()
  • min(), max()
  • sd() - standard deviation
  • IQR() - interquartile range

An import thing to remember relates to missing data: if any of your values is NA (not available; missing), these functions will return an NA. Either deal with your missing values beforehand (recommended) or add the na.rm = TRUE argument into any of the functions to ask R to ignore missing values. More discussion and examples around missing data can be found in Chapters 2 and 11.

mynumbers <- c(1, 2, NA)
sum(mynumbers)
## [1] NA
sum(mynumbers, na.rm = TRUE)
## [1] 3

Overall, R’s unwillingness to implicitly average over observations with missing values should be considered helpful, not an unnecessary pain. If you don’t know exactly where your missing values are, you might end up comparing the averages of different groups. So the na.rm = TRUE is fine to use if quickly exploring and cleaning data, or if you’ve already investigated missing values and are convinced the existing ones are representative. But it is rightfully not a default so get used to typing na.rm = TRUE when using these functions.