3.6 Common arithmetic functions - sum(), mean(), median(), etc.

Statistics is what R does, so if there is a statistical function you can think of, it will exist in R.

The most common ones are:

  • sum()
  • mean()
  • median()
  • min(), max()
  • sd() - standard deviation
  • IQR() - inter-quartile range

The import thing to remember about all of these is that if any of the values is NA (not applicable/not available), these functions will return an NA. Either deal with your missing values beforehand (recommended) or add the na.rm = TRUE argument into any of the above functions to ask R to ignore missing values. More discussion and examples around missing data can be found in Chapters 2 and 14.

## [1] NA
## [1] 3

Overall, R’s unwillingness to implicitly average over observations with missing values should be considered helpful, not an unnecessary pain. If you don’t know exactly where your missing values are/how many, you might end up comparing the averages of very different groups (if the values are not missing and random or the sample size is small). So the na.rm = TRUE is fine to use if quickly exploring and cleaning data, or you’ve already investigated missing values and are convinced the existing ones are representative. But it is rightfully not a default so get used to typing na.rm = TRUE when using these functions.