6.7 Compare the means of more than two groups

It may be that our question is set around a hypothesis involving more than two groups. For example, we may be interested in comparing life expectancy across 3 continents such as the Americas, Europe and Asia.

6.7.2 ANOVA

Analysis of variance is a collection of statistical tests which can be used to test the difference in means between two or more groups.

In base R form, it produces an ANOVA table which includes an F-test. This so-called omnibus test tells you whether there are any differences in the comparison of means of the included groups. Again, it is important to plot carefully and be clear what question you are asking.

##             Df Sum Sq Mean Sq F value   Pr(>F)    
## continent    2  755.6   377.8   11.63 3.42e-05 ***
## Residuals   85 2760.3    32.5                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

We can conclude from the significantly small p-value, that there is at least one significant difference in the continents included. This does not mean that all included groups (in this case the 3 continents) are significantly different from each other. As above, the output can be neatened up using the tidy function.

## # A tibble: 2 x 6
##   term         df sumsq meansq statistic    p.value
##   <chr>     <dbl> <dbl>  <dbl>     <dbl>      <dbl>
## 1 continent     2  756.  378.       11.6  0.0000342
## 2 Residuals    85 2760.   32.5      NA   NA

6.7.3 Assumptions

As with the normality assumption of the t-test (for example, Sections 6.4.1 and 6.4.2), there are assumptions of the ANOVA model. These assumptions are shared with linear regression and are covered in the next chapter, as linear regression lends itself to illustrate and explain these concepts well. Suffice to say that diagnostic plots can be produced to check that the assumptions are fulfilled. library(ggfortify) includes a function called autoplot() that can be used to quickly create diagnostic plots, including the Q-Q plot that we showed before:

Diagnostic plots: ANOVA model of life expectancy by continent for 2007

FIGURE 6.8: Diagnostic plots: ANOVA model of life expectancy by continent for 2007