6.6 Compare the mean of one group: one sample t-tests

We can use a t-test to determine whether the mean of a distribution is different to a specific value. For instance, we can test whether the mean life expectancy in each continent was significantly different from 77 years in 2007. We have included some extra code here to demonstrate how to run multiple tests in one pipe function.

gapdata %>% 
  filter(year == 2007) %>%          # 2007 only
  group_by(continent) %>%           # split by continent
  do(                               # dplyr function
    t.test(.$lifeExp, mu = 77) %>%  # compare mean to 77 years 
      tidy()                        # tidy into tibble
  )
## # A tibble: 5 x 9
## # Groups:   continent [5]
##   continent estimate statistic  p.value parameter conf.low conf.high method
##   <fct>        <dbl>     <dbl>    <dbl>     <dbl>    <dbl>     <dbl> <chr> 
## 1 Africa        54.8    -16.6  3.15e-22        51     52.1      57.5 One S…
## 2 Americas      73.6     -3.82 8.32e- 4        24     71.8      75.4 One S…
## 3 Asia          70.7     -4.52 7.88e- 5        32     67.9      73.6 One S…
## 4 Europe        77.6      1.19 2.43e- 1        29     76.5      78.8 One S…
## 5 Oceania       80.7      7.22 8.77e- 2         1     74.2      87.3 One S…
## # … with 1 more variable: alternative <chr>

The mean life expectancy for Europe and Oceania do not significantly differ from 77, while the others do. In particular, look at the confidence intervals of the results above (conf.low and conf.high columns) and whether they include or exclude 77. For instance, Oceania’s confidence intervals are especially wide as the dataset only includes two countries. Therefore, we can’t conclude that its value isn’t different to 77, but that we don’t have enough observations and the estimate is uncertain. It doesn’t make sense to report the results of a statistical test - whether the p-value is significant or not - without assessing the confidence intervals.

6.6.1 Interchangeability of t-tests

Furthermore, remember how we calculated the table of differences in the paired t-test section? We can use these differences for each pair of observations (country’s life expectancy in 2002 and 2007) to run a simple one-sample t-test instead:

# note that we're using dlifeExp
# so the differences we calculated above
t.test(paired_table$dlifeExp, mu = 0)
## 
##  One Sample t-test
## 
## data:  paired_table$dlifeExp
## t = 14.338, df = 32, p-value = 1.758e-15
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
##  1.282271 1.706941
## sample estimates:
## mean of x 
##  1.494606

Notice how this result is identical to the paired t-test.