4.6 Bar plots

There are two geoms for making bar plots - geom_col() and geom_bar(). In short: geom_col() plots values from your data directly (you define the x and y values), whereas geom_bar() will only take x values, the height of the bar (y) is the subgroups within x counted up. geom_bar() is basically a histogram for a categorical variable.

geom_col():

  • requires two variables aes(x = , y = )
  • x is categorical, y is continuous (numeric)

Let’s plot the life expectancies in 2007 in these three countries:

This gives us Figure 4.8:1. We have also created another cheeky one using the same code but changing the scale of the y axis to be more dramatic (Figure 4.8:2).

Bar plots using `geom_col()`: (1) using the code example, (2) same plot but with `+ coord_cartesian(ylim=c(79, 81))` to manipulate the scale into something a lot more dramatic.

FIGURE 4.8: Bar plots using geom_col(): (1) using the code example, (2) same plot but with + coord_cartesian(ylim=c(79, 81)) to manipulate the scale into something a lot more dramatic.

geom_bar():

  • requires a single variable aes(x = )
  • this x should be a categorical variable
  • geom_bar() then counts up the number of observations (rows) for this variable and plots them as bars.

Our gapminder2007 tibble has a row for each country (see end of Section 4.1 to remind yourself). Therefore, if we use the count() function on the continent variable, we are counting up the number of countries on each continent (in this dataset12):

## # A tibble: 5 x 2
##   continent     n
##   <fct>     <int>
## 1 Africa       52
## 2 Americas     25
## 3 Asia         33
## 4 Europe       30
## 5 Oceania       2

So geom_bar() basically runs the count() function and plots it (see how the bars on Figure 4.9 are the same height as the values from count(continent)).

`geom_bar()` counts up the number of observations for each group. (1) `gapminder2007 %>% ggplot(aes(x = continent)) + geom_bar()`, (2) same + a little bit of magic to reveal the underlying data.

FIGURE 4.9: geom_bar() counts up the number of observations for each group. (1) gapminder2007 %>% ggplot(aes(x = continent)) + geom_bar(), (2) same + a little bit of magic to reveal the underlying data.

The first barplot in Figure 4.9 is produced with just this:

Whereas on the second one, we’ve asked geom_bar() to reveal the components (countries) in a colourful way:

We have added theme(legend.position = "none") to remove the legend - it includes all 142 countries and is not very informative in this case. We’re only including the colours for a bit of fun.

We’re also removing the fill by setting it to NA (fill = NA). Note how we defined colour = country inside the aes() (as it’s a variable), but we put the fill inside geom_bar() as a constant. This was explained in more detail in steps (3) and (4) in the ggplot anatomy Section (4.2).

4.6.1 colour vs fill

Figure 4.9 also reveals the difference between a colour and a fill. Colour is the border around a geom, whereas fill is inside it. Both can either be set based on a variable in your dataset (this means colour = or fill = needs to be inside the aes() function), or they could be set to a fixed colour.

R has an amazing knowledge of colour. In addition to knowing what is “white”, “yellow”, “red”, “green” etc. (meaning we can simply do geom_bar(fill = "green")) it also knows what “aquamarine”, “blanchedalmond”, “coral”, “deeppink”, “lavender”, “deepskyblue” look like (amongst many many others, search the internet for “R colours” for a full list).

We can also use HEX colour codes, for example, geom_bar(fill = "#FF0099") is a very pretty pink.

4.6.2 Proportions

Whether using geom_bar() or geom_col(), we can use fill to display proportions within bars. Furthermore, sometimes it’s useful to set the x value to a constant - to get everything plotted together rather than separated by a variable. So we are using aes(x = "Global", fill = continent), note that “Global” could be any word - since it’s quoted ggplot() won’t go looking for it in the dataset:

There are more examples of bar plots in Chapter 8.

4.6.3 Exercise

Create Figure 4.10 of life expectancies in European countries (year 2007).

Barplot Exercise. Life expectancies in European countries in year 2007 from the Gapmminder dataset.

FIGURE 4.10: Barplot Exercise. Life expectancies in European countries in year 2007 from the Gapmminder dataset.

Hints:

  • If geom_bar() doesn’t work try geom_col() or vice versa.
  • coord_flip() to make the bars horizontal (it flips the x and y axes).
  • x = country gets the country bars plotted in the alphabetical order, use x = fct_reorder(country, lifeExp) still inside the aes() to order the bars by their lifeExp values. Or try one of the other variables (pop, gdpPercap) as the second argument to fct_reorder().
  • when using fill = NA, you also need to include a colour, we’re using colour = "deepskyblue" inside the geom_col().

  1. The number of countries in this dataset is 142, whereas the United Nations have 193 member states