3.7 select()
columns
The select()
function can be used to choose, rename, or reorder columns of a tibble.
For the following select()
examples, let’s create a new tibble called gbd_2rows
by taking the first 2 rows of gbd_full
(just for shorter printing):
## # A tibble: 2 x 5
## cause year sex income deaths_millions
## <chr> <dbl> <chr> <chr> <dbl>
## 1 Communicable diseases 1990 Female High 0.21
## 2 Communicable diseases 1990 Female Upper-Middle 1.150
Let’s select()
two of these columns:
## # A tibble: 2 x 2
## cause deaths_millions
## <chr> <dbl>
## 1 Communicable diseases 0.21
## 2 Communicable diseases 1.150
We can also use select()
to rename the columns we are choosing:
## # A tibble: 2 x 2
## cause deaths
## <chr> <dbl>
## 1 Communicable diseases 0.21
## 2 Communicable diseases 1.150
The function rename()
is similar to select()
, but it keeps all variables whereas select()
only kept the ones we mentioned:
## # A tibble: 2 x 5
## cause year sex income deaths
## <chr> <dbl> <chr> <chr> <dbl>
## 1 Communicable diseases 1990 Female High 0.21
## 2 Communicable diseases 1990 Female Upper-Middle 1.150
select()
can also be used to reorder the columns in your tibble. Moving columns around is not relevant in data analysis (as any of the functions we showed you above, as well as plotting, only look at the column names, and not their positions in the tibble), but it is useful for organising your tibble for easier viewing.
So if we use select like this:
## # A tibble: 2 x 5
## year sex income cause deaths_millions
## <dbl> <chr> <chr> <chr> <dbl>
## 1 1990 Female High Communicable diseases 0.21
## 2 1990 Female Upper-Middle Communicable diseases 1.150
The columns are reordered.
If you want to move specific column(s) to the front of the tibble, do:
## # A tibble: 2 x 5
## year sex cause income deaths_millions
## <dbl> <chr> <chr> <chr> <dbl>
## 1 1990 Female Communicable diseases High 0.21
## 2 1990 Female Communicable diseases Upper-Middle 1.150
And this is where the true power of select()
starts to come out.
In addition to listing the columns explicitly (e.g., mydata %>% select(year, cause...)
) there are several special functions that can be used inside select()
.
These special functions are called select helpers, and the first select helper we used is everything()
.
The most common select helpers are starts_with()
, ends_with()
, contains()
, matches()
(but there are several others that may be useful to you, so press F1 on select()
for a full list, or search the web for more examples).
Let’s say you can’t remember whether the deaths column was called deaths_millions
or just deaths
or deaths_mil
, or maybe there are other columns that include the word “deaths” that you want to select()
:
## # A tibble: 2 x 1
## deaths_millions
## <dbl>
## 1 0.21
## 2 1.150
Note how “deaths” needs to be quoted inside starts_with()
- as it’s a word to look for, not the real name of a column/variable.