Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Binary file added images/split-apply-combine.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
72 changes: 41 additions & 31 deletions sessions/split-apply-combine.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -19,10 +19,10 @@ tidied_post_meal_data <- post_meal_data |>
```

The split-apply-combine method is a powerful technique for performing
data analysis. It involves splitting a dataset into groups based on
some categorical variable(s), performing some analysis to each group,
and then combining the results back together into a single dataset. In
this session, we'll explore how to use this technique in R using the
data analysis. It involves splitting a dataset into groups based on some
categorical variable(s), performing some analysis to each group, and
then combining the results back together into a single dataset. In this
session, we'll explore how to use this technique in R using the
`group_by()` and `summarise()` functions from the `{dplyr}` package.

## Learning objectives
Expand All @@ -41,6 +41,12 @@ analysis tasks that can be approached using the
method, which involves splitting the data into groups, applying some
analysis to each group, and then combining the results together.

![A diagram showing how a data frame is split up, an action is applied
to the splits that outputs a new result, and the results are combined
back together. Taken from [Software Carpentries R for Reproducible
Scientific
Analysis](https://unlhcc.github.io/r-novice-gapminder/16-plyr/).](/images/split-apply-combine.png)

As we briefly showed in @tbl-wrangling-verbs, `{dplyr}` has some
wrangling verbs that include one to `summarise()` the data. If you want
to do a split-apply-combine analysis to, for example, find the max
Expand Down Expand Up @@ -142,17 +148,18 @@ run it.

Run this code chunk with {{< var keybind.run-code >}} to see the output.
Cool! Since we don't need the dataset grouped anymore, it's good
practice to end the grouping with `ungroup()`.
practice to end the grouping with `ungroup()` or add `.groups = "drop"`
inside the `summarise()` function.

```{r ungroup-data}
#| filename: "docs/learning.qmd"
tidied_post_meal_data |>
group_by(Group) |>
summarise(
mean_age = mean(Age),
mean_bmi = mean(BMI)
) |>
ungroup()
mean_bmi = mean(BMI),
.groups = "drop"
)
```

Run this code chunk with {{< var keybind.run-code >}} to see the output.
Expand All @@ -179,9 +186,9 @@ tidied_post_meal_data |>
group_by(Group) |>
summarise(
mean_age = mean(Age),
mean_bmi = mean(BMI)
mean_bmi = mean(BMI),
.groups = "drop"
) |>
ungroup() |>
knitr::kable()
```

Expand All @@ -203,9 +210,9 @@ tidied_post_meal_data |>
group_by(Group) |>
summarise(
mean_age = round(mean(Age), 1),
mean_bmi = round(mean(BMI), 1)
mean_bmi = round(mean(BMI), 1),
.groups = "drop"
) |>
ungroup() |>
knitr::kable()
```

Expand All @@ -222,9 +229,9 @@ tidied_post_meal_data |>
group_by(Group) |>
summarise(
"Mean Age (yrs)" = round(mean(Age), 1),
"Mean BMI (kg/m^2^)" = round(mean(BMI), 1)
"Mean BMI (kg/m^2^)" = round(mean(BMI), 1),
.groups = "drop"
) |>
ungroup() |>
knitr::kable()
```

Expand All @@ -234,17 +241,17 @@ To add the caption, there are two ways to do it: as an argument to
works similar to the one for figures. We can also reference the table
too if we include a label with `#| label: tbl-`.

````{.markdown filename="docs/learning.qmd"}
```` {.markdown filename="docs/learning.qmd"}
```{{r}}
#| label: tbl-mean-age-bmi
#| tbl-cap: "Mean values of Age and BMI for each group."
tidied_post_meal_data |>
group_by(Group) |>
summarise(
"Mean Age (yrs)" = round(mean(Age), 1),
"Mean BMI (kg/m^2^)" = round(mean(BMI), 1)
"Mean BMI (kg/m^2^)" = round(mean(BMI), 1),
.groups = "drop"
) |>
ungroup() |>
knitr::kable()
```
````
Expand All @@ -257,9 +264,9 @@ tidied_post_meal_data |>
group_by(Group) |>
summarise(
"Mean Age (yrs)" = round(mean(Age), 1),
"Mean BMI (kg/m^2^)" = round(mean(BMI), 1)
"Mean BMI (kg/m^2^)" = round(mean(BMI), 1),
.groups = "drop"
) |>
ungroup() |>
knitr::kable()
```

Expand All @@ -281,7 +288,7 @@ file, create a new header called `## Bigger table`. Then copy the code
template below and paste it below the new header. Then, using this
template, complete each of the items below.

````{.markdown filename="docs/learning.qmd"}
```` {.markdown filename="docs/learning.qmd"}
```{{r}}
#| label: tbl-___
#| tbl-cap: "___"
Expand Down Expand Up @@ -333,22 +340,24 @@ See @tbl-___, very nice table! :D
columns `"AUC c-Peptide"`, `"AUC Glucose"`, and `"AUC Insulin"`
(note the `""` around the names).

5. `rename()` the columns to be more human readable. Rename `age_group`
5. Add `.groups = "drop"` inside the `summarise()` function to ungroup
the data.

6. `rename()` the columns to be more human readable. Rename `age_group`
to `"Age group"` and `Group` to `"Family history"`. Remember,
renaming follows the `new = old` format.

6. Next, `ungroup()` the data before sending it to `knitr::kable()` to
create the table.
7. Next, send the data to `knitr::kable()` to create the table.

7. In the text below the code chunk, reference the table with
8. In the text below the code chunk, reference the table with
`@tbl-summary-table`.

8. Run `{styler}` on the document with {{< var keybind.styler >}}.
9. Run `{styler}` on the document with {{< var keybind.styler >}}.

9. Render the document to HTML with {{< var keybind.render >}} to see
10. Render the document to HTML with {{< var keybind.render >}} to see
what the table looks like.

10. End the exercise by adding and committing to the Git history with
11. End the exercise by adding and committing to the Git history with
{{< var keybind.git >}}, then push to GitHub.

```{r solution-create-pretty-table}
Expand All @@ -371,15 +380,16 @@ post_meal_data |>
summarise(
"AUC c-Peptide" = round(median(auc_cp), 1),
"AUC Glucose" = round(median(auc_pg), 1),
"AUC Insulin" = round(median(auc_ins), 1)
) |>
"AUC Insulin" = round(median(auc_ins), 1),
# Task 5.
.groups = "drop"
) |>
# Task 6.
rename(
"Age group" = age_group,
"Family history" = Group
) |>
# Task 6.
ungroup() |>
# Task 7.
knitr::kable()

# See @tbl-summary-table, very nice table! :D
Expand Down