LearnR-2-Spring2026/02_extra_exercises.qmd at main · data-and-visualization/LearnR-2-Spring2026 · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
---
title: "Learn R Part II - Extra Exercises"
format: html
editor: visual
---

To save you time, I've completed tasks 1 and 2 for you. Make sure to run those code chunks before moving on to task 3.

#### 1. Load tidyverse

```{r}
# load packages
library(tidyverse)
```

#### 2. Load two data sets using `read_csv()`

The data sets for these exercises include:

1.  `se_colleges`: admissions information for 4-year colleges and universities in the U.S. southeast in fall 2023 (<https://nces.ed.gov/ipeds/use-the-data>).

2.  `us_pop`: total U.S. population by race/ethnicity and state in 2020 (<https://data.census.gov/table/DECENNIALDHC2020.P9?g=010XX00US>)

```{r}
# import data
se_colleges <- read_csv("data/admissions_data.csv")
us_pop <- read_csv("data/census_race_state.csv")
```

#### 3. Create new variables

Using `mutate()`, create two new variables in `se_colleges` (make sure to store a new version of `se_colleges` so your variables get saved):

1.  `accept_rate`: `n_admitted` divided by `n_applied`

2.  `selective`: whether colleges are "more selective" or "less selective". "more selective" colleges are those with an `accept_rate` **less than** .5 (Hint: use an `if_else()` statement)

```{r}

```

#### 4. Count observations by group (by collapsing)

How many "more selective" colleges are in the U.S. southeast?

```{r}

```

How many "more selective" HBCUs (Historically Black Colleges and Universities) are in the U.S. southeast? (Hint: group by two variables. An `hbcu` value of 1 indicates a college is an HBCU)

```{r}

```

#### 5. Other calculations by group (by collapsing)

Using `group_by()` and `summarise()`, calculate two aggregations by `control` (remember to use `na.rm = TRUE`):

1.  `total_applied`: Sum of applicants

2.  `avg_accept`: Average (mean) of `accept_rate`

Which `control` category had the greatest number of applicants? What about the highest average acceptance rate?

```{r}

```

#### 6. Other calculations by group (without collapsing)

Which three colleges enrolled the highest percentage of total enrollment in their state?

Hints:

-   Use `group_by()` and `mutate()`

-   Within `mutate()`, first calculate total state enrollment (sum of `n_enrolled`), then calculate each college's percent of that total enrollment variable

-   Sort by your percentage variable (in descending order) to determine the highest percentage colleges (you may also want to select just a subset of columns to view your results)

```{r}

```

#### 7. Tidying data

Pivot the `us_pop` data frame into a long/tidy format.

```{r}

```

Using your long/tidy data frame, determine how many Hispanic or Latino individuals live in the United States (Hint: use `group_by()` and `summarise()`).

```{r}

```