-
Notifications
You must be signed in to change notification settings - Fork 0
/
10-next.qmd
884 lines (693 loc) · 36.6 KB
/
10-next.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
# Next Steps {#sec-next .incomplete-chapter}
## Intended Learning Outcomes {#sec-ilo-next - .ilo}
- [ ] Create and customise advanced types of plots
- [ ] Structure data in report, presentation, and dashboard formats
- [ ] Be aware of the ways to continue developing computational reproducibility skills
## Visualisation {#sec-custom-viz}
### Set-up {#setup-custom-viz}
1. Open your `reprores` project
1. Create a new quarto file called `10-viz.qmd`
1. Update the YAML header
1. Replace the setup chunk with the one below:
```{r}
#| echo: fenced
#| label: setup
#| include: false
# packages needed for this chapter section
library(tidyverse) # for data wrangling
library(ggthemes) # for themes
library(patchwork) # for combining plots
library(plotly) # for interactive plots
# devtools::install_github("hrbrmstr/waffle")
library(waffle) # for waffle plots
library(ggbump) # for bump plots
library(treemap) # for treemap plots
library(ggwordcloud) # for word clouds
library(tidytext) # for manipulating text for word clouds
library(gganimate) # for animated plots
#install.packages("rnaturalearthhires", repos = "http://packages.ropensci.org", type = "source")
library(sf) # for mapping geoms
library(rnaturalearth) # for map data
library(rnaturalearthdata) # extra mapping data
theme_set(theme_light())
```
Download the [ggplot2 cheat sheet](https://raw.githubusercontent.com/rstudio/cheatsheets/main/data-visualization.pdf).
### Defaults
The code below creates two plots using the default (light) theme and palettes. First, load the data and set `issue_category` to a factor so you can control the order of the categories.
```{r, message=FALSE}
# update column specification
ct <- cols(issue_category = col_factor(
levels = c("tech", "returns", "sales", "other")
))
# load data
survey_data <- read_csv(file = "data/survey_data.csv",
col_types = ct)
```
Next, create a bar plot of number of calls by issue category.
```{r, message=FALSE}
# create bar plot
bar <- ggplot(data = survey_data,
mapping = aes(x = issue_category,
fill = issue_category)) +
geom_bar(show.legend = FALSE) +
labs(x = "Issue Category",
y = "Count",
title = "Calls by Issue Category")
```
And create a scatterplot of wait time by call time, distinguished by issue category.
```{r, message=FALSE}
#create scatterplot
point <- ggplot(data = survey_data,
mapping = aes(x = wait_time,
y = call_time,
color = issue_category)) +
geom_point(alpha = 0.5) +
geom_smooth(method = lm, formula = y~x) +
labs(x = "Wait Time",
y = "Call Time",
color = "Issue Category",
title = "Wait Time by Call Time")
```
Finally, combine the two plots using the `+` from `r pkg("patchwork")` to see the default styles for these plots.
```{r, fig.cap="Default plot styles."}
bar + point
```
::: {.try data-latex=""}
Try changing the theme using built-in themes or customising the colours or linetypes with scale_* functions. See Appendix\ \@ref(plotstyle) for details.
:::
### Annotations
It's often useful to add annotations to a plot, for example, to highlight an important part of the plot or add labels. The `annotate()` function creates a specific geom at x- and y-coordinates you specify.
#### Text annotations
Add a text annotation by setting the `geom` argument to "text" or "label" and adding a `label`. Labels have padding and a background, while text is just text.
* Backslash-n `\n` in the label text controls where the line breaks are. Try removing or changing the position of these to see what happens.
* `x` and `y` control the coordinates of the label. You will likely have to play around with these values to get them right.
* The argument `hjust` is the horizontal justification of text, and `vjust` is the vertical justification. The default values are 0.5, where the text is centred on the x and y coordinates. 0 will justify to the left and bottom, while 1 justifies to the right and top.
* You can change the `angle` of text, but not labels.
```{r annotate-text, fig.cap = "An example of annotation text and label."}
bar +
# add left-justified text to the second bar
annotate(geom = "text",
label = "Our goal is to\nreduce this\ncategory",
x = 1.65, y = 150,
hjust = 0, vjust = 1,
color = "white", fontface = "bold",
angle = 45) +
# add a centred label to the third bar
annotate(geom = "label",
label = "Our goal is\nto increase this\ncategory",
x = 3, y = 75,
hjust = 0.5, vjust = 1,
color = " darkturquoise", fontface = "bold")
```
::: {.try data-latex=""}
See if you can work out how to make the figure below, starting with the following:
```{r, eval = FALSE}
tibble(x = c(0, 0, 1, 1),
y = c(0, 1, 0, 1)) |>
ggplot(aes(x, y)) +
geom_point(alpha = 0.25, size = 4, color = "red")
```
Hint: you will need to add 1 label annotation and 8 separate text annotations.
```{r, echo = FALSE}
tibble(x = c(0, 0, 1, 1),
y = c(0, 1, 0, 1)) |>
ggplot(aes(x, y)) +
geom_point(alpha = 0.25, size = 4, color = "red") +
annotate("label", label = "In the\nmiddle",
x = 0.5, y = 0.5,
fill = "dodgerblue", color = "white",
label.padding = unit(1, "lines"),
label.r = unit(1.5, "lines")) +
annotate("text", label = "Bottom\nLeft",
x = 0, y = 0, hjust = 0, vjust = 0) +
annotate("text", label = "Top\nLeft",
x = 0, y = 1, hjust = 0, vjust = 1) +
annotate("text", label = "Bottom\nRight",
x = 1, y = 0, hjust = 1, vjust = 0) +
annotate("text", label = "Top\nRight",
x = 1, y = 1, hjust = 1, vjust = 1) +
annotate("text", label = "45 degrees",
x = 0, y = 0.5, hjust = 0, angle = 45) +
annotate("text", label = "90 degrees",
x = 0.25, y = 0.5, angle = 90) +
annotate("text", label = "270 degrees",
x = 0.75, y = 0.5, angle = 270)+
annotate("text", label = "-45 degrees",
x = 1, y = 0.5, hjust = 1, angle = -45)
```
```{r, eval = FALSE, webex.hide = TRUE}
tibble(x = c(0, 0, 1, 1),
y = c(0, 1, 0, 1)) |>
ggplot(aes(x, y)) +
geom_point(alpha = 0.25, size = 4, color = "red") +
annotate("label", label = "In the\nmiddle",
x = 0.5, y = 0.5,
fill = "dodgerblue", color = "white",
label.padding = unit(1, "lines"),
label.r = unit(1.5, "lines")) +
annotate("text", label = "Bottom\nLeft",
x = 0, y = 0, hjust = 0, vjust = 0) +
annotate("text", label = "Top\nLeft",
x = 0, y = 1, hjust = 0, vjust = 1) +
annotate("text", label = "Bottom\nRight",
x = 1, y = 0, hjust = 1, vjust = 0) +
annotate("text", label = "Top\nRight",
x = 1, y = 1, hjust = 1, vjust = 1) +
annotate("text", label = "45 degrees",
x = 0, y = 0.5, hjust = 0, angle = 45) +
annotate("text", label = "90 degrees",
x = 0.25, y = 0.5, angle = 90) +
annotate("text", label = "270 degrees",
x = 0.75, y = 0.5, angle = 270)+
annotate("text", label = "-45 degrees",
x = 1, y = 0.5, hjust = 1, angle = -45)
```
:::
#### Other annotations
You can add other geoms to highlight parts of a plot. The example below adds a rectangle around a group of points, a text label, a straight arrow from the label to the rectangle, and a curved arrow from the label to an individual point.
```{r annotation-other, fig.cap="Example of annotatins with the rect, text, segment, and curve geoms."}
point +
# add a rectangle surrounding long call times
annotate(geom = "rect",
xmin = 100, xmax = 275,
ymin = 140, ymax = 180,
fill = "transparent", color = "red") +
# add a text label
annotate("text",
x = 260, y = 120,
label = "outliers") +
# add an line with an arrow from the text to the box
annotate(geom = "segment",
x = 240, y = 120,
xend = 200, yend = 135,
arrow = arrow(length = unit(0.5, "lines"))) +
# add a curved line with an arrow
# from the text to a wait time outlier
annotate(geom = "curve",
x = 280, y = 120,
xend = 320, yend = 45,
curvature = -0.5,
arrow = arrow(length = unit(0.5, "lines")))
```
See the `r pkg("ggforce", "https://ggforce.data-imaginist.com/")` package for more sophisticated options, such as highlighting a group of points with an ellipse.
### Other Plots {#sec-other-plots}
#### Interactive Plots
The `r pkg("plotly")` package can be used to make interactive graphs. Assign your ggplot to a variable and then use the function `ggplotly()` on the plot object. Note that interactive plots only work in HTML files, not PDFs or Word files.
```{r plotly, message = FALSE, fig.cap="Interactive graph using plotly"}
ggplotly(point)
```
::: {.info data-latex=""}
Hover over the data points above and click on the legend items.
:::
#### Waffle Plots
In Chapter\ \@ref(ggplot), we mentioned that pie charts are such a poor way to visualise proportions that we refused to even show you how to make one. Waffle plots are a delicious alternative.
::: {.warning data-latex=""}
Use `r pkg("waffle")` by [hrbrmstr on GitHub](https://github.com/hrbrmstr/waffle/) using the `install_github()` function below, rather than the one on CRAN you get from using `install.packages()`.
```{r, eval = FALSE}
devtools::install_github("hrbrmstr/waffle")
```
:::
By default, `geom_waffle()` represents each observation with a tile and splits these across 10 rows. You can play about with the `n_rows` argument to determine what works best for your data.
```{r, fig.cap = "Waffle plot."}
survey_data |>
count(issue_category) |>
ggplot(aes(fill = issue_category, values = n)) +
geom_waffle(
n_rows = 23, # try setting this to 10 (the default)
size = 0.33, # line size
make_proportional = FALSE, # use raw values
colour = "white", # line colour
flip = FALSE, # bottom-top or left-right
radius = grid::unit(0.1, "npc") # set to 0.5 for circles
) +
theme_enhance_waffle() + # gets rid of axes
scale_fill_colorblind(name = "Issue Category")
```
The waffle plot can also be used to display the counts as proportions To achieve these, set `n_rows = 10` and `make_proportional = TRUE`. Now, rather than each tile representing one observation, each tile represents 1% of the data.
```{r, fig.cap = "Proportional waffle plot."}
survey_data |>
count(issue_category) |>
ggplot(aes(fill = issue_category, values = n)) +
geom_waffle(
n_rows = 10,
size = 0.33,
make_proportional = TRUE, # compute proportions
colour = "white",
flip = FALSE,
radius = grid::unit(0.1, "npc")
) +
theme_enhance_waffle() +
scale_fill_colorblind(name = "Issue Category")
```
#### Treemap
Treemap plots are another way to visualise proportions. Like the waffle plots, you need to count the data by category first. You can use any [brewer palette](https://www.datanovia.com/en/blog/the-a-z-of-rcolorbrewer-palette/) for the fill.
```{r, fig.cap = "Treemap plot."}
survey_data |>
count(issue_category) |>
treemap(
index = "issue_category", # column with number of rectangles
vSize = "n", # column with size of rectangle
title = "",
palette = "BuPu",
inflate.labels = TRUE # expand labels to size of rectangle
)
```
You can also represent multiple categories with treemaps
```{r, fig.cap = "Treemap with two variables"}
survey_data |>
count(issue_category, employee_id) |>
arrange(employee_id) |>
treemap(
# use c() to specify two variables
index = c("employee_id", "issue_category"),
vSize = "n",
title = "",
palette = "Dark2",
# set different label sizes for each type of label
fontsize.labels = c(30, 10),
# set different alignments for two label types
align.labels = list(c("left", "top"), c("center", "center"))
)
```
#### Bump Plots
Bump plots are very useful for visualising how rankings change over time. So first, we need to get some ranking data. Let's start with a more typical raw data table, containing an identifying column of `person` and three columns for their scores each week
```{r}
# make a small dataset of scores for 3 people over 3 weeks
score_data <- tribble(
~person, ~week_1, ~week_2, ~week_3,
"Abeni", 80, 75, 90,
"Beth", 75, 85, 75,
"Carmen", 60, 70, 80
)
```
Now we make the table long, group by week, and use the `rank()` function to find the rank of each person's score each week. Use `n() - rank(score) + 1` to reverse the ranks so that the highest score gets rank 1. We also need to make the `week` variable a number.
```{r}
# calculate ranks
rank_data <- score_data |>
pivot_longer(cols = -person,
names_to = "week",
values_to = "score") |>
group_by(week) |>
mutate(rank = n() - rank(score) + 1) |>
ungroup() |>
arrange(week, rank) |>
mutate(week = str_replace(week, "week_", "") |> as.integer())
rank_data
```
A typical mapping for a bump plot puts the time variable in the x-axis, the rank variable on the y-axis, and sets colour to the identifying variable.
```{r basic-bump, fig.width = 8, fig.height=3, fig.cap = "Basic bump plot"}
ggplot(data = rank_data,
mapping = aes(x = week,
y = rank,
colour = person)) +
ggbump::geom_bump()
```
We can make this more attractive by customising the axes and adding text labels. Try running each line of this code to see how it builds up.
* Add `label = person` to the mapping so we can add in text labels.
* Increase the size of the lines with the `size` argument to `geom_bump()`
* We don't need labels for weeks 1.5 and 2.5, so change the x-axis `breaks`
* The `expand` argument for the two scale_ functions expands the plot area so we can fit text labels to the right.
* It makes more sense to have first place at the top, so reverse the order of the y-axis with `scale_y_reverse()` and fix the breaks and expansion.
* Add text labels with `geom_text()`, but just for week 3, so set `data = filter(rank_data, week == 3)` for this geom.
* Set `x = 3.05` to move the text labels just to the right of week 3, and set `hjust = 0` to right-justify the text labels (the default is `hjust = 0.5`, which would center them on 3.05).
* Remove the legend and grid lines. Increase the x-axis text size.
```{r bump-example, fig.width = 8, fig.height=3, fig.cap = "Bump plot with added features."}
ggplot(data = rank_data,
mapping = aes(x = week,
y = rank,
colour = person,
label = person)) +
ggbump::geom_bump(size = 10) +
scale_x_continuous(name = "",
breaks = 1:3,
labels = c("Week 1", "Week 2", "Week 3"),
expand = expansion(c(.05, .2))) +
scale_y_reverse(name = "Ranking",
breaks = 1:3,
expand = expansion(.2)) +
geom_text(data = filter(rank_data, week == 3),
color = "black", x = 3.05, hjust = 0) +
theme(legend.position = "none",
panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
axis.text.x = element_text(size = 12))
```
#### Word Clouds
Word clouds are a common way to summarise text data. First, download <a href="https://psyteachr.github.io/reprores-v4/data/amazon_alexa.csv" download>amazon_alexa.csv</a> into your data folder and then load it into an object. This dataset contains text reviews as well as the 1-5 rating from customers who bought an Alexa device on Amazon.
```{r}
# https://www.kaggle.com/sid321axn/amazon-alexa-reviews
# extracted from Amazon by Manu Siddhartha & Anurag Bhatt
alexa <- rio::import("data/amazon_alexa.csv")
```
We can use this data to look at how the words used differ depending on the rating given. To make the text data easy to work with, the function `tidytext::unnest_tokens()` splits the words in the `input` column into individual words in a new `output` column. `unnnest_tokens()` is particularly helpful because it also does things like removes punctuation and transforms all words to lower case to make it easier to work with. Compare `words` and `alexa` to see how they map on to each other.
```{r}
words <- alexa |>
unnest_tokens(output = "word", input = "verified_reviews")
```
We can then add another line of code using a pipe that counts how many instances of each word there is by rating to give us the most popular words.
```{r}
words <- alexa |>
unnest_tokens(output = "word", input = "verified_reviews") |>
count(word, rating, sort = TRUE)
```
```{r echo = FALSE}
head(words)
```
The problem is that the most common words are all function words rather than content words, which makes sense because these words have the highest word frequency in natural language.
Helpfully, `tidytext` contains a list of common "stop words", i.e., words that you want to ignore, that are stored in an object named `stop_words`. It is also very useful to define a list of custom stop words based upon the unique properties of your data (it can sometimes take a few attempts to identify what's appropriate for your dataset). This dataset contains a lot of numbers that aren't informative, and it also contains "https" from website links, so we'll get rid of both with a custom stop list.
Once you have defined your stop words, you can then use `anti_join()` to remove any `word` that is present in the stop word list.
To get the top 25 words, we then group by rating and use `dplyr::slice_max()`, ordered by the column `n`.
```{r}
custom_stop <- tibble(word = c(0:9, "https", 34))
words <- alexa |>
unnest_tokens(output = "word", input = "verified_reviews") |>
count(word, rating) |>
anti_join(stop_words, by = "word") |>
anti_join(custom_stop, by = "word") |>
group_by(rating) |>
slice_max(order_by = n, n = 25, with_ties = FALSE) |>
ungroup()
```
First, let's make a word cloud for customers who gave a 1-star rating:
* Filter retains only the data for 1-star ratings.
* `label` comes from the `word` column and is the data to plot (i.e., the words).
* `colour` makes the words red (you could also set this to `word` to give each word a different colour or `n` to vary colour continuously by frequency).
* `size` makes the size of the word proportional to `n`, the number of times the word appeared.
* `r hl(ggwordcloud::geom_text_wordcloud_area())` is the word cloud geom.
* `r hl(ggwordcloud::scale_size_area())` controls how big the word cloud is (this usually takes some trial-and-error).
```{r}
rating1 <- filter(words, rating == 1) |>
ggplot(aes(label = word, colour = "red", size = n)) +
geom_text_wordcloud_area() +
scale_size_area(max_size = 10) +
ggtitle("Rating = 1") +
theme_minimal(base_size = 14)
rating1
```
We can now do the same but for 5-star ratings and paste the plots together with `patchwork` (word clouds don't play well with facets).
```{r, fig.height = 3, fig.cap="Word cloud."}
rating5 <- filter(words, rating == 5) |>
ggplot(aes(label = word, size = n)) +
geom_text_wordcloud_area(colour = "darkolivegreen3") +
scale_size_area(max_size = 12) +
ggtitle("Rating = 5") +
theme_minimal(base_size = 14)
rating1 + rating5
```
::: {.warning data-latex=""}
It's worth highlighting that whilst word clouds are very common, they're really the equivalent of pie charts for text data because we're not very good at making accurate comparisons based on size. You might be able to see what's the most popular word, but can you accurately determine the 2nd, 3rd, 4th or 5th most popular word based on the clouds alone? There's also the issue that just because it's text data doesn't make it a qualitative analysis and just because something is said a lot doesn't mean it's useful or important. But, this argument is outwith the scope of this book, even if it is a recurring part of Emily's life thanks to her qualitative wife.
:::
#### Maps
Working with maps can be tricky. The `r pkg("sf")` package provides functions that work with `r pkg("ggplot2")`, such as `geom_sf()`. The `r pkg("rnaturalearth")` package (and associated data packages that you may be prompted to download) provide high-quality mapping coordinates.
* `ne_countries()` returns world country polygons (i.e., a world map). We specify the object should be returned as a "simple feature" class `sf` so that it will work with `geom_sf()`. If you would like a deep dive on simple feature objects, check out a [vignette](https://r-spatial.github.io/sf/articles/sf1.html) from the `r pkg("sf")` package.
* It's worth checking out what the object `ne_countries()` returns to see just how much information is available.
* Try changing the values and colours below to get a sense of how the code works.
```{r map-world, fig.width = 7, fig.height = 3.4}
# get the world map coordinates
world_sf <- ne_countries(returnclass = "sf", scale = "medium")
# plot them on a light blue background
ggplot() +
geom_sf(data = world_sf, size = 0.3) +
theme(panel.background = element_rect(fill = "lightskyblue2"))
```
You can combine multiple countries using `bind_rows()` and visualise them with different colours for each country.
```{r map-islands, fig.width = 6, fig.height = 6, fig.cap="Map coloured by country."}
# get and bind country data
uk_sf <- ne_states(country = "united kingdom", returnclass = "sf")
ireland_sf <- ne_states(country = "ireland", returnclass = "sf")
islands <- bind_rows(uk_sf, ireland_sf) |>
filter(!is.na(geonunit))
# set colours
country_colours <- c("Scotland" = "#0962BA",
"Wales" = "#00AC48",
"England" = "#FF0000",
"Northern Ireland" = "#FFCD2C",
"Ireland" = "#F77613")
ggplot() +
geom_sf(data = islands,
mapping = aes(fill = geonunit),
colour = NA,
alpha = 0.75) +
coord_sf(crs = sf::st_crs(4326),
xlim = c(-10.7, 2.1),
ylim = c(49.7, 61)) +
scale_fill_manual(name = "Country",
values = country_colours)
```
You can join <a href="https://psyteachr.github.io/reprores-v4/data/scottish_population.csv" download>Scottish population data</a> to the map table to visualise data on the map using colours or labels.
```{r}
# load map data
scotland_sf <- ne_states(geounit = "Scotland",
returnclass = "sf")
# load population data from
# https://www.indexmundi.com/facts/united-kingdom/quick-facts/scotland/population
scotpop <- read_csv("data/scottish_population.csv",
show_col_types = FALSE)
# join data and fix typo in the map
scotmap_pop <- scotland_sf |>
mutate(name = ifelse(name == "North Ayshire",
yes = "North Ayrshire",
no = name)) |>
left_join(scotpop, by = "name") |>
select(name, population, geometry)
```
::: {.warning data-latex=""}
There is a typo in the data from `r pkg("rnaturalearth")`, so you need to change "North Ayshire" to "North Ayrshire" before you join the population data.
:::
* Setting the fill to population in `geom_sf()` gives each region a colour based on its population.
* The colours are customised with `scale_fill_viridis_c()`. The breaks of the fill scale are set to increments of 100K (1e5 in scientific notation) and the scale is set to span 0 to 600K.
* `paste0()` creates the labels by taking the numbers 0 through 6 and adding "00 k" to them.
* Finally, the position of the legend is moved into the sea using `legend.position()`.
```{r map-scotland, fig.width = 5, fig.height = 7, fig.cap="Map coloured by population."}
# plot
ggplot() +
geom_sf(data = scotmap_pop,
mapping = aes(fill = population),
color = "white",
size = .1) +
coord_sf(xlim = c(-8, 0), ylim = c(54, 61)) +
scale_fill_viridis_c(name = "Population",
breaks = seq(from = 0, to = 6e5, by = 1e5),
limits = c(0, 6e5),
labels = paste0(0:6, "00 K")) +
theme(legend.position = c(0.16, 0.84))
```
#### Animated Plots
Animated plots are a great way to add a wow factor to your reports, but they can be complex to make, distracting, and not very accessible, so use them sparingly and only for data visualisation where the animation really adds something. The package `r pkg("gganimate", "https://gganimate.com/")` has many functions for animating ggplots.
Here, we'll load some population data from the United Nations. <a href="data/WPP2019_POP_F01_1_TOTAL_POPULATION_BOTH_SEXES.xlsx" download>Download the file</a> into your data folder and open it in Excel first to see what it looks like. The code below gets the data from the first tab, filters it to just the 6 world regions, makes the data long, and makes sure the `year` column is numeric and the `pop` column shows population in whole numbers (the original data is in 1000s).
```{r}
# load and process data
worldpop <- readxl::read_excel("data/WPP2019_POP_F01_1_TOTAL_POPULATION_BOTH_SEXES.xlsx", skip = 16) |>
filter(Type == "Region") |>
select(region = 3, `1950`:`1992`) |>
pivot_longer(cols = -region,
names_to = "year",
values_to = "pop") |>
mutate(year = as.integer(year),
pop = round(1000 * as.numeric(pop)))
```
Let's make an animated plot showing how the population in each region changes with year. First, make a static plot. Filter the data to the most recent year so you can see what a single frame of the animation will look like.
```{r}
worldpop |>
filter(year == 1992) |>
ggplot(aes(x = region, y = pop, fill = region)) +
geom_col(show.legend = FALSE) +
scale_fill_viridis_d() +
scale_x_discrete(name = "",
guide = guide_axis(n.dodge=2))+
scale_y_continuous(name = "Population",
breaks = seq(0, 3e9, 1e9),
labels = paste0(0:3, "B")) +
ggtitle('Year: 1992')
```
To convert this to an animated plot that shows the data from multiple years:
* Remove the filter and add `transition_time(year)`.
* Use the `{}` syntax to include the `frame_time` in the title.
* Use `anim_save()` to save the animation to a GIF file and set this code chunk to `eval = FALSE` because creating an animation takes a long time and you don't want to have to run it every time you knit your report.
```{r, eval = FALSE}
anim <- worldpop |>
ggplot(aes(x = region, y = pop, fill = region)) +
geom_col(show.legend = FALSE) +
scale_fill_viridis_d() +
scale_x_discrete(name = "",
guide = guide_axis(n.dodge=2))+
scale_y_continuous(name = "Population",
breaks = seq(0, 3e9, 1e9),
labels = paste0(0:3, "B")) +
ggtitle('Year: {frame_time}') +
transition_time(year)
dir.create("images", FALSE) # creates an images directory if needed
anim_save(filename = "images/gganim-demo.gif",
animation = anim,
width = 8, height = 5, units = "in", res = 150)
```
You can show your animated gif in an html report (animations don't work in Word or a PDF) using `include_graphics()`, or include the GIF in a dynamic document like PowerPoint.
```{r anim-demo, fig.cap="Animated gif."}
knitr::include_graphics("images/gganim-demo.gif")
```
::: {.warning data-latex=""}
There are actually not many plots that are really improved by animating them. The plot below gives the same information at a single glance.
```{r anim-alternative, echo = FALSE}
worldpop |>
ggplot(aes(x = year, y = pop, color = region)) +
geom_line(size = 2) +
geom_text(aes(label = region),
# only label last year
data = filter(worldpop, year == max(year)),
hjust = 0, # right-justify
nudge_x = 1, # move x-position 1 to the right
# nudge Europe up a little
nudge_y = c(0, 0, 1e8, 0, 0, 0)
) +
scale_x_continuous(name = "",
breaks = seq(1950, 1995, 5),
expand = expansion(add = c(1, 25))) +
scale_y_continuous(name = "Population (in billions)",
breaks = seq(0, 3e9, 1e9),
labels = paste0(0:3, "B"),
limits = c(0, 3.5e9)) +
scale_color_viridis_d() +
ggtitle("Population growth from 1950 to 1995") +
theme(legend.position = "none")
```
:::
### Resources {#resources-custom}
There are so many more options for data visualisation in R than we have time to cover here. The following resources will get you started on your journey to informative, intuitive visualisations.
* [The R Graph Gallery](http://www.r-graph-gallery.com/) (this is really useful)
* [Look at Data](https://socviz.co/lookatdata.html) from [Data Vizualization for Social Science](http://socviz.co/)
* [Graphs](http://www.cookbook-r.com/Graphs) in *Cookbook for R*
* [Top 50 ggplot2 Visualizations](http://r-statistics.co/Top50-Ggplot2-Visualizations-MasterList-R-Code.html)
* [R Graphics Cookbook](http://www.cookbook-r.com/Graphs/) by Winston Chang
* [ggplot extensions](https://exts.ggplot2.tidyverse.org/)
* [plotly](https://plot.ly/ggplot2/) for creating interactive graphs
* [Drawing Beautiful Maps Programmatically](https://r-spatial.org/r/2018/10/25/ggplot2-sf.html)
* [gganimate](https://gganimate.com/)
## Reports {#sec-custom-reports}
### Set-up {#setup-custom-reports}
1. Close the file `10-viz.qmd`
1. Create a new quarto file called `11-reports.qmd`
1. Update the YAML header
1. Replace the setup chunk with the one below:
```{r}
#| echo: fenced
#| label: setup
#| include: false
# packages needed for this chapter section
library(tidyverse) # data wrangling functions
library(bookdown) # for chaptered reports
library(flexdashboard) # for dashboards
library(DT) # for interactive tables
```
### Interactive tables
One way to make your reports more exciting is to use interactive tables. The `DT::datatable()` function displays a table with some extra interactive elements to allow readers to search and reorder the data, as well as controlling the number of rows shown at once. This can be especially helpful. This only works with HTML output types. The [DT website](https://rstudio.github.io/DT/) has extensive tutorials, but we'll cover the basics here.
```{r}
library(DT)
scotpop <- read_csv("data/scottish_population.csv",
show_col_types = FALSE)
datatable(data = scotpop)
```
You can customise the display, such as changing column names, adding a caption, moving the location of the filter boxes, removing row names, applying [classes](https://datatables.net/manual/styling/classes) to change table appearance, and applying [advanced options](https://datatables.net/reference/option/).
```{r}
# https://datatables.net/reference/option/
my_options <- list(
pageLength = 5, # how many rows are displayed
lengthChange = FALSE, # whether pageLength can change
info = TRUE, # text with the total number of rows
paging = TRUE, # if FALSE, the whole table displays
ordering = FALSE, # whether you can reorder columns
searching = FALSE # whether you can search the table
)
datatable(
data = scotpop,
colnames = c("County", "Population"),
caption = "The population of Scottish counties.",
filter = "none", # "none", "bottom" or "top"
rownames = FALSE, # removes the number at the left
class = "cell-border hover stripe", # default is "display"
options = my_options
)
```
::: {.try data-latex=""}
Create an interactive table like the one below from the `diamonds` dataset of diamonds where the `table` value is greater than 65 (the whole table is *much* too large to display with an interactive table). Show 20 items by default and remove the search box, but leave in the filter and other default options.
```{r, echo = FALSE}
my_options <- list(
pageLength = 20, # how many rows are displayed
searching = FALSE # whether you can search the table
)
diamonds |>
filter(table > 65) |>
select(-table, -(x:z)) |>
DT::datatable(
caption = "All diamonds with table > 65.",
options = my_options
)
```
```{r, webex.hide = TRUE, eval = FALSE}
my_options <- list(
pageLength = 20, # how many rows are displayed
searching = FALSE # whether you can search the table
)
diamonds |>
filter(table > 65) |>
select(-table, -(x:z)) |>
DT::datatable(
caption = "All diamonds with table > 65.",
options = my_options
)
```
:::
### Other formats
You can create more than just reports with R Markdown. You can also create presentations, interactive dashboards, books, websites, and web applications.
#### Presentations
You can choose a presentation template when you create a new R Markdown document. We'll use ioslides for this example, but the other formats work similarly.
```{r img-ioslides-template, echo=FALSE, fig.cap="Ioslides RMarkdown template."}
knitr::include_graphics("images/present/new-ioslides.png")
```
The main differences between this and the Rmd files you've been working with until now are that the `output` type in the `r glossary("YAML")` header is `ioslides_presentation` instead of `html_document` and this format requires a specific title structure. Each slide starts with a level-2 header.
The template provides you with examples of text, bullet point, code, and plot slides. You can knit this template to create an `r glossary("HTML")` document with your presentation. It often looks odd in the RStudio built-in browser, so click the button to open it in a web browser. You can use the space bar or arrow keys to advance slides.
The code below shows how to load some packages and display text, a table, and a plot. You can see the [HTML output here](demos/ioslides.html).
`r hide()`
```{embed, file = "demos/ioslides.Rmd"}
# readLines("demos/ioslides.Rmd") |>
# paste(collapse = "\n") |>
# paste0("<code style='font-size: smaller;'><pre>\n", ., "\n</pre></code>") |>
# cat()
```
`r unhide()`
#### Dashboards
Dashboards are a way to display text, tables, and plots with dynamic formatting. After you install `r pkg("flexdashboard")`, you can choose a flexdashboard template when you create a new R Markdown document.
```{r img-flx-template, echo=FALSE, fig.cap="Flexdashboard RMarkdown template."}
knitr::include_graphics("images/present/flexdashboard-template.png")
```
The code below shows how to load some packages, display two tables in a tabset, and display two plots in a column. You can see the [HTML output here](demos/flexdashboard.html).
`r hide()`
```{embed, file = "demos/flexdashboard.Rmd"}
# readLines("demos/flexdashboard.Rmd") |>
# paste(collapse = "\n") |>
# paste0("<code style='font-size: smaller;'><pre>\n", ., "\n</pre></code>") |>
# cat()
```
`r unhide()`
Change the size of your web browser to see how the boxes, tables and figures change.
The best way to figure out how to format a dashboard is trial and error, but you can also look at some [sample layouts](https://pkgs.rstudio.com/flexdashboard/articles/layouts.html).
#### Books
You can create online books with `r pkg("bookdown")`. In fact, the book you're reading was created using bookdown. After you download the package, start a new project and choose "Book project using bookdown" from the list of project templates.
```{r img-bookdown-template, echo=FALSE, fig.cap="Bookdown project template."}
knitr::include_graphics("images/present/bookdown.png")
```
Each chapter is written in a separate .Rmd file and the general book settings can be changed in the `_bookdown.yml` and `_output.yml` files.
#### Websites
You can create a simple website the same way you create any R Markdown document. Choose "Simple R Markdown Website" from the project templates to get started. See Appendix\ \@ref(webpages) for a step-by-step tutorial.
For more complex, blog-style websites, you can investigate [`r pkg("blogdown")`](https://bookdown.org/yihui/blogdown/). After you install this package, you will also be able to create template blogdown projects to get you started.
#### Shiny
To get truly interactive, you can take your R coding to the next level and learn Shiny. Shiny apps let your R code react to user input. You can do things like [make a word cloud](https://shiny.psy.gla.ac.uk/debruine/wordcloud/), [search a google spreadsheet](https://shiny.psy.gla.ac.uk/debruine/seen/), or [conduct a survey](https://shiny.psy.gla.ac.uk/debruine/question/).
This is well outside the scope of this class, but the skills you've learned here provide a good start. The free book [Building Web Apps with R Shiny](https://debruine.github.io/shinyintro/) by one of the authors of this book can get you started creating shiny apps.
### Resources {#sec-resources-report -}
* [RStudio Formats](https://rmarkdown.rstudio.com/formats.html)
* [R Markdown Cookbook](https://bookdown.org/yihui/rmarkdown-cookbook)
* [DT](https://rstudio.github.io/DT/)
* [Flexdashboard](https://pkgs.rstudio.com/flexdashboard/)
* [Bookdown](https://bookdown.org/yihui/bookdown/)
* [Blogdown](https://bookdown.org/yihui/blogdown/)
* [Shiny](https://shiny.rstudio.com/)
* [Building Web Apps with R Shiny](https://debruine.github.io/shinyintro/)