-
Notifications
You must be signed in to change notification settings - Fork 83
/
R_basics_2_data_and_functions_practise.Rmd
212 lines (130 loc) · 7.37 KB
/
R_basics_2_data_and_functions_practise.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
---
title: 'Practise R basics: data and functions'
output:
html_document: null
theme: united
toc: yes
---
# Completing your assignment in RMarkdown
For this week's exercise, that acompanies the R_basics_2_data_and_functions tutorial, you will be using RMarkdown. The file that you are currently reading is an RMarkdown file (.Rmd). This is type of file that combines two things:
* **Regular text**, such as the text you are reading now. RMarkdown is a [markup language](https://en.wikipedia.org/wiki/Markup_language), which means that we type certain instructions for how the text should be presented. For example, the hashtag before "Completing your assignment in RMarkdown" indicates that this is a level 1 header, and the asterisk in front of this paragraph indicates that this is a bulletpoint. The advantage is that this enables you to quickly type a document without worrying about lay-out, and afterwards we can create a pretty document (e.g., in html, docx or pdf) where we can still specify what layout we want to use (e.g., detemining the size of a level one header).
* **Code blocks**, in which you can type R code. If you *Knit* the file (we'll show how in a minute), this code will also be evaluated, and both the code and results will be printed in the document.
Together, these components make it easy to create reports. Since we are only storing the text in a markup language and the analysis in R syntax, we can easily update the document, or redo an analysis with new data. To use RMarkdown, you will first have to install the RMarkdown package, and it is recommended to get the latest version of knitr. For this, run the following commands (in your console, in an R script, or directly from within the following codeblock):
```{r, eval=F}
install.packages('rmarkdown')
install.packages('knitr')
```
You can do many things with RMarkdown, but for this assignment we'll stick to the basics. Below you will find templates for completing your assignments. For each assignment, you will have to enter your code in the designated code blocks. Eventually, you will *Knit* your file to a .html or .doc file. We'll start with an example.
# Example of assignment template
For each assignment you'll get a codeblock, which looks like this:
```{r}
x = 5
x
```
The R code that you will put into the codeblock will automatically be executed when you Knit the document. Any lines of code that produce output in your console, such as the single x at the end of the above codeblock, will now produce output in you document. If you have installed RMarkdown as instructed above, you will see a button *Knit* at the top of this window. Press this button now to *Knit* the current .Rmd file. You can also click on the dropdown button just right of *Knit* to select an output format.
When working on your assignments, it is recommended to *Knit* regularly to see if your code works. If your code produces errors the document won't be generated, and its easier to fix mistake immediatly than to trace back where it went wrong (though the error messages will point you in the right direction).
# Assignment 1: Data types
### 1.A
Complete the steps in the bulletpoints in the codeblock directly below.
* Assign the number `50` to the name `x`
* Assign the number `30` to the name `y`
* Add `x` to `y`
```{r}
```
### 1.B
The following line of code assigns the number `100` as a character value to the name `x`. Add code in which you:
* Transform x to a numerical value
* Multiply x by 2
```{r}
x = "100"
```
### 1.C
You are given a character vector `name`. Now:
* Transform `name` to a `factor` type
* Show the levels/labels of `name`
```{r}
name = c('Alice','Alice','Alice','Bob','Bob','Carol')
```
### 1.D
You are given a numeric vector `age`, that contains the ages of 10 people, that conveniently happens to count up from 15 to 24. Use comparisons to create a logical vector that shows:
* People older than 18
* People younger than 21
* People older than 18 and younger than 21
* People younger than 18 or older than 21
```{r}
age = c(15,16,17,18,19,20,21,22,23,24)
age >= 20 ## example
```
### 1.E
Fill in the format argument in the following `strptime()` functions to parse the date properly.
```{r}
strptime('1961-12-24', format='%Y-%m-%d') ## example
strptime('25 12, 1961', format='')
strptime('1961-12-26T19:00:00', format='')
```
### 1.F
Fill in the format in the following `strftime()` functions to tell:
* The year
* The year and day
* The time
```{r}
x = strptime('2012/01/01 20:15:00', format='%Y/%m/%d %H:%M:%S')
strftime(x, format='')
strftime(x, format='')
strftime(x, format='')
```
# Assignment 2: Data structures
### 2.A
Given the character vector `x`, use selection to return the following subsets.
* All elements in the positions three to twenty
* All elements except for those in the positions three to twenty
* The elements in reversed order
```{r}
x = letters ## built-in vector of alphabet letters
x[1:5] ## example
```
### 2.B
You are given a numeric vector `x`, and are shown how to calculate the mean and standard deviation.
* Calculate the [Z-score](https://en.wikipedia.org/wiki/Standard_score) for all values of x.
```{r}
x = c(20,10,35,23,27,16,29,35,27,35,25,16,5,12,34,16,25,34,17,37,24,29)
mean(x)
sd(x)
```
### 2.C
R has some built-in data.frames for testing, such as the famous [iris](https://en.wikipedia.org/wiki/Iris_flower_data_set) data, about the length and width of the sepals and petals of three species of Iris. Use this data.frame, named `iris`, and:
* show the rows for which the sepal length is smaller than 4.5
* show the rows for which the petals are at least 8 times longer than they are wide
* show the sepal width column for the species "setosa" and "versicolor"
* Find the setosa flowers with a sepal length of at most 4.4, and rename the species to "sad setosa"
```{r}
head(iris) ## shows the top 6 rows of the iris data.frame
# tip for renaming: changing factors is a hassle. Changing them to character-vectors using "as.character()" is an easy way out
iris$Species = as.character(iris$Species) ## ignore factors (easier)
```
# Assignment 3: Functions
### 3.A
Here we create two vectors (V1 and V2). Now, use the `data.frame()` function to:
* Create a data.frame using the V1 and V2 vectors as columns, where V1 is named "age" and V2 is named "gender"
* Create the same data.frame, but this time use arguments to prevent R from converting strings (in the "gender" variable) to factors
```{r}
V1 = floor(rnorm(10, mean = 21, sd = 3)) ## random age
V2 = sample(c('Male','Female'), 10, replace=T) ## random gender
```
### 3.B
This time, you will be using a new function, so we recommend looking up the documentation. Given the iris data, use the `cor.test()` function to calculate:
* The correlation between Petal.Length and Petal.Width, using the default settings.
* The correlation between Petal.Length and Petal.Width, but this time using the "spearman" correlation method.
```{r}
head(iris) ## shows top 6 rows, for reference
## reminder: use iris$... to get a column vector
```
### 3.C
Use the `paste()` function to:
* Paste the vectors `x` and `y` together, separated by a dot (1.a, 2.b, etc.)
* Collapse the elements in y, separated by
In addition, look at the documentation of the `paste()` function and explain (answer below the codeblock) what the difference is between `paste` and `paste0`.
```{r}
x = 1:5
y = c('a','b','c','d','e')
```