forked from swcarpentry/r-novice-gapminder
-
Notifications
You must be signed in to change notification settings - Fork 0
/
10-control-flow.html
272 lines (267 loc) · 16.5 KB
/
10-control-flow.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8">
<meta name="generator" content="pandoc">
<title>Software Carpentry: R for reproducible scientific analysis</title>
<link rel="shortcut icon" type="image/x-icon" href="/favicon.ico" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<link rel="stylesheet" type="text/css" href="css/bootstrap/bootstrap.css" />
<link rel="stylesheet" type="text/css" href="css/bootstrap/bootstrap-theme.css" />
<link rel="stylesheet" type="text/css" href="css/swc.css" />
<link rel="alternate" type="application/rss+xml" title="Software Carpentry Blog" href="http://software-carpentry.org/feed.xml"/>
<meta charset="UTF-8" />
<!-- HTML5 shim, for IE6-8 support of HTML5 elements -->
<!--[if lt IE 9]>
<script src="http://html5shim.googlecode.com/svn/trunk/html5.js"></script>
<![endif]-->
</head>
<body class="lesson">
<div class="container card">
<div class="banner">
<a href="http://software-carpentry.org" title="Software Carpentry">
<img alt="Software Carpentry banner" src="img/software-carpentry-banner.png" />
</a>
</div>
<article>
<div class="row">
<div class="col-md-10 col-md-offset-1">
<a href="index.html"><h1 class="title">R for reproducible scientific analysis</h1></a>
<h2 class="subtitle">Control flow</h2>
<section class="objectives panel panel-warning">
<div class="panel-heading">
<h2 id="learning-objectives"><span class="glyphicon glyphicon-certificate"></span>Learning Objectives</h2>
</div>
<div class="panel-body">
<ul>
<li>Write conditional statements with <code>if</code> and <code>else</code>.</li>
<li>Write and understand <code>for</code> loops.</li>
</ul>
</div>
</section>
<p>Often when we’re coding we want to control the flow of our actions. This can be done by setting actions to occur only if a condition or a set of conditions are met. Alternatively, we can also set an action to occur a particular number of times.</p>
<p>There are several ways you can control flow in R. For conditional statements, the most commonly used approaches are the constructs:</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="co"># if</span>
if (condition is true) {
perform action
}
<span class="co"># if ... else</span>
if (condition is true) {
perform action
} else { <span class="co"># that is, if the condition is false,</span>
perform alternative action
}</code></pre></div>
<p>Say, for example, that we want R to print a message if a variable <code>x</code> has a particular value:</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="co"># sample a random number from a Poisson distribution</span>
<span class="co"># with a mean (lambda) of 8</span>
x <-<span class="st"> </span><span class="kw">rpois</span>(<span class="dv">1</span>, <span class="dt">lambda=</span><span class="dv">8</span>)
if (x >=<span class="st"> </span><span class="dv">10</span>) {
<span class="kw">print</span>(<span class="st">"x is greater than or equal to 10"</span>)
}
x</code></pre></div>
<pre class="output"><code>[1] 8
</code></pre>
<p>Note you may not get the same output as your neighbour because you may be sampling different random numbers from the same distribution.</p>
<p>Let’s set a seed so that we all generate the same ‘pseudo-random’ number, and then print more information:</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">x <-<span class="st"> </span><span class="kw">rpois</span>(<span class="dv">1</span>, <span class="dt">lambda=</span><span class="dv">8</span>)
if (x >=<span class="st"> </span><span class="dv">10</span>) {
<span class="kw">print</span>(<span class="st">"x is greater than or equal to 10"</span>)
} else if (x ><span class="st"> </span><span class="dv">5</span>) {
<span class="kw">print</span>(<span class="st">"x is greater than 5"</span>)
} else {
<span class="kw">print</span>(<span class="st">"x is less than 5"</span>)
}</code></pre></div>
<pre class="output"><code>[1] "x is greater than 5"
</code></pre>
<aside class="callout panel panel-info">
<div class="panel-heading">
<h2 id="tip-pseudo-random-numbers"><span class="glyphicon glyphicon-pushpin"></span>Tip: pseudo-random numbers</h2>
</div>
<div class="panel-body">
<p>In the above case, the function <code>rpois</code> generates a random number following a Poisson distribution with a mean (i.e. lambda) of 8. The function <code>set.seed</code> guarantees that all machines will generate the exact same ‘pseudo-random’ number (<a href="http://en.wikibooks.org/wiki/R_Programming/Random_Number_Generation">more about pseudo-random numbers</a>). So if we <code>set.seed(10)</code>, we see that <code>x</code> takes the value 8. You should get the exact same number.</p>
</div>
</aside>
<p><strong>Important:</strong> when R evaluates the condition inside <code>if</code> statements, it is looking for a logical element, i.e., <code>TRUE</code> or <code>FALSE</code>. This can cause some headaches for beginners. For example:</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">x <-<span class="st"> </span><span class="dv">4</span> ==<span class="st"> </span><span class="dv">3</span>
if (x) {
<span class="st">"4 equals 3"</span>
}</code></pre></div>
<p>As we can see, the message was not printed because the vector x is <code>FALSE</code></p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">x <-<span class="st"> </span><span class="dv">4</span> ==<span class="st"> </span><span class="dv">3</span>
x</code></pre></div>
<pre class="output"><code>[1] FALSE
</code></pre>
<section class="challenge panel panel-success">
<div class="panel-heading">
<h2 id="challenge-1"><span class="glyphicon glyphicon-pencil"></span>Challenge 1</h2>
</div>
<div class="panel-body">
<p>Use an <code>if</code> statement to print a suitable message reporting whether there are any records from 2002 in the <code>gapminder</code> dataset. Now do the same for 2012.</p>
</div>
</section>
<p>Did anyone get a warning message like this?</p>
<pre class="error"><code>Warning in if (gapminder$year == 2012) {: the condition has length > 1 and
only the first element will be used
</code></pre>
<p>If your condition evaluates to a vector with more than one logical element, the function <code>if</code> will still run, but will only evaluate the condition in the first element. Here you need to make sure your condition is of length 1.</p>
<blockquote>
<h2 id="tip-any-and-all">Tip: <code>any</code> and <code class="callout">all</code></h2>
<p>The <code>any</code> function will return TRUE if at least one TRUE value is found within a vector, otherwise it will return <code>FALSE</code>. This can be used in a similar way to the <code>%in%</code> operator. The function <code>all</code>, as the name suggests, will only return <code>TRUE</code> if all values in the vector are <code>TRUE</code>.</p>
</blockquote>
<h2 id="repeating-operations">Repeating operations</h2>
<p>If you want to iterate over a set of values, when the order of iteration is important, and perform the same operation on each, a <code>for</code> loop will do the job. We saw <code>for</code> loops in the shell lessons earlier. This is the most flexible of looping operations, but therefore also the hardest to use correctly. Avoid using <code>for</code> loops unless the order of iteration is important: i.e. the calculation at each iteration depends on the results of previous iterations.</p>
<p>The basic structure of a <code>for</code> loop is:</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">for(iterator in set of values){
do a thing
}</code></pre></div>
<p>For example:</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">for(i in <span class="dv">1</span>:<span class="dv">10</span>){
<span class="kw">print</span>(i)
}</code></pre></div>
<pre class="output"><code>[1] 1
[1] 2
[1] 3
[1] 4
[1] 5
[1] 6
[1] 7
[1] 8
[1] 9
[1] 10
</code></pre>
<p>The <code>1:10</code> bit creates a vector on the fly; you can iterate over any other vector as well.</p>
<p>We can use a <code>for</code> loop nested within another <code>for</code> loop to iterate over two things at once.</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">for (i in <span class="dv">1</span>:<span class="dv">5</span>){
for(j in <span class="kw">c</span>(<span class="st">'a'</span>, <span class="st">'b'</span>, <span class="st">'c'</span>, <span class="st">'d'</span>, <span class="st">'e'</span>)){
<span class="kw">print</span>(<span class="kw">paste</span>(i,j))
}
}</code></pre></div>
<pre class="output"><code>[1] "1 a"
[1] "1 b"
[1] "1 c"
[1] "1 d"
[1] "1 e"
[1] "2 a"
[1] "2 b"
[1] "2 c"
[1] "2 d"
[1] "2 e"
[1] "3 a"
[1] "3 b"
[1] "3 c"
[1] "3 d"
[1] "3 e"
[1] "4 a"
[1] "4 b"
[1] "4 c"
[1] "4 d"
[1] "4 e"
[1] "5 a"
[1] "5 b"
[1] "5 c"
[1] "5 d"
[1] "5 e"
</code></pre>
<p>Rather than printing the results, we could write the loop output to a new object.</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">output_vector <-<span class="st"> </span><span class="kw">c</span>()
for (i in <span class="dv">1</span>:<span class="dv">5</span>){
for(j in <span class="kw">c</span>(<span class="st">'a'</span>, <span class="st">'b'</span>, <span class="st">'c'</span>, <span class="st">'d'</span>, <span class="st">'e'</span>)){
temp_output <-<span class="st"> </span><span class="kw">paste</span>(i, j)
output_vector <-<span class="st"> </span><span class="kw">c</span>(output_vector, temp_output)
}
}
output_vector</code></pre></div>
<pre class="output"><code> [1] "1 a" "1 b" "1 c" "1 d" "1 e" "2 a" "2 b" "2 c" "2 d" "2 e" "3 a"
[12] "3 b" "3 c" "3 d" "3 e" "4 a" "4 b" "4 c" "4 d" "4 e" "5 a" "5 b"
[23] "5 c" "5 d" "5 e"
</code></pre>
<p>This approach can be useful, but ‘growing your results’ (building the result object incrementally) is computationally inefficient, so avoid it when you are iterating through a lot of values.</p>
<aside class="callout panel panel-info">
<div class="panel-heading">
<h2 id="tip-dont-grow-your-results"><span class="glyphicon glyphicon-pushpin"></span>Tip: don’t grow your results</h2>
</div>
<div class="panel-body">
<p>One of the biggest things that trips up novices and experienced R users alike, is building a results object (vector, list, matrix, data frame) as your for loop progresses. Computers are very bad at handling this, so your calculations can very quickly slow to a crawl. It’s much better to define an empty results object before hand of the appropriate dimensions. So if you know the end result will be stored in a matrix like above, create an empty matrix with 5 row and 5 columns, then at each iteration store the results in the appropriate location.</p>
</div>
</aside>
<p>A better way is to define your (empty) output object before filling in the values. For this example, it looks more involved, but is still more efficient.</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">output_matrix <-<span class="st"> </span><span class="kw">matrix</span>(<span class="dt">nrow=</span><span class="dv">5</span>, <span class="dt">ncol=</span><span class="dv">5</span>)
j_vector <-<span class="st"> </span><span class="kw">c</span>(<span class="st">'a'</span>, <span class="st">'b'</span>, <span class="st">'c'</span>, <span class="st">'d'</span>, <span class="st">'e'</span>)
for (i in <span class="dv">1</span>:<span class="dv">5</span>){
for(j in <span class="dv">1</span>:<span class="dv">5</span>){
temp_j_value <-<span class="st"> </span>j_vector[j]
temp_output <-<span class="st"> </span><span class="kw">paste</span>(i, temp_j_value)
output_matrix[i, j] <-<span class="st"> </span>temp_output
}
}
output_vector2 <-<span class="st"> </span><span class="kw">as.vector</span>(output_matrix)
output_vector2</code></pre></div>
<pre class="output"><code> [1] "1 a" "2 a" "3 a" "4 a" "5 a" "1 b" "2 b" "3 b" "4 b" "5 b" "1 c"
[12] "2 c" "3 c" "4 c" "5 c" "1 d" "2 d" "3 d" "4 d" "5 d" "1 e" "2 e"
[23] "3 e" "4 e" "5 e"
</code></pre>
<aside class="callout panel panel-info">
<div class="panel-heading">
<h2 id="tip-while-loops"><span class="glyphicon glyphicon-pushpin"></span>Tip: While loops</h2>
</div>
<div class="panel-body">
<p>Sometimes you will find yourself needing to repeat an operation until a certain condition is met. You can do this with a <code>while</code> loop.</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">while(this condition is true){
do a thing
}</code></pre></div>
<p>As an example, here’s a while loop that generates random numbers from a uniform distribution (the <code>runif</code> function) between 0 and 1 until it gets one that’s less than 0.1.</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">z <-<span class="st"> </span><span class="dv">1</span>
while(z ><span class="st"> </span><span class="fl">0.1</span>){
z <-<span class="st"> </span><span class="kw">runif</span>(<span class="dv">1</span>)
<span class="kw">print</span>(z)
}</code></pre></div>
<p><code>while</code> loops will not always be appropriate. You have to be particularly careful that you don’t end up in an infinite loop because your condition is never met.</p>
</div>
</aside>
<section class="challenge panel panel-success">
<div class="panel-heading">
<h2 id="challenge-2"><span class="glyphicon glyphicon-pencil"></span>Challenge 2</h2>
</div>
<div class="panel-body">
<p>Compare the objects output_vector and output_vector2. Are they the same? If not, why not? How would you change the last block of code to make output_vector2 the same as output_vector?</p>
</div>
</section>
<section class="challenge panel panel-success">
<div class="panel-heading">
<h2 id="challenge-3"><span class="glyphicon glyphicon-pencil"></span>Challenge 3</h2>
</div>
<div class="panel-body">
<p>Write a script that loops through the <code>gapminder</code> data by continent and prints out whether the mean life expectancy is smaller or larger than 50 years.</p>
</div>
</section>
<section class="challenge panel panel-success">
<div class="panel-heading">
<h2 id="challenge-4"><span class="glyphicon glyphicon-pencil"></span>Challenge 4</h2>
</div>
<div class="panel-body">
<p>Modify the script from Challenge 4 to also loop over each country. This time print out whether the life expectancy is smaller than 50, between 50 and 70, or greater than 70.</p>
</div>
</section>
<section class="challenge panel panel-success">
<div class="panel-heading">
<h2 id="challenge-5---advanced"><span class="glyphicon glyphicon-pencil"></span>Challenge 5 - Advanced</h2>
</div>
<div class="panel-body">
<p>Write a script that loops over each country in the <code>gapminder</code> dataset, tests whether the country starts with a ‘B’, and graphs life expectancy against time as a line graph if the mean life expectancy is under 50 years.</p>
</div>
</section>
</div>
</div>
</article>
<div class="footer">
<a class="label swc-blue-bg" href="http://software-carpentry.org">Software Carpentry</a>
<a class="label swc-blue-bg" href="https://github.com/swcarpentry/lesson-template">Source</a>
<a class="label swc-blue-bg" href="mailto:admin@software-carpentry.org">Contact</a>
<a class="label swc-blue-bg" href="LICENSE.html">License</a>
</div>
</div>
<!-- Javascript placed at the end of the document so the pages load faster -->
<script src="http://software-carpentry.org/v5/js/jquery-1.9.1.min.js"></script>
<script src="css/bootstrap/bootstrap-js/bootstrap.js"></script>
</body>
</html>