-
Notifications
You must be signed in to change notification settings - Fork 3
/
Copy pathCreating_functions.Rmd
396 lines (280 loc) · 10.9 KB
/
Creating_functions.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
# (PART) Advanced topics {-}
# Functional programming: making and using your own functions
Why on earth would you create your own function?
It can be useful to make your own function
## Set up
We will use the `tidyverse` and `purrr` in this chapter.
```{r setup}
library(tidyverse)
library(purrr)
```
```{r options, include = FALSE}
options(scipen = 999)
```
## Defining simple functions
Like your data, a function is an `object` that is defined.
Let's say you wanted to take whatever value you had and add one to it.
We could define a function called `add_one` to do this:
```{r simple_fun1}
add_one <- function(x) {
x + 1
}
```
There are four elements to what we just did:
1. Created a new function using the `function()` function.
1. Defined one argument (input) to the new function: `x`
1. Gave the function instructions to 'take `x` and add one': `x + 1`.
1. Assigned this function to the object `add_one`.
Now that you've done all that, you can call the function like you would any other:
```{r simple_fun2}
add_one(1)
add_one(2)
add_one(100)
```
You can define functions can take more than one argument:
```{r simple_fun3}
add_together_plus_one <- function(x, y, z) {
x + y + z + 1
}
add_together_plus_one(1, 2, 3)
```
As the above takes three arguments, and there are no defaults provided,
we'll get an error if we fotget one:
```{r simple_fun4, error=TRUE}
add_together_plus_one(1, 2)
```
So you can also provide _default_ values to your function when you define it:
```{r simple_fun5}
make_power <- function(x, n = 2) {
x^n
}
```
If you don't provide a value for `n` when you call the `make_power` function,
it will default to `2`:
```{r power1}
make_power(10)
```
And you can override that by providing your own value:
```{r power2}
make_power(10, 4)
```
We can also assign the result of this function (`10000` above) to
an object of its own:
```{r power3}
my_power_number <- make_power(10, 4)
```
For this reason, and a few others described in a [section to follow](#functions-returned), a function
can only return one _thing_ (a number, or a vector, or dataset, or list, and so on).
## Using conditional statements and categorical arguments {#functions-conditionals}
Sometimes you will want your function to behave differently under different
circumstances. For example, you might want to do one thing if your input is a
<b style='font-size:bigger'>REALLY BIG</b> number, and another if it's <span style='font-size:smaller'>very small</span>.
Conditional `if` statements -- which evaluate an expression and proceed _if_ `TRUE` -- can be useful for these occasions.
The function below takes one argument -- `x` -- and transforms it depending on how large it is:
```{r cond1}
make_smaller <- function(x) {
if (x > 10) {
return_number <- x - 10
}
if (x <= 10) {
return_number <- x - 5
}
return_number
}
```
If the input number is greater than 10, the `make_smaller` function will take
`10` off it; if it's 10 or less, it will just take `5` off:
```{r cond2}
make_smaller(7)
make_smaller(13)
```
Making use of the `return()` function could make this clearer.
If `x` is greater than `10`, the `make_smaller` function will now subtract `10`
and immediately return the value, ignoring everything else below it:
```{r cond3}
make_smaller <- function(x) {
if (x > 10) {
return(x - 10)
}
if (x <= 10) {
return(x - 5)
}
}
make_smaller(7)
make_smaller(13)
```
These conditional statements can be used for input options to your functions.
Let's say you had a vector of ages of people in your office, `office_ages`:
```{r cond4}
office_ages <- c(-6, 12, 21, 36, 56, 67, 200)
office_ages
```
To summarise your office by age, you might want a function that would round each age to the nearest `10`. You could make a function that rounds a number to the nearest `10`,
using the `round()` function with `digits` set to `-1` (i.e. round to the nearest `10`):
```{r cond5}
make_age10 <- function(age) {
round(age, digits = -1)
}
make_age10(office_ages)
```
Perfect. But some of those ages look implausible, and you might also want your function to validate them, by, say, capping ages to be between zero and `100`. You could let `validate_ages` be an argument, defaulting to `TRUE`, and _if it is `TRUE`_, then you could perform the validation:
```{r cond6}
make_age10 <- function(age,
validate_ages = TRUE) {
# First, validate ages IF ASKED FOR
if (validate_ages) {
age <- if_else(age > 100, 100, age)
age <- if_else(age < 0, 0, age)
}
# Then round ages to the nearest 10:
round(age, digits = -1)
}
```
Now, if `validate_ages == TRUE` (the default), the numbers over `100` will be replaced with `100`, and those less than `0` with `0`:
```{r cond7}
make_age10(office_ages)
```
And you can turn that behaviour off by setting `validate_ages` to `FALSE`:
```{r cond8}
make_age10(office_ages, validate_ages = FALSE)
```
However, we might not trust an age entry of `-6` or `200` at all, and might want to give the user the option to remove them rather than assume they are `0` or `100`. We could provide that option in one of two ways.
The first is to add another argument, shown below.
```{r cond9}
make_age10 <- function(age,
validate_ages = TRUE,
remove_implausible = FALSE) {
# First, validate ages
if (validate_ages) {
if (remove_implausible) {
# Replace implausible ages with NAs
age <- if_else(age > 100 | age < 0, NA_real_, age)
}
if (!remove_implausible) {
# Replace implausible ages with their nearest plausible age
age <- if_else(age > 100, 100, age)
age <- if_else(age < 0, 0, age)
}
}
# Then round ages to the nearest 10:
round(age, digits = -1)
}
```
This is fine! By default, the function still works like it did previously:
```{r cond10}
make_age10(office_ages, validate_ages = TRUE)
```
And if the user wants to validate ages **and** they want to remove the
implausible ages, then they will be replaced with `NA`:^[Note that when _creating_
an `NA` value, like we've done above, we have to specify what _type_ of `NA`
value it is: a number `NA_real_`, an integer `NA_integer`, or a character
`NA_character_`.]
```{r cond11}
make_age10(office_ages, validate_ages = TRUE, remove_implausible = TRUE)
```
But those nested conditional statements -- which are often
needed! -- can be a bit of a headache.
There is another way we can do this, which might be neater in this instance.
The `validate_ages` argument could take a character string that tells it which
validation method to use. We could use `"remove"` and `"adjust"` to indicate the
validation methods:
```{r cond12}
make_age10 <- function(age,
validate_ages = "remove") {
# First, validate ages
if (validate_ages == "remove") {
age <- if_else(age > 100 | age < 0, NA_real_, age)
}
if (validate_ages == "adjust") {
# Replace implausible ages with their nearest plausible age
age <- if_else(age > 100, 100, age)
age <- if_else(age < 0, 0, age)
}
# Then round ages to the nearest 10:
round(age, digits = -1)
}
```
Now we can use the `validate_ages` argument by itself to get the results we want:
```{r cond13}
make_age10(office_ages, validate_ages = "adjust")
make_age10(office_ages, validate_ages = "remove")
```
## What is 'returned' from a function? {#functions-returned}
A function can do lots of things in the background.
For example, you might want to take a vector, square every number,
and then add all those numbers up:
```{r return1}
sum_squares <- function(x) {
# first, add one to each using the function we defined above
added <- add_one(x)
# then sum all the numbers in the vector
summed <- sum(added)
# then return the summed object
summed
}
```
Running that function on a vector of numbers \(1, 2, 3, ..., 10\) (created with `1:10`) does what we want:
```{r return2}
sum_squares(1:10)
```
But look in your **Environment** window. The two objects that were created in the
function -- `added` and `summed` -- aren't there! They are instead calculated,
stored in the background, and removed when the function is finished.
A function only returns **one thing**; everything else that is created in it
is discarded.^[Unless they are explicitly stored in your environment, but this is not recommended.]
This behaviour keeps your environment clean and tidy, but it can cause some frustration when you're getting started.
That **one thing** that is returned is -- by default -- the last thing printed in
the function. _this is a bad explanation that I need to make better_
For `sum_squares` above, we defined two objects and then passed the `summed` to
the end of the function. If we omitted the last step, the function wouldn't return _anything_:
```{r return3}
empty_sum_squares <- function(x) {
# first, add one to each using the function we defined above
added <- add_one(x)
# then sum all the numbers in the vector
summed <- sum(added)
}
empty_sum_squares(1:10)
```
The `empty_sum_squares` function took the `1:10` vector, then added one, then
summed the resulting numbers. But it didn't _return_ anything.
It just assigned values to the `added` and `summed` objects, then the function
finished and those objects vanished.
The `return()` function can help you make this behaviour clear. Using `return()`
will stop your function in its tracks and pass the object out of the function.
We can use it in the `sum_squares` function:
```{r return4}
sum_squares <- function(x) {
# first, add one to each using the function we defined above
added <- add_one(x)
# then sum all the numbers in the vector
summed <- sum(added)
# then return the summed object
return(summed) # function stops here!
}
```
Ensuring your function _returns_ the object you want in the form you want is the second step in writing your own functions.
## Using functions on datasets within `mutate()`
Recall that when you are adding (or changing) a variable in a dataset with `mutate`, the length of the output must be either the length of the dataset or one (which is then repeated). Anything else will throw an error:
```{r, error=TRUE}
office_df <- tibble(office_ages)
office_df
# A single element will be repeated:
office_df %>%
mutate(fave_colour = "#F68B33")
# A vector of length of the dataset is fine:
office_df %>%
mutate(fave_number = c(10, 4, 1, 4, 0, 99, 100))
# BUT a vector of a different length will cause an error:
office_df %>%
mutate(fave_lunch = c("pasta salad", "a variety"))
```
In [the section above](#functions-conditionals), we created the `make_age10` function and applied it to our little vector of ages. The function took a vector and returned a vector of the same length:
```{r same_length}
office_ages %>% length()
make_age10(office_ages) %>% length()
```
## Using the `purrr` family of functions
_Section to be finished_; see https://github.com/grattan/R_at_Grattan/issues/59
## Using functions for visualisations
## Sharing your useful functions with Grattan and the world