-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathindex.html
1373 lines (959 loc) · 43.7 KB
/
index.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
<!DOCTYPE html>
<html lang="" xml:lang="">
<head>
<title>R Package Development for Research</title>
<meta charset="utf-8" />
<meta name="author" content="Valentin Lucet" />
<meta name="date" content="2021-08-16" />
<script src="index_files/header-attrs/header-attrs.js"></script>
<link href="index_files/tile-view/tile-view.css" rel="stylesheet" />
<script src="index_files/tile-view/tile-view.js"></script>
<script src="index_files/clipboard/clipboard.min.js"></script>
<link href="index_files/shareon/shareon.min.css" rel="stylesheet" />
<script src="index_files/shareon/shareon.min.js"></script>
<link href="index_files/xaringanExtra-shareagain/shareagain.css" rel="stylesheet" />
<script src="index_files/xaringanExtra-shareagain/shareagain.js"></script>
<script src="index_files/js-cookie/js.cookie.js"></script>
<script src="index_files/peerjs/peerjs.min.js"></script>
<script src="index_files/tiny.toast/toast.min.js"></script>
<link href="index_files/xaringanExtra-broadcast/broadcast.css" rel="stylesheet" />
<script src="index_files/xaringanExtra-broadcast/broadcast.js"></script>
<script src="index_files/fabric/fabric.min.js"></script>
<link href="index_files/xaringanExtra-scribble/scribble.css" rel="stylesheet" />
<script src="index_files/xaringanExtra-scribble/scribble.js"></script>
<script>document.addEventListener('DOMContentLoaded', function() { window.xeScribble = new Scribble({"pen_color":["#FF0000"],"pen_size":3,"eraser_size":30,"palette":[]}) })</script>
<link href="index_files/animate.css/animate.xaringan.css" rel="stylesheet" />
<link href="index_files/panelset/panelset.css" rel="stylesheet" />
<script src="index_files/panelset/panelset.js"></script>
<link href="index_files/xaringanExtra-clipboard/xaringanExtra-clipboard.css" rel="stylesheet" />
<script src="index_files/xaringanExtra-clipboard/xaringanExtra-clipboard.js"></script>
<script>window.xaringanExtraClipboard(null, {"button":"Copy Code","success":"Copied!","error":"Press Ctrl+C to Copy"})</script>
<script src="index_files/xaringanExtra_fit-screen/fit-screen.js"></script>
<script src="index_files/xaringanExtra-webcam/webcam.js"></script>
<script id="xaringanExtra-webcam-options" type="application/json">{"width":"200","height":"200","margin":"1em"}</script>
<link rel="stylesheet" href="xaringan-themer.css" type="text/css" />
</head>
<body>
<textarea id="source">
class: top, left, title-slide
# R Package Development for Research
## <em>An Introduction</em>
### Valentin Lucet
### 2021-08-16
---
<!-- -->
## About me
<img align="right" width="210" height="210" alt="a picture of Val" src="images/val.jpeg">
* Val Lucet (he / him)
* __RSE__ at __Concordia University__ (lab of Dr. Eric Pedersen)
* __Data Scientist__ at __Environment Canada (ECCC)__ (Landscape Science Unit)
* Have been coding in R for ~ 7-8 years
* Maintaining 4 packages (2 on CRAN), contributed to 3 other
* B.Sc. (Hons) & M.Sc. McGill University (Biology)
* Website: [___vlucet.github.io___](https://vlucet.github.io/)
---
name: plan 1
## Workshop: Part 1 (now -> 10:30)
<!-- 9:30 - 9:35 : let people trickle in -->
<!-- 9:35 - 9:40 : personal introduction -->
1. __Research software in Eco-Evo__ (__~ 20 minutes__) <!-- 9:40 - 10:00 -->
* What is research software?
* To Package or not to Package, and alternatives to packaging
2. __From "spaghetti code" to "packaged" code__ (__~ 20 minutes__) <!-- 10:00 - 10:20 -->
* Tips and tricks to modularize your research code.
* How to write clean code.
=> __Break: 10 minutes__ <!-- 10:20 - 10:30 -->
---
name: plan 2
## Workshop: Parts 2 & 3 (10:30 -> 12:30)
3. __A package step by step with `usethis` and `devtools`__ (__~ 50 minutes__) <!-- 10:30 - 11:20 -->
* Let's create an example package step by step.
=> (__Break: 10 minutes__) <!-- 11:20 - 11:30 -->
4. __Package testing, maintenance and distribution: best practices__ (__~ 50 minutes__) <!-- 11:30 - 12:20 -->
* How to test, maintain, and distribute your package.
__Q & A__: If time allows, __~ 10 minutes__ (or on the conference app at any time)
---
name: prereqs
## Before we get started...
For today, it is recommendeded that you are already pretty well acquainted with __R__ and __Rstudio__. But, if at least you...
* Have written a few __functions with multiple arguments__ in R
* Know about "__project oriented workflow__" (for instance you use RStudio projects)
...then you have the bare minimum we need for today!
Knowledge of Git/Github is also welcome as I won't have the time to explain version control software.
```r
# Run this in R while we start.
pkgs <- c("devtools", "roxygen2", "usethis", "testthat",
"covr", "lubridate", "dplyr")
install.packages(pkgs)
```
---
name: refs
## On not reinventing the wheel...
There exists a __plethora of great resources on R package making__.
I have tried to __synthesize many freely available resources__.
_A full list of resources is included at the end of this workshop._
.center[
![Comic, 2 characters. The first character shows a picture of a wheel to a befuddled second character and says "I have a great idea!"](https://miro.medium.com/max/240/0*FiszegEhSoYnfqO9.png)]
---
class: inverse center middle
## Research software in Eco-Evo
---
## What is research software?
__Research software__ can be defined as any computer-based application that directly or indirectly supports users in a research task<sup>1</sup>.
For example:
- Any software packages used to conduct research
- Any script that employ these packages
- Any code or program used to manage and process data
- Any program that runs on data collection hardware
- Camera traps
- Temperature sensors
- etc.
Can include:
- Open source software
- Proprietary software (Matlab, ArcGIS, etc...)
.footnote[[1] [IGI Global](https://www.igi-global.com/dictionary/knowledge-visualization-for-research-design/69111)]
---
## Who writes research software?
__Research software__ is written by __research software engineers__
_A Research Software Engineer (RSE) combines professional software engineering expertise with an intimate understanding of research._<sup>1</sup>
A wider definition would also include, independently of whether they have received any formal training in software engineering:
- Any __HQP__ (Highly Trained Personnel) who writes code in a research context (grad student, post-doc, etc.)
- Research assistant, lab managers, undergraduate students involved in producing code
The RSE hat is often shared and worn by different people in a lab, and increasingly researchers are required to acquire RSE skills.
.footnote[[1] [RSE Society](https://society-rse.org/about/)]
---
## RSE & code "openness"
__Software & Code literacy__, i.e. the ability to write, test and deliver reproducible code, is an essential skill to solve the reproducibility crisis in Eco-Evo.
<a href="https://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.3000763"> <img src="images/low_avail.png" align="left" width="400" alt='Low availability of code in ecology: A call for urgent action (screenshot of a research paper)'/> </a>
<a href="https://www.cell.com/trends/ecology-evolution/fulltext/S0169-5347(15)00290-6?_returnURL=https%3A%2F%2Flinkinghub.elsevier.com%2Fretrieve%2Fpii%2FS0169534715002906%3Fshowall%3Dtrue"> <img src="images/elevating.png" width="250" alt='Elevating the status of code in ecology (screenshot of a research paper)'/> </a>
<a href="https://www.nature.com/articles/467753a"> <img src="images/publish_your_code.png" align="center" width="500" alt='Publish your computer code, it is good enough'/> </a>
---
## Open source software
<img src="https://imgs.xkcd.com/comics/dependency_2x.png" align="right" width="350" alt='XKCD comic, where "all modern digital infrastructure" is represented as a jenga tower whose stability completely relies on "a project some random person has been thanklessly maintaining since 2003"'/>
_Open source software is software with source code that anyone can inspect, modify, and enhance._<sup>1</sup>
Like most research, research in eco-evo __could not take place__ without OSS (think of the R language and all the add-on packages).
Good RSE is therefore guided by OSS principles, paradigms, and tools.
.footnote[[1] [opensource.com](https://opensource.com/resources/what-open-source)]
---
## FAIR Software<sup>1</sup>
1. __Findable__
- rich metadata + unique, persistent, identifier.
2. __Accessible__
- metadata is both machine AND human readable.
- deposited in trusted community approved repository.
3. __Interoperable__
- uses community accepted standards and platforms.
4. __Reusable__
- clear licence and documentation.
.footnote[[1] [library carpentry](https://librarycarpentry.org/Top-10-FAIR/2018/12/01/research-software/)]
---
## Is software a scientific contribution?
Research software is often __not recognized__ as a scientific contribution in its own right. Yet, research code:
- Represents a very significant part of a researcher's time.
- Advances our knowledge of the "how", just like research protocols.
- Should be easily citable to improve methods tractability.
Recognizing RS as a scientific contribution is not a vanity for RS developers: __code should under better scrutiny, and should be submitted to peer review__.
Who is reviewing code?
- Journals & reviewers (sometimes)
- [ROpenSci software peer review](https://ropensci.org/).
- [JOSS](https://joss.theoj.org/) and other software journals.
---
class: inverse center middle
# To Package or not to Package?
---
## Should your package exist?<sup>1</sup>
1. Will this piece of software really be useful to someone else than you (or to you in the future)?
2. Is your idea really new, or is there already a package out there performing the exact same task?
3. If there is one, would your implementation bring something to the market, e.g. an user-friendlier implementation? (and is this improvement worth introducing a divide in user base)
4. If there is already a similar package, should you work on your own or collaborate with the author(s) of the original package?
5. As a researcher, are you ready to commit to maintaining the package (fix bugs, provide continuous support) after release/publication. (re: contributing instead)
.footnote[[1] [M. Salmon's blog post](https://masalmon.eu/2017/12/11/goodrpackages/)]
---
## Private packages
_"Packaging doesn't have to be about sharing the code, it can just be to save yourself time."_<sup>1</sup>
For example, consider packaging scripts that:
- You have re-used more than once
- You have shared with students and/or collaborators
- Produces a specific analysis or figure that is often re-used within your lab
Consider starting a __lab package project__ where students are encouraged to contribute functions and document them.
.footnote[[1] [H. Parker's blog post](https://hilaryparker.com/2014/04/29/writing-an-r-package-from-scratch/)]
---
## Research compendiums
I want to mention __research compendiums__ as an alternative to writing and releasing a full package.
Compendiums are actually simplified R packages (with or without documented functions) with an extra `analysis/` folder and does not require you to go through all the steps of packaging.
_The goal of a research compendium is to provide a standard and easily recognizable way for organizing the digital materials of a project to enable others to inspect, reproduce, and extend the research._<sup>1</sup> (re: OSS principles)
<a href="https://github.com/benmarwick/rrtools"> <img src="images/rrtools.png" align="center" alt='rrtools: Tools for Writing Reproducible Research in R'/> </a>
.footnote[[1] [R for Reproducible Research course](https://annakrystalli.me/rrresearch/10_compendium.html)]
---
## What should a package contain?
Two principles<sup>1</sup> :
1. Make your package compatible with the workflow of your users.
2. Do not get too ambitious.
I would add, consider principles of the [Unix philosophy](https://en.wikipedia.org/wiki/Unix_philosophy):
1. Small building blocks (small functions)
2. "Do One Thing and Do It Well"
.footnote[[1] [M. Salmon's blog post](https://masalmon.eu/2017/12/11/goodrpackages/)]
---
## How to name your package
__Naming things is hard__. but here are a few pieces of advice<sup>1</sup> :
1. Use __lower cases__, this way your user doesn’t need to remember where the capital letters are and full capital letter name looks like screaming.
2. Use the R package `available`, to check whether the name is valid (no special characters, etc.), whether CRAN already hosts a package with the same name and searches the web for unintended meanings of the name.
3. If your package revolves around a __central function__ or interfaces with __another program__, consider using these as the name.
.footnote[[1] [M. Salmon's blog post](https://masalmon.eu/2017/12/11/goodrpackages/)]
---
class: inverse center middle
## From "spaghetti code" to "packaged" code
---
## Messy code
<!-- Messy code is not necessarily bad code, but one could argue that it is "unfinished code", no one took the time to clean it up. Why? talk about how this part of the research process might not be valorized as much. -->
Messy code is ubiquitous in research, BUT:
.center[
__Messy code ≠ Bad code__
]
In fact, this is closer to the truth:
.center[
__Messy code = Unfinished code__
]
Cleaning up research code should be considered an integral part of the research process. It is often the first step in any packaging endeavor.
Three aspects of code "cleanliness":
1. The way __it is organized__ (files)
2. The way __it looks__ (spaces, number of lines, etc.)
2. The way __it reads__ (measured in "WTFs per minute"<sup>1</sup> as you read it)
.footnote[[1] [Uncle Bob's Clean Code Lectures](https://www.youtube.com/watch?v=7EmboKQH8lM)]
---
## How to write clean (R) code
<a href="https://imgs.xkcd.com/comics/good_code.png"> <img src="https://imgs.xkcd.com/comics/good_code.png" align="right" width=300 alt="XKCD flow chart on how to write good code"/> </a>
- Use an __IDE__ (RStudio, VSCode, etc.).
- Learn to follow a style guide, for example [the tidyverse](https://style.tidyverse.org/) or [Google's](https://google.github.io/styleguide/Rguide.html).
- Name things well (re: naming is hard). Don't be afraid to use long, informative function and variable names.
- Write __functional__ code blocks, i.e. code that employs reusable functions, each function accomplishing one thing (recall the Unix's Do One Thing Well).
- Write useful comments: do not comment everything in the code, just the parts that actually need explaining.
---
## How to write clean (R) code (ct'd)
- Use tools to identify and fix style issues in code. The packages [`lintr`](https://github.com/jimhester/lintr) and [`goodpractice`](https://github.com/MangoTheCat/goodpractice) are a good way to start.
- Engage in code review:
- The best way to know if your code is clean is to have someone else take a look at it.
- Consider setting up code review sessions between students and/or collaborators. GitHub makes it easy to get started with reviewing code.
---
## Packaging your messy code
A simple recipe for packaging research scripts into a __proto-package__:
1. Identify __modules__. Modules are pieces of codes that stands on their own and that either:
- Achieve a specific task.
- Represent a discrete step in the workflow.
2. Establish the __expected behavior__ of each module (e.g. write tests)
2. "Refactor" modules into __functions__
3. __Rewrite__ your script to employ these functions.
4. Repeat the process __within__ each function.
5. Consider your code _modularized_ when each function Only Does One Thing (yes, this is subjective). Sometimes, using a max number of lines per function can be useful.
---
## Object-Oriented Programming
OOP is an approach to writing code where information is organized into __objects__.
In R, using objects will allow you to make the most of functions like `plot()` or `print()`. Pretty much everything you manipulate in R are in fact objects for which methods (i.e. _versions of a function written to operate on a given object_) have been written for these specific objects.
If your package requires you to organize and manipulate information with specific constraints, it might be relevant to use OOP design.
To learn more about OOP I would suggest reading:
- [Mastering Software Development in R](https://bookdown.org/rdpeng/RProgDA/object-oriented-programming.html)
- [Advanced R Book](https://adv-r.hadley.nz/oo.html)
---
class: inverse center middle
## A package step by step with {usethis} and {devtools}
---
## Get to know your R installation<sup>1</sup>
Packages live in your __library__
```r
R.home()
```
```
## [1] "/usr/lib/R"
```
```r
list.files(R.home())
```
```
## [1] "bin" "COPYING" "etc" "lib" "library" "modules" "site-library"
## [8] "SVN-REVISION"
```
.footnote[[1] [forwards workshops](https://librarycarpentry.org/Top-10-FAIR/2018/12/01/research-software/)]
---
## A note on package states
Packages have different __states__ at different stages of development.
<a href="https://r-pkgs.org/"> <img src="https://github.com/hadley/r-pkgs/blob/master/diagrams/installation.png?raw=true" align="center" alt='package states workflow'/> </a>
---
## The 4 pillars of R package dev
<!-- Describe the 4 packages -->
1. `devtools`
- The R developer swiss army knife: install packages from github, check and test packages, build documentation, etc...
2. `usethis`
- Automates many steps in the development workflow
3. `roxygen2`
- The backend of devtools's documentation routine. Generates html documentation from a specific markdowm format.
4. `testthat`
- The backend of devtools's testing routine. Functions to help with writing test and finding errors in the code.
---
## Let's make a simple R package
We are going to code a small package called `carbon` which will retrieve the data from the Mauna Loa observatory on carbon concentration in the atmosphere.
We will package 2 functions: (1) to retrieve the data, and (2) tells you the carbon concentration around a given date.
Therefore, the process we are going to go through touches on 2 aspects of RSE:
1. Obtaining and parsing data from an online source
2. Manipulating data
---
## Let's make a simple R package
Let's take a look at the main function:
```r
library(carbon)
ppm_from_date("1996-10-29")
```
```
## [1] 360.13
```
Let's look at
1. The base script I wrote to achieve this.
2. The functions I wrote to modularize the analysis.
3. The packaging process.
---
## The `carbon` base script
Step 1: getting the data
```r
library(lubridate)
library(dplyr)
weekly_data_url <-
"https://gml.noaa.gov/webdata/ccgg/trends/co2/co2_weekly_mlo.txt"
co2_data <- read.table(weekly_data_url, skip = 49)
names(co2_data) <- c("year", "month", "day", "year_decimal",
"co2_ppm", "nb_days", "1_year_ago",
"10_years_ago", "increase_since_1980")
co2_data_clean <- co2_data %>%
mutate(date = ymd(paste(year, month, day, sep = "-"))) %>%
select(-year, -month, -day) %>%
relocate(date)
```
---
## The `carbon` base script
Step 2: getting the closest ppm from a given date
```r
birthday <- ymd("1996-10-29")
ppm_from_date <- co2_data_clean %>%
mutate(diff = abs(date - birthday)) %>%
arrange(diff) %>%
slice(1) %>%
pull(co2_ppm)
```
---
## Translating code into functions
We now translate each step into a function. The function `get_weekly_data` reads in the data...
```r
get_weekly_data <- function(clean = TRUE) {
weekly_data_url <-
"https://gml.noaa.gov/webdata/ccgg/trends/co2/co2_weekly_mlo.txt"
co2_data <- read.table(weekly_data_url, skip = 49)
names(co2_data) <- c("year", "month", "day", "year_decimal",
"co2_ppm", "nb_days", "1_year_ago",
"10_years_ago", "increase_since_1980")
if (clean) {
co2_data <- co2_data %>%
dplyr::mutate(date = lubridate::ymd(paste(year, month, day,
sep = "-"))) %>%
dplyr::select(-year, -month, -day) %>%
dplyr::relocate(date)
}
return(co2_data)
}
```
---
## Translating code into functions
...and `ppm_from_date` takes in a date, calls `get_weekly_data` and figures out the closest ppm from a given date.
```r
ppm_from_date <- function(date) {
date_formatted <- lubridate::ymd(date)
co2_data_clean <- get_weekly_data(clean = TRUE)
ppm <- co2_data_clean %>%
dplyr::mutate(diff = abs(date - date_formatted)) %>%
dplyr::arrange(diff) %>%
dplyr::slice(1) %>%
dplyr::pull(co2_ppm)
return(ppm)
}
```
---
## A note about dependencies
Packages can rely on other packages (and sometimes other software) as long as those dependencies are correctly specified (we will come back to that).
For now, it is important to note that the functions we just wrote are always explicit about using other functions by using the double dot notations to access functions that are contained in other packages.
```r
dplyr::mutate
```
---
## Starting a new package with {usethis}
<!-- Show how to start a package and get the basics done -->
Open Rstudio and make sure that no projects are activated.
```r
usethis::create_package("~/carbon", open = TRUE)
```
```
✓ Creating '/home/vlucet/carbon/'
✓ Setting active project to '/home/vlucet/carbon'
✓ Creating 'R/'
✓ Writing 'DESCRIPTION'
Package: carbon
Title: What the Package Does (One Line, Title Case)
Version: 0.0.0.9000
Authors@R (parsed):
* First Last <first.last@example.com> [aut, cre] (YOUR-ORCID-ID)
Description: What the package does (one paragraph).
License: `use_mit_license()`, `use_gpl3_license()` or friends to
pick a license
Encoding: UTF-8
Roxygen: list(markdown = TRUE)
RoxygenNote: 7.1.1
```
---
## The Package anatomy
<!-- The anatomy of a package -->
Let's take a look at the pacakge anatomy...
.center[
![screenshot of a package folder](images/anatomy.png)
]
---
## Editing the DESCRIPTION file
```r
Package: carbon
Title: Get the CO2 ppm Value Closest to a Given Date
Version: 0.0.0.9000
Authors@R:
person(given = "Valentin",
family = "Lucet",
role = c("aut", "cre"),
email = "valentin.lucet@gmail.com")
Description: Retrieves weekly CO2 ppm from the Mauna Loa
observatory data hub, and provides a simple function that
retrieves the ppm value closest to a given date.
Encoding: UTF-8
Roxygen: list(markdown = TRUE)
RoxygenNote: 7.1.1
```
---
## Adding dependencies
All dependency packages that we accessed earlier with `::` need to be added to the DESCRIPTION file:
```r
usethis::use_package("lubridate")
usethis::use_package("dplyr")
```
Which adds a new section to DESCRIPTION:
```r
Imports:
lubridate,
dplyr
```
---
## Adding dependencies : %>% !
We make use of the `magrittr` pipe in our code, therefore we need to make sure our package specifies this dependency correctly.
```r
usethis::use_pipe()
```
`usethis` tells us what it did:
```r
✓ Adding 'magrittr' to Imports field in DESCRIPTION
✓ Writing 'R/utils-pipe.R'
```
---
## Adding code
It is good practice to organize code in __R/__ under different files.
Usually each file is a different function or set of functions that work together. For simplicity's sake, let's put our functions in different files.
We can use `use_r` to create code files.
```r
usethis::use_r("get_weekly_data")
usethis::use_r("ppm_from_date")
```
---
## Copy and paste code
.panelset[
.panel[.panel-name[`get_weekly_data`]
```r
get_weekly_data <- function(clean = TRUE) {
weekly_data_url <-
"https://gml.noaa.gov/webdata/ccgg/trends/co2/co2_weekly_mlo.txt"
co2_data <- read.table(weekly_data_url, skip = 49)
names(co2_data) <- c("year", "month", "day", "year_decimal",
"co2_ppm", "nb_days", "1_year_ago",
"10_years_ago", "increase_since_1980")
if (clean) {
co2_data <- co2_data %>%
dplyr::mutate(date = lubridate::ymd(paste(year, month, day,
sep = "-"))) %>%
dplyr::select(-year, -month, -day) %>%
dplyr::relocate(date)
}
return(co2_data)
}
```
]
.panel[.panel-name[`ppm_from_date`]
```r
ppm_from_date <- function(date) {
date_formatted <- lubridate::ymd(date)
co2_data_clean <- get_weekly_data(clean = TRUE)
ppm <- co2_data_clean %>%
dplyr::mutate(diff = abs(date - date_formatted)) %>%
dplyr::arrange(diff) %>%
dplyr::slice(1) %>%
dplyr::pull(co2_ppm)
return(ppm)
}
```
]
]
---
## Exporting functions
Once packaged, not all functions might be available to the user. This is by design: you might want to write helper functions and sub routines that your users have no use for. You need to explicitly decide which functions to "export" for the user.
To do so we need to add an "export tag" to the function file. We are using the `roxygen` documentation framework, which means our function file must look like that:
```r
#' @export
ppm_from_date <- function(date) {
...
}
```
---
## Exporting functions
We now need to run `roxygen` to confirm the function export.
```r
devtools::document()
```
This will edit the package NAMESPACE file which now looks like:
```
# Generated by roxygen2: do not edit by hand
export(ppm_from_date)
```
The function `ppm_from_date` is now set up for exportation.
---
## Documenting functions
```
#' Get the CO2 ppm value closest to a given date
#'
#' Given a date, retrieves the atmospheric CO2 concentration value
#' the closest to that date, using the data from the Mauna Loa
#' observatory.
#'
#' @param date A character string of the format "yyyy-mm-dd"
#'
#' @return
#' A floating point value, the closest data point to the input date
#'
#' @references
#' NOAA's [Global Monitoring Laboratory data hub](https://gml.noaa.gov/ccgg/trends/data.html).
#'
#' @examples
#' ppm_from_date("1978-06-05")
#'
#' @export
ppm_from_date <- function(date) {
```
---
## Documenting functions
```r
devtools::document()
?ppm_from_date
```
.center[
![screenshot of the function documentation](images/doc.png)
]
---
## The documentation workflow
<a href="https://fr.slideshare.net/DataFestTbilisi/r-package-development-create-package-documentation-isabella-gollini"> <img src="https://image.slidesharecdn.com/rpackagedevelopmentcreatepackagedocumentation-isabellagollini-181126113617/95/r-package-development-create-package-documentation-isabella-gollini-12-638.jpg?cb=1543232899" align="center" width="700" alt='Initial documentation workflow'/> </a>
---
## Trying out functions
While you develop the package, it is important to know about `devtools::load_all()`. This will make ALL functions in the package available to you to test things interactively.
You can also try out non-exported functions that way:
```r
devtools::load_all()
get_weekly_data(clean = FALSE)
```
---
## The development workflow
<a href="https://forwards.github.io/workshops/package-dev-modules/"> <img src="https://forwards.github.io/workshops/package-dev-modules/slides/03-your-first-package/images/dev_cycle_before_testing.png" align="center" width="700" alt='Initial development workflow'/> </a>
---
## Building and installing
We are now ready to build and install the package. You can either:
- Run `devtools::install()`
- Use `CTRL+SHIFT+B` (or on mac `CMD+SHIFT+B`)
---
## Choosing a license
To choose a license, let's head to [choosealicense.com](https://choosealicense.com/).
We are going to go with GPL-3.
```r
usethis::use_gpl3_license()
```
```
✓ Setting active project to '/home/vlucet/carbon'
✓ Setting License field in DESCRIPTION to 'GPL (>= 3)'
✓ Writing 'LICENSE.md'
✓ Adding '^LICENSE\\.md$' to '.Rbuildignore'
```
---
## Adding a readme
```r
usethis::use_readme_rmd()
```
- Adds a readme.
- Creates commit hook so that changes to readme cant be commited before its re-knitted.
```
✓ Writing 'README.Rmd'
✓ Adding '^README\\.Rmd$' to '.Rbuildignore'
• Modify 'README.Rmd'
```
---
## Checking your package
R has a built-in system for checking that packages are properly written. You can run it with
```r
devtools::check()
```
The results of the check:
```r
── R CMD check results ──────────────── carbon 0.0.0.9000 ────
Duration: 8.4s
> checking R code for possible problems ... NOTE
get_weekly_data: no visible global function definition for ‘read.table’
get_weekly_data: no visible binding for global variable ‘year’
get_weekly_data: no visible binding for global variable ‘month’
get_weekly_data: no visible binding for global variable ‘day’
ppm_from_date: no visible binding for global variable ‘co2_ppm’
Undefined global functions or variables:
co2_ppm day month read.table year
Consider adding
importFrom("utils", "read.table")
to your NAMESPACE file.
```
---
## The development workflow
In summary:
1. Modify code => `load_all()` => try out code
2. Modify documentation => `document()` (=> `install()`) => verify doc
4. `install()` + `check()`
5. Repeat!
---
class: inverse center middle
## Package testing, maintenance and distribution: best practices
<!-- Include licensing in there -->
---
## Using GitHub
Using version control and depositing your package on GitHub is very important: it allows anyone (as long as the repository is public) to install your package
```r
devtools::install_github("vlucet/carbon")
```
---
## Testing: the basics
Tests served 2 main purposes:
- Document the expected behavior of functions
- A way to detect bugs as you add features
How big should a test be? Principles of __unit testing__ dictate that a test should check for the __behavior of a single function for a single combination of inputs__.
This is debated and sometimes difficult to achieve (for example with packages that replicate a complex workflow).
The smaller / simpler your functions, the easier it is to write tests for them. It is common that testing code equals or exceed the amount of tested code!
---
## Testing: an example
We use usethis to set up the `testthat` framework.
```r
usethis::use_testthat()
```
And then to create a test file. It is good practice to name each test file after the function or functionality it is meant to test.
```r
usethis::use_test("ppm_from_date")
```
---
## Testing: an example
The more specific you can be, the better the test
```r
test_that("returned ppm value is correct",{
expect_equal(ppm_from_date("1996-10-29"), 360.13)
})
```
Now lets run the test:
```r
devtools::test() # OR CMD/CTRL+SHIFT+T
```
---
## Test-Driven Development
One way to develop software is called __test-driven development__.
_In a nutshell:_ Write the test first and then write the code necessary for the test to pass.
```r
test_that("get_last_ppm returns the latest ppm", {
...
})
```