-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy pathPPB-Toolkit-for-R-and-R-Studio.tex
2738 lines (2121 loc) · 136 KB
/
PPB-Toolkit-for-R-and-R-Studio.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
% Options for packages loaded elsewhere
\PassOptionsToPackage{unicode}{hyperref}
\PassOptionsToPackage{hyphens}{url}
%
\documentclass[
]{book}
\usepackage{amsmath,amssymb}
\usepackage{lmodern}
\usepackage{ifxetex,ifluatex}
\ifnum 0\ifxetex 1\fi\ifluatex 1\fi=0 % if pdftex
\usepackage[T1]{fontenc}
\usepackage[utf8]{inputenc}
\usepackage{textcomp} % provide euro and other symbols
\else % if luatex or xetex
\usepackage{unicode-math}
\defaultfontfeatures{Scale=MatchLowercase}
\defaultfontfeatures[\rmfamily]{Ligatures=TeX,Scale=1}
\fi
% Use upquote if available, for straight quotes in verbatim environments
\IfFileExists{upquote.sty}{\usepackage{upquote}}{}
\IfFileExists{microtype.sty}{% use microtype if available
\usepackage[]{microtype}
\UseMicrotypeSet[protrusion]{basicmath} % disable protrusion for tt fonts
}{}
\makeatletter
\@ifundefined{KOMAClassName}{% if non-KOMA class
\IfFileExists{parskip.sty}{%
\usepackage{parskip}
}{% else
\setlength{\parindent}{0pt}
\setlength{\parskip}{6pt plus 2pt minus 1pt}}
}{% if KOMA class
\KOMAoptions{parskip=half}}
\makeatother
\usepackage{xcolor}
\IfFileExists{xurl.sty}{\usepackage{xurl}}{} % add URL line breaks if available
\IfFileExists{bookmark.sty}{\usepackage{bookmark}}{\usepackage{hyperref}}
\hypersetup{
pdftitle={A Participatory Plant Breeding Toolkit in R and RStudio},
pdfauthor={By Sergio Castro and Matteo Petitti},
hidelinks,
pdfcreator={LaTeX via pandoc}}
\urlstyle{same} % disable monospaced font for URLs
\usepackage{color}
\usepackage{fancyvrb}
\newcommand{\VerbBar}{|}
\newcommand{\VERB}{\Verb[commandchars=\\\{\}]}
\DefineVerbatimEnvironment{Highlighting}{Verbatim}{commandchars=\\\{\}}
% Add ',fontsize=\small' for more characters per line
\usepackage{framed}
\definecolor{shadecolor}{RGB}{248,248,248}
\newenvironment{Shaded}{\begin{snugshade}}{\end{snugshade}}
\newcommand{\AlertTok}[1]{\textcolor[rgb]{0.94,0.16,0.16}{#1}}
\newcommand{\AnnotationTok}[1]{\textcolor[rgb]{0.56,0.35,0.01}{\textbf{\textit{#1}}}}
\newcommand{\AttributeTok}[1]{\textcolor[rgb]{0.77,0.63,0.00}{#1}}
\newcommand{\BaseNTok}[1]{\textcolor[rgb]{0.00,0.00,0.81}{#1}}
\newcommand{\BuiltInTok}[1]{#1}
\newcommand{\CharTok}[1]{\textcolor[rgb]{0.31,0.60,0.02}{#1}}
\newcommand{\CommentTok}[1]{\textcolor[rgb]{0.56,0.35,0.01}{\textit{#1}}}
\newcommand{\CommentVarTok}[1]{\textcolor[rgb]{0.56,0.35,0.01}{\textbf{\textit{#1}}}}
\newcommand{\ConstantTok}[1]{\textcolor[rgb]{0.00,0.00,0.00}{#1}}
\newcommand{\ControlFlowTok}[1]{\textcolor[rgb]{0.13,0.29,0.53}{\textbf{#1}}}
\newcommand{\DataTypeTok}[1]{\textcolor[rgb]{0.13,0.29,0.53}{#1}}
\newcommand{\DecValTok}[1]{\textcolor[rgb]{0.00,0.00,0.81}{#1}}
\newcommand{\DocumentationTok}[1]{\textcolor[rgb]{0.56,0.35,0.01}{\textbf{\textit{#1}}}}
\newcommand{\ErrorTok}[1]{\textcolor[rgb]{0.64,0.00,0.00}{\textbf{#1}}}
\newcommand{\ExtensionTok}[1]{#1}
\newcommand{\FloatTok}[1]{\textcolor[rgb]{0.00,0.00,0.81}{#1}}
\newcommand{\FunctionTok}[1]{\textcolor[rgb]{0.00,0.00,0.00}{#1}}
\newcommand{\ImportTok}[1]{#1}
\newcommand{\InformationTok}[1]{\textcolor[rgb]{0.56,0.35,0.01}{\textbf{\textit{#1}}}}
\newcommand{\KeywordTok}[1]{\textcolor[rgb]{0.13,0.29,0.53}{\textbf{#1}}}
\newcommand{\NormalTok}[1]{#1}
\newcommand{\OperatorTok}[1]{\textcolor[rgb]{0.81,0.36,0.00}{\textbf{#1}}}
\newcommand{\OtherTok}[1]{\textcolor[rgb]{0.56,0.35,0.01}{#1}}
\newcommand{\PreprocessorTok}[1]{\textcolor[rgb]{0.56,0.35,0.01}{\textit{#1}}}
\newcommand{\RegionMarkerTok}[1]{#1}
\newcommand{\SpecialCharTok}[1]{\textcolor[rgb]{0.00,0.00,0.00}{#1}}
\newcommand{\SpecialStringTok}[1]{\textcolor[rgb]{0.31,0.60,0.02}{#1}}
\newcommand{\StringTok}[1]{\textcolor[rgb]{0.31,0.60,0.02}{#1}}
\newcommand{\VariableTok}[1]{\textcolor[rgb]{0.00,0.00,0.00}{#1}}
\newcommand{\VerbatimStringTok}[1]{\textcolor[rgb]{0.31,0.60,0.02}{#1}}
\newcommand{\WarningTok}[1]{\textcolor[rgb]{0.56,0.35,0.01}{\textbf{\textit{#1}}}}
\usepackage{longtable,booktabs,array}
\usepackage{calc} % for calculating minipage widths
% Correct order of tables after \paragraph or \subparagraph
\usepackage{etoolbox}
\makeatletter
\patchcmd\longtable{\par}{\if@noskipsec\mbox{}\fi\par}{}{}
\makeatother
% Allow footnotes in longtable head/foot
\IfFileExists{footnotehyper.sty}{\usepackage{footnotehyper}}{\usepackage{footnote}}
\makesavenoteenv{longtable}
\usepackage{graphicx}
\makeatletter
\def\maxwidth{\ifdim\Gin@nat@width>\linewidth\linewidth\else\Gin@nat@width\fi}
\def\maxheight{\ifdim\Gin@nat@height>\textheight\textheight\else\Gin@nat@height\fi}
\makeatother
% Scale images if necessary, so that they will not overflow the page
% margins by default, and it is still possible to overwrite the defaults
% using explicit options in \includegraphics[width, height, ...]{}
\setkeys{Gin}{width=\maxwidth,height=\maxheight,keepaspectratio}
% Set default figure placement to htbp
\makeatletter
\def\fps@figure{htbp}
\makeatother
\setlength{\emergencystretch}{3em} % prevent overfull lines
\providecommand{\tightlist}{%
\setlength{\itemsep}{0pt}\setlength{\parskip}{0pt}}
\setcounter{secnumdepth}{5}
\usepackage{booktabs}
\usepackage{titling}
\pretitle{\begin{center} \includegraphics[width=1in,height=1in]{Cover.jpg}\LARGE\\}
\ifluatex
\usepackage{selnolig} % disable illegal ligatures
\fi
\usepackage[]{natbib}
\bibliographystyle{apalike}
\title{A Participatory Plant Breeding Toolkit in R and RStudio}
\author{By Sergio Castro and Matteo Petitti}
\date{Last Update: 2021-10-19}
\begin{document}
\maketitle
{
\setcounter{tocdepth}{1}
\tableofcontents
}
\includegraphics{Cover.jpg}
\hypertarget{introduction}{%
\chapter{Introduction}\label{introduction}}
\includegraphics{rsrstrip.png}
Participatory Plant Breeding (PPB) is becoming increasingly popular around the world, as an approach that allows for both important gains in selection efficacy and for the improvement in farmer's involvement in science and decision making \citep{Ceccareli2020}.
However, the technical and practical aspects of PPB prograrms can be very challenging. One of the biggest areas of difficulty has to do with data processing and analysis, as it may require specific skill-sets that can only be adquired by practice. Literature on this subject is increasingly growing and great overall manuals have been written; most notably, the one from \citet{Tech-Manual}.
We do not pretend to reiterate or substitute these valuable resouces. What we do pretend is to offer a brief, synthetic and updates toolkit, which should be useful for participatory plant breeders during their processes. We offer this toolkit specifically in the frame of R and RStudio, an open source language and Integrated Development Environment (IDE) \citep{Rstudio}.
We propose R as a very powerful tool for PPB purposes because of three main reasons:
\begin{itemize}
\item
\textbf{Open Source. }\\
Anyone can download R and use it for free. What is more, anyone can also make its own contribution and expand the possibilites of things that can be done through this language. In fact, R is constantly updated by users around the world.
\item
\textbf{Polifunctional. }\\
Through R and RStudio, you can integrate most of the data related needs in PPB, such as Data Management, Statistical Analysis and Data Visualization.
\item
\textbf{Reproducibile. }\\
After developing all your designs and analysis, you can share your methods and results with the community. Then, virtually anyone with access to your data and your code can review every step you took and (hopefully) arrive to the same result.
\end{itemize}
In fact, a very useful package has been released specially for the use in Participatory Plant Breeding \citet{ppbstatsbook}. We are writing this book with the
purpose of enriching the resources already available, and to gather information on techniques and methods that might otherwise be scattered.
Furthermore, we want to insist on the fact that this is intended to be an ever growing manual, which is to be fed by the comments and experiences of those who find this resource useful.
\includegraphics{rsrstrip.png}
\hypertarget{first-steps-in-r}{%
\chapter{First Steps in R}\label{first-steps-in-r}}
\includegraphics{rsrstrip.png}
If you already know the basics of R, you can skip this section and go straight to the next \protect\hyperlink{data-wrangling-and-summarizing}{chapter}!
\hypertarget{getting-started}{%
\section{Getting started}\label{getting-started}}
If however, you are not used to RStudio, many ressources are available on line to master the language and procedures on R. If you are interested, we suggest some of them at the end of this chapter. However, this first chapter gives the most basic elements of R that will be useful later on, for the objetives of this PPB Toolkit.
At this point, if you do not have it already, we suggest you to download the programming language R through this \href{https://www.r-project.org/}{link} and RStudio, the associated Integrated Development Environment (IDE) through this \href{https://www.rstudio.com/products/rstudio/download/}{link}. Alternatively, you can try the cloud version \href{https://rstudio.cloud/}{R studio cloud}, but have in mind that most explanations in this toolkit will be done according to a locally downloaded version.
\includegraphics{rsrstrip.png}
\textbf{The RStudio Screen}
If at this point, you have downloaded and opened RStudio, you should be looking at something similar to the image in Figure 1 (This image is provisory). In order to facilitate communication between us and to properly navigate through the RStudio screen later on, some precisions must be made about this.
\includegraphics{Rstudio screen 2.png}
\textbf{Figure 1. The RStudio Screen. 1: The Text Editor. 2: The Environment. 3: The Plots. 4: The Console. }
The RStudio screen is normally divided into 4 large panels. Different tools could be assigned to those 4 panels, and their order can be inverted. However, we will here describe the most commons uses for the standard positions of these panels.
The \textbf{text editor (1)} usually appears at the upper left cuadrant. It is evidently where most typing happens and what is typed can then be saved in the form of a script.
The \textbf{console (2)} is at the lower left cuadrant, every line of code run passes through here, and it is also where results are usually shown. Also, when an error occurs, the explanation of the error appears here. Moreover, some other features might be shown in this cuadrant if decided by the user, such as the Terminal and R Markdown (which will not be covered in this toolkit).
The \textbf{upper right corner (3)} usually shows us the objects that are present in the R environment. We will later on see what objects can be and how to use them.
The \textbf{lower right cuadrant (4)} is one of the most versatile spaces. It can be used to see the plots, to browse for files and to ask for help.
\includegraphics{rsrstrip.png}
\textbf{Basic operations.}
R can be thought simply as a calculator (but a very powerful one). Start by writing \textbf{on the console} ``1+1'', and then click the Enter button.
\begin{Shaded}
\begin{Highlighting}[]
\DecValTok{1}\SpecialCharTok{+}\DecValTok{1}
\end{Highlighting}
\end{Shaded}
\begin{verbatim}
## [1] 2
\end{verbatim}
The first strip is what we typed, and the second shows the result of the request. The {[}1{]} symbol indicates that this is the first line of answers, and the only element present is a 2. Later on, we will see that certain operations will produce several lines of answers, so it is useful that results are ordered by numbers.
Try to type now ``1+1'' on the text editor. Once typed, you can click on the ``Run'' button on the upper right to see the same results. Or alternatively (and faster), you can just type Ctrl+Enter while the cursor is on that line.
\begin{Shaded}
\begin{Highlighting}[]
\DecValTok{1}\SpecialCharTok{+}\DecValTok{1}
\end{Highlighting}
\end{Shaded}
\begin{verbatim}
## [1] 2
\end{verbatim}
We should expect the same results by performing operations directly on the console or on the text editor. However, the text editor is more flexible, because you can save all your text as a script, and rerun it later.
\includegraphics{rsrstrip.png}
\textbf{Basic Arithmetics }
Let's explore a bit of other arithmetic functions. For example, if you want to multiply 6 times 9, just type "6*9" in the text editor and hit Ctrl+Enter:
\begin{Shaded}
\begin{Highlighting}[]
\DecValTok{6}\SpecialCharTok{*}\DecValTok{9}
\end{Highlighting}
\end{Shaded}
\begin{verbatim}
## [1] 54
\end{verbatim}
Or maybe you want to divide 3 by 2.
\begin{Shaded}
\begin{Highlighting}[]
\DecValTok{3}\SpecialCharTok{/}\DecValTok{2}
\end{Highlighting}
\end{Shaded}
\begin{verbatim}
## [1] 1.5
\end{verbatim}
Perhaps you are interested in estimating the square root of 2.
\begin{Shaded}
\begin{Highlighting}[]
\FunctionTok{sqrt}\NormalTok{(}\DecValTok{2}\NormalTok{)}
\end{Highlighting}
\end{Shaded}
\begin{verbatim}
## [1] 1.414214
\end{verbatim}
Or in calculating 10 to the power of 2.
\begin{Shaded}
\begin{Highlighting}[]
\DecValTok{10}\SpecialCharTok{\^{}}\DecValTok{2}
\end{Highlighting}
\end{Shaded}
\begin{verbatim}
## [1] 100
\end{verbatim}
\includegraphics{rsrstrip.png}
\textbf{Basic functions}
Functions are the base of R. Internally, they contain the operations that we want to execute. They are always written followed by a parenthesis and everything that is between parenthesis is called ``arguments'' . You can learn more about each function by typing the ? symbol, followed by the name of the function.
Three simple examples of useful functions:
\begin{verbatim}
Concatenate = c()
Mean = mean()
Standard deviation = sd()
\end{verbatim}
How to use them?
\begin{itemize}
\tightlist
\item
Use c() to put together a set of elements. In this case, numbers.
\end{itemize}
\begin{Shaded}
\begin{Highlighting}[]
\FunctionTok{c}\NormalTok{(}\DecValTok{1}\NormalTok{,}\DecValTok{2}\NormalTok{,}\DecValTok{3}\NormalTok{,}\DecValTok{4}\NormalTok{)}
\end{Highlighting}
\end{Shaded}
\begin{verbatim}
## [1] 1 2 3 4
\end{verbatim}
Use mean() to estimate the mean of the four elements inside the c() function.
\begin{Shaded}
\begin{Highlighting}[]
\FunctionTok{mean}\NormalTok{(}\FunctionTok{c}\NormalTok{(}\DecValTok{1}\NormalTok{,}\DecValTok{2}\NormalTok{,}\DecValTok{3}\NormalTok{,}\DecValTok{4}\NormalTok{))}
\end{Highlighting}
\end{Shaded}
\begin{verbatim}
## [1] 2.5
\end{verbatim}
You can do the same, but with sd(), to estimate the standard deviation
\begin{Shaded}
\begin{Highlighting}[]
\FunctionTok{sd}\NormalTok{ (}\FunctionTok{c}\NormalTok{(}\DecValTok{1}\NormalTok{,}\DecValTok{2}\NormalTok{,}\DecValTok{3}\NormalTok{,}\DecValTok{4}\NormalTok{))}
\end{Highlighting}
\end{Shaded}
\begin{verbatim}
## [1] 1.290994
\end{verbatim}
\includegraphics{rsrstrip.png}
\textbf{Objects}
The term ``Object'' refers to most of the information stored in your R session. An object can be a number, a character, a dataframe, a plot, or other things. You can see your objects in the upper right window of RStudio, called Environment.
You can create objects by using the symbols ` = ' or ` \textless- `, although the latter is generally prefered. For example, create an object called x, containing the numbers 1, 2 and 3.
\begin{Shaded}
\begin{Highlighting}[]
\NormalTok{x }\OtherTok{\textless{}{-}} \FunctionTok{c}\NormalTok{(}\DecValTok{1}\NormalTok{,}\DecValTok{2}\NormalTok{,}\DecValTok{3}\NormalTok{)}
\NormalTok{x}
\end{Highlighting}
\end{Shaded}
\begin{verbatim}
## [1] 1 2 3
\end{verbatim}
Or create and object called myvector, with all numbers from 1 to 5. The ``:'' symbol indicated that every number between 1 and 5 will be included.
\begin{Shaded}
\begin{Highlighting}[]
\NormalTok{myvector }\OtherTok{\textless{}{-}} \FunctionTok{c}\NormalTok{(}\DecValTok{1}\SpecialCharTok{:}\DecValTok{5}\NormalTok{)}
\NormalTok{myvector}
\end{Highlighting}
\end{Shaded}
\begin{verbatim}
## [1] 1 2 3 4 5
\end{verbatim}
Now, you should see in the upper right window your two newly created objects, called ``x'' and ``myvector''.
Two important notes regarding objects are:
\begin{itemize}
\item
When you create an object, it does not appear automatically in the console. To actually see the object you created, you have to ``call it'' by writing it's name on the text editor and typing Ctrl+Enter. Or, by writing it on the console and hitting simply Enter.
\item
Attention to ortography and lower/upper cases!! MyVector is not myvector (typos make up the most common errors for R begginers)
\end{itemize}
\includegraphics{rsrstrip.png}
\textbf{Saving your script.}
You can save your script to work on it later, and, in that way, you have a proper register of the analysis you did.It's very useful to create a folder in your computer for every project you do in R. We invite you to create a folder on your documents and call it ``R PPB toolkit'' or whatever name you like better.
Once created, you can save the script in which you are working by clicking on:
\begin{itemize}
\tightlist
\item
File \textgreater{} Save As \textgreater{} Choose your folder and save your script as ``myscript.R''.
\end{itemize}
(It's important to finish with ``.R'', to indicate your file format.)
\includegraphics{rsrstrip.png}
\hypertarget{preparing-and-reading-your-data}{%
\section{Preparing and reading your data}\label{preparing-and-reading-your-data}}
\textbf{Setting up your Working Directory. }
The Working Directory (WD) is a very important concept, and it is simply the folder in which you are working. This folder should work both for uploading files and placing the outputs of your work.
The most important things to di regarding your WD are:
\textbf{1. Knowing which folder is your actual WD, by using getwd() }
\begin{Shaded}
\begin{Highlighting}[]
\FunctionTok{getwd}\NormalTok{() }
\end{Highlighting}
\end{Shaded}
\begin{verbatim}
## [1] "C:/Users/Usuario1/Documents/Github/PPB-Toolkit"
\end{verbatim}
*This is our working directory in our computer, but yours will be inevitably named diferently. In this case, our Directory (``PPB-Toolkit'') is within the a folder called ``Github'', that is located within ``Documents''.
\textbf{2. Fixing your WD by writing setwd(``Folder Location'') }
\begin{Shaded}
\begin{Highlighting}[]
\FunctionTok{setwd}\NormalTok{(}\StringTok{"C:/Users/Usuario1/Documents/New{-}PPB{-}Toolkit"}\NormalTok{)}
\end{Highlighting}
\end{Shaded}
So, I could set my WD to a folder inside my Documents. This step is specially tricky and works differently in PC and Mac. You should be sure that you can type precisely the location on your computer. Alternatively, you can follow step 3.
\textbf{3. Fixing your WD by clicking. }
You can also set your WD by clicking on RStudio.
\begin{itemize}
\tightlist
\item
Session \textgreater{} Set Working Directory \textgreater{} Choose Directory \textgreater{} Manually locate your folder.
\end{itemize}
This is useful because the chances of typos and errors are reduced, but, if you have to do it every time you open RStudio, it might be annoying.
*After doing this, you will see on the console that a code was automatically written, using the setwd(function). A nice trick is to copy and paste this direction into your script (inside the setwd() function), so that you are sure that is it typed correctly.
\includegraphics{rsrstrip.png}
\textbf{Preparing your data.}
Normally, after evaluations in the field or in the laboratory, data is usually stored in a spreadsheet format in Excel, Google Sheets or other similar options. In those files, data should be kept in a format that is as tidy as possible, to facilitate further elaborations. The fundamentals of tidy data can be found \href{https://cran.r-project.org/web/packages/tidyr/vignettes/tidy-data.html}{here}.
In this toolkit we will work mostly with data from a tomato PPB project developed in Italy (I would like to also add data from another source, maybe later). For this case, the data corresponds to a Multi Location Trial (MLT) done in 2020 at 4 different locations. The material evaluated came from a composite cross of 4 different landraces, and subjected to Natural Selection or Farmers Selection. However, to simplify this toolkit, varieties were simply named from A to N, as there were 14 different genotypes. Data also includes variables like mean of farmer's evaluations, the yield at first harvest, the total yield, the mean fruit weight and the percentage of marketable yield (as defined by the farmers).
\includegraphics{tomatodata.png}
\textbf{Figure 2. Snapshot of the spreadsheet containing the data from the tomato Multi-Location Trial as part of a PPB program. }
Moreover, there are some general advices that can be given about the format of the data to upload.
\begin{itemize}
\item
For the column names, it is convenient to use short but explanative names and to avoid spaces. This allows for less problems in the analysis. That is why, for example, column 7 on Figure 2 reads ``farmers\_eval'' instead of ``Farmer's Evaluation''.
\item
If you data frame has missing data, it is convenient to use always the same character as data frame indicator. It can be a "*" or even ``NA'', but, for now, avoid leaving blanks for the missing data.
\end{itemize}
\includegraphics{rsrstrip.png}
\textbf{Uploading the data.}
Data can be uploaded to R studio directly from the spreadsheet file in a excel format. However, the best way to do it is transforming your file into a ``comma separated values'' or .csv file. This is a lighter and easier to read format for most programs. To convert your spreadsheet into this format, you just have to click on ``Save As'', and choose the option ``CSV (comma delimited)''. You can know if you did it right if you open the same document in a text editor, such as Word or Notepad, and you see that now your rows have become just a long string of text separated with commas.
You can \textbf{find and download the data} in this \href{https://drive.google.com/drive/folders/1y2NPXd9lYZcM51NMJEgqa-Iax8qZYs3O?usp=sharing}{Google Drive Folder}. Once your data file is on .csv format and saved in the folder you choose for Working Directory, you can upload it this way.
\begin{Shaded}
\begin{Highlighting}[]
\NormalTok{mydata }\OtherTok{\textless{}{-}} \FunctionTok{read.csv}\NormalTok{(}\StringTok{"tomatoMLT2020.csv"}\NormalTok{)}
\end{Highlighting}
\end{Shaded}
In this case, we knew that there was a file called `tomatoMLT2020.csv' in our working directory. So, through this line, we are at the same time reading the data, and assigning it to an object called ``mydata''
If it doesn's work, the most probable reasons are that: i)the data is not on the right directory, or ii) you have not set properly your working directory. A small trick to check that out would be using this function. Where we ask R to list all the files within our working directory.
\begin{Shaded}
\begin{Highlighting}[]
\FunctionTok{list.files}\NormalTok{(}\FunctionTok{getwd}\NormalTok{())}
\end{Highlighting}
\end{Shaded}
Another way to upload the data would be be clicking
\begin{itemize}
\tightlist
\item
Import Database \textgreater{} From text(base) \textgreater{} Manually choose and select your file.
\end{itemize}
In that case, you will see that the code required to perform that operation will be automatically written on the console. A nice trick would be to copy that code and paste it on your script, to save it for later occasions.
Moreover, off course that you can also import files that are in a excel format. You could, for example, click on:
\begin{itemize}
\tightlist
\item
Import Database \textgreater{} From Excel \textgreater{} Manually choose and select your file.
\end{itemize}
Or, as well, use the required function, called \textbf{read\_xl()}. However, it requires to download the ``readxl'' package library ( and we will talk about them soon!).
\begin{Shaded}
\begin{Highlighting}[]
\FunctionTok{library}\NormalTok{(readxl)}
\NormalTok{mydata }\OtherTok{\textless{}{-}} \FunctionTok{read\_xls}\NormalTok{(}\StringTok{"tomatoMLT2020.xlsx"}\NormalTok{)}
\end{Highlighting}
\end{Shaded}
Moreover, more information about importing data into R can be found \href{http://www.sthda.com/english/wiki/importing-data-into-r}{here}.
\textbf{Checking your data }
It is generally convenient to check your data once you have uploaded it. This way, you make sure that you are choosing the right file and that no information was loss or distorted in the uploading process. Once you have uploaded the data, you can ask questions through R about the data, for example:
What is the structure of my data?
\begin{Shaded}
\begin{Highlighting}[]
\FunctionTok{str}\NormalTok{(mydata)}
\end{Highlighting}
\end{Shaded}
\begin{verbatim}
## 'data.frame': 112 obs. of 11 variables:
## $ location : chr "Molise" "Molise" "Molise" "Molise" ...
## $ plot : int 1 2 3 4 5 6 7 8 9 10 ...
## $ rep : int 1 1 2 2 1 1 2 2 1 1 ...
## $ row : int 1 2 3 4 1 2 3 4 1 2 ...
## $ col : int 1 1 1 1 2 2 2 2 3 3 ...
## $ variety : chr "Var. A" "Var. D" "Var. I" "Var. N" ...
## $ farmers_eval : num 3.24 2.71 3.35 3.24 3.24 3.06 2.88 3.24 2.94 2.88 ...
## $ yield : num 789 498 822 846 1083 ...
## $ yield_first : num 11 0 24.9 46.8 28.2 ...
## $ perc_mark_yield : num 57.5 55.8 76 55.3 81.2 ...
## $ mean_fruit_weight: num 17.43 7.37 35.54 14.19 68.85 ...
\end{verbatim}
This function is particularly useful, as you can see the data type of your columns. They might be characters, numerics, integrers (numbers without decimals) and factors, among others. For example, in this case, the location and the variety are read as characters, while they should be factors. We will soon see how to change this.
We could, also do more specifical questions, like how many columns do I have? Or how many rows? What are the names of my variables?
\begin{Shaded}
\begin{Highlighting}[]
\FunctionTok{ncol}\NormalTok{(mydata)}
\end{Highlighting}
\end{Shaded}
\begin{verbatim}
## [1] 11
\end{verbatim}
\begin{Shaded}
\begin{Highlighting}[]
\FunctionTok{nrow}\NormalTok{(mydata)}
\end{Highlighting}
\end{Shaded}
\begin{verbatim}
## [1] 112
\end{verbatim}
\begin{Shaded}
\begin{Highlighting}[]
\FunctionTok{names}\NormalTok{(mydata)}
\end{Highlighting}
\end{Shaded}
\begin{verbatim}
## [1] "location" "plot" "rep"
## [4] "row" "col" "variety"
## [7] "farmers_eval" "yield" "yield_first"
## [10] "perc_mark_yield" "mean_fruit_weight"
\end{verbatim}
Of course, you can always choose to see your data in spreadsheet format.
\begin{Shaded}
\begin{Highlighting}[]
\FunctionTok{View}\NormalTok{(mydata)}
\end{Highlighting}
\end{Shaded}
This does not appear in the book format, but in your RStudio screen, it will make a new tab appear with your data in a spreadsheet format.
Bottomline, we advice to always check your data when uploading. Specifically, check the data type of each variable, as this might lead to problems during the analysis and manipulation.
\includegraphics{rsrstrip.png}
\hypertarget{basic-data-manipulation}{%
\section{Basic data manipulation}\label{basic-data-manipulation}}
\textbf{Accesing your data }
Before performing statistic analysis, it is useful to understand how you can ``call'' a specific column or row in R, so that you can easiliy access your data. Here are two basic examples.
\begin{itemize}
\tightlist
\item
To access a specific column, you can use the data frame name, followed by the ``\$'' operator and the column name. For example, if I wanted to see every data point on the column for yield, I could type.
\end{itemize}
\begin{Shaded}
\begin{Highlighting}[]
\NormalTok{mydata}\SpecialCharTok{$}\NormalTok{yield}
\end{Highlighting}
\end{Shaded}
\begin{verbatim}
## [1] 789.25 497.75 821.90 846.50 1083.23 750.49 719.00 1163.00 854.72
## [10] 766.25 791.25 1023.18 1004.11 906.84 830.79 953.52 906.25 831.25
## [19] 698.76 977.25 671.00 999.75 1055.65 653.16 814.89 1016.23 1223.02
## [28] 785.00 3907.75 3679.00 2618.95 3115.58 2474.74 3199.25 3695.00 3617.00
## [37] 3607.63 3550.26 3619.80 3447.75 2670.28 2660.75 3262.11 1587.50 3215.00
## [46] 2671.75 2789.00 2646.00 2715.30 1321.75 3449.00 3362.63 3315.25 2300.00
## [55] 2754.25 3028.42 424.47 482.25 469.47 623.06 415.83 509.75 478.24
## [64] 515.60 315.00 173.00 1094.72 799.06 253.95 944.75 190.26 376.18
## [73] 440.79 441.30 345.53 478.68 711.75 221.75 158.25 719.25 275.00
## [82] 418.50 440.50 261.32 1264.85 918.07 1110.38 1171.68 893.31 1168.29
## [91] 1038.53 1150.23 1277.07 821.88 805.21 1195.68 655.47 695.64 988.79
## [100] 1162.33 1556.57 876.80 868.72 1135.47 1582.41 761.53 549.38 840.05
## [109] 1405.12 540.47 774.07 602.94
\end{verbatim}
Thus, this gives every one of the 112 observations on plot yield, in the order in which they appear in the data frame.
\begin{itemize}
\tightlist
\item
You can, alternatively, use the form data{[}row,column{]}. For example, if you only want to see the data point in the third row and third column.
\end{itemize}
\begin{Shaded}
\begin{Highlighting}[]
\NormalTok{mydata[}\DecValTok{3}\NormalTok{,}\DecValTok{3}\NormalTok{]}
\end{Highlighting}
\end{Shaded}
\begin{verbatim}
## [1] 2
\end{verbatim}
The same idea can be used to see only the first row, and all columns.
\begin{Shaded}
\begin{Highlighting}[]
\NormalTok{mydata[}\DecValTok{1}\NormalTok{,]}
\end{Highlighting}
\end{Shaded}
\begin{verbatim}
## location plot rep row col variety farmers_eval yield yield_first
## 1 Molise 1 1 1 1 Var. A 3.24 789.25 11
## perc_mark_yield mean_fruit_weight
## 1 57.46 17.43
\end{verbatim}
Or also, it could be useful to see rows 1, 2 and 3, and only the second column.
\begin{Shaded}
\begin{Highlighting}[]
\NormalTok{mydata[}\FunctionTok{c}\NormalTok{(}\DecValTok{1}\NormalTok{,}\DecValTok{2}\NormalTok{,}\DecValTok{3}\NormalTok{),}\DecValTok{5}\NormalTok{]}
\end{Highlighting}
\end{Shaded}
\begin{verbatim}
## [1] 1 1 1
\end{verbatim}
You can also call columns by their names. In this case, lines 1 to 10 of the Farmer's Evaluations
\begin{Shaded}
\begin{Highlighting}[]
\NormalTok{mydata[}\DecValTok{1}\SpecialCharTok{:}\DecValTok{10}\NormalTok{, }\StringTok{"farmers\_eval"}\NormalTok{]}
\end{Highlighting}
\end{Shaded}
\begin{verbatim}
## [1] 3.24 2.71 3.35 3.24 3.24 3.06 2.88 3.24 2.94 2.88
\end{verbatim}
\includegraphics{rsrstrip.png}
\textbf{Vector arithmetics}
In the same way in which you can do arithmetics with simple numbers (as seen before), you can do it with vectors (a series of numbers). Also, if you use introduce a hashtag (\#) in your text editor or console, you can freely write text that explains what you are doing. For example:
\begin{Shaded}
\begin{Highlighting}[]
\CommentTok{\# Define a v vector}
\NormalTok{v }\OtherTok{\textless{}{-}} \FunctionTok{c}\NormalTok{(}\DecValTok{10}\NormalTok{,}\DecValTok{20}\NormalTok{, }\DecValTok{30}\NormalTok{, }\DecValTok{40}\NormalTok{, }\DecValTok{50}\NormalTok{)}
\CommentTok{\# Define a w vector}
\NormalTok{w }\OtherTok{\textless{}{-}} \FunctionTok{c}\NormalTok{(}\DecValTok{1}\NormalTok{, }\DecValTok{2}\NormalTok{, }\DecValTok{3}\NormalTok{, }\DecValTok{4}\NormalTok{, }\DecValTok{5}\NormalTok{)}
\CommentTok{\# Add them up to create a t vector}
\NormalTok{t}\OtherTok{=}\NormalTok{ v }\SpecialCharTok{+}\NormalTok{ w}
\CommentTok{\# t is the sum of v and w}
\NormalTok{t}
\end{Highlighting}
\end{Shaded}
\begin{verbatim}
## [1] 11 22 33 44 55
\end{verbatim}
\begin{center}\rule{0.5\linewidth}{0.5pt}\end{center}
\textbf{Create a new column. }
Using vector arithmetics we could create new columns in our data frame, done through calculations performed with the existing columns. For example, in our particular data set, we have data for total yield and the data for percentage of marketable yield. A simple way to obtain only the marketable yield, would be to multiply the total yield with the percentage of marketable yield. This will create a new vector which we will simply call ``newvector''
\begin{Shaded}
\begin{Highlighting}[]
\NormalTok{newvector }\OtherTok{\textless{}{-}}\NormalTok{ mydata}\SpecialCharTok{$}\NormalTok{yield }\SpecialCharTok{*}\NormalTok{ mydata}\SpecialCharTok{$}\NormalTok{perc\_mark\_yield}
\end{Highlighting}
\end{Shaded}
Once you created the vector, you can add it to the data frame. You can also do it directly, but we chose it this way to make it simpler to understand.
\begin{Shaded}
\begin{Highlighting}[]
\NormalTok{mydata}\SpecialCharTok{$}\NormalTok{mark\_yield }\OtherTok{\textless{}{-}}\NormalTok{ newvector}
\end{Highlighting}
\end{Shaded}
\includegraphics{rsrstrip.png}
\hypertarget{basic-statistics-and-plots}{%
\section{Basic Statistics and Plots}\label{basic-statistics-and-plots}}
\textbf{Minimal statistics}
Now that we know how to call an specifical column in the data frame, it is possible to perform some basic statistics on these variables.
\begin{itemize}
\tightlist
\item
Estimate the mean of a variable.
\end{itemize}
\begin{Shaded}
\begin{Highlighting}[]
\FunctionTok{mean}\NormalTok{(mydata}\SpecialCharTok{$}\NormalTok{yield)}
\end{Highlighting}
\end{Shaded}
\begin{verbatim}
## [1] 1334.865
\end{verbatim}
\begin{itemize}
\tightlist
\item
Estimate the median.
\end{itemize}
\begin{Shaded}
\begin{Highlighting}[]
\FunctionTok{median}\NormalTok{(mydata}\SpecialCharTok{$}\NormalTok{yield)}
\end{Highlighting}
\end{Shaded}
\begin{verbatim}
## [1] 906.545
\end{verbatim}
\begin{itemize}
\tightlist
\item
Estimate the standard deviation.
\end{itemize}
\begin{Shaded}
\begin{Highlighting}[]
\FunctionTok{sd}\NormalTok{(mydata}\SpecialCharTok{$}\NormalTok{yield)}
\end{Highlighting}
\end{Shaded}
\begin{verbatim}
## [1] 1055.797
\end{verbatim}
\includegraphics{rsrstrip.png}
\textbf{A simple plot.}
Using the function plot() we can create simple exploratory graphs, which will appear on the lower right pannel.
\begin{Shaded}
\begin{Highlighting}[]
\FunctionTok{plot}\NormalTok{( }\AttributeTok{x=}\NormalTok{ mydata}\SpecialCharTok{$}\NormalTok{farmers\_eval, }\CommentTok{\# My x axis}
\AttributeTok{y=}\NormalTok{ mydata}\SpecialCharTok{$}\NormalTok{yield, }\CommentTok{\# My y axis}
\AttributeTok{xlab=} \StringTok{"Yield per plant (kg)"}\NormalTok{, }\CommentTok{\# My x label}
\AttributeTok{ylab=} \StringTok{"Farmer\textquotesingle{}s Evaluation"}\NormalTok{ ) }\CommentTok{\# My y label}
\end{Highlighting}
\end{Shaded}
\includegraphics{PPB-Toolkit-for-R-and-R-Studio_files/figure-latex/unnamed-chunk-33-1.pdf}
\textbf{Another simple plot.}
We could, for example plot the harvest data for each location. But first, we will check if the data types are right.
\begin{Shaded}
\begin{Highlighting}[]
\FunctionTok{str}\NormalTok{(mydata)}
\end{Highlighting}
\end{Shaded}
\begin{verbatim}
## 'data.frame': 112 obs. of 12 variables:
## $ location : chr "Molise" "Molise" "Molise" "Molise" ...
## $ plot : int 1 2 3 4 5 6 7 8 9 10 ...
## $ rep : int 1 1 2 2 1 1 2 2 1 1 ...
## $ row : int 1 2 3 4 1 2 3 4 1 2 ...
## $ col : int 1 1 1 1 2 2 2 2 3 3 ...
## $ variety : chr "Var. A" "Var. D" "Var. I" "Var. N" ...
## $ farmers_eval : num 3.24 2.71 3.35 3.24 3.24 3.06 2.88 3.24 2.94 2.88 ...
## $ yield : num 789 498 822 846 1083 ...
## $ yield_first : num 11 0 24.9 46.8 28.2 ...
## $ perc_mark_yield : num 57.5 55.8 76 55.3 81.2 ...
## $ mean_fruit_weight: num 17.43 7.37 35.54 14.19 68.85 ...
## $ mark_yield : num 45350 27750 62464 46828 88002 ...
\end{verbatim}
Looks like the location is a character variable. This means that it is just a loose string of text. We rather want it to be a factor, so that all observations with the same location can be grouped in plots. The function \textbf{as.factor()} allows me to change any type of variable into a factor
\begin{Shaded}
\begin{Highlighting}[]
\NormalTok{mydata}\SpecialCharTok{$}\NormalTok{location }\OtherTok{=} \FunctionTok{as.factor}\NormalTok{(mydata}\SpecialCharTok{$}\NormalTok{location)}
\end{Highlighting}
\end{Shaded}
And now we can do the plot. Notice that, as we plotted a factor variable (the location) against a quantitative variable (the yield), R automatically generates \href{https://en.wikipedia.org/wiki/Box_plot}{box plot}, which is useful, becuase it not only tells us the median values, but also dispersion of the data with the size of the box and it's whiskers..
\begin{Shaded}
\begin{Highlighting}[]
\FunctionTok{plot}\NormalTok{( }\AttributeTok{x=}\NormalTok{ mydata}\SpecialCharTok{$}\NormalTok{location, }
\AttributeTok{y=}\NormalTok{mydata}\SpecialCharTok{$}\NormalTok{yield,}
\AttributeTok{ylab=} \StringTok{"Yield per plant (kg)"}\NormalTok{,}
\AttributeTok{xlab=} \StringTok{"Location"}\NormalTok{)}
\end{Highlighting}
\end{Shaded}
\includegraphics{PPB-Toolkit-for-R-and-R-Studio_files/figure-latex/unnamed-chunk-36-1.pdf}
\includegraphics{rsrstrip.png}
\hypertarget{packages-in-r.}{%
\section{Packages in R.}\label{packages-in-r.}}
Packages are what keeps the R community growing, as virtually anyone can create and upload a package in R. A package can be loosely defined as a group of functions that serve to a certain objective or area of study. Some of them are very useful to treat and analyze data from varietial trials and PPB programs, and we will see many examples later. Downloading and installing them is very easy, you have mostly two options.
\begin{enumerate}
\def\labelenumi{\arabic{enumi}.}
\item
The simplest is to click on Tools \textgreater{} Install Packages \textgreater{} and write the package name. This works as long as the package is on the CRAN repositoy, which is the official R repository where developers upload packages.
\item
To make it easier, and embeded into your code, you could also write
\end{enumerate}
\begin{Shaded}
\begin{Highlighting}[]
\FunctionTok{install.packages}\NormalTok{(}\StringTok{"yourpackagename"}\NormalTok{)}
\CommentTok{\#Don\textquotesingle{}t try this! It\textquotesingle{}s only an example!}
\end{Highlighting}
\end{Shaded}
Once installed, you have to call it so that it is active on R. This is true for every time you open R.
\begin{Shaded}
\begin{Highlighting}[]
\FunctionTok{library}\NormalTok{(yourpackagename)}
\CommentTok{\#Again, don\textquotesingle{}t try this.}
\end{Highlighting}
\end{Shaded}
We will use and install several packages in the following chapters.
\begin{center}\rule{0.5\linewidth}{0.5pt}\end{center}
\hypertarget{final-considerations}{%
\section{Final considerations}\label{final-considerations}}
\textbf{Ask questions!}
It's very hard to know (and to remember!) how everything is done in R. And, like any language, the only way to become fluent is to practice often. There will always be things that you might want to do, and which are not included in this very small manual. For those cases, curiosity is a gread advantage, and here is some small advice about how to proceed.
Examples:
\begin{itemize}
\item
\begin{enumerate}
\def\labelenumi{\arabic{enumi}.}
\tightlist
\item
When you wish to better understand a function and it's arguments, you can type ?functionname (and changing ``functionname'' for your actual function). This will display a window on the left lower panel which explains thoroughly the function and how is it done.
\end{enumerate}
\item
\begin{enumerate}
\def\labelenumi{\arabic{enumi}.}
\setcounter{enumi}{1}
\tightlist
\item
When you want to something in particular, but are not sure how to do it in R, A great approach is just to google: ``How to \_\_\_\_\_ in R''. This can get you out of many troubles, the tricky part is finding out how to ask the proper question.
\end{enumerate}
\item
\begin{enumerate}
\def\labelenumi{\arabic{enumi}.}
\setcounter{enumi}{2}
\tightlist
\item
When you run a piece of code and receive an error (usually displayed with red letters on the console) try to read them and see what they mean. Sometimes, these errors are hard to interpret, and one good approach is to copy-paste them in google, and see who was suffered from this before.
\end{enumerate}
\end{itemize}
\textbf{To learn more}
These are just some sources to learn more about R:
\begin{itemize}
\item
The basic book is the \href{https://rc2e.com/}{R Cookbook}.
\item
This site offers plenty of information with nice tutorials: \href{http://www.sthda.com/english/}{STHDA}
\item
This is a cool blog where people ask questions you might have: \href{https://stackoverflow.com/}{Stack Overflow}
\end{itemize}
\hypertarget{data-wrangling-and-summarizing}{%
\chapter{Data Wrangling and Summarizing}\label{data-wrangling-and-summarizing}}
\includegraphics{rsrstrip.png}
Data wrangling is one of the most important activites that you can do in R. It allows you to, sistematically, take a given data frame (or several) and set them into the format that works best for the analysis. In the last chaper, we saw already some of the most basic elements of data wrangling (like creating a new column from two other). Now, we will see a brief summary of what can be done with the native functions that come from R, as well as two specific packages: \textbf{dplyr}, a package specifically designed for that purpose and, \textbf{metan} a package designed for analyzing multienvironmental trials.
\hypertarget{basic-r}{%
\section{Basic R}\label{basic-r}}
\textbf{Subsetting your data. }
Suppose you only want to work with a section of your data frame, for example, only the data for one location.
There are many ways to do this, a rather simple one is using the function subset().
\begin{Shaded}
\begin{Highlighting}[]
\NormalTok{soloRotonda }\OtherTok{\textless{}{-}} \FunctionTok{subset}\NormalTok{(mydata, mydata}\SpecialCharTok{$}\NormalTok{location }\SpecialCharTok{==} \StringTok{"Rotonda"}\NormalTok{)}
\FunctionTok{nrow}\NormalTok{(soloRotonda)}
\end{Highlighting}
\end{Shaded}
\begin{verbatim}
## [1] 28
\end{verbatim}
Notice how we created a new subset, only with data from Rotonda. After checking, we see it has only 28 rows, as there were only 28 plots per location..
Notice also how we used the ``=='' sign, to express a logical function stating the location had to be equal to ``Rotonda''
In R, the other characters to indicate logical expressions are:\\
\begin{verbatim}
* '==' for equal\
* '!=' for different \
* '<,>' less than, more than \
* '<=' less or equal to \
* '>=' more or equal to \
* '&' if we want one condition AND another.
* "|" if we Want one condition OR another.
\end{verbatim}
For example, we can select only the data for Rotonda, and only varieties A and B. If we check how many row we have now, we will see that, as expected we only have two rows, two for Var. A and 2 for Var. B.
\begin{Shaded}
\begin{Highlighting}[]
\NormalTok{Var.AB\_Rotonda }\OtherTok{\textless{}{-}} \FunctionTok{subset}\NormalTok{(mydata, }
\NormalTok{ mydata}\SpecialCharTok{$}\NormalTok{location }\SpecialCharTok{==} \StringTok{"Rotonda"}
\SpecialCharTok{\&} \CommentTok{\#And}
\NormalTok{ mydata}\SpecialCharTok{$}\NormalTok{variety }\SpecialCharTok{==} \StringTok{"Var. A"}
\SpecialCharTok{|} \CommentTok{\#Or}
\NormalTok{ mydata}\SpecialCharTok{$}\NormalTok{location }\SpecialCharTok{==} \StringTok{"Rotonda"}
\SpecialCharTok{\&} \CommentTok{\#And}
\NormalTok{ mydata}\SpecialCharTok{$}\NormalTok{variety }\SpecialCharTok{==} \StringTok{"Var. B"}\NormalTok{ )}
\FunctionTok{nrow}\NormalTok{(Var.AB\_Rotonda)}
\end{Highlighting}
\end{Shaded}
\begin{verbatim}
## [1] 4
\end{verbatim}