A collection of all resources and packages about R that I find interesting. Please cite the repository if using its resources and let me know of packages that you love that I (most probably!) have missed!
DataCamp. Tutorial. A fantastic collection of tutorials on R, including introduction to the basics as well as more advanced topics. David Robinson of DataCamp posts screencasts of data analysis using tidyverse on YouTube every Tuesday as part of #TidyTuesday that are super instructional here!
Quick-R. Tutorial. A fantastic website with anything one would need to start analyzing data using R.
Swirl. Tutorial. An R package that teaches you how to use R within R.
R for Data Science. Book. Written by Garrett Grolemund and Hadley Wickham - do I need to say anything else? This is a book written in RMarkdown using bookdown and contains all you need to know for basic data science using the tidyverse!
Foundations of Statistics with R. Book. A nicely done book teaching how to produce basic statistical analyses using R. I really like the integration of text with code and their use of modern data analysis packages (tidyverse).
STAT 547 @ UBC: Class Meeting Guide 2019/20. Book. A fantastic book teaching statistics through R, created to accompany the famous STAT 545 course at University of British Columbia. Please also review a similar book by the instructor and TAs of this class on Data wrangling, exploration, and analysis with R.
Introduction to Econometrics with R. Book. A fantastic book illustrating how to use econometric approaches to analyse data using R. It illustrates how to use R to appropriately run and interpret multivariable regressions, instrumental variables, time series regression, average treatment effects, etc.
R for Health Data Science. Book. A great introduction to using R for clinical data science. The two authors are very experienced in teaching R and have their own course.
R Programming. Course. It's considered one of the leading if not the leading online courses on R. I've not taken it myself, but I've heard good things about Roger Peng's courses. If you don't want the final certificate, it's free.
Pipes. Tutorial. A fantastic and comprehensive tutorial on how to use pipes in R - these are now becoming standard in R syntax.
magrittr. Package. Tutorial. The standard package for the commonest pipes used in R - very useful to go through as it introduces many super-useful operators. I especially like "%<>%", which uses and updates an object at the same time.
Hadley Wickham's Style Guide. Tutorial. Guidelines. This in combination with the style guide by Google (referenced in the website) are the most standard coding styles in R.
reticulate. Package. Tutorial. A great way to code in Python within R. As of RStudio 1.2, reticulate can be used to start a Python environment, combine R and Python within the same R Markdown and produce plots! A great way of combining working within both languages.
- head. Cool syntax. You can use negative indexes to remove from the end. For example, head(x, -1) will return all elements of object x apart from the last one.
Text mining with R. Book. A great GitHub resource with code for cutting-edge text mining with R.
datapasta. Package. Tutorial. This is a fantastic package that translates copied data in your clipboard into R input and your R output into copied data in your clipboard. It is amazing how it simply works and effortlessly knows how to handle your data. You can also assign its functions to a keyboard shortcut, which makes this a super easy way to import/export data. I love how it also quotes unquoted strings within a vector. Very highly recommended!
Data from clipboard in Windows. Tutorial. In the R distribution for Windows, there is a readClipboard
function, which can be use dto read data from the Clipboard and read tables using read.table(file = "clipboard", ...)
. Having said that, the package datapasta
seems like a better solution.
vroom. Package. Tutorial. Lighting fast import of .csv files (x10 faster than data.table). It does this by only accessing the part of the database required for the action needed.
dplyr. Package. Tutorial. By far the most popular package for data manipulation in R. Part of the tidyverse.
data.table. Package. Tutorial. Probably the second most popular package for data manipulation in R. This package is much faster than dplyr and thus preferred when using large datasets. It uses a somewhat unique syntax.
zeallot. Package. A clever way to unpack data from objects such as data frames or lists into newly defined vectors using a new operator: %<-%.
janitor. Package. Tutorial. An easy way to identify duplicate by multiple columns (exact of fuzzy match). This is often surprisingly hard to do in base R.
Databases using R. Tutorial. A fantastic guide on how to use R to analyse large datasets, using resources like Spark, SQL and others. It also illustrates how packages like dplyr
can be combined with SQL to analyse large datasets. A must for big data management.
broom. Package. Tutorial. A great package to summarize models fit into meaningful data frames.
stargazer. Package. Tutorial. A great package to automatically produce summary tables in HTML or LaTeX ready for publication.
skimr. Package. Tutorial. Produce pretty summary tables where the last column represents a histogram of the data. It can output directly into R and as an HTML table.
summarytools. Package. Tutorial. Produce pretty summary tables and contingency tables.
papeR. Package. Tutorial. A package to produce summary statistics quickly. I particularly like the ease of choosing which type of variables to summarize, how to group those variables and how to produce pretty LaTeX/HTML output tables without having to rely on other packages, such as ktable.
Moving averages. Tutorial. A great tutorial on how to calculate moving averages in R and how to use some of the power of Base R by utilising time series (ts) objects or packages designed for the purpose of analysing time-series, such as zoo
.
R Graph Gallery. Vignette. Educational. Another superb website created by Yan Holtz. It is a gallery of all sorts of different types of graphs created using different specifications. I like how it gives suggestions at the end indicating that if you are interested in this type of plot, you may also be interested in this other type of plot. I derive lots of benefit by browsing through this website even for simple plots for ideas of how to make them look nicer. An unbelievable resource and a must in the armamentarium of anyone working with data.
Data-to-Viz. Vignette. Educational. This is a beautiful and SUPERB website created by Yan Holtz including all sorts of different plots you might imagine alongside the required code to produce them and, most importantly, ways to avoid caveats/mistakes usually done when using such plots. An unbelievable resource and a must in the armamentarium of anyone working with data.
Top 50 ggplot2 Visualizations - The Master List (With Full R Code) Vignette. A great resources of 50 very well done and super useful visualizations to get inspiration and code from when creating your own. Definitely one of the websites I check out regularly to remind myself of code.
ggplot2. Package. Tutorial. By far the most popular package for data visualization in R. Look for visualization packages starting with "gg" - these contain custom functions to produce plots using ggplot2. Some of the attributes in ggplot2 may be tricky to use: go to the official ggplot2 wiki here for some really good explanations of these.
GGally. Package. Tutorial. Mind-blowingly good extension to ggplot2 to summarize data (try replacing graphics::pairs
with GGally::ggpairs
!), model coefficients, survival data, maps, networks and create beautiful grids. Astonished.
ggrepel. Package. Vignette. This is an amazing package for adding text and labels to your plot. It replaces geom_text() with geom_text_repel() and geom_label() with geom_label_repel() to add a huge amount of super necessary functionality, such as lines connecting the label to the point, making usre labels close to each other do not overlap and controlling all sorts of different aspects of fitting text/labels with ease. A fantastic package that simply works and should really replace the functions currently offered by ggplot2.
ggsignif. Package. Tutorial. An extension to ggplot2 that can be used to add details about hypothesis testing (e.g. asterisks) to your plot.
ggpubr. A ggplot2 wrapper that helps create good looking figures ready for publication, including text on the plots and star indications for significance.
ggridges Package. Vignette. A package for easily building good-looking ridgeplots (plots in which the distribution of different parameters is plotted within the same plot). A useful package - some of these can be done well using other packages as well, such as dotwhisker and bayesplot.
ggeffects. Package. Tutorial. A package to easily produce pretty ggplots of predicted value distributions across model specifications.
bbplot. Package. A ggplot2 wrapper that helps create good looking figures in the style of BBC - apparently now BBC uses this package to produce figures.
corrplot. Package. Tutorial. A package to easily produce beautiful and intuitive correlation plots.
dotwhisker. Package. Tutorial. A fantastic package to easily produce ggplots of coefficients from models fit and compare.
sunburstR. Package. Tutorial. A package that makes it easy to produce beautiful and interactive sunburst plots in R.
Best R packages for interactive plots. Overview. An overview of packages for pretty and interactive plots in R.
Pretty survival plots. Tutorial. A guide on how to use ggplot2 to produce pretty survival curves.
Complex barplots 1. Tutorial. A good exmaple of not-as-straight-forward bar plots with ggplot2. Download the script from the top of the page.
Complex barplots 2. Tutorial. A guide on StackOverflow on how to build rather inveted bar plots using Base R. Quite a useful guide that illustrates multiple useful functions necessary to build such plots.
Wordclouds. Tutorial. A tutorial on how to create wordclouds in R.
alluvial. Package. Tutorial. A package for building pretty alluvial diagrams (flow diagrams) using ggplot2.
circlize. Package. Book tutorial. A package to build beuatiful circular plots, including circular flow (alluvial) diagrams. Great guide.
Forest plots from scratch. Tutorial. Code I copied from Charles DiMaggio's website to create a pretty forest plot from scratch.
survminer. Package. Tutorial. The best package I have found for visualization of survival curves with data tables and for visualization of Cox model evalutions. Better than GGally in this respect.
Color Palettes. Tutorial. Functions. This is the help page on the color palettes offered by R, with no need to use external packages. These are pretty good and allow use of good combinations of multiple colors easily and succintly.
viridis. Package. Tutorial. An introduction to using viridis as a package or within ggplot2 to create plots with beautifully matching colors.
Visualization of missingness. Package. Tutorial. A good example of how to easily visualize missingness across a dataset.
[Survival function in Cox regression.](Calculating survival probability per person at time (t) from Cox PH) Tutorial. Fitting a Cox regression in R is straight forward, but it is not immediately apparent how to obtain the survival function. This is a great answer on Stack Overflow with guidance on how to do so.
rbokeh. Package. Tutorial. Bokeh is a visualization library that provides a flexible and powerful declarative framework for creating web-based plots. Bokeh renders plots using HTML canvas and provides many mechanisms for interactivity. Bokeh has interfaces in Python, Scala, Julia, and now R.
r2d3. Package. Tutorial. A library that provides an R interface to the great D3.js library for beatufiul and interactive visualizations. This was created by the RStudio crew.
-
gt. Package. Tutorial. An attempt to create a grammar for tables. Still at an early stage and requires some work, but a terribly necessary package to eventually appear.
-
DT. Package. Vignette. An R interface for the powerful JavaScript library DataTables. Looks like a powerful alternative to kableExtra, but the latter is still my go to as of the time of writing.
-
Many packages for tabulating. Great packages for tabulation as reported by
gt
: knitr, kableExtra, formattable, DT, pander, huxtable, reactable, flextable,, pixiedust, tangram, ztable, condformat, stargazer, xtable.
Iso. Package. Tutorial. A great package for fitting isotonic regression, which helps with calibration of your models after fitting them - a very important and very underused technique.
Imputation and IV regression for missing data. Tutorial. Imputation and IV regression for missing data in R - it illustrates how to use packages mice
, AER
and tidyverse
for this purpose.
lme4. The standard and fastest package for fitting mixed-effects models. Here is a great tutorial on how to go about fitting a mixed-effects model from Tufts University. Here is another great tutorial by Berkeley, which combines this with discussing Bayesian fitting for hierachical models using rstanarm
. When not fitting a generalized linear model, I use the lmerTest package, which makes testing the random-effects, comparing models and getting p-values a lot easier.
Contrasts in linear models. Tutorial. A tutorial on how to use the "contrast" parameter when fitting linear models in R. This refers to the contrasts applied between levels of a factor.
iBreakDown. Package. Overview. A package to help explain prediction models - I have not tried it yet, but it appears promising.
MASS. Package. Documentation. An indispensible package containing some very useful functions complementary to Base R (e.g. the ability to fit negative binomial regression). I also particularly like the fitdistr
function, which uses maximum likelihood to identify the parameters of the assumed underlying distribution.
mfp. Package. Tutorial. This package makes it easy to fit fractional polynomials. I have no knowledge on the subject as of the time of writing and have not used the package. Instead of polynomial models, which proceed in powers of x, fractional polynomials proceed in fractional powers of x (e.g. 2, 1/2, etc.). This is a great tutorial on advanced implementations of splines, GAM and fractional polynomials.
Experimental Design CRAN Task View. Collection. A CRAN collection of many packages with specialised functions for designing, running and analysing data from experiments, including clinical trials.
H2O. Package. Tutorial. A fantastic package for speeding up the process of fitting machine learning models. This is a tutorial illustrating how to use the H2O package. Here is a tutorial on how to use GBM with H2O.
healthcareai. Package. A package designed to make it easy to automatically fit many machine learning methods, automatically tune them and get a model. I have not explored this package, but it is open source and the machine learning methods being used appear good. I do not like that it handles data automatically without much insight by the researcher (e.g. regarding missingness), but I need to try it out before making any other comments.
mlr. Package. Tutorial. A great tutorial on how to use the mlr framework to fit machine learning models in R - it allows for fitting multiple methods and for tuning parameters easily. Slower than H2O, but allows for a lot more methods.
caret. Package. Tutorial/Book. Caret is one of the most famous frameworks for machine leanring in R. It bares very high resemblence to mlr and it makes applying and analyzing the results of many machine learning algorithms easy and standardized. This book is a fantastic introduction to all that caret can do. This is a great tutorial on Kaggle on how to compare multiple classification models using caret.
SuperLearner. Package. Tutorial. A very well-done package, the primary advantage of which is its ability to automatically ensemble all fitted models into a new model. It offers all important learning algorithms, makes it easy to add your own algorithms, tune the algorithms or select features. Highly recommended!
Fundamentals of Bayesian Data Analysis with R. Tutorial. A tutorial of the fundamentals by DataCamp.
rstan. Package. Tutorial. A fantastic wrapper around the programming language Stan, which was specifically created to ease Hamiltonian Monte Carlo sampling methods for fitting Bayesian models. Mandatory for anyone interested in Bayesian data analysis. A great tutorial on how to fit mixture models using rstan.
rstan examples. Examples. A super useful resource of example models fitted using RStan. This contains some of the most commonly used models when using RStan and has definitely saved me a lot of time in learning RStan and using it. Note that some fo the functions being used in these examples are dated, but I am sure RStan will let you know of the correct function to use.
rstanarm. Package. Vignette. This package truly makes Bayesian statistics accessible to the wider community of data scientists. It redefines the basic R functions for fitting models to fit Bayesian models using the Stan backend. Very highly recommended for your definitive answer to working with Bayesian statistics. This is a fantastic tutorial on how to use rstanarm to fit hierarchical models (it also goes through the lme4
package)! This is a great tutorial on graphical posterior predictive checks! A tutorial on how to fit a beta regression using rstanarm.
tidybayes. Package. Tutorial. A great package that can take objects created using rstanarm
or stan
and provide the functions required to manipulate and analyze them using the principles of tidyverse.
projpred. Package. Vignette. This package performs variable selection for fitting models using the rstanarm package. This is super useful in any kind of predictive data analysis exercise. It uses the Stan as a backend.
loo. Package. Examples. A fantastic package built by the RStan team to compare between Bayesian models. Works seamlessly with RStan and necessary for assessment of Bayesian models.
bayesplot. Package. Examples. A fatnastic package built by the RStan team to analyze the MCMC runs of your sampling and your posterior distributions. It creates super pretty plots using ggplot2 and provides the definitive solution to a tricky aspect of using Bayesian statistics. Another amazing tutorial on how to plot MCMC plots with bayesplot.
Towards a principled Bayesian workflow. Examples. As per the official examples provided by RStan developers, this provides examples of fitting mixture models with RStan.
RJAGS. A super useful package for Bayesian modelling within R. This is a link to a tutorial on how to use it by the great as always DataCamp.
diagis. Package. Tutorial. "diagis is an R package containing functions relating weighted samples obtained for example from importance sampling. The main motivation for developing diagis was to enable easy computation of summary statistics and diagnostics of the weighted MCMC runs provided by bssm package (Helske and Vihola 2016; Vihola, Helske, and Franks 2017) for Bayesian state space modelling."
Bayesian Cognitive Modeling - A practical course. "This site is dedicated to the book “Bayesian Cognitive Modeling: A Practical Course”, published by Cambridge University Press. This book forms the basis for a week-long course that we teach in Amsterdam, during the summer. We’d love to see you there. Follow this link for next year’s course."
estimatr. Benchmark. Package. A great package for a basic econometric approach to causal inference. As the benchmark shows, it works faster than using a combination of base R and the sandwich or other estimators. The same group has other very interesting packages as well (e.g. fabricatr
and randomizer
).
causalToolbox. Package. Tutorial. A great package to easily fit an honest random forest, fit a BART, estimate the average treatment effect and fit confidence intervals.
dmlmt. Package. Tutorial. A package that makes using double machine learning easy. It implements work by Chernozukov.
grf. Package. Tutorial. A package to fit generalized random forests by Susan Athey and Stefan Wager.
coin. Package. The most popular package for all sorts of permutation tests.
MatchIt. Package. Tutorial. An implementation of matching algorithms by some of the most prominent names in the field - indispensable for things such as propensity-score matching.
Implementation of G-Computation. Tutorial. A great paper illustrating how to implement G-computation in R for causal inference.
metafor. My favorite package for meta-analyses. It also comes with amazing tutorials illustrating how functions within metafor can be used to replicate seminal papers in the field.
forestplot. Package. Tutorial. Create forest plots that look better (marginally better) than the ones produced automatically by rmeta or metafor. Still not as gooda s this could be.
Bayesian meta-analysis with R. Tutorial. A fantastic tutorial on how to run a fixed-effect and random-effects Bayesian meta-analysis from scratch.
Doing Meta-Analysis in R - A hands-on guide. A great bookdown book on the basics of creating a meta-analysis using R. It also contains great code for meta-regression, network meta-analyses and Bayesian meta-analyses.
R Markdown: The Definitive Guide Book. Written by Yihui Xie, who is the creator of knitr and the definite guide to using R Markdown. Very well written and contains the answer to 95% of the questions I usually have in using R Markdown. A treasure in finding functions of R Markdown you never thought existed.
A few tips for R Markdown. Vignette. This is for slightly more advanced users of R Markdown and it goes through some tricks that you may want to learn (e.g. how to reference within the document, how to use interactive tables, etc.). I primarily like this because it provides a lead towards starting to create your own customized HTML templates!
R Markdown Templates. Collection. A collection of some of the most important packages of templates for R Markdown (both pdf and HTML). This is probably the most beautiful R Markdown document that I have seen, but I do not know which template they are using.
R Markdown Theme Gallery. Collection. A collection of some of the most important packages and templates for R Markdown HTML. This is where I found most of the packages that I use for my HTML themes.
R Script to HTML. Tutorial. A fantastic tutorial on rmarkdown and how to use it to introduce custom CSS stylesheets and most importantly how to render R Script files into HTML! The latter is fantastically useful and would have saved me tons of time if I had known about it earlier. Do this by using the spin()
function of the knitr
package. This can also be done by using "File -> Compile Report" from the R Script, which will call rmarkdown::render()
, which calls spin()
.
prettydoc. Package. Vignette. Super good looking HTML templates for R Markdown output. I particularly like the hpstr (my current default) and architect themes.
rmdformats. Package. Vignette. Super good looking HTML templates for R Markdown output. I love and extensively use the readthedown format from this package.
epuRate. Package. Very good looking HTML templates for R Markdown by Yan Holtz. I really like being able to hide code and to have the table of contents on the side. This template is particularly useful in learning how to create your own templates, as detailed by Yan here. To install this package, go to the linked GitHub, download the repo on your computer (go to "Clone or download" and then click "Download zip"), unzip it and then go to RStudio and finally do library(devtools); install("path/to/eduRate"); library(eduRate)
- do not use install_github
as recommended on the repo because it gave me an error (in April, 2019). You may want to restart RStudio if it does not show up right away in your templates.
tufte. A package to create very pretty HTML and LaTeX outputs of identical formatting. It uses the Tufte style, which allows producing side-comments and side-plots super easily. Highly recommended!
bookdown. Package. Vignette. A package to write awesome-looking books using R Markdown. The definitive guide referenced here, as well as the celebrated Advanced R book by Hadley were written in bookdown and they all look super impressive! This package also makes it easy to have openly available books online.
rticles. Package. Vignette. Use templates to create articles written in a format appropriate for multiple journals of interest (including journals by Springer, Sage and Taylor & Francis). I have not used this package yet, but it is recommended by RStudio.
vitae. Tutorial. Package. A package designed to help construct and update CVs quickly, without the fuss usually required to update one. I have not tried this one out, but it looks promising!
kableExtra. Package. Tutorial. An indisposable package in creating beautiful tables, ready for publication. It does not come without its glitches (I have yet to make certain of its functions work, such as not repeating a word across rows in LaTeX), but I use it in almost every report I produce. It can be used to produce HTML files or LaTeX files and it always amazes me with the flexibility it allows. Two tips are: (1) use options()
at the very start of your RMarkdown file to define table characteristics so that you do not have to do that every time you call kable()
and (2) remember to set the chunks in which you are using kable()
to results='asis'
.
formattable. Package. Tutorial. A package that helps create beautiful tables that incorporate color and pictures to make them much more appealing and informative. Another great tutorial for formattable
can be found here.
rvest. Package. Tutorial. My favorite package for scraping content off the web. It can be used to work with both HMTL and XML data and it is part of the tidyverse family. It is a way more user friendly to work with the web than packages like XML or XML2. An instructive comparison between rvest and beautifulsoup (the equivalent Python method from which rvest was inspired) can be found here.
Winners of the 1st Shiny Contest Examples. These are examples of beautiful pages created with Shiny from the 2019 shiny contest - code is available to look at and learn from!
modelDown. Package. The modelDown package turns classification or regression models into HTML static websites. With one command you can convert one or more models into a website with visual and tabular model summaries. Summaries like model performance, feature importance, single feature response profiles and basic model audits.
JuliaCall. Package. A package to use Julia from within R Markdown chunks. A very exciting development as Julia is becoming increasingly popular/useful, but I have not used this package yet.
reticulate. Package. Vignette. A package to interface Python within R Markdown. An exciting development, but I have not used this package enough to have specific opinions - so far, so good.
dlstats. Package. It reports monthly downloads of any package on CRAN or Bioconductor across time.
rentrez. Package. Tutorial. Another excellent package by ROpenSci, which lets you download information from PubMed. This has overtaken other PubMed packages (e.g. reutils
and RISmed
) as my favorite package to work with PubMed. Its syntax is not straightforward so go through the tutorial.
crminer. Package. Tutorial. A super useful package! Supply it with a DOI and it will give you the link to the article, the PDF of the article (if accessible) and it will even convert that PDF to a text file for you, amongst others! Having played with this for a bit, as of Dec 2018, I could not get hold of many of the PDFs I actually have access through the university. Nevertheless, a very exciting prospect and a much-needed package.
rcrossref. Package. Tutorial. Another great package by the rOpenSci community! This one is super useful in using the large database of CrossRef to identify the DOI of an article for which you only have the PMID and most importantly, identify citation counts from those mapped by CrossRef!
tidypvals. Package. Tutorial. An awesome package developed by Jeff Leek of Johns Hopkins University. It compiles the p-values gathered by some of the most important papers systematically looking at p-values (inlcuding the Chavalarias paper) into one single R package!
fulltext. Package. Book. A package by the ROpenSci community that brings together several packages to provide meta-data, full text and links to full text from repositories such as Entrez (PubMed), arXiv, bioRxiv, PLOS, Scopus and Microsoft Academic. A great and very handy tool explained extremely well in the linked book.
googleCloudStorageR. Package. Tutorial. A report on how to interact with the Google Storage API of the Google Cloud Platform through R.
RStudio Cloud. Cloud service. Run RStudio on the cloud - no need to download R or RStudio on your computer - operational within minutes!
Code Ocean. Cloud service. You can launch a Cloud Workstation with JupyterLab or RStudio to develop or test code on a powerful cloud computing machine using familiar tools and workflows. Specifically designed to enhance reproducibility.
How to manage memory in R. Tutorial. A good guide on managing memory in R.
Schrute. Package. Tutorial. "This is a package that does/has only one thing: the complete transcriptions of all episodes of The Office! (US version). Use this data set to master NLP or text analysis. This tutorial scratches the surface of the subject with a few examples from the excellent Text Mining with R book, by Julia Silge and David Robinson."
BlueSky. Even though I have not used this GUI, it being so much easier to install than R Commander, being much easier on the eye and creating what look like more beautiful plots makes it quite promising for those interested in GUIs!
DataPackageR. A great way to turn datasets into easily reusable and downloadable packages.
Creating custom themes for RStudio. Guide. A guide by RStudio of the important variables you need to know to manipulate how RStudio looks like. It also provides a link to a great editor to help you develop your own themes, even though they do not tend to translate well to RStudio.