Skip to content

Latest commit

 

History

History
88 lines (80 loc) · 3.18 KB

article_list.md

File metadata and controls

88 lines (80 loc) · 3.18 KB

Data Science Toolbox

R programming

Getting and Cleaning Data

EDA

Reproducible Research

  • Structure of Data Analysis
  • Checklist for reproducible data analysis

Stats inference

  • what is stats inference
  • An intro to probability
  • Conditional probability
  • expected values
  • variation
  • 3 common distributions
  • Asymptopics
  • t confidence intervals
  • Hypothesis testing
  • p-values
  • power
  • Multiple Testing
  • bootstrap and resampling

Regression models

  • Influence measures (dffits, dfbetas, rstudent, rstandard, Cook's distance)
  • What is regression
  • Stats learning vs machine learning
  • interpreting regression coefficients
  • residuals
  • regression inference
  • multivariate regression
  • Model selection
  • GLMs
  • What are link functions
  • Logistic regression
  • Poisson regression

Practical Machine Learning

  • Prediction study design
  • In sample and out of sample errors
  • Overfitting
  • Receiver Operating Characteristic (ROC) curves
  • The caret package in R
  • Preprocessing and feature creation
  • Prediction with regression
  • Prediction with decision trees
  • Prediction with random forests
  • Boosting
  • Prediction blending

Developing Data Products

  • Plotly animation
  • Leaflet map tutorial
  • How are R packages made
  • How to dockerize a shiny app
  • R presentations
  • How to uninstall R