Independent T-test.Rmd

---
title: "t-test"
author: "William W. Lage"
date: "8/12/2020"
output: word_document
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```

```{r import data}
data <- read.csv("wave4.csv")
```

```{r library}
library(jmv) #contains the descriptive statistics, t-test, and histograms
library(groupdata2)  #for upsampling, or downsampling if you prefer
library(car) #for independent Levene's test, though this test can also be an argument in Ttest
library(effsize) #for Cliff's Delta
```

#prep work and cleaning
```{r recoding}
#in the codebook we learned that 996 was refused to answer and 998 was unknown. Since we don't know the value of these, we'll recode them to NA's so they're not included in our statistics
data$WEIGHT[data$WEIGHT=="996"]<-"NA"
data$WEIGHT[data$WEIGHT=="998"]<-"NA"
#entering NA converted the column into a string, so we're going to convert it back to an integer, and R will make the string "NA" into applicable NA's
data$WEIGHT <- as.integer(data$WEIGHT)

#let's also make SEX into male and female so that they have understandable meaning to us
data$SEX[data$SEX=="1"]<-"Male"
data$SEX[data$SEX=="2"]<-"Female"

#let's convert these strings into factor levels
data$SEX <- as.factor(data$SEX)

#the str function allows you to see the structure of your dataframe and varify the variables are configured correctly
str(data)
```
```{r cleaning}
#remove the NAs
data.noNA <- na.omit(data)
#alternatively, you could impute the values using
##data$WEIGHT<- round(with(data, impute(WEIGHT)), 0)

#next we need to remove univariate outliers
##this first function identifies outliers. As we can see individuals having weight >=340lbs or <=22 lbs were not normal for this sample
data.noNA[abs(scale(data.noNA$WEIGHT)) > 3, ]
##and this one gets rid of them
data.nouni <- data.noNA[!abs(scale(data.noNA$WEIGHT)) > 3, ]
```

```{r descriptives}
desc <- descriptives(data.nouni, vars = c('WEIGHT'), splitBy = c('SEX'), hist = TRUE, sd = TRUE, se = TRUE, skew = TRUE, kurt = TRUE)
desc
```
```{r making parametric}
#for the parametric t-test we need to work with data that has an equal representation of each of groups (male/female)
#one method to deal with this (if you have a large sample) is to randomly drop participants from your larger group to get it down to the number in the smaller group
down <- downsample(data = data.nouni, cat_col = 'SEX')
#alternatively, you can randomly sample from the smaller group with replacement to get up to the larger group
up <- upsample(data = data.nouni, cat_col = 'SEX')
#there are also mixed methods in the ROSE and DMWR libraries
```

#cleaned descriptives and t-test
```{r new descriptives}
desc2 <- descriptives(down, vars = c('WEIGHT'), splitBy = c('SEX'), hist = TRUE, sd = TRUE, se = TRUE, skew = TRUE, kurt = TRUE)
desc2
```
```{r equality of variance}
#it's important that the variance be similar for comparison
leveneTest(down$WEIGHT, down$SEX, center = mean)
##since p<.05 (Pr>F) we will have to use a Welch's t-test to account for the unequal variance 
```

```{r paramteric independent t}
ttestIS(data = down, vars = 'WEIGHT', group = 'SEX', effectSize = TRUE, ci = TRUE, desc = TRUE, welchs = TRUE) 
#you can make the levene test built in by passing the argument eqv = TRUE into the equation
```

```{r non-parametric}
#if we don't want to resample the data, and instead want to keep it's original proportion we can unclude a Mann-Whitney U test for non-parametric data
ttestIS(data = data.nouni, vars = 'WEIGHT', group = 'SEX', effectSize = TRUE, ci = TRUE, desc = TRUE, mann = TRUE)
```
```{r non-para effect size}
#Since we're looking a non-parametrics anyways, it is importnat to note that the t-test also has a non-paraemtric effect size calculator that accounts for the Likert-style data used in many surveys
cliff.delta(WEIGHT ~ SEX, data = data.nouni, conf.level = .95, magnitude = TRUE, method = "Cliff's Delta")
```

#visualization
```{r}
library(ggplot2)
```

```{r}
# creation of the bar graph - including specifications such as the color, title, addition of error bars, etc. 
bar1 <- ggplot(down, aes(SEX, y = WEIGHT)) +
  stat_summary(fun.y = mean,
               geom = "col",
               fill = 'dodgerblue3')+
  stat_summary(fun.data = mean_se,
               fun.args = list(mult = 1),
               geom = "errorbar")+
  theme_minimal() +
  ggtitle('Weight by Sex')

bar1
```