forked from rdpeng/RepData_PeerAssessment1
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathPA1_template.Rmd
83 lines (60 loc) · 2.23 KB
/
PA1_template.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
---
title: "Reproducible Research: Peer Assessment 1"
output:
html_document:
keep_md: true
---
## Loading and preprocessing the data
```{r}
data <- read.csv("activity.csv")
```
## What is mean total number of steps taken per day?
```{r}
steps_per_day <- tapply(data$steps, data$date, sum)
hist(steps_per_day)
mean(steps_per_day, na.rm = TRUE)
median(steps_per_day, na.rm = TRUE)
```
## What is the average daily activity pattern?
```{r}
steps_per_int <- tapply(data$steps, data$interval, mean, na.rm = TRUE)
plot(steps_per_int, type = "l", col = "blue", main = "daily activity pattern",
xlab = "5-min interval", ylab = "average number of steps")
```
Which 5-minute interval, on average across all days in the dataset, contains the maximum number of steps? Minute of day of the interval start will be reported above the interval number:
```{r}
which(steps_per_int == max(steps_per_int))
```
## Imputing missing values
Total number of missing values in the dataset:
```{r}
sum(is.na(data$steps))
```
Impute missing values by filing in the daily average of the interval:
```{r}
data$steps_imputed <- data$steps
data[is.na(data$steps), "steps_imputed"] <- steps_per_int
```
Histogram, mean and median of imputed steps per interval:
```{r}
steps_per_day_imp <- tapply(data$steps_imputed, data$date, sum)
hist(steps_per_day_imp)
mean(steps_per_day_imp)
median(steps_per_day_imp)
```
## Are there differences in activity patterns between weekdays and weekends?
```{r}
# this is independent of any languange environment;
# values 0..6, starting on Sunday, c.f. help
dayOfWeek <- as.POSIXlt(data$date)$wday
data_augm <- data.frame(date = data$date,
interval = data$interval,
steps = data$steps_imputed,
weekend = (dayOfWeek %in% c(6, 0)))
steps_per_int_wd <- tapply(data_augm$steps, list(data_augm$weekend, data_augm$interval), mean, na.rm = TRUE)
par(mfrow=c(2,1))
plot(steps_per_int_wd["TRUE", ], type = "l", col = "blue", main = "daily activity pattern: weekend",
xlab = "5-min interval", ylab = "average number of steps")
plot(steps_per_int_wd["FALSE", ], type = "l", col = "blue", main = "daily activity pattern: workday",
xlab = "5-min interval", ylab = "average number of steps")
```