chapter1.Rmd


---
title       : Exploring Polling Data in R
description : This chapter will show you how to explore, refine, and visualize a sample of polling data from the 2016 primaries. You'll start with simple visualizations using base R commands before proceeding to more complicated plots in the ggplot2 package. The final exercises teach you how to generate maps in R using the maps package and the googleVis package.

attachments :


--- type:MultipleChoiceExercise lang:r xp:50 skills:1 key:9245d968f0
## Reading the polls

The first step in working with polling data is understanding what types of conclusions we can derive from the polls. Election polls are surveys designed to estimate voter preferences and generally rely on a large, representative sample of a given population. However, polling data can often be biased, unrepresentative of the population, or missing important variables related to election outcomes (for example, whether someone will actually vote).

Even the best polls are only an approximation of voter preferences. A poll that was taken three months, three weeks, or even three hours before the election still can't tell you exactly how someone will behave in the voting booth.

The plot on the right shows the percentage of voters supporting each candidate across all states in the 2016 Republican primaries. Which of the following is a reasonable conclusion to draw from these data?

*** =instructions
- Ted Cruz is unlikely to win a single state.
- Donald Trump is generally the most popular candidate, but may lag behind other candidates in certain states.
- Donald Trump is guaranteed to win the Republican nomination.
- Donald Trump will likely win the general election against the Democratic candidate in November.

*** =hint
Take a look at the plot. Which candidate appears to be leading? What can these data tell us about voting outcomes?


*** =pre_exercise_code
```{r}

polls <- read.csv("http://s3.amazonaws.com/assets.datacamp.com/production/course_1635/datasets/polls.csv")
polls$polldate <- as.Date(polls$polldate)
polls$electiondate <- as.Date(polls$electiondate)
library(ggplot2)
library(scales)

ggplot(data=subset(polls, type=="rep")) + 
  geom_jitter(aes(y=percent, x=polldate, col=candidate), alpha=0.3, na.rm=TRUE) + 
  geom_smooth(aes(y=percent, x=polldate, col=candidate), span=0.5, se=FALSE, alpha=0.7, na.rm=TRUE) + 
  scale_x_date(breaks = date_breaks("1 month"), labels = date_format("%b %Y")) + 
  scale_y_continuous(breaks = c(0, 20, 40, 60)) +
  labs(x="Poll Date", y="") +
  scale_colour_discrete(breaks=c("trump", "cruz", "kasich", "rubio"), 
                        labels=c("Donald Trump","Ted Cruz", "John Kasich","Marco Rubio")) +
  theme(legend.position="top", legend.title=element_blank())

```

*** =sct
```{r}

msg_bad <- "Not quite right! That may be true, but we can't tell from these polls."
msg_success <- "Exactly! Donald Trump has a lead on average, but that doesn't mean he'll win every state."
test_mc(correct = 2, feedback_msgs = c(msg_bad, msg_success, msg_bad, msg_bad))
```

--- type:NormalExercise lang:r xp:100 skills:1, 3 key:c03ca11be3
## Looking under the hood

In the previous exercise, you viewed polling data from the 2016 Republican primaries. In this exercise, you'll take a look at a larger dataset containing a sample of polls from both the Republican and Democratic primaries. 

A dataset with these data, `polls`, is available in the workspace.

*** =instructions
- Take a look at the structure of `polls` using `str()`.
- Select polls for the Democratic primaries only (`type == "dem"`). Assign these polls to `dem_polls`.
- Use R's base plot function, `plot()`, to plot the date of the poll (`dem_polls$polldate`) on the x-axis, candidate support (`dem_polls$percent`) on the y-axis, and give each candidate (`dem_polls#candidate`) their own color.


*** =hint
- Use `str()` for the first instruction.
- For the second instruction, you should use `subset(polls, polls$type= "..."`.
- For the plot, use `plot(x = ..., y = ..., col = ...)`.

*** =pre_exercise_code
```{r}

polls <- read.csv("http://s3.amazonaws.com/assets.datacamp.com/production/course_1635/datasets/polls.csv")
polls$polldate <- as.Date(polls$polldate)
polls$electiondate <- as.Date(polls$electiondate)


```

*** =sample_code
```{r}
# polls is available in your workspace

# Check out the structure of polls


# Select polls for the Democratic primaries only: dem_polls
dem_polls <- subset(polls, polls$type == "___")

# Using dem_polls, plot polldate on the x-axis, percent (i.e. percent of voters supporting a candidate) on the y-axis, and set the color using candidate
plot(___, ___, col= ___)

```

*** =solution
```{r}
# polls is available in your workspace

# Check out the structure of polls
str(polls)

# Select polls for the Democratic primaries only: dem_polls
dem_polls <- subset(polls, polls$type == "dem")

# Using dem_polls, plot polldate on the x-axis, percent (i.e. percent of voters supporting a candidate) on the y-axis, and set the color using candidate
plot(dem_polls$polldate, dem_polls$percent, col = dem_polls$candidate)

```

*** =sct
```{r}
# SCT written with testwhat: https://github.com/datacamp/testwhat/wiki

test_function("str", args = "object",
              not_called_msg = "You didn't call `str()`!",
              incorrect_msg = "You didn't call `str(object = ...)` with the correct argument, `object`.")

test_object("dem_polls")

test_function("plot", args = "x")
test_function("plot", args = "y")
test_function("plot", args = "col")

test_error()

success_msg("Great work! Your polling data are accompanied by information about the poll operator (`pollster`) and sample size. That plot looks a little messy. In the next exercise, you'll use the `ggplot2` package to produce a more intuitive plot.")
```


--- type:NormalExercise lang:r xp:100 skills:1 key:b29b654048
## Visualizing polling trends

Your previous plot of Democratic primary polls was difficult to interpret because it didn't have a legend and didn't obviously show trends over time. To draw conclusions from polling data, analysts (and the general public!) often rely on crisp and clear visualizations. In this exercise, you'll use the [ggplot2](http://www.rdocumentation.org/packages/ggplot2/versions/2.1.0) package to produce a more intuitive plot of Democratic primary polls.

In this exercise, you'll return to the complete dataset of polls (`polls`), which is preloaded in your workspace.

*** =instructions
- Recreate your previous plot using the [ggplot()](http://www.rdocumentation.org/packages/ggplot2/versions/2.1.0/topics/ggplot). Select only data from Democratic primaries (`pollstype = "dem"`) using `subset()`. Plot `percent` on the y-axis, `polldate` on the x-axis, and set color to `candidate`. Store your plot as `dem_plot`.
- View your new plot.
- Add a trend line to `dem_plot` using [geom_smooth()](http://www.rdocumentation.org/packages/ggplot2/versions/2.1.0/topics/geom_smooth). Set the `span` equal to 0.5 and remove confidence intervals (`se = FALSE`).


*** =hint
- Use `subset(polls, polls$type == "dem")` to select only Democratic primaries.
- Make sure you set y = `percent`, x = `polldate`, and col = `candidate`.

*** =pre_exercise_code
```{r}
polls <- read.csv("http://s3.amazonaws.com/assets.datacamp.com/production/course_1635/datasets/polls.csv")
polls$polldate <- as.Date(polls$polldate)
polls$electiondate <- as.Date(polls$electiondate)

```

*** =sample_code
```{r}
# Load the ggplot2 package


# Create a plot of Democratic candidate support (percent) over time (polldate), setting color to candidate
dem_plot <- ggplot(data = subset(polls, polls$type == "___"), aes(y = ___, x = ___, col = ___)) + 
  geom_point(alpha = 0.5) 

# View your new plot


# Add a trend line for each candidate using geom_smooth()
dem_plot + 
  geom_smooth(___)

```

*** =solution
```{r}
# Load the ggplot2 package
library(ggplot2)

# Create a plot of Democratic candidate support (percent) over time (polldate), setting color to candidate
dem_plot <- ggplot(data = subset(polls, polls$type == "dem"), aes(y = percent, x = polldate, col = candidate)) + 
  geom_point(alpha = 0.5) 

# View your new plot
dem_plot

# Add a trend line for each candidate using geom_smooth()
dem_plot + 
  geom_smooth(span = 0.5, se = FALSE) 
  
```

*** =sct
```{r}
# SCT written with testwhat: https://github.com/datacamp/testwhat/wiki

test_function("ggplot", args = "data",
              not_called_msg = "Make sure to call `ggplot()`",
              args_not_specified_msg = "Have you specified the argument `data` in `ggplot`?",
              incorrect_msg = "Have you correctly specified the argument `data` in `ggplot`?")

test_function("ggplot", args = "mapping")

test_function("geom_smooth", args = "span")
test_function("geom_smooth", args = "se")


test_error()

success_msg("Excellent! Your new plot of Democratic primary polls is crisp and easy to understand.")
```


--- type:NormalExercise lang:r xp:100 skills:1, 3, 8 key:a85763af21
## Improving poll data quality

You just made a great plot showing support for Hillary Clinton and Bernie Sanders over the course of the 2016 Democratic primaries. Before you draw any conclusions from this plot, you'll want to address issues of data quality. Even the best polls can suffer from inaccuracy caused by low sample size or pollster bias. 

In this exercise, you'll search your polling data for unrepresentative polls and refine your dataset to contain only the most accurate polls.

`polls` is available in your workspace. 

*** =instructions
- One common source of inaccuracy in polls is low sample size. Take a look at sample sizes across your data using [summary()](http://www.rdocumentation.org/packages/base/versions/3.3.1/topics/summary).
- Generate a histogram showing the distribution of polls where `samplesize` is less than 1000. A vector of breaks for your histogram (`breaks`) has been prepared for you.
- Create a new data frame that contains only polls with a sample size greater than 400. Save this as `polls2`.

*** =hint
- Specify the sample size variable within your `polls` data frame using `polls$samplesize`.
- Include only a subset of your data using `polls[polls$samplesize > ..., ]`.


*** =pre_exercise_code
```{r}
polls <- read.csv("http://s3.amazonaws.com/assets.datacamp.com/production/course_1635/datasets/polls.csv")
polls$polldate <- as.Date(polls$polldate)
polls$electiondate <- as.Date(polls$electiondate)

breaks <- seq.int(0, 1300, 100)

```


*** =sample_code
```{r}
# Summarize samplesize (contained in the polls data frame)
summary(___)

# Create a histogram showing the distribution of polls with a sample size below 1000. Use the vector `breaks` for your histogram breaks.
hist(___[polls$samplesize < 1000], breaks)

# Create a new data frame that contains only polls with a sample size greater than 400: polls2
polls2 <-


```

*** =solution
```{r}
# Summarize samplesize (contained in the polls data frame)
summary(polls$samplesize)

# Create a histogram showing the distribution of polls with a sample size below 1000. Use the vector `breaks` for your histogram breaks.
hist(polls$samplesize[polls$samplesize < 1000], breaks)

# Create a new data frame that contains only polls with a sample size greater than 400: polls2
polls2 <- polls[polls$samplesize > 400, ]


```

*** =sct
```{r}
# SCT written with testwhat: https://github.com/datacamp/testwhat/wiki

test_function("hist", args = "x")

test_object("polls2")

test_error()

success_msg("Well done! You've removed a key source of inaccuracy in your data. With these refined data, you are ready to produce more accurate plots.")
```


--- type:NormalExercise lang:r xp:100 skills:1 key:ede1dd57e1
## Visualizing the refined data

Now that you've refined your polling data, you're ready to produce more accurate plots In this exercise, you'll create new plots for the Democratic and Republican primaries using the `ggplot2` package.

To view both plots at once, you'll need to use the [grid.arrange()](http://www.rdocumentation.org/packages/gridExtra/versions/2.2.1/topics/arrangeGrob) command from the [gridExtra](http://www.rdocumentation.org/packages/gridExtra/versions/2.2.1) package. This command creates a grid which you can fill with plot objects.

`polls2` is available in your workspace and ``ggplot2`` and `gridExtra` have been preloaded. 

*** =instructions
- Recreate your plot of the Democratic polling data using `polls2`. Remember to use `subset` with `type == "dem"`, set the y-axis to `percent`, set the x-axis to `polldate`, and set color to `candidate`. Save this plot as `dem_plot2`.
- Create a similar plot using the Republican polling data contained in `polls2`. Save this plot as `rep_plot2`.
- View both plots together using `grid.arrange()`.

*** =hint
- Use `subset(polls2, polls2$type == "dem")` to select only Democratic primaries. Use a similar command for Republican primaries (`type == "rep"`).
- Make sure you set y = `percent`, x = `polldate`, and col = `candidate` in each plot.


*** =pre_exercise_code
```{r}
polls <- read.csv("http://s3.amazonaws.com/assets.datacamp.com/production/course_1635/datasets/polls.csv")
polls$polldate <- as.Date(polls$polldate)
polls$electiondate <- as.Date(polls$electiondate)

polls2 <- polls[polls$samplesize > 400, ]

library(ggplot2)
library(gridExtra)

```


*** =sample_code
```{r}
# Recreate your plot of the Demcoratic polling data using polls2: dem_plot2
dem_plot2 <- ggplot(data = subset(___), aes(y = ___, x = ___, col = ___)) +
  geom_point(alpha = 0.5, na.rm = TRUE) +
  geom_smooth(span = 0.5, se = FALSE, na.rm = TRUE)

# Create a similar plot for the Republican primaries: rep_plot2
rep_plot2 <- 

# View both plots together (do not modify this command)
grid.arrange(dem_plot2, rep_plot2, nrow = 2)


```

*** =solution
```{r}
# Recreate your plot of the Demcoratic polling data using polls2: dem_plot2
dem_plot2 <- ggplot(data = subset(polls2, polls2$type == "dem"), aes(y = percent, x = polldate, col = candidate)) +
  geom_point(alpha = 0.5, na.rm = TRUE) +
  geom_smooth(span = 0.5, se = FALSE, na.rm = TRUE)

# Create a similar plot for the Republican primaries: rep_plot2
rep_plot2 <- ggplot(data = subset(polls2, polls2$type == "rep"), aes(y = percent, x = polldate, col = candidate)) +
  geom_point(alpha = 0.5, na.rm = TRUE) +
  geom_smooth(span = 0.5, se = FALSE, na.rm = TRUE)

# View both plots together (do not modify this command)
grid.arrange(dem_plot2, rep_plot2, nrow = 2)


```

*** =sct
```{r}
# SCT written with testwhat: https://github.com/datacamp/testwhat/wiki

test_function("ggplot", args = "data", index = 1,
              not_called_msg = "You didn't call `ggplot()`!",
              incorrect_msg = "You didn't call `ggplot(data=...)` with the correct argument")

test_function("ggplot", args = "mapping", index = 1)

test_function("ggplot", args = "data", index = 2,
              not_called_msg = "You didn't call `ggplot()`!",
              incorrect_msg = "You didn't call `ggplot(data=...)` with the correct argument")

test_function("ggplot", args = "mapping", index = 2)

test_error()

success_msg("Excellent! Those plots provide a much better basis for drawing conclusions about the popularity of each candidate.")
```


--- type:NormalExercise lang:r xp:100 skills:1 key:dfbe1e6b60
## The fifty-state strategy

Your plots look great! Now it's time to take a closer look at the polling data. Did you notice these polls are done at the state level? Especially during the primaries, campaigns tend to focus on winning individual states, rather than maintaining popularity across the country. Instead of averaging across all states, it might make sense to view polling data state-by-state.

In this exercise, you'll compare polls in the Republican primaries across three important early states: Iowa, New Hampshire, and South Carolina. This time, you'll keep the confidence intervals to help see how the candidates compare.

`polls2`as well as the `ggplot2` and `gridExtra` packages are preloaded into your workspace.


*** =instructions
- Create a plot using `ggplot()` that includes only polling data for Republicans (`type == "rep"`) in the state of Iowa (`location == "IA"`) drawn from `polls2`. Save this plot to `ia_plot`.
- Create a second plot that includes only polling data for Republicans in New Hampshire (`location == "NH"`). Save this plot to `nh_plot`.
- Create a third plot that includes only polling data for Republicans in South Carolina (`location == "SC"`). Save this plot to `sc_plot`.
- Take a look at all three plots together using `grid.arrange()`.


*** =hint
- Use `subset(polls2, polls2$type == "rep" & location =="IA")` to select only Republican primary polls in Iowa.
- Make sure you set y = `percent`, x = `polldate`, and col = `candidate` in each plot.


*** =pre_exercise_code
```{r}
polls <- read.csv("http://s3.amazonaws.com/assets.datacamp.com/production/course_1635/datasets/polls.csv")
polls$polldate <- as.Date(polls$polldate)
polls$electiondate <- as.Date(polls$electiondate)

polls2 <- polls[polls$samplesize > 400, ]

library(ggplot2)
library(gridExtra)

```


*** =sample_code
```{r}
# Create a plot that includes only polling data for Republicans in Iowa: ia_plot
ia_plot <- ggplot(data = subset(___), aes(y = ___, x = ___, col = ___)) +
  geom_point(alpha = 0.5, na.rm = TRUE) +
  geom_smooth(span = 0.7, na.rm = TRUE) +
  labs(title="Iowa")

# Create another plot that includes only polling data for Republicans in New Hampshire: nh_plot
nh_plot <- ggplot(data = subset(___), aes(y = ___, x = ___, col = ___)) +
  geom_point(alpha = 0.5, na.rm = TRUE) +
  geom_smooth(span = 0.7, na.rm = TRUE) +
  labs(title="New Hampshire")

# Create another plot that includes only polling data for Republicans in South Carolina: sc_plot
sc_plot <- ggplot(data = subset(___), aes(y = ___, x = ___, col = ___)) +
  geom_point(alpha = 0.5, na.rm = TRUE) +
  geom_smooth(span = 0.7, na.rm = TRUE) +
  labs(title="South Carolina")

# Take a look at all three plots together (do not modify this command)
grid.arrange(ia_plot, nh_plot, sc_plot, nrow=3)


```

*** =solution
```{r}
# Create a plot that includes only polling data for Republicans in Iowa: ia_plot
ia_plot <- ggplot(data = subset(polls2, polls2$type == "rep" & location == "IA"), aes(y = percent, x = polldate, col = candidate)) +
  geom_point(alpha = 0.5, na.rm = TRUE) +
  geom_smooth(span = 0.7, na.rm = TRUE) +
  labs(title="Iowa")

# Create another plot that includes only polling data for Republicans in New Hampshire: nh_plot
nh_plot <- ggplot(data = subset(polls2, polls2$type == "rep" & location == "NH"), aes(y = percent, x = polldate, col = candidate)) +
  geom_point(alpha = 0.5, na.rm = TRUE) +
  geom_smooth(span = 0.7, na.rm = TRUE) +
  labs(title="New Hampshire")

# Create another plot that includes only polling data for Republicans in South Carolina: sc_plot
sc_plot <- ggplot(data = subset(polls2, polls2$type == "rep" & location == "SC"), aes(y = percent, x = polldate, col = candidate)) +
  geom_point(alpha = 0.5, na.rm = TRUE) +
  geom_smooth(span = 0.7, na.rm = TRUE) +
  labs(title="South Carolina")

# Take a look at all three plots together (do not modify this command)
grid.arrange(ia_plot, nh_plot, sc_plot, nrow=3)


```

*** =sct
```{r}
# SCT written with testwhat: https://github.com/datacamp/testwhat/wiki

test_function("ggplot", args = "data", index = 1,
              not_called_msg = "You didn't call `ggplot()`!",
              incorrect_msg = "You didn't call `ggplot(data=...)` with the correct argument")

test_function("ggplot", args = "mapping", index = 1)

test_function("ggplot", args = "data", index = 2,
              not_called_msg = "You didn't call `ggplot()`!",
              incorrect_msg = "You didn't call `ggplot(data=...)` with the correct argument")

test_function("ggplot", args = "mapping", index = 2)

test_function("ggplot", args = "data", index = 3,
              not_called_msg = "You didn't call `ggplot()`!",
              incorrect_msg = "You didn't call `ggplot(data=...)` with the correct argument")

test_function("ggplot", args = "mapping", index = 3)


test_error()

success_msg("Great job! It looks like Iowa was a tight race, but Donald Trump had large leads in both New Hampshire and South Carolina.")
```


--- type:MultipleChoiceExercise lang:r xp:50 skills:1 key:0d50c40c7f
## Stronger predictions

Now that you've refined and visualized your polling data, you should have an easier time drawing conclusions from polls across different units (in this case, different parties and states). But be wary - polls are still only an approximation of voter preferences.

Which of the following conclusions can you make from the plots you constructed in the previous exercise?


*** =instructions
- There is a 100% chance that Donald Trump will win New Hampshire.
- There was never a chance of Marco Rubio winning in South Carolina.
- John Kasich will not win in New Hampshire.
- We can't be sure who will win in Iowa.

*** =hint
Have a look at the plots. Which candidate appears to be leading in each state? What can these data tell us about voting outcomes?


*** =pre_exercise_code
```{r}
library(ggplot2)
library(gridExtra)

polls <- read.csv("http://s3.amazonaws.com/assets.datacamp.com/production/course_1635/datasets/polls.csv")
polls$polldate <- as.Date(polls$polldate)
polls$electiondate <- as.Date(polls$electiondate)

polls2 <- polls[polls$samplesize > 400, ]
ia_plot <- ggplot(data = subset(polls2, polls2$type == "rep" & location == "IA"), aes(y = percent, x = polldate, col = candidate)) +
  geom_point(alpha = 0.5, na.rm = TRUE) +
  geom_smooth(span = 0.7, na.rm = TRUE) +
  labs(title="Iowa")

nh_plot <- ggplot(data = subset(polls2, polls2$type == "rep" & location == "NH"), aes(y = percent, x = polldate, col = candidate)) +
  geom_point(alpha = 0.5, na.rm = TRUE) +
  geom_smooth(span = 0.7, na.rm = TRUE) +
  labs(title="New Hampshire")

sc_plot <- ggplot(data = subset(polls2, polls2$type == "rep" & location == "SC"), aes(y = percent, x = polldate, col = candidate)) +
  geom_point(alpha = 0.5, na.rm = TRUE) +
  geom_smooth(span = 0.7, na.rm = TRUE) +
  labs(title="South Carolina")

grid.arrange(ia_plot, nh_plot, sc_plot, nrow=3)

```

*** =sct
```{r}
# SCT written with testwhat: https://github.com/datacamp/testwhat/wiki

msg_bad <- "That's not quite right. Even the best polls are just estimates based on public opinion."
msg_success <- "Great job! Remember, even the strongest polls are limited."
test_mc(correct = 4, feedback_msgs = c(msg_bad, msg_bad, msg_bad, msg_success))
```


--- type:NormalExercise lang:r xp:100 skills:1 key:e7163b7798
## Mapping polling data

Now that you've looked at trends in a few important states, you have a good idea of the variation in polling from one state to another. A valuable way to visualize this variation is to attach polling data to a map of the United States. 

In this exercise, you'll use the [maps](http://www.rdocumentation.org/packages/maps/versions/3.1.1) package to generate a map of candidate support across each state in your sample. The `maps` package contains coordinates for important geographic and political units worldwide which R can use to generate maps. Before you can generate a map from your data, you'll need to merge your polling data with geographic information for each state.

A new dataset containing only the final polling data for Bernie Sanders before each state's primary (`sanders`) has been preloaded into your workspace. For plotting purposes, you'll use the `ggplot2` package and the [ggthemes](http://www.rdocumentation.org/packages/ggthemes/versions/3.2.0) package, the latter of which provides some useful pre-defined themes for plots made with `ggplot()`. Both packages are preloaded for you.


*** =instructions
- Load the `maps` package to gain access to geographic datasets and save the `states` data as an object.
- Use [merge()](http://www.rdocumentation.org/packages/base/versions/3.3.1/topics/merge) to combine `states` and `sanders` on the column `region`. Save this new data frame as `sanders_map`.
- Reorder `sanders_map` according to `order` to make sure R knows how to plot your data.
- Use `ggplot()` to project Bernie Sanders' polling data onto a map of the United States. The `theme_map()` option (from the `ggthemes` package) is a convenient way to produce a crisp and clean map.

*** =hint
- Be sure to specify `states` and `sanders` as the two data frames for your `merge()` command (in that order).
- Order the `sanders_map` data according to the `order` column.


*** =pre_exercise_code
```{r}
sanders <- read.csv("http://s3.amazonaws.com/assets.datacamp.com/production/course_1635/datasets/sanders.csv")

library(ggplot2)
library(ggthemes)

```


*** =sample_code
```{r}
# Load the maps package and state data
library(maps)
states <- map_data("state")

# Merge states and sanders by region: sanders_map
sanders_map <- merge(___, ___, by = ___, all = TRUE)

# Reorder Sanders map according to the maps package order column
sanders_map <- ___[order(sanders_map$___ ), ]

# Use ggplot() to produce a map of Bernie Sanders polling data (do not modify this command)
ggplot() +
  geom_polygon(data = sanders_map, aes(x = long, y = lat, group = group, fill = percent)) +
  labs(title = "Support for Bernie Sanders at Last Poll Before Primary") + 
  theme_map()

```

*** =solution
```{r}

# Load the maps package
library(maps)
states <- map_data("state")

# Merge states and sanders by region: sanders_map
sanders_map <- merge(states, sanders, by = "region", all = TRUE)

# Reorder Sanders map according to the maps package order column
sanders_map <- sanders_map[order(sanders_map$order), ]

# Use ggplot() to produce a map of Bernie Sanders polling data (do not modify this command)
ggplot() +
  geom_polygon(data = sanders_map, aes(x = long, y = lat, group = group, fill = percent)) +
  labs(title = "Support for Bernie Sanders at Last Poll Before Primary") + 
  theme_map()

```

*** =sct
```{r}
# SCT written with testwhat: https://github.com/datacamp/testwhat/wiki

test_function("merge", args = c("x", "y"),
              not_called_msg = "You didn't call `merge()`!",
              incorrect_msg = "You didn't call `merge( x = ..., y = ...)` with the correct arguments")

test_object("states")

test_object("sanders_map")

test_error()

success_msg("Excellent! It looks like Bernie Sanders was polling very well before the Vermont primary. No surprise there. Sanders' polling numbers across the South are much lower.")
```


--- type:NormalExercise lang:r xp:100 skills:1 key:341d0fbe4d
## Interactive maps using googleVis

Your map looks excellent! Attaching polling data to a map allows for easy and intuitive identification of regional trends for each candidate. 

However, a color gradient alone makes it difficult to identify specific polling numbers in each state. Ideally, you want your map to display general trends and specific information without being too cluttered. To accomplish this, you'll use the [googleVis](http://www.rdocumentation.org/packages/googleVis/versions/0.6.0) package to generate an interactive map from your Sanders polling data.

The `googleVis` package allows you to create interactive charts and maps in R by providing a direct interface to the Google Charts API. Unlike the `maps` package, maps in `googleVis` do not require you to attach geographic coordinates to your data as long as they contain relevant geographic names (in this case, states). The [gvisGeoChart()](http://www.rdocumentation.org/packages/googleVis/versions/0.6.0/topics/gvisGeoChart) command requires you to specify your data, a location variable, and a color variable.

The Bernie Sanders polling data (`sanders`) is preloaded into your environment.

*** =instructions
- Load the `googleVis` package.
- Generate a googleVis object (`sanders_gvis`) by specifying `sanders` as the data, setting the location variable to `location`, and setting the color variable to `percent`.
- Plot `sanders_gvis` using the base R plot command (`plot()`).

*** =hint
- The `gvisGeoChart()` command requires you to specify data, a location variable, and a color variable. Be sure to use quotes when specifying variables.

*** =pre_exercise_code
```{r}
sanders <- read.csv("http://s3.amazonaws.com/assets.datacamp.com/production/course_1635/datasets/sanders.csv")
suppressPackageStartupMessages(library(googleVis))
options(gvis.plot.tag = 'chart')


```


*** =sample_code
```{r}
# Load the googleVis package


# Generate a googleVis map object: sanders_gvis
sanders_gvis <- gvisGeoChart(data = ___, locationvar = , colorvar = ,
                          options=list(region="US", 
                                       displayMode="regions", 
                                       resolution="provinces"))

# Plot your googleVis object


```

*** =solution
```{r}
# Load the googleVis package
library(googleVis)

# Generate a googleVis map object: sanders_gvis
sanders_gvis <- gvisGeoChart(data = sanders, locationvar = "location", colorvar = "percent",
                          options=list(region="US", 
                                       displayMode="regions", 
                                       resolution="provinces"))

# Plot your googleVis object
plot(sanders_gvis)

```

*** =sct
```{r}
# SCT written with testwhat: https://github.com/datacamp/testwhat/wiki

test_function("gvisGeoChart", args = "data")
test_function("gvisGeoChart", args = "locationvar")
test_function("gvisGeoChart", args = "colorvar")

test_error()

success_msg("Great job! Now that you have an interactive map, you can get a general overview of trends across the United States as well as specific polling numbers for each state.")
```