-
Notifications
You must be signed in to change notification settings - Fork 51
Choosing a Scale Type
After deciding on the geographic level of detail of your choropleth, the next most important decision will be on the type of scale to use. choroplethr supports three type of scales: continuous, discrete and manual. They are controlled with the num_buckets
argument.
Here is a choropleth map of US state populations using a continuous scale.
library(choroplethr)
data(choroplethr)
choroplethr(df_pop_state, "state", num_buckets=1, title="2012 State Population Estimates, Continuous Scale")
One striking feature of this map is how much brighter California is than all its neighbors. Indeed, in the western half of the map only California and Texas stand out at all. The reason for this can be seen by looking at a boxplot of state populations; California (37M) and Texas (25M) are the two largest outliers:
ggplot(df_pop_state, aes(factor(1), value)) +
geom_boxplot() +
ggtitle("Boxplot of 2012 State Population Estimates") +
scale_y_continuous(label=comma)
This is both the blessing and curse of choropleths which use a continuous scale: outliers stand out dramatically, but it is difficult to detect differences among the non-outliers.
Setting num_buckets
to a value between 2 and 9 will color the regions in equally sized buckets. By default num_buckets = 9
.
choroplethr(df_pop_state, "state", num_buckets=9, title="2012 State Population Estimates, 9 Buckets")
Here California and Texas still stand out. But there are clear differences in the remaining states in the western half of the US. In particular, there are a cluster of low population states in northwest.
A common misconception is to think that more buckets is always better. For example, using two buckets highlights regions above and below the median value, which is itself quite informative.
choroplethr(df_pop_state, "state", num_buckets=2, title="2012 State Population Estimates, 2 Buckets")
Oftentimes when analyzing spatial data you want to manually decide the values of a scale. Here is how you would highlight states with populations above or below 1M residents:
library(Hmisc) # for cut2
df_pop_state$value = cut2(df_pop_state$value, cuts=c(0,1000000,Inf))
choroplethr(df_pop_state, "state", title="States with Populations Above or Below 1 Million")