Skip to content

Choosing a Scale Type

arilamstein edited this page Mar 22, 2014 · 1 revision

After deciding on the geographic level of detail of your choropleth, the next most important decision will be on the type of scale to use. choroplethr supports three type of scales: continuous, discrete and manual. They are controlled with the num_buckets argument.

Continuous Scales

Here is a choropleth map of US state populations using a continuous scale.

library(choroplethr)
data(choroplethr)
choroplethr(df_pop_state, "state", num_buckets=1, title="2012 State Population Estimates, Continuous Scale") 

One striking feature of this map is how much brighter California is than all its neighbors. Indeed, in the western half of the map only California and Texas stand out at all. The reason for this can be seen by looking at a boxplot of state populations; California (37M) and Texas (25M) are the two largest outliers:

ggplot(df_pop_state, aes(factor(1), value)) + 
  geom_boxplot() + 
  ggtitle("Boxplot of 2012 State Population Estimates") + 
  scale_y_continuous(label=comma)

This is both the blessing and curse of choropleths which use a continuous scale: outliers stand out dramatically, but it is difficult to detect differences among the non-outliers.

Discrete Scales

Setting num_buckets to a value between 2 and 9 will color the regions in equally sized buckets. By default num_buckets = 9.

choroplethr(df_pop_state, "state", num_buckets=9, title="2012 State Population Estimates, 9 Buckets") 

Here California and Texas still stand out. But there are clear differences in the remaining states in the western half of the US. In particular, there are a cluster of low population states in northwest.

A common misconception is to think that more buckets is always better. For example, using two buckets highlights regions above and below the median value, which is itself quite informative.

choroplethr(df_pop_state, "state", num_buckets=2, title="2012 State Population Estimates, 2 Buckets") 

Manual Scales

Oftentimes when analyzing spatial data you want to manually decide the values of a scale. Here is how you would highlight states with populations above or below 1M residents:

library(Hmisc) # for cut2
df_pop_state$value = cut2(df_pop_state$value, cuts=c(0,1000000,Inf))
choroplethr(df_pop_state, "state", title="States with Populations Above or Below 1 Million")