Which bins should be used for demographic data for a choropleth map?

Question

I'm currently working on a project which produces a choropleth map similar to a map like this: http://dl.dropboxusercontent.com/u/54512470/Telefondaten_VIZ/index.html

That map has 2x5 classifications (5 for positive migration numbers, 5 for negative ones.)

I have different underlying numbers than the one I linked. How do I decide the following things:

Linear or Exponential increase of the break points (10, 20, 30, 40, 50 vs. 10, 20, 40, 80, 160)? Or even square-root-shaped?
related: If linear(ish), when is it ok to make some bins larger than others?
'final break point', i.e. what would be the equivalent of -500 and 500 in the linked example

Is there generic advice on these things which can be given without looking at the data?

Edit: The reason I don't want to post the data here is that I'd like to use some kind of algorithm to automatically create bins and resulting choropleth maps. Unlike the linked project, I don't have the migration data for a fixed timespan, but I have data for 2007, 2008, and so on up to 2011. I'd like a radio button to switch between the years (can combine several years).

I don't want the colors to be consitent between different different settings (i.e. + 5000 in year 2007 may have a different color than for 2010 + 2011), but to be informative for a fixed year.

One approach I was thinking of is: Take the largest absolute migration number (e.g. 1200) and round down to the nearest 50, 100, 500, 1000 and so on. This is the lower border of the dark orange, let's call this threshold number T (i.e. 1000 in this example). The threshold for the dark blue is everything below -T (i.e. -1000). The binning points of the linked project are T/10, T/5 and T/2 (in the example 100, 200, 500). I'm unsure about these particular break points. Why not T/10, T/4, T/2 (100, 250, 500)?

possible duplicate of Should I use a discrete or continuous scale for coloring a chloropleth? — Chris W, Apr 26 '14 at 19:26
The question I have flagged this as possible duplicate of comes at it from a color ramp rather than classification (bins) standpoint, but the best voted answer covers all the basics. If statistics is how to lie with numbers, then thematic mapping is how to lie with maps - meaning without knowing specifics of the data and what you're trying to show, the question boils down to 'how do I make the best choropleth from my data' and the answer is 'it depends.' If you feel like reading, Dent's Cartography: Thematic Map Design has several relevant sections/chapters. — Chris W, Apr 26 '14 at 19:34
@ChrisW: Thanks for the link. While it answers some of the questions, I'm not fully satisfied since I'm not sure if there are rules of thumb for when to use which kind of scale. I'll add some more info to show how my question might differ from the linked one. The most outstanding point is that I'm kind of aiming for some automation - I'd like a binning scheme which works on longitundinal data (not necessarily the same scale, but an algorithm to decide which scale to use). — Roland, Apr 26 '14 at 23:11
The additional info is helpful, more fully describes your goal, and narrows the scope of the question. You're looking to automatically class various data into a consistent number of bins when the distribution and range may vary significantly. I'm more comfortable with the cartography than the statistics, and this is something I've always done visually with a given data set. Why those break points? Perhaps they used others and ended up without sufficient areas in a given bin for the map to be meaningful. Normalizing to % change instead of actual value might solve your challenge. — Chris W, Apr 27 '14 at 00:21
I'm curious about your stated desire to NOT use the same colors for different years. This choice makes it harder for the user of the map to interpret meaning as they toggle from year to year. I understand that auto-scaling each year gives you the best use of colors for that particular year, but it comes at the cost of overall comprehension. — Llaves, Apr 27 '14 at 03:05
@Llaves: If I don't normalize the data in some way (or work with quantiles), the reference color palette would probably be based on the total migration data summed up over several years, which means that single years would usually only get one of 'lightest colors' - all of the districts would end up in the bin with the smallest label. It's worth noting that the actual migration numbers are provided as a mouse over. I value the information who is the 'winner'/'loser' in relation to the other districts over the information of the exact number of migrants. — Roland, Apr 27 '14 at 08:36
@ChrisW: If I were to normalize to % changes (i.e. take the max. absolute value and divide every entry by this number): 1. What's your suggestion for the bins? (I'd take 5 bins with 20% each - is this sensible?) 2. What's your suggestion for the labels of the legend to make it clear what we are choroplething there? — Roland, Apr 27 '14 at 08:43
Actually that wouldn't be the way to get your %s - you'd need to divide the starting total of a district by the net migration for a given timespan. In other words, the formula will be the same but the values will change as the timespan is selected. You're looking for the percent change, not the percent of the total population. While it happens to work for your example because the value range is small and the districts are very similar in size, choropleths are best suited for derived values and normalized data, as opposed to absolute values. — Chris W, Apr 27 '14 at 18:07
For the bins I think you'll have a range of %s that is much smaller - it would be a big event to see a region drop 30% in population over a short time span. You're not looking to divide up +/-100% into 10 bins, rather whatever your range is into perhaps no more than 9 bins - one for minimal/no change in the middle, and 4 either side for positive/negative change. The label would be % pop change, but the title would need to specify time span. — Chris W, Apr 27 '14 at 18:22
I also noticed at least three more questions under the Related heading on the right you may want to look at: Normalize Census, Choose Classification, Design Class Breaks for Time — Chris W, Apr 27 '14 at 18:26

Which bins should be used for demographic data for a choropleth map?

0 Answers0