Wednesday, September 25, 2013

(More) Fun with Congressional District Maps

My last post looked at the boundaries of Congressional Districts in the US and tried to draw some conclusions about the political motivations behind the drawing of their boundaries.  Specifically, I calculated the ratio of a district's perimeter divided by its area to find geometric oddballs--districts with funny shapes that I interpreted as evidence for gerrymandering.

It turns out I was on the right track, but didn't have things quite right.  A friend pointed out a link to a study that a professional geographer did on the same topic.  His analysis was pretty close to mine, except he used the the ratio of perimeter-squared over area.  This keeps the ratio dimensionless and is the proper way to do it.  My ratio was actually biasing my result towards finding smaller districts (which tend to be in urban areas, which tend to be Democratic, yadda yadda yadda).

So I re-did my calculation and here is the revised histogram using the correct ratio



You can see one or two Republican districts in the tail towards the right, but it's still a lot more blue than red.  A good sign that this is working is that I now have North Carolina's 12th, one of the most obviously gerrymandered cases, as the district with the highest ratio.

Piggybacking off another comment on my last post, I also decided to add a few more dimensions to my data set.  The commenter suggested that the perimeter-to-area alone might be indicative of natural boundaries that occur in urban areas, and not any malicious political intent.  So I decided to add some extra information and see if it makes gerrymandered districts pop out more obviously.

I added two more variables for each district: the margin of victory in the 2012 election, and the fraction of neighboring districts with the same party affiliation.  Remember that "packing" is the strategy of putting all of your opponent's supporters in a single district so that you can win the surrounding districts and come away with a net gain in Congressional seats.  As a result, districts that are examples of packing should have a comfortable margin of victory and should be surrounded by districts under the control of the opposite party.

This seemed like it should be easy, but finding the data and getting it into a neat form for analysis proved more difficult than I expected.  I took the results of the 2012 Congressional elections from the Federal Election Commission.  Unfortunately, their monstrosity of a spreadsheet tallied all the races (primary and general) for every candidate in every wacko third party for every seat.  This took some work to clean up.  After I got it in order, I tallied up the margins of victory for each district on a scale ranging from 0 (a dead heat) to 1 (a blowout with one candidate receiving 0% of the vote).

The next task was to go through each district and calculate the fraction of neighboring districts under the same party's control.  The first step involved finding the neighboring districts.  This took a lot of thinking.  I was going to do this by going through each district and calculating the distance to all the other districts' boundaries but that would have been horrible.  The boundaries are specified in 100 meter intervals, that means the arrays that hold them have hundreds of thousands of points.  If I'm calculating distances, the number of calculations I'd have to do is astronomical: for P points and D districts it would be $O(P^2 D^2)$.  With P around 100,000 and D=435, that number is huge.

Instead of doing things the brute force way, I got creative and realized that neighboring districts would share some of the same points in their coordinates.  If I could convert these coordinate arrays for each district into sets, I could take advantage of Python's blazingly fast set operations.  With a quick one-liner I can find if any two districts share common points (intersect) and the average-case time it takes for this operation is $O(P)$, a big improvement over $O(P^2)$.

Ok, so now that I've calculated everything I need, what do the data look like?  Here is a 3D plot of the perimeter-to-area ratio, margin of victory, and the fraction of neighboring districts under the same party control.  Dots are color coded according to the current controlling party.
3D scatterplot of all 435 Congressional Districts, broken down by perimeter-to-area ratio, margin of victory in the 2012 elections, and the fraction of neighboring districts under the same party's control.  Blue and red map to the current controlling parties in the obvious way.  Source code: district_findneighbors.pyexplore_district.py 

The most interesting cases are those with a large ratio (so they have a funny shape), high margin of victory (so they're primarily a single-party district) and a low fraction of neighboring districts under their party's control (so they're an isolated island).  These guys are prime examples of the "packing" strategy in action!



Gerrymandered districts show up in the lower right here
and the lower left here.
We can set up a rule for whether or not we label a district as gerrymandered.  For anything that lies in the area I've circled, it's a pretty safe bet that it's been gerrymandered.  Once again, the majority of these points are Democratically-controlled districts.  In my opinion, this shows that gerrymandering is indeed being done, and that it's the Republicans doing it now.  That's not to say that Democrats haven't done it in the past (they have).

As a final point, let's beat the dead horse that is North Carolina's 12th District.  This Democratically-controlled district shows up with the highest ratio, one of the largest margins of victory, and exactly ZERO neighboring Democratically-controlled districts.  If I've done nothing else, I've at least shown that politics in North Carolina are messed up.

Here it is, one more time.  The district that's been referred to as "political pornography".



Friday, September 13, 2013

Fun with Congressional District Maps

With Congress's approval rating hovering below 20%, it's safe to say that all sides of the political spectrum are unhappy.  When you consider that the 113th Congress is on pace to pass only 72 bills in its two-year session (compare that to the mere 900 bills passed by the 80th Congress that got it the nickname "the Do Nothing Congress") it's easy to see why.  Add to that the sharp uptick in partisan vitriol, and I think both Democrats and Republicans would agree that our lawmakers are more polarized now than ever.

So if you were to ask a political analyst why things are this way, I'm betting that many of them would suggest the cause, at least in the House of Representatives, is due to Gerrymandering of congressional districts.  The term Gerrymandering refers to the strategy of manipulating the boundaries of voting districts so as to gain an advantage over your opponent.  There are two strategies employed known as "packing" and "cracking".  The goal of packing is to isolate as many of your opponents into a single district, which you will concede, so that you can win a larger number of the surrounding districts.  Cracking does the opposite, it aims to dilute the strength of a district that is controlled by your opponent by mixing in your own supporters.  These two strategies are usually used in concert for the greatest effect.

Packing, in particular, tends to produce districts that elect Representatives with extreme political views.  Since the district is essentially a single-party district there's no challenge in the general election, and hence no incentive to run a moderate candidate.  This, I believe, is one of the more serious problems with our current Congress, and it got me wondering if it can be traced directly to the maps of Congressional districts.

What I needed was a diagnostic of a Gerrymandered district.  They're easy to spot by eye (they look kind of like salamanders, which is what coined the term in the first place) but I had no intention of looking through all 435 Congressional districts for oddities.  I needed an automatic classifier that would be able to separate obviously-gerrymandered districts like North Carolina's 12th:


from districts with relatively normal boundaries like Georgia's 2nd:


I decided to use the ratio of a district's perimeter to its area as a proxy for gerrymandering.  For districts like Georgia's 2nd that are close to rectangular, this number is relatively small.  However, districts like North Carolina's 12th, with all its twists and turns, should have a much larger ratio relative to normal.

So I wrote a script to compute this ratio for every district and make a map of the districts with the largest ratio.  Ok, so how did it do?  The answer is it performed surprisingly well given the very limited input information.

Here are some of the maps I made of districts with the highest ratio of perimeter to area:


Source Code
View Larger
View Larger
View Larger
and here are some districts with a low ratio:
View Larger
View Larger
View Larger
Notice how the obviously gerrymandered districts all occur around urban areas?  That's not by accident, it's a great example of "packing" in action.  After a quick inspection, I'm feeling pretty confident in my perimeter-to-area classifier's ability to pick out gerrymandered districts.  So let's see what else we can learn from this.

Here's a histogram of all the districts' ratios broken down by which party currently has control of the district.  
Every district with a ratio greater than about .0005 is Democratically-controlled.  If my classifier is working, that means that every one of the most gerrymandered districts is a Democratic district.  Doesn't that sound like shenanigans?  Source Code
I found it remarkable that all of the districts in high-ratio tail of the distribution are Democratically-controlled.  This suggests that Republicans are winning the gerrymander battle and successfully concentrating Democrats into "packed" districts.  Maybe that's how, despite winning more votes than Republicans, Democrats currently own a 234-195 minority in the House.  Representative democracy in action!