Wednesday, September 9, 2015

Checking Out Federal Election Committee Data In Kibana


I really like using Kibana to look at data, not only can you gain some insight; but it's just plain old fun. I recently had some free time and stumbled onto Federal Election Committee's website (FEC) http://fec.gov/. For those of you not familiar with them; they track all political contributions to PACs and political candidates.


 In the spirit of transparency and gaining some insight I decided to write a little script to pull the data and make it a bit easier to work with. Originally, it was to format it into a neat CSV with headers plus decoded and denormalized data for pandas, but I soon wrote another script to also index this data into ElasticSearch so I can look at it in Kibaba. Kibana is a great tool for checking out data, but it doesn't make the best presentation tool. Either way, here are a few screenshots.


Initially, it was interesting to out exactly who is in the lead. Now there are a lot of elections and candidates,  and I didn't want to overcrowd the charts to I filtered out the bottom of the pack that will probably not win. Here is the list of candidates that are included in an ES format string that you can re-use:
(CAND_NAME.raw :"BUSH, JEB" OR CAND_NAME.raw:"RUBIO, MARCO"  OR CAND_NAME.raw:"CAIN, HERMAN"  OR CAND_NAME.raw:"SANDERS, BERNARD"  OR CAND_NAME.raw:"PAUL, RAND" OR CAND_NAME.raw:"CLINTON, HILLARY RODHAM" OR CAND_NAME.raw:"CRUZ, RAFAEL EDWARD \"TED\"")

 So I ran a quick visualization to add up transaction amounts and aggregate them per week per candidate.



No big surprise there, Hillary has been banking cash throughout the campaign.  Cruz was off to a good start too, but dwindled after March. However check out Bernie, he was a bit low on cash until June; where he got some massive dough. Jeb's cash flow spiked up a bit too, however overall it's kind of hard to distinguish each candidate because so many of them are bunched up in the same place (bottom).

Using Kibana I can change the type of chart to log (from linear) to make this a bit better. Here is another version of exactly same information. For the layman; notice how the distance between millions starts to shrink closer to the top.

Not sure if this was terrible helpful in this case; but the functionality is here regardless.

Another thing I wondered is who actually contributes to these candidates and what are they all about? Now this is data from the individual contributions file from FEC, so it should be limited to individuals donating to various committees and candidates. Lets see if we can break down these contributions by state, but not in a fancy map; just a regular bar plot (my favorite :))

Here are the top 10 states that gave money:



 Looks like California, Texas, Florida and New York have really been leading the charge here... lets see what it looks like based on the total amount contributed. I changed the Y axis to the the sum of total contributions.


Looks like DC left all these states in the dust. Hmm, now I wonder, who from DC contributed? And to whom? We'll come back to that another blog post, but now lets check out the employment information of our generous contributors. A regular bar chart would be a bit boring so lets do a split bar chart based on candidate and top 5 occupations that contributed.


And here we are..... Top 3 contributors to Hillary are Attorneys, Retired and Homemakers; these are also the top 3 occupations for almost all of the candidates. It's hard to tell what up with Bernie though, his bar is all squished up and indistinguishable. Lets adjust our bar chart to be a percent chart. Through this we should see the relationship between slices in each bar, but not between the bars themselves.

A few clicks in the options and here we are:





What the hell is that slice consuming Bernie's bar... Oh, it's the "Not Employed" occupation, looks like those guys contributed the most money to him.  The other ones up top are professors and physicians.

However it's weird that not employed were giving Bernie so much money; I mean where would they get it from? We can confirm this by adjusting the Y axis here, instead of summing up the transaction amounts, we can just do a count of contributions. So here it is... now before you look at it; one thing that I want to point out is that the order of the bar columns is changed and so are the color labels. Kibana is just good like that; it keeps you on your toes to make sure you are always paying attention, and yep, that big now blue bar is still there.




Anyway, I hope you enjoyed this. I know that it was pretty fun writing this; I am going to play around with the data a bit more and show some new stuff in a couple of weeks. If you want to experiment with this at home; all you need is ElasticSearch, Kibana (open source and free from https://www.elastic.co/) and my code to download and massage the FEC data (https://github.com/nickvasilyev/FEC_DATA_LOADER).