At long last the day comes where the data management and analytics course begins the analytics stream. The first step? An online pirate themed course laying down the basics of programming in R.
Having mastered the basics, it was time to take my first real world challenge at using and R dataset, rinsing the day and producing graphics to illustrate useful trends/information form the data.
I choose to use a data set from Kaggle with the votes and population data of the United States from it’s recent Democratic and Republican Primaries. in looking at the data I wanted to focus it on the results from key battleground states in this year’s election i.e.:
Arizona, Colorado, Florida, Iowa, Michigan, Nevada, New Hampshire, North Carolina, Ohio, Pennsylvania, Virginia and Wisconsin.
I calculated the winners of individual counties within these states then added in some demographic data which I thought would have a weight on the results in those counties:
Mean Income, Population density, White(non-hispanic) population, hispanic population, black population, Asian population percentage of women and college degree attainment.
It produced a table like this for the Republican race:
I then created a table for both where an average county was created for each candidate’s victory
With this info it was possible to start plotting box plots and graphs to give a better overview of how the candidates in each race fared across a variety of factors. From looking how the candidates fared with certain demographic groups in their primary races we can hope to learn something about the strengths and weaknesses the posses going into the general election and see where both can improve across these states, which will hold the balance of this year’s election.
First I wanted to look at how the candidates fared against the largest electoral group, non-hispanic whites in relation to the education of this demographic
We can see here, even from this small graph That Hilary Clinton and Donald Trump won similar counties but that Hilary managed to take those more educated areas that voted for Rubio over Trump, showing she has an advantage over him in demographics with higher college graduation rates.
Next we can look at some box plots to see how much of the vote our candidates have procured for our key demographics.
The information here would seem to suggest that Donald Trump shares a somewhat similar popularity as Hilary Clinton among black voters. It is important to remember that areas with large ethnic minorities also tend to have a lower number of registered republicans, majority white who can secure wins for candidates in areas demographically opposed to their base.
The same pinch of salt can apply to our Hispanic demographics. Though we note Rubio picked up a great deal of voter share and now out of the race has a base that while Republican in registration, have deep reservations about Trump as a candidate.
Our best measurement is to look at the candidates share of the votes as fractions of the overall numbers and how they play out among our big demographics through fraction tables.
These tables show us Clinton has far greater consistency across the demographic spectrums. Trumps strength lies in lower income voters, where as we can see as income increases so clearly does Clinton support. Her popularity with college educated voters and in densely populated areas (cities) is also a distinct advantage. The numbers for voters in the democratic primaries, relative to the republicans is another strong factor for Clinton.
Trump battleground total votes: 3,997,874
Clinton battleground total votes: 5,204,921
A difference of 1.2 million votes. Though a tiny number when it comes to the numbers who will vote in the general election, it shows there is a greater enthusiasm from democratic base supporters going into the election, and with Clinton’s slight demographic advantages she starts off with a distinct advantage going into this.