Let’s have a look at the results of the 2016 Brexit vote in the UK. First we read the data using read_csv() and have a quick glimpse at the data

brexit_results <- read_csv(here::here("data","brexit_results.csv"))


glimpse(brexit_results)
## Rows: 632
## Columns: 11
## $ Seat        <chr> "Aldershot", "Aldridge-Brownhills", "Altrincham and Sale W…
## $ con_2015    <dbl> 50.6, 52.0, 53.0, 44.0, 60.8, 22.4, 52.5, 22.1, 50.7, 53.0…
## $ lab_2015    <dbl> 18.3, 22.4, 26.7, 34.8, 11.2, 41.0, 18.4, 49.8, 15.1, 21.3…
## $ ld_2015     <dbl> 8.82, 3.37, 8.38, 2.98, 7.19, 14.83, 5.98, 2.42, 10.62, 5.…
## $ ukip_2015   <dbl> 17.87, 19.62, 8.01, 15.89, 14.44, 21.41, 18.82, 21.76, 19.…
## $ leave_share <dbl> 57.9, 67.8, 38.6, 65.3, 49.7, 70.5, 59.9, 61.8, 51.8, 50.3…
## $ born_in_uk  <dbl> 83.1, 96.1, 90.5, 97.3, 93.3, 97.0, 90.5, 90.7, 87.0, 88.8…
## $ male        <dbl> 49.9, 48.9, 48.9, 49.2, 48.0, 49.2, 48.5, 49.2, 49.5, 49.5…
## $ unemployed  <dbl> 3.64, 4.55, 3.04, 4.26, 2.47, 4.74, 3.69, 5.11, 3.39, 2.93…
## $ degree      <dbl> 13.87, 9.97, 28.60, 9.34, 18.78, 6.09, 13.12, 7.90, 17.80,…
## $ age_18to24  <dbl> 9.41, 7.33, 6.44, 7.75, 5.73, 8.21, 7.82, 8.94, 7.56, 7.61…

The data comes from Elliott Morris, who cleaned it and made it available through his DataCamp class on analysing election and polling data in R.

Our main outcome variable (or y) is leave_share, which is the percent of votes cast in favour of Brexit, or leaving the EU. Each row is a UK parliament constituency.

To get a sense of the spread, or distribution, of the data, we can plot a histogram, a density plot, and the empirical cumulative distribution function of the leave % in all constituencies.

# histogram
ggplot(brexit_results, aes(x = leave_share)) +
  geom_histogram(binwidth = 2.5, fill = "steelblue") +
  labs(title="Histogram: Leave Share % in all constituencies",
    x= "Leave Share %",
    y= "Count"
  )

# density plot-- think smoothed histogram
ggplot(brexit_results, aes(x = leave_share)) +
  geom_density() +
  labs(title="Density Plot: Leave Share % in all constituencies",
    x= "Leave Share %",
    y= "Count"
  )

# The empirical cumulative distribution function (ECDF) 
ggplot(brexit_results, aes(x = leave_share)) +
  stat_ecdf(geom = "step", pad = FALSE) +
  scale_y_continuous(labels = scales::percent) +
  labs(title="Empirical Cumulative Distribution Function: Leave Share % in all constituencies",
    x= "Leave Share %",
    y= "Count"
  )

One common explanation for the Brexit outcome was fear of immigration and opposition to the EU’s more open border policy. We can check the relationship (or correlation) between the proportion of native born residents (born_in_uk) in a constituency and its leave_share. To do this, let us get the correlation between the two variables!

brexit_results %>% 
  select(leave_share, born_in_uk) %>% 
  cor()
##             leave_share born_in_uk
## leave_share       1.000      0.493
## born_in_uk        0.493      1.000

The correlation is almost 0.5, which shows that the two variables are positively correlated.

A scatterplot between these two variables with the best fit line:

ggplot(brexit_results, aes(x = born_in_uk, y = leave_share)) +
  geom_point(alpha=0.3) +
  
  # add a smoothing line, and use method="lm" to get the best straight-line
  geom_smooth(method = "lm") + 
  
  # use a white background and frame the plot with a black box
  theme_bw() +
   labs(title = "Constituency's Residents in favour of Brexit vs. Native Born Residents",
       x = "Constituency's Residents Born in the UK",
       y = "Constituency's Residents in favour of Brexit")

Upon examining the best fit line of the ‘Constituency’s Residents in favour of Brexit vs. Native Born Residents’ scatterplot as well as the positive correlation between the proportion of native born residents (born_in_uk) in a constituency and its leave_share, it is deduced that both variables move in tandem - more the number of native born residents, more is the proportion of residents in support of Brexit. For example, constituencies like Aldridge-Brownhills (born_in_uk: 96.12207, leave_share: 67.79635) and Amber Valley (born_in_uk: 97.30437, leave_share: 65.29912) display this trend. This is mainly owing to previously mentioned reasons like fear of immigration and opposition to the EU’s more open border policy. Other reasons could be the age of voters (with older voters being more likely to vote for Brexit, and a usually higher turnout of older voters affecting the overall result) and protest votes (dissatisfaction with politics at the time). Furthermore, on detailed analysis of the histogram plot, the density plot, and the empirical cumulative distribution function (ECDF), it is deduced that only about 15% of all constituencies’ leave_share count is less than 40. Thereby, further solidifying the belief that Brexit was imminent.

#Import the csv file

brexit <- read_csv(here::here("data", "brexit_results.csv"))
glimpse(brexit) # examine the data frame
## Rows: 632
## Columns: 11
## $ Seat        <chr> "Aldershot", "Aldridge-Brownhills", "Altrincham and Sale W…
## $ con_2015    <dbl> 50.6, 52.0, 53.0, 44.0, 60.8, 22.4, 52.5, 22.1, 50.7, 53.0…
## $ lab_2015    <dbl> 18.3, 22.4, 26.7, 34.8, 11.2, 41.0, 18.4, 49.8, 15.1, 21.3…
## $ ld_2015     <dbl> 8.82, 3.37, 8.38, 2.98, 7.19, 14.83, 5.98, 2.42, 10.62, 5.…
## $ ukip_2015   <dbl> 17.87, 19.62, 8.01, 15.89, 14.44, 21.41, 18.82, 21.76, 19.…
## $ leave_share <dbl> 57.9, 67.8, 38.6, 65.3, 49.7, 70.5, 59.9, 61.8, 51.8, 50.3…
## $ born_in_uk  <dbl> 83.1, 96.1, 90.5, 97.3, 93.3, 97.0, 90.5, 90.7, 87.0, 88.8…
## $ male        <dbl> 49.9, 48.9, 48.9, 49.2, 48.0, 49.2, 48.5, 49.2, 49.5, 49.5…
## $ unemployed  <dbl> 3.64, 4.55, 3.04, 4.26, 2.47, 4.74, 3.69, 5.11, 3.39, 2.93…
## $ degree      <dbl> 13.87, 9.97, 28.60, 9.34, 18.78, 6.09, 13.12, 7.90, 17.80,…
## $ age_18to24  <dbl> 9.41, 7.33, 6.44, 7.75, 5.73, 8.21, 7.82, 8.94, 7.56, 7.61…
temp_brexit <- brexit %>% 
  select(con_2015, lab_2015, ld_2015, ukip_2015, leave_share) %>% 
  pivot_longer(cols = 1:4, names_to = "party", values_to = "party_percentage")

cols <- c("con_2015" = "#0087dc", "lab_2015" = "#d50000", "ld_2015" = "#FDBB30", "ukip_2015" = "#EFE600")

temp_brexit %>% 
  ggplot(aes(x=party_percentage,y=leave_share, group=party, color=party)) +
  geom_point(alpha = 0.5)+
  geom_smooth(method=lm, size = 0.5)+
  labs(title = "How political affiliation translated to Brexit Voting", x = "Party % in the UK 2015 general election", y = "Leave % in the 2016 Brexit referendum")+
  scale_colour_manual(labels = c("Conservative", "Labour","Lib Dems","UKIP"), values = cols)+
  theme_bw()+
  theme(legend.position = "bottom",plot.title = element_text(face = "bold", size = 13))+
  theme(legend.title = element_blank())+
  NULL