With the launch of my Senate Forecast today, I get to do my favorite part of the forecast, writing the methodology (not really). After some successful 2020 models, I took a crack at the 2021 Virginia Gubernatorial Race, predicting the Youngkin victory on the last day of the . But back to 2022.
Projecting the Vote Share
There are four data categories that go into the model: fundamentals, polling, experts ratings, and state similarity. With differing weights, they combine to create the projected vote share for each candidate (or the rest of the candidates).
The fundamentals encompass recent elections, incumbency, current political environment, and candidate adjustments.
PVI (Partisan Voting Index) is a measurement of how much a state leans to one party in the last two presidential elections compared to the nation as a whole. For example, in 2020 Texas voted for Trump by 5.6% compared to the nation as a whole voting for Biden by 4.5%. This gives Texas a PVI in 2020 of R+10.1. Then combining this with 2016, giving more weight to the most recent election, gives you a states PVI. For the Senate model, most recent election for the incumbent is used to adjust the PVI to match their most recent election margin.
Incumbents earn a bump to their fundamentals based on how long they have been in office, how large, and how diverse their state is. Lisa Murkowski gets almost a 5 point bump, while Mark Kelly only gets about a 2.5 point bump towards their fundamental margin.
The current political environment is calculated by using the generic ballot average, and Biden’s approval rating, as midterms are usually a referendum on the current President’s party.
Finally, the candidate adjustment is a small adjustment where one candidate has a major advantage over the other based on previously held offices. These are then mixed together to output a projected margin used as the fundamentals margin.
For states with weird elections or major third party candidate they use different calculations. In Alaska their general election consists of the top 4 vote getters of the jungle primary. The candidates fundamentals will be updated to adjust from the primary vote share they receive. Currently it is based on he fundamental margin to get a rough two party vote share, then adjusts based on candidate’s endorsements, incumbency, and a few other things. The 4th candidate is a generic independent and it uses Alaska’s third party index (how much a state votes for third party candidates in recent elections). In Utah, the Democrats decided to endorse independent candidate Evan McMullin, who got a 22% of the vote in Utah in 2020. So I treated McMullin almost as a Democrat in the sense that he will get a lot of Democrats’ votes, and then as a major third party candidate where he will take some votes from Republicans as well. In Louisiana, they have a jungle primary. Each candidates vote share is calculated based on their party, incumbency, and endorsements.
The polling averages by the end of the election will have the most weight out of the categories. They are calculated pretty simply.
Polls are given a weight based on how far from the election they are, how long ago the poll has taken place, how good the pollster is, voter type, sample size and if it was an party sponsored poll or not. Recent polls hold their weight for about a week until they start dropping off. Pollsters are graded based on performance in recent elections compared to other pollsters and raw error. Likely voter polls are given more weight than other voter types.
Polling percentages are adjusted based on pollster bias, party sponsor, and change in political environment since the poll. Pollster bias is calculated from a pollster’s bias compared to other polls in elections they polled and the results of that election. Some polls are sponsored by candidates, parties, or PACs. These polls usually favor the candidate whose party ran the poll. Then polls are adjusted based on the change in the generic ballot from when the poll was surveyed.
Lastly, all the polls are combined with the adjusted percentage, and weight of the poll to output a polling average that is used. The polling averages weight in the model is based on the total weight of all the polls together.
I use four different experts for the experts ratings.
- The Cook Political Report
- Inside Elections
- Sabato’s Crystal Ball
Using the ratings from the experts, I convert them into a win percentage, then into margins.
Then I combine all of the experts to create the expert’s ratings. States that have a higher pvi than the Solid R/D margin will use their slightly adjusted fundamentals margin
The final category is the state similarity category. This category will have the least amount of weight in the model. The state similarity uses polling averages from the 10 most similar states and adjusts them back to the state. The similar states are calculated from the states’ region, demographics, and partisanship. The polling averages from the states are compared to that states neutral margin (the fundamental margin minus the political environment) and the gets a weighted average from the 10 similar states to produce a state similarity index. The index is then combined with the states neutral margin to output the state similarity margin.
Combining the Data
The categories are then combined with weighted average based on the weights of the categories. This is displayed in the data smoothie graphic on each states’ page. Below is Georgia’s smoothie. The squares are sized based on the categories weight in the model.
Simulating the Election
The model is run 10,000 times everyday around 3 am. I run it 10,000 times to get as close to the expected values.
In each simulation a random number is generated to produce a national environment. Then each state generates its own number for that states environment. These numbers are then weighted averaged based off a national correlation factor to produce each states simulated number. Then plugging that into a T distribution with 10 df, each candidate gets a simulated vote %. The candidates are then sorted based on simulated vote % to determine the winner. Then all the seats are added up plus the seats not up for election to get that simulations outcome. The states are ordered from most democratic to most republican to find the tipping point state of that simulation. This happens 10,000 times to give us the win percentages of each state and the senate.
Alaska has a weird election so it gets it own simulation. It uses a ranked choice voting to determine the winner. It will have four candidates in the general election, and if no one reaches 50%, they will reallocate 4th place’s votes to the top 3. If someone still hasn’t reach 50%, the will reallocate 3rd place’s votes to the final two to give us a winner. So the model does the same thing. The first thing the model does for each simulation is randomize the matrix where it allocates the votes when someone is eliminated. It starts with a base matrix and then randomly changes each cell of the matrix. For example let's say the independent comes in 4th after the first round in a simulation. So the base matrix allocates 30% to Murkowski, 50% to Tshibaka, and 20% to the Democrat. The matrix gets randomized and it could output something like 39% Murkowski, 38% Tshibaka, and 23% for the Democrat. These percentages will be multiplied by the Independent’s vote % and added to the first round % of the top 3, and the exact same for the final 2. For tipping point if the Republicans are the final 2, it will be considered the most Republican state due to there being no Democrat in the final round.
That wraps it for the 2022 Senate Forecast Methodology. Thanks for taking the time to read and I hope you keep up with the model throughout the cycle. If you really enjoy my work, please contribute to the site