11

I want to make a prediction for the result of the parliamentary elections. My output will be the % each party receives. There is more than 2 parties so logistic regression is not a viable option. I could make a separate regression for each party but in that case the results would be in some manner independent from each other. It would not ensure that the sum of the results would be 100%.

What regression (or other method) should I use? Is it possible to use this method in R or Python via a specific library?

Nitesh
  • 1,615
  • 1
  • 12
  • 22
Viktor
  • 850
  • 1
  • 6
  • 17

3 Answers3

5

Robert is right, multinomial logistic regression is the best tool to use. Although you would need to have a integer value representing the party as the dependent variable, for example:

1= Conservative majority, 2= Labour majority, 3= Liberal majority....(and so on)

You can perform this in R using the nnet package. Here is a good place to quickly run through how to use it.

Kasra Manshaei
  • 6,570
  • 1
  • 21
  • 45
3

On what do you want to base your prediction? I've tried to predict multiparty election results for my thesis based on previous years and then using results for some polling stations from this year predict the results in all other polling stations. For this the linear model with which I compared estimated the number of votes each party would obtain by regressing over the votes from previous years. If you have the estimated number of votes for all parties you can calculate the percentage from that. See Forecasts From Nonrandom Samples for the relevant paper, which extends the linear model.

Emre
  • 10,491
  • 1
  • 29
  • 39
Bas
  • 31
  • 1
2

This is not a regression but a multi-class classification problem. The output is typically the probabilities of all classes for any given test instance (test row). So in your case, the output for any given test row from the trained model will be of the form:

prob_1, prob_2, prob_3,..., prob_k

where prob_i denotes the probability of the i-th class (in your case i-th party), assuming there are k classes in the response variable. Note that the sum of these k probabilities is going to be 1. The class prediction in this case is going to be the class that has the maximum probability.

There are many classifiers in R that do multi-class classification. You could use logistic regression with multi-class support through the nnet package in R and invoking the multinom command.

As an alternative, you could also use the gbm package in R and invoke the gbm command. To create a multi-class classifier, just use distribution="multinomial" while using thegbm` function.

Nitesh
  • 1,615
  • 1
  • 12
  • 22