2

I have a data set of a number of different variables that relate to the performances of footballers in matches. Examples include Accurate Passes/90 mins, Crosses/90 mins and Headers/90mins.

Rather than taking the actual value in each column, I have replaced it with a rank. So, given there are 100 footballers in the data set, the player with the highest value in the Accurate Passes/90 mins columns is assigned the number 1, the second highest value is assigned 2, third highest player 3, all the way to 100.

For each position, I have taken a subset of the variables. For example, for defenders, I'm only interested in how many tackles and clearances they've made and not how many shots they've had.

My question is to do with how to weight these variables, as some are more important than others when evaluating a player. For example, for strikers, although I am interested in how many passes they make, the number of goals they've scored is much more important and should be weighted accordingly.

For each position, I have a player that I know is the best in that position and I would like to assign appropriate weightings to the relevant variables to help achieve this. Each player will be assigned a number (let's call it x) which is the sum of the weighting times the variable rank. My aim is for x to be the lowest number for the best player in each position. So:

x$\alpha$ = w1V1 + w2V2 + w3V3 + w4V4 + ....

where

0 $\lt$ wi $\lt$ 1

i = 1,2,3,4,...

and

0 $\lt$ Vj $\lt$ 100

j = 1,2,3,4,...

It must also be noted that the weighting of each variable can change depending on which position is being examined. For example, when looking at strikers, the goals weighting will be much higher than when looking at midfielders. This is because, although goals can be used to evaluate how good both midfielders and strikers are, it is much more important for a striker to score goals than a midfielder. Therefore the goals variable will carry more weight when examining strikers compared to midfielders.

How would be the best way of working out the weightings? Help much appreciated!

OD1995
  • 133
  • 1
    You do not describe any basis for a statistical solution. Statistics can't do much, if anything, with vague qualitative statements like "more important." It can help you obtain a solution if either (a) you have outcome data you can analyze or (b) you can provide quantitative information of some sort either about (b.1) combinations of factors that should be ranked equally or (b.2) combinations where the relative ranking is clear. See Keeney & Raiffa's work on making decisions with multiple objectives, for instance. BTW, converting all variable values to ranks is a poor start. – whuber Sep 09 '21 at 15:08

1 Answers1

0

It sounds very much like you want to do is weight the correlation or beta correlation between an interaction.

  1. players belong to a given subset (striker, midfielder, forward, etc)
  2. In the "ranking" equation certain interactions would be weighted higher in the ranking (i.e. the interaction between striker + number of goals).

To achieve this, you need to create dummy variables for each position (1 or 0). You then weight the beta coefficient of the variable with a given dummy variable.

So your equation for rank might be Y (rank) = X (number of goals) + X (position) + (X + W) (striker x number of goals). Where W is some weight.

You can also do it as you suggested with some w(X) with each coefficient being weighted. It just depends on how you feel the weight should be applied.

The real trick to making this work well is deciding what W will be. So how can you assume that? The problem with weighting beta coefficients is that the weights imply some prior distribution (of which you do not know). You could always do things the Bayesian way.

JWH2006
  • 652
  • Great! Thank you very much for your help. How do you mean I could do it the Bayesian way? Would you have any useful links that might show me how? Thanks! – OD1995 May 21 '17 at 20:35