To achieve your goal two tasks need to be solved:
- rank different variables according to their "direction" (whether lower or higher values should obtain best rank)
- combine multiple ranks into single
Rank column
Obtaining variables ranks
For the first task I suggest cloning the original DataFrame and converting all variables to the same direction:
- make sure to handle
NaN values according to column and task logic
- columns like
Car Age can be left intact since they already follow "the lower the better" direction
- columns like
MPG should be inverted (assuming no zero values) to follow "the lower the better" direction
- categorical columns like
Defects can be converted to ordered categorical type by providing the order of the options from best to worst: pd.Categorical(cars_df.Defects, ordered=True, categories=['No', 'Unknown', 'Yes'])
After that you can call cars_for_ranking_df.rank(axis=0) to obtain ranks for each variable. Check method argument options in pd.DataFrame.rank() documentation to choose how you prefer the ties to be handled.
Combining ranks
This task has multiple available options, and the choice of the right solution will depend on your task. And even then it might be subjective: is a new car with known defects better than old (but fuel efficient) car? Depends on who you ask.
One possible way to obtain final ranks would be to average rank of every variable for each car, and then rank obtained averages again:
final_ranks = cars_for_ranking_df.rank().mean(axis=1).rank(method='dense')
method='dense' is used to make sure final ranks are serial (1, 2, 3, ...).
Other options have been mentioned in these questions: