I play a car racing video game on my tablet computer and I have collected data from 224 races. In the game, two cars are matched up head-to-head and race over a quarter-mile. I would like to determine the relative impact of certain factors on the likelihood of winning the race.
A data file is available at http://csr.datamustflow.com/csr_racing_times.csv . This file contains the columns listed below. In the file, a 'y' means 'yes/present' and 'n' means 'no/not present'.
Looking forward to any feedback you can give me on the best way to go about this analysis.
Edit: A new file with the race pairings is now available at http://csr.datamustflow.com/csr_times_paired.csv
Data fields:
Make_Model: name of the car
E: Engine upgrade that can be purchased with in-game cash; if present, should lower race time
N: Nitrous oxide upgrade: if present, should lower race time
T: Tire upgrade; if present, should lower race time
B: "Blogger" if you lose the race, you don't lose any "game points"; not expected to have any effect on race time
win: whether the car won the race or not
time: The race time over a quarter-mile; lower time = equals faster race
me: whether it was me racing the car in that race or not
PP: An integer that describes the overall performance of the car; a higher number means a better, faster car
winreflects whether you won the race, and all your opponents' info is included in each race's row (in separate columns after the columns with your info) for use in predicting whether you beat your opponent that race. Also, for feature selection tips, check out this question. – Nick Stauner Mar 14 '14 at 11:35