I'm making a logistic regression model but am unsure about whether it is right or not to do the following:
I'm trying to predict if a person will buy a high cost hotel, given by hotel_spend > 250. I know columns such as flight_spend and vehicle_spend are acceptable inputs to the model but am unsure if I could use total_spend as it contains information about hotel_spend which is used to create the target. This is highlighted by the last row where (hotel_spend == total_spend) > 250. My head tells me I shouldn't do this as I'm using the hotel_spend to predict if they will spend a certain amount on a hotel.
I'm looking for advice if this is acceptable or not. In my head I don't think should be done, just looking for other opinions.
flight_spend hotel_spend vehicle_spend total_spend \
20 49 33 102
0 59 0 59
65 100 40 205
150 250 50 450
0 300 0 300
hotel_spend_high_spend_label
0
0
0
1
1
hotel_spend_high_spend_label. I think usingflight_spendandvehicle spendare ok but nottotal_spendas it contains information about the target. The point is to find people who have similar characteristics to the people who gothotel_spend_high_spend_labelso that can send promotions etc to them. I hope this frames my question better. – Fungie Feb 06 '18 at 22:01total_spendwould be incorrect? – Fungie Feb 06 '18 at 22:52