I am using one-hot encoding to transform my categorical variable. But it's not just a presence-absence situation. Consider the variable as a device that can have with different brands as well as different model numbers. So, for example it can be Sony 10, Sony 10.5, or LG 2000, LG 3200. The brands differ and the model numbers have their own range too.
What I did was something like this:
I convert:
---------------------------
| Index | Device
---------------------------
| 0 | Sony,10
| 1 | Sony,10.5
| 2 | LG,2000
| 3 | LG,3200
to:
---------------------------
| Index | Dev_Sony | Dev_LG
---------------------------
| 0 | 10 | 0
| 1 | 10.5 | 0
| 2 | 0 | 2000
| 3 | 0 | 3200
Question: I am using multiple linear regression. Using the above encoding, the model numbers (e.g. 10 vs 10.5) are useful when comparing devices of the same brand, but I'm not sure if they make sense in comparison with other brands. So, I was wondering if there is a better way of encoding such data.
UPDATE
based on the answer, my dataframe would look like this:
| Index | Dev_Sony | Dev_LG | Model_Number
---------------------------
| 0 | 1 | 0 | 10
| 1 | 1 | 0 | 10.5
| 2 | 0 | 1 | 2000
| 3 | 0 | 1 | 3200
Python, if I understand your other post correctly, it should be modelled as "conditional variables":y ~ Device + Device:Model_Number? and whyDevice/Model_numrather thanDevice*Model_num? Can you give me a textbook reference where I can read more about this? – towi_parallelism Jan 27 '20 at 17:22Dev/Num(which can be read Dev, and within Dev, Num) expands intoDev + Dev:Num. A good discussion is in https://www.springer.com/gp/book/9780387954578 – kjetil b halvorsen Jan 27 '20 at 18:45Updatebased on your answer. And then, I'll haveDevandDev*Numin the formula. – towi_parallelism Feb 12 '20 at 05:13Dev*Numwhen the dummy variable is 1) . My point is since they are collinear, we can then drop one of them, say the dummy variable itself, and we end up using only the interaction term, which is exactly what I used in the first place (the second dataframe in the question) – towi_parallelism Feb 13 '20 at 18:26