In some data analysis challenge (so I don't control the data given), I have a dataset made with the price of a commodity in two location (let say Japan and Korea) and at different day to relate to other data, let's say the price of oil and iron on international market.
So a typical row is like
DAY_ID | COUNTRY | PRICE | OIL_PRICE | IRON_PRICE | ...
1 | JPN | 4.35 | 7.22 | 6.55 |
1 | KOR | 5.32 | 7.22 | 6.55 |
2 | JPN | 3.51 | 6.38 | 4.27 |
As you notice the price of iron is the same for the two first lines since they are in the same day. Also the data are incomplete, some row are missing meaning I can have the row for one day in Japan but not in Korea.
My problem
The DAY_IDis just an identification and does not reflect any chronological order. At the moment I don't know what to do with it so I just drop this column and then train my regression model.
However I feel like that I am erasing some information that I could use since the price in Japan and Korea at the same day are correlated.
How to use the DAY_ID column?
(JPN,KOR)values and the model allows for the vector error terms on any given day to be correlated. The [tag:duplicate-records] tag looks inappropriate to me. – whuber Aug 01 '23 at 14:35DAY_IDis just an identification and does not reflect any chronological order" How can you have time series data where the time is unknown? – Sextus Empiricus Aug 03 '23 at 08:29PRICEas a function of the price of everything else depending on the date and location. – EtienneBfx Aug 03 '23 at 16:16DAY_ID– EtienneBfx Aug 03 '23 at 16:17DAY_IDthen every data butCOUNTRYare the same. They represent the price on a international market. – EtienneBfx Aug 03 '23 at 16:18PRICEcompared to the oil price and iron price? – Dave Aug 07 '23 at 13:45PRICEis the target, the price of the final commodity. We want to understand it with respect to 32 features like for example the price of the oil or iron on international market. – EtienneBfx Aug 07 '23 at 21:19PRICE, not a vector of multiple prices (such as those of oil and iron). Or do you have a different interpretation of the earlier comment about what thePRICEcolumn is? – Dave Aug 08 '23 at 14:31PRICEfor different countries, or at least there is a repetition of theDAY_IDvariable, which allows a regression that is different from standard OLS (I assume that a mixed effects regression could deal with it). – Sextus Empiricus Aug 08 '23 at 15:12