I am working on an attrition dataset which has a large number of categorical parameters. Each categorical parameter has a high cardinality, so one-hot encoding them is out of question. I was looking for models which can handle categorical data with high cardinality and came across CatBoost and LightGBM. Catboost is working as expected. However, in case of LightGBM, I'm unable to use my categorical features. The following lines were picked up from the official documentation of LightGBM and I am struggling to understand those.
LightGBM can use categorical features as input directly. It doesn’t need to convert to one-hot encoding, and is much faster than one-hot encoding (about 8x speed-up).
Note: You should convert your categorical features to int type before you construct Dataset.
How do I convert nominal data to int?!
If I follow the documentation, I get the following error
ValueError: DataFrame.dtypes for data must be int, float or bool. Did not expect the data types in the following fields: Business, Segment Desc, Family Desc, Class Desc, Job Desc, Site Tag, City Desc, Employee Group, Gender, Marital Status, Award Desc, Shift Schedule