I recently started working with sklearn, and found myself creating new features often (new features with K Bins, with various Encoders etc.).
What I noticed though, is that is very difficult to systematically create new features (i.e. have a general approach in creating new features). For example, I was recently analizyng the Titanic dataset on Kaggle (https://www.kaggle.com/c/titanic/data).
Let's assume, for the sake of example, that I split the age in n bins and find out that the bin containing the ages between 20 and 25 is a lot more likely than the other age-bins to survive, and the other age-bins are uncorrelated with the survival feature.
It does make sense in this case to split the age in different bins as the 20 - 25 age Bin will give me more insight in the data.
Once I start thinking like this, however, the possibilities are endless: how about creating an additional feature with the people that are aged 20 to 25 and come from a certain port? How about creating a feature with people that are aged 20 to 25, come from a certain port and have a certain social class and are all male?
By looking at some examples online (at this specific dataset or others) I saw the general approach is to visually look at the data (making charts, etc.) and then think about the new features to create but is there a general way of creating new features without ending up with a dataset that is way bigger than the original one?