Feature engineering is the process of using domain knowledge of the data to create features for machine learning models. This tag is meant for both theoretical and practical questions regarding feature engineering, excluding questions asking for code, that would be off-topic on CrossValidated.
Questions tagged [feature-engineering]
757 questions
11
votes
2 answers
Understanding Feature Hashing
Wikipedia provides the following example when describing feature hashing; but the mapping does not seem consistent with the dictionary defined
For example, to should be converted to 3 according to the dictionary, but it is encoded as 1 instead.
Is…
Josh
- 4,448
3
votes
0 answers
Optional Features
I'm trying to get my head around what I call "optional features" but since I don't know their proper name in statistics I can't find any information about them. Essentially, I'm looking at a problem where sometimes some of the features do not even…
skiman
- 31
1
vote
0 answers
How to handle features that rely on a category? where each category has a different set of features
Dataset description
My dataset features are:
some features not important for this question
Price (target)
Collection (categorical feature, there are 1.8k collections)
latest 10 prices (time-series, latest price of the same collection)
up to 37…
Marcello
- 11
1
vote
0 answers
Feature engineering a license plate price data
Target = price.
Suppose I have license plates in the form {letter}-{number}.
For example, 'A-12345', or 'K-343'
Letter can be any letter from A to Z, and numbers from 2-5 digits long.
Here are some of the features I am extracting, but I would like…
Fardin Ahsan
- 21
1
vote
2 answers
Hiding features from your client
I have to automate a yes/no type business decision problem for a customer (think: Is the use of chemical compound X beneficial in combination with chemicals A,B,C?). He dumped on me a very large dataset that contains all the data that I need (and…
l7ll7
- 1,275
1
vote
1 answer
Useful Representation of Continuous and Nominal variables
I want to develop a prediction model (e.g. using SVM, Neural Networks...etc) to predict the relationship between a protein and its DNA target. Each proteins is represented using ~100 continuous [-infinity,+infinity] numerical variables + one…
Abbas
- 505
0
votes
0 answers
Help with generalizing a formula for historical fencing matches factors
I am writing a small application to aggregate Historical Fencing matches results. This data is then used to calculate a couple of factors:
Effectiveness: simply shows how many of your matches are won:
won matches
___________
all…
0
votes
1 answer
Featuring Enginnering: High cardinality
I was reviewing a ML notebook when part of the EDA looks at the cardinality of categorical variables.
As the notebook was prepared there was no strange result, but what if an attribute has a very high cardinality. For example, if a Dataset of 10000…