Questions tagged [feature-engineering]

Feature engineering is the process of using domain knowledge of the data to create features for machine learning models. This tag is meant for both theoretical and practical questions regarding feature engineering, excluding questions asking for code, that would be off-topic on CrossValidated.

757 questions
11
votes
2 answers

Understanding Feature Hashing

Wikipedia provides the following example when describing feature hashing; but the mapping does not seem consistent with the dictionary defined For example, to should be converted to 3 according to the dictionary, but it is encoded as 1 instead. Is…
Josh
  • 4,448
3
votes
0 answers

Optional Features

I'm trying to get my head around what I call "optional features" but since I don't know their proper name in statistics I can't find any information about them. Essentially, I'm looking at a problem where sometimes some of the features do not even…
skiman
  • 31
1
vote
0 answers

How to handle features that rely on a category? where each category has a different set of features

Dataset description My dataset features are: some features not important for this question Price (target) Collection (categorical feature, there are 1.8k collections) latest 10 prices (time-series, latest price of the same collection) up to 37…
1
vote
0 answers

Feature engineering a license plate price data

Target = price. Suppose I have license plates in the form {letter}-{number}. For example, 'A-12345', or 'K-343' Letter can be any letter from A to Z, and numbers from 2-5 digits long. Here are some of the features I am extracting, but I would like…
1
vote
2 answers

Hiding features from your client

I have to automate a yes/no type business decision problem for a customer (think: Is the use of chemical compound X beneficial in combination with chemicals A,B,C?). He dumped on me a very large dataset that contains all the data that I need (and…
l7ll7
  • 1,275
1
vote
1 answer

Useful Representation of Continuous and Nominal variables

I want to develop a prediction model (e.g. using SVM, Neural Networks...etc) to predict the relationship between a protein and its DNA target. Each proteins is represented using ~100 continuous [-infinity,+infinity] numerical variables + one…
Abbas
  • 505
0
votes
0 answers

Help with generalizing a formula for historical fencing matches factors

I am writing a small application to aggregate Historical Fencing matches results. This data is then used to calculate a couple of factors: Effectiveness: simply shows how many of your matches are won: won matches ___________ all…
0
votes
1 answer

Featuring Enginnering: High cardinality

I was reviewing a ML notebook when part of the EDA looks at the cardinality of categorical variables. As the notebook was prepared there was no strange result, but what if an attribute has a very high cardinality. For example, if a Dataset of 10000…