3

I'm trying to get my head around what I call "optional features" but since I don't know their proper name in statistics I can't find any information about them. Essentially, I'm looking at a problem where sometimes some of the features do not even make sense. For example, if for each individual I have a feature "oldest son age" what happens if a particular individual does not have any son? How can one use this feature for regression or density estimation?

Sorry about the naive question. Just knowing what this kind of features are called in statistics or machine learning would be enormously helpful.

skiman
  • 31
  • It depends. If there is only one reason a feature could be missing, then you could just use your software's missing-value code for those instances. Then if an analysis requires oldest son's age, you use only the non-missing values (and the analysis only applies to that sub population), but if you're analyzing whether or not they have a son, you count the number of missing and non-missing values (possibly for each value of some other factors). But if you are concerned about why they are missing, you need another variable in the dataset with a code for why it is missing. – Russ Lenth Aug 14 '14 at 02:29

0 Answers0