I have a binary classifier. The classifier is trained on both numeric and categorical variables. In a given month, I will have new data coming in, comparable to 5% of the observation count of the training sample. It takes a long time to observe the true binary outcome of these incoming observations, but I know all the right-hand-side variables or features of the observation right away, classic out-of-sample classification problem.
I make predictions on these data using the binary classifier. I would like a means of classifying how well these observations are represented in the training dataset, taking account of both categorical and numeric features. Can anyone recommend a methodology that would yield a score for how well represented an observation is in the development data?
For example, say $X_1$ and $X_2$ are features. In the development data, these variables' magnitudes are usually inversely correlated. If new observations come in with atypical association between the two, positively correlated, this atypical association would contribute to the "anomaly score".