1

I have a simple use case - I want to, as part of an sklearn pipeline, generate y labels based on X features (e.g. for predicting some signal in X, n timesteps in the future).

I want it to be a pipeline step as I want the label based on the transformed features, so that the final pipeline looks something like the following:

  1. Transform features X into X'
  2. Generate labels y based on (past) features of X'
  3. Fit a model based on X' and y

Based on the sklearn docs there is no official way of doing it. Are there any workarounds?

Raven
  • 131

1 Answers1

0

Simply create a Pipeline() that performs step 2 (noting X' as X2 in code):

label_gen_pipe = Pipeline(...)
y = label_gen_pipe.fit_transform(X2)

model.fit(X2, y)

amiando
  • 43