i'm using skearn's pipeline and GridSearchCV to apply grid search on text data classification problem as follows:
text_clf = Pipeline([('vect', CountVectorizer()),
('tfidf', TfidfTransformer()),
('clf', SGDClassifier())])'
I would like to concatenate to the second step's result (i.e. bag of words structure as scipy sparse matrix) another matrix to be applied by the classifier. I used [this][1] question to do Something like:
text_clf = Pipeline([('vect', CountVectorizer()),
('tfidf', TfidfTransformer()),
('add_feature', add_features(sp.csr_matrix(features_train.values))),
('clf', SGDClassifier())])
where 'add_feature' is defined as:
class add_features(object):
def __init__(self, features):
self.features = features
def transform(self, X, **transform_params):
return hstack([X, self.features])
def fit(self, X, y=None, **fit_params):
return self
Running GridSearchCV(text_clf, parameters).fit(data, labels) gives the following error:
"for key, value in six.iteritems(step.get_params(deep=True)):
AttributeError: 'add_features' object has no attribute 'get_params'"
What did i do wrong? how should get_params be added? Thanks!