4

i'm using skearn's pipeline and GridSearchCV to apply grid search on text data classification problem as follows:

text_clf = Pipeline([('vect', CountVectorizer()),
                     ('tfidf', TfidfTransformer()),
                     ('clf', SGDClassifier())])'

I would like to concatenate to the second step's result (i.e. bag of words structure as scipy sparse matrix) another matrix to be applied by the classifier. I used [this][1] question to do Something like:

text_clf = Pipeline([('vect', CountVectorizer()),
                    ('tfidf', TfidfTransformer()),
                     ('add_feature', add_features(sp.csr_matrix(features_train.values))),
                     ('clf', SGDClassifier())])

where 'add_feature' is defined as:

class add_features(object):

    def __init__(self, features):
        self.features = features

    def transform(self, X, **transform_params):
        return hstack([X, self.features])

    def fit(self, X, y=None, **fit_params):
        return self

Running GridSearchCV(text_clf, parameters).fit(data, labels) gives the following error:
"for key, value in six.iteritems(step.get_params(deep=True)): AttributeError: 'add_features' object has no attribute 'get_params'"

What did i do wrong? how should get_params be added? Thanks!

Eitan
  • 131
  • I cannot add comments due to low reputation, but here there's a tutorial on concatenating heterogeneous features – Net_Raider Oct 07 '15 at 10:43
  • If you think your question is answered, please choose the best answer – Net_Raider Oct 15 '15 at 07:46
  • Well, eventually it did not solve my problem, as i'm using an external feature set independent of the corpus data together with the bag of words features. IT seems that FeatureUnion may be applied only on classes based on the corpus data.. unless missed something. – Eitan Oct 15 '15 at 08:34
  • in fact the point of the link is to use the custom class ItemSelector with the feature union – Net_Raider Oct 15 '15 at 08:39
  • i did use it. it works fine in FeatureUnion for classification, but when applying cross validation grid search it doesn't work.. – Eitan Oct 15 '15 at 09:13

0 Answers0