How to find Top features from Naive Bayes using sklearn pipeline

up vote
0
down vote

favorite

Hi all,

I am trying to apply Naive Bayes(MultinomialNB ) using pipelines and i came up with the code. However I am interested in finding top 10 positve and negative words , but not able to succeed. when I searched , I got the code for finding top features which i mentioned below. However when i tried using the code using pipeline i am getting the error which i mentioned below. I tried searching exhaustively , but got the code without using pipeline.But when i use the code with my output from pipeline, it is not working. COuld you please help me on how to find feature importance from pipeline output.

    # Pipeline dictionary

    pipelines = {

        'bow_MultinomialNB' : make_pipeline(

                                        CountVectorizer(), 

                                        preprocessing.Normalizer(),

                                        MultinomialNB()

                                   )

    }





    # List tuneable hyperparameters of our  pipeline

    pipelines['bow_MultinomialNB'].get_params()





    # BOW -  MultinomialNB hyperparameters

    bow_MultinomialNB_hyperparameters = {

        'multinomialnb__alpha' : [1000,500,100,50,10,5,1,0.5,0.1,0.05,0.01,0.005,0.001,0.0005,0.0001]

    }



    # Create hyperparameters dictionary

    hyperparameters = {

        'bow_MultinomialNB' : bow_MultinomialNB_hyperparameters

    }





    tscv = TimeSeriesSplit(n_splits=3) #For time based splitting

    for name, pipeline in pipelines.items():

        print("NAME:",name)

        print("PIPELINE:",pipeline)





        %time

    # Create empty dictionary called fitted_models

    fitted_models = {}



    # Loop through model pipelines, tuning each one and saving it to fitted_models

    for name, pipeline in pipelines.items():

        # Create cross-validation object from pipeline and hyperparameters



        model = GridSearchCV(pipeline, hyperparameters[name], cv=tscv, n_jobs=1,verbose=1)





        # Fit model on X_train, y_train



        model.fit(X_train, y_train)





        # Store model in fitted_models[name] 



        fitted_models[name] = model





        # Print '{name} has been fitted'

        print(name, 'has been fitted.')

FEAURE IMPORTANCE:-

        pipelines['bow_MultinomialNB'].steps[2][1].classes__



        ---------------------------------------------------------------------------

        AttributeError                            Traceback (most recent call last)

        <ipython-input-125-7d45b007e86b> in <module>()

        ----> 1 pipelines['bow_MultinomialNB'].steps[2][1].classes_



        AttributeError: 'MultinomialNB' object has no attribute 'classes_'





        pipelines['bow_MultinomialNB'].steps[0][1].get_feature_names()

        ---------------------------------------------------------------------------

        NotFittedError                            Traceback (most recent call last)

        <ipython-input-126-2883929221d1> in <module>()

        ----> 1 pipelines['bow_MultinomialNB'].steps[0][1].get_feature_names()



        ~Anaconda3libsite-packagessklearnfeature_extractiontext.py in get_feature_names(self)

            958     def get_feature_names(self):

            959         """Array mapping from feature integer indices to feature name"""

        --> 960         self._check_vocabulary()

            961 

            962         return [t for t, i in sorted(six.iteritems(self.vocabulary_),



        ~Anaconda3libsite-packagessklearnfeature_extractiontext.py in _check_vocabulary(self)

            301         """Check if vocabulary is empty or missing (not fit-ed)"""

            302         msg = "%(name)s - Vocabulary wasn't fitted."

        --> 303         check_is_fitted(self, 'vocabulary_', msg=msg),

            304 

            305         if len(self.vocabulary_) == 0:



        ~Anaconda3libsite-packagessklearnutilsvalidation.py in check_is_fitted(estimator, attributes, msg, all_or_any)

            766 

            767     if not all_or_any([hasattr(estimator, attr) for attr in attributes]):

        --> 768         raise NotFittedError(msg % {'name': type(estimator).__name__})

            769 

            770 



        NotFittedError: CountVectorizer - Vocabulary wasn't fitted.





        x=pipelines['bow_MultinomialNB'].steps[0][1]._validate_vocabulary()

        x.get_feature_names()



        ---------------------------------------------------------------------------

        AttributeError                            Traceback (most recent call last)

        <ipython-input-120-f620c754a34e> in <module>()

        ----> 1 x.get_feature_names()



        AttributeError: 'NoneType' object has no attribute 'get_feature_names'

Regards,
Shree

edited Nov 12 at 2:18

asked Nov 11 at 20:17

premgnc1983

1

Is there a reason you're looking at the pipelines object instead of the fitted model?
– Jarad
Nov 12 at 3:38

Either way it did not work. Actually I am saving each fitted model as per following code. fitted_models[name] = model. I am just interested in getting to work those error lines
– premgnc1983
Nov 12 at 12:46

add a comment |

up vote
0
down vote

favorite

How to find Top features from Naive Bayes using sklearn pipeline

Hi all,

    # Pipeline dictionary

    pipelines = {

        'bow_MultinomialNB' : make_pipeline(

                                        CountVectorizer(), 

                                        preprocessing.Normalizer(),

                                        MultinomialNB()

                                   )

    }





    # List tuneable hyperparameters of our  pipeline

    pipelines['bow_MultinomialNB'].get_params()





    # BOW -  MultinomialNB hyperparameters

    bow_MultinomialNB_hyperparameters = {

        'multinomialnb__alpha' : [1000,500,100,50,10,5,1,0.5,0.1,0.05,0.01,0.005,0.001,0.0005,0.0001]

    }



    # Create hyperparameters dictionary

    hyperparameters = {

        'bow_MultinomialNB' : bow_MultinomialNB_hyperparameters

    }





    tscv = TimeSeriesSplit(n_splits=3) #For time based splitting

    for name, pipeline in pipelines.items():

        print("NAME:",name)

        print("PIPELINE:",pipeline)





        %time

    # Create empty dictionary called fitted_models

    fitted_models = {}



    # Loop through model pipelines, tuning each one and saving it to fitted_models

    for name, pipeline in pipelines.items():

        # Create cross-validation object from pipeline and hyperparameters



        model = GridSearchCV(pipeline, hyperparameters[name], cv=tscv, n_jobs=1,verbose=1)





        # Fit model on X_train, y_train



        model.fit(X_train, y_train)





        # Store model in fitted_models[name] 



        fitted_models[name] = model





        # Print '{name} has been fitted'

        print(name, 'has been fitted.')

FEAURE IMPORTANCE:-

        pipelines['bow_MultinomialNB'].steps[2][1].classes__



        ---------------------------------------------------------------------------

        AttributeError                            Traceback (most recent call last)

        <ipython-input-125-7d45b007e86b> in <module>()

        ----> 1 pipelines['bow_MultinomialNB'].steps[2][1].classes_



        AttributeError: 'MultinomialNB' object has no attribute 'classes_'





        pipelines['bow_MultinomialNB'].steps[0][1].get_feature_names()

        ---------------------------------------------------------------------------

        NotFittedError                            Traceback (most recent call last)

        <ipython-input-126-2883929221d1> in <module>()

        ----> 1 pipelines['bow_MultinomialNB'].steps[0][1].get_feature_names()



        ~Anaconda3libsite-packagessklearnfeature_extractiontext.py in get_feature_names(self)

            958     def get_feature_names(self):

            959         """Array mapping from feature integer indices to feature name"""

        --> 960         self._check_vocabulary()

            961 

            962         return [t for t, i in sorted(six.iteritems(self.vocabulary_),



        ~Anaconda3libsite-packagessklearnfeature_extractiontext.py in _check_vocabulary(self)

            301         """Check if vocabulary is empty or missing (not fit-ed)"""

            302         msg = "%(name)s - Vocabulary wasn't fitted."

        --> 303         check_is_fitted(self, 'vocabulary_', msg=msg),

            304 

            305         if len(self.vocabulary_) == 0:



        ~Anaconda3libsite-packagessklearnutilsvalidation.py in check_is_fitted(estimator, attributes, msg, all_or_any)

            766 

            767     if not all_or_any([hasattr(estimator, attr) for attr in attributes]):

        --> 768         raise NotFittedError(msg % {'name': type(estimator).__name__})

            769 

            770 



        NotFittedError: CountVectorizer - Vocabulary wasn't fitted.





        x=pipelines['bow_MultinomialNB'].steps[0][1]._validate_vocabulary()

        x.get_feature_names()



        ---------------------------------------------------------------------------

        AttributeError                            Traceback (most recent call last)

        <ipython-input-120-f620c754a34e> in <module>()

        ----> 1 x.get_feature_names()



        AttributeError: 'NoneType' object has no attribute 'get_feature_names'

Regards,
Shree

edited Nov 12 at 2:18

asked Nov 11 at 20:17

premgnc1983

1

Is there a reason you're looking at the pipelines object instead of the fitted model?
– Jarad
Nov 12 at 3:38

Either way it did not work. Actually I am saving each fitted model as per following code. fitted_models[name] = model. I am just interested in getting to work those error lines
– premgnc1983
Nov 12 at 12:46

add a comment |

up vote
0
down vote

favorite

How to find Top features from Naive Bayes using sklearn pipeline

Hi all,

    # Pipeline dictionary

    pipelines = {

        'bow_MultinomialNB' : make_pipeline(

                                        CountVectorizer(), 

                                        preprocessing.Normalizer(),

                                        MultinomialNB()

                                   )

    }





    # List tuneable hyperparameters of our  pipeline

    pipelines['bow_MultinomialNB'].get_params()





    # BOW -  MultinomialNB hyperparameters

    bow_MultinomialNB_hyperparameters = {

        'multinomialnb__alpha' : [1000,500,100,50,10,5,1,0.5,0.1,0.05,0.01,0.005,0.001,0.0005,0.0001]

    }



    # Create hyperparameters dictionary

    hyperparameters = {

        'bow_MultinomialNB' : bow_MultinomialNB_hyperparameters

    }





    tscv = TimeSeriesSplit(n_splits=3) #For time based splitting

    for name, pipeline in pipelines.items():

        print("NAME:",name)

        print("PIPELINE:",pipeline)





        %time

    # Create empty dictionary called fitted_models

    fitted_models = {}



    # Loop through model pipelines, tuning each one and saving it to fitted_models

    for name, pipeline in pipelines.items():

        # Create cross-validation object from pipeline and hyperparameters



        model = GridSearchCV(pipeline, hyperparameters[name], cv=tscv, n_jobs=1,verbose=1)





        # Fit model on X_train, y_train



        model.fit(X_train, y_train)





        # Store model in fitted_models[name] 



        fitted_models[name] = model





        # Print '{name} has been fitted'

        print(name, 'has been fitted.')

FEAURE IMPORTANCE:-

        pipelines['bow_MultinomialNB'].steps[2][1].classes__



        ---------------------------------------------------------------------------

        AttributeError                            Traceback (most recent call last)

        <ipython-input-125-7d45b007e86b> in <module>()

        ----> 1 pipelines['bow_MultinomialNB'].steps[2][1].classes_



        AttributeError: 'MultinomialNB' object has no attribute 'classes_'





        pipelines['bow_MultinomialNB'].steps[0][1].get_feature_names()

        ---------------------------------------------------------------------------

        NotFittedError                            Traceback (most recent call last)

        <ipython-input-126-2883929221d1> in <module>()

        ----> 1 pipelines['bow_MultinomialNB'].steps[0][1].get_feature_names()



        ~Anaconda3libsite-packagessklearnfeature_extractiontext.py in get_feature_names(self)

            958     def get_feature_names(self):

            959         """Array mapping from feature integer indices to feature name"""

        --> 960         self._check_vocabulary()

            961 

            962         return [t for t, i in sorted(six.iteritems(self.vocabulary_),



        ~Anaconda3libsite-packagessklearnfeature_extractiontext.py in _check_vocabulary(self)

            301         """Check if vocabulary is empty or missing (not fit-ed)"""

            302         msg = "%(name)s - Vocabulary wasn't fitted."

        --> 303         check_is_fitted(self, 'vocabulary_', msg=msg),

            304 

            305         if len(self.vocabulary_) == 0:



        ~Anaconda3libsite-packagessklearnutilsvalidation.py in check_is_fitted(estimator, attributes, msg, all_or_any)

            766 

            767     if not all_or_any([hasattr(estimator, attr) for attr in attributes]):

        --> 768         raise NotFittedError(msg % {'name': type(estimator).__name__})

            769 

            770 



        NotFittedError: CountVectorizer - Vocabulary wasn't fitted.





        x=pipelines['bow_MultinomialNB'].steps[0][1]._validate_vocabulary()

        x.get_feature_names()



        ---------------------------------------------------------------------------

        AttributeError                            Traceback (most recent call last)

        <ipython-input-120-f620c754a34e> in <module>()

        ----> 1 x.get_feature_names()



        AttributeError: 'NoneType' object has no attribute 'get_feature_names'

Regards,
Shree

edited Nov 12 at 2:18

asked Nov 11 at 20:17

premgnc1983

How to find Top features from Naive Bayes using sklearn pipeline

Hi all,

    # Pipeline dictionary

    pipelines = {

        'bow_MultinomialNB' : make_pipeline(

                                        CountVectorizer(), 

                                        preprocessing.Normalizer(),

                                        MultinomialNB()

                                   )

    }





    # List tuneable hyperparameters of our  pipeline

    pipelines['bow_MultinomialNB'].get_params()





    # BOW -  MultinomialNB hyperparameters

    bow_MultinomialNB_hyperparameters = {

        'multinomialnb__alpha' : [1000,500,100,50,10,5,1,0.5,0.1,0.05,0.01,0.005,0.001,0.0005,0.0001]

    }



    # Create hyperparameters dictionary

    hyperparameters = {

        'bow_MultinomialNB' : bow_MultinomialNB_hyperparameters

    }





    tscv = TimeSeriesSplit(n_splits=3) #For time based splitting

    for name, pipeline in pipelines.items():

        print("NAME:",name)

        print("PIPELINE:",pipeline)





        %time

    # Create empty dictionary called fitted_models

    fitted_models = {}



    # Loop through model pipelines, tuning each one and saving it to fitted_models

    for name, pipeline in pipelines.items():

        # Create cross-validation object from pipeline and hyperparameters



        model = GridSearchCV(pipeline, hyperparameters[name], cv=tscv, n_jobs=1,verbose=1)





        # Fit model on X_train, y_train



        model.fit(X_train, y_train)





        # Store model in fitted_models[name] 



        fitted_models[name] = model





        # Print '{name} has been fitted'

        print(name, 'has been fitted.')

FEAURE IMPORTANCE:-

        pipelines['bow_MultinomialNB'].steps[2][1].classes__



        ---------------------------------------------------------------------------

        AttributeError                            Traceback (most recent call last)

        <ipython-input-125-7d45b007e86b> in <module>()

        ----> 1 pipelines['bow_MultinomialNB'].steps[2][1].classes_



        AttributeError: 'MultinomialNB' object has no attribute 'classes_'





        pipelines['bow_MultinomialNB'].steps[0][1].get_feature_names()

        ---------------------------------------------------------------------------

        NotFittedError                            Traceback (most recent call last)

        <ipython-input-126-2883929221d1> in <module>()

        ----> 1 pipelines['bow_MultinomialNB'].steps[0][1].get_feature_names()



        ~Anaconda3libsite-packagessklearnfeature_extractiontext.py in get_feature_names(self)

            958     def get_feature_names(self):

            959         """Array mapping from feature integer indices to feature name"""

        --> 960         self._check_vocabulary()

            961 

            962         return [t for t, i in sorted(six.iteritems(self.vocabulary_),



        ~Anaconda3libsite-packagessklearnfeature_extractiontext.py in _check_vocabulary(self)

            301         """Check if vocabulary is empty or missing (not fit-ed)"""

            302         msg = "%(name)s - Vocabulary wasn't fitted."

        --> 303         check_is_fitted(self, 'vocabulary_', msg=msg),

            304 

            305         if len(self.vocabulary_) == 0:



        ~Anaconda3libsite-packagessklearnutilsvalidation.py in check_is_fitted(estimator, attributes, msg, all_or_any)

            766 

            767     if not all_or_any([hasattr(estimator, attr) for attr in attributes]):

        --> 768         raise NotFittedError(msg % {'name': type(estimator).__name__})

            769 

            770 



        NotFittedError: CountVectorizer - Vocabulary wasn't fitted.





        x=pipelines['bow_MultinomialNB'].steps[0][1]._validate_vocabulary()

        x.get_feature_names()



        ---------------------------------------------------------------------------

        AttributeError                            Traceback (most recent call last)

        <ipython-input-120-f620c754a34e> in <module>()

        ----> 1 x.get_feature_names()



        AttributeError: 'NoneType' object has no attribute 'get_feature_names'

Regards,
Shree

scikit-learn pipeline feature-extraction naivebayes

edited Nov 12 at 2:18

asked Nov 11 at 20:17

premgnc1983

edited Nov 12 at 2:18

asked Nov 11 at 20:17

premgnc1983

edited Nov 12 at 2:18

asked Nov 11 at 20:17

premgnc1983

asked Nov 11 at 20:17

premgnc1983

asked Nov 11 at 20:17

premgnc1983

1

Is there a reason you're looking at the pipelines object instead of the fitted model?
– Jarad
Nov 12 at 3:38

Either way it did not work. Actually I am saving each fitted model as per following code. fitted_models[name] = model. I am just interested in getting to work those error lines
– premgnc1983
Nov 12 at 12:46

add a comment |

1

Is there a reason you're looking at the pipelines object instead of the fitted model?
– Jarad
Nov 12 at 3:38

Either way it did not work. Actually I am saving each fitted model as per following code. fitted_models[name] = model. I am just interested in getting to work those error lines
– premgnc1983
Nov 12 at 12:46

Is there a reason you're looking at the pipelines object instead of the fitted model?
– Jarad
Nov 12 at 3:38

Either way it did not work. Actually I am saving each fitted model as per following code. fitted_models[name] = model. I am just interested in getting to work those error lines
– premgnc1983
Nov 12 at 12:46

add a comment |

active

oldest

votes

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53252832%2fhow-to-find-top-features-from-naive-bayes-using-sklearn-pipeline%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

active

oldest

votes

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

Some of your past answers have not been well-received, and you're in danger of being blocked from answering.

Please pay close attention to the following guidance:

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Vfrdtyky