Sentiment analysis using Hidden Markov Model











up vote
1
down vote

favorite












I have a list of reviews, each element of the list is a review of IMDB data set in kaggle. there are 25000 reviews in total. I have the label of each review +1 for positive and -1 for negative.



I want to train a Hidden Markov Model with these reviews and labels.



1- what is the sequence that I should give to HMM? is it something like Bag of words or is it something else like probabilities which I need to calculate? what kind of feature extraction method is appropriate? I was told to use Bag of words on review's list, but when I searched a little I find out HMM cares about the order but bag of words doesn't maintain the order of words in sequences. how should I prepare this List of reviews to be able to feed it into a HMM model?



2- is there a framework for this? I know hmmlearn, and I think I should use the MultinomialHMM, correct me if I'm wrong. but it is not supervised, its models do not take labels as input when i want to train it, and I get some funny errors which I don't know how to solve because of the first question I asked about the correct type of input I should give to it. seqlearn is the one I find recently, is it good or there is a better one to use?



I appreciate any guidance since I have almost zero knowledge about NLP.










share|improve this question
























  • HMMs are used when you need to assign one label for each item in a sequence. In sentiment analysis, you assign a single label to the whole sequence (the review), so HMMs are not very appropriate for this task. Instead, you can turn to a Naive Bayes classifier (as in this blog post). Both HMMs and Naive Bayes can be learned either in a supervised setting or in an unsupervised setting (you specify the number of labels, and usually use the Expectation-Maximization algorithm to learn them without supervision).
    – mcoav
    yesterday










  • indeed. that was what I find out too, you gave label to each item in a sequence, but this is a project for my class and I must use HMM I can't use anything else.I know how HMM works in a abstract level, but I can't map my little knowledge of HMM to this problem. thanks for feedback
    – leo
    yesterday















up vote
1
down vote

favorite












I have a list of reviews, each element of the list is a review of IMDB data set in kaggle. there are 25000 reviews in total. I have the label of each review +1 for positive and -1 for negative.



I want to train a Hidden Markov Model with these reviews and labels.



1- what is the sequence that I should give to HMM? is it something like Bag of words or is it something else like probabilities which I need to calculate? what kind of feature extraction method is appropriate? I was told to use Bag of words on review's list, but when I searched a little I find out HMM cares about the order but bag of words doesn't maintain the order of words in sequences. how should I prepare this List of reviews to be able to feed it into a HMM model?



2- is there a framework for this? I know hmmlearn, and I think I should use the MultinomialHMM, correct me if I'm wrong. but it is not supervised, its models do not take labels as input when i want to train it, and I get some funny errors which I don't know how to solve because of the first question I asked about the correct type of input I should give to it. seqlearn is the one I find recently, is it good or there is a better one to use?



I appreciate any guidance since I have almost zero knowledge about NLP.










share|improve this question
























  • HMMs are used when you need to assign one label for each item in a sequence. In sentiment analysis, you assign a single label to the whole sequence (the review), so HMMs are not very appropriate for this task. Instead, you can turn to a Naive Bayes classifier (as in this blog post). Both HMMs and Naive Bayes can be learned either in a supervised setting or in an unsupervised setting (you specify the number of labels, and usually use the Expectation-Maximization algorithm to learn them without supervision).
    – mcoav
    yesterday










  • indeed. that was what I find out too, you gave label to each item in a sequence, but this is a project for my class and I must use HMM I can't use anything else.I know how HMM works in a abstract level, but I can't map my little knowledge of HMM to this problem. thanks for feedback
    – leo
    yesterday













up vote
1
down vote

favorite









up vote
1
down vote

favorite











I have a list of reviews, each element of the list is a review of IMDB data set in kaggle. there are 25000 reviews in total. I have the label of each review +1 for positive and -1 for negative.



I want to train a Hidden Markov Model with these reviews and labels.



1- what is the sequence that I should give to HMM? is it something like Bag of words or is it something else like probabilities which I need to calculate? what kind of feature extraction method is appropriate? I was told to use Bag of words on review's list, but when I searched a little I find out HMM cares about the order but bag of words doesn't maintain the order of words in sequences. how should I prepare this List of reviews to be able to feed it into a HMM model?



2- is there a framework for this? I know hmmlearn, and I think I should use the MultinomialHMM, correct me if I'm wrong. but it is not supervised, its models do not take labels as input when i want to train it, and I get some funny errors which I don't know how to solve because of the first question I asked about the correct type of input I should give to it. seqlearn is the one I find recently, is it good or there is a better one to use?



I appreciate any guidance since I have almost zero knowledge about NLP.










share|improve this question















I have a list of reviews, each element of the list is a review of IMDB data set in kaggle. there are 25000 reviews in total. I have the label of each review +1 for positive and -1 for negative.



I want to train a Hidden Markov Model with these reviews and labels.



1- what is the sequence that I should give to HMM? is it something like Bag of words or is it something else like probabilities which I need to calculate? what kind of feature extraction method is appropriate? I was told to use Bag of words on review's list, but when I searched a little I find out HMM cares about the order but bag of words doesn't maintain the order of words in sequences. how should I prepare this List of reviews to be able to feed it into a HMM model?



2- is there a framework for this? I know hmmlearn, and I think I should use the MultinomialHMM, correct me if I'm wrong. but it is not supervised, its models do not take labels as input when i want to train it, and I get some funny errors which I don't know how to solve because of the first question I asked about the correct type of input I should give to it. seqlearn is the one I find recently, is it good or there is a better one to use?



I appreciate any guidance since I have almost zero knowledge about NLP.







python nlp sentiment-analysis hidden-markov-models






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited yesterday

























asked yesterday









leo

1618




1618












  • HMMs are used when you need to assign one label for each item in a sequence. In sentiment analysis, you assign a single label to the whole sequence (the review), so HMMs are not very appropriate for this task. Instead, you can turn to a Naive Bayes classifier (as in this blog post). Both HMMs and Naive Bayes can be learned either in a supervised setting or in an unsupervised setting (you specify the number of labels, and usually use the Expectation-Maximization algorithm to learn them without supervision).
    – mcoav
    yesterday










  • indeed. that was what I find out too, you gave label to each item in a sequence, but this is a project for my class and I must use HMM I can't use anything else.I know how HMM works in a abstract level, but I can't map my little knowledge of HMM to this problem. thanks for feedback
    – leo
    yesterday


















  • HMMs are used when you need to assign one label for each item in a sequence. In sentiment analysis, you assign a single label to the whole sequence (the review), so HMMs are not very appropriate for this task. Instead, you can turn to a Naive Bayes classifier (as in this blog post). Both HMMs and Naive Bayes can be learned either in a supervised setting or in an unsupervised setting (you specify the number of labels, and usually use the Expectation-Maximization algorithm to learn them without supervision).
    – mcoav
    yesterday










  • indeed. that was what I find out too, you gave label to each item in a sequence, but this is a project for my class and I must use HMM I can't use anything else.I know how HMM works in a abstract level, but I can't map my little knowledge of HMM to this problem. thanks for feedback
    – leo
    yesterday
















HMMs are used when you need to assign one label for each item in a sequence. In sentiment analysis, you assign a single label to the whole sequence (the review), so HMMs are not very appropriate for this task. Instead, you can turn to a Naive Bayes classifier (as in this blog post). Both HMMs and Naive Bayes can be learned either in a supervised setting or in an unsupervised setting (you specify the number of labels, and usually use the Expectation-Maximization algorithm to learn them without supervision).
– mcoav
yesterday




HMMs are used when you need to assign one label for each item in a sequence. In sentiment analysis, you assign a single label to the whole sequence (the review), so HMMs are not very appropriate for this task. Instead, you can turn to a Naive Bayes classifier (as in this blog post). Both HMMs and Naive Bayes can be learned either in a supervised setting or in an unsupervised setting (you specify the number of labels, and usually use the Expectation-Maximization algorithm to learn them without supervision).
– mcoav
yesterday












indeed. that was what I find out too, you gave label to each item in a sequence, but this is a project for my class and I must use HMM I can't use anything else.I know how HMM works in a abstract level, but I can't map my little knowledge of HMM to this problem. thanks for feedback
– leo
yesterday




indeed. that was what I find out too, you gave label to each item in a sequence, but this is a project for my class and I must use HMM I can't use anything else.I know how HMM works in a abstract level, but I can't map my little knowledge of HMM to this problem. thanks for feedback
– leo
yesterday












1 Answer
1






active

oldest

votes

















up vote
0
down vote













I was able to do it somehow with surprisingly good accuracy, yet I am not sure what happened exactly, I used seqlearn framework which has a sad documentation. I really suggest to use MATLAB instead of python for HMM.



I used sklearn TfidfVectorizer for feature extraction, then I did this:



vectorizer = TfidfVectorizer(norm=None)
x_train = vectorizer.fit_transform(train_review)
x_test = vectorizer.transform(test_review)

len_train_seq = np.array([[1,1]]*(len(train_review)/2)) # this part was really annoying
len_test_seq = np.array([1]*len(test_review))

model = seqlearn.hmm.MultinomialHMM()
HMM_Classifier = model.fit(x_train, Y, lengths = len_train_seq)
y_predict = HMM_Classifier.predict(x_test, lengths=len_test_seq)


I still would appreciate if a knowledgable person about HMM gives a more robust and clean guideline about doing sentiment analysis with HMM.






share|improve this answer























    Your Answer






    StackExchange.ifUsing("editor", function () {
    StackExchange.using("externalEditor", function () {
    StackExchange.using("snippets", function () {
    StackExchange.snippets.init();
    });
    });
    }, "code-snippets");

    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "1"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














     

    draft saved


    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53238402%2fsentiment-analysis-using-hidden-markov-model%23new-answer', 'question_page');
    }
    );

    Post as a guest
































    1 Answer
    1






    active

    oldest

    votes








    1 Answer
    1






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes








    up vote
    0
    down vote













    I was able to do it somehow with surprisingly good accuracy, yet I am not sure what happened exactly, I used seqlearn framework which has a sad documentation. I really suggest to use MATLAB instead of python for HMM.



    I used sklearn TfidfVectorizer for feature extraction, then I did this:



    vectorizer = TfidfVectorizer(norm=None)
    x_train = vectorizer.fit_transform(train_review)
    x_test = vectorizer.transform(test_review)

    len_train_seq = np.array([[1,1]]*(len(train_review)/2)) # this part was really annoying
    len_test_seq = np.array([1]*len(test_review))

    model = seqlearn.hmm.MultinomialHMM()
    HMM_Classifier = model.fit(x_train, Y, lengths = len_train_seq)
    y_predict = HMM_Classifier.predict(x_test, lengths=len_test_seq)


    I still would appreciate if a knowledgable person about HMM gives a more robust and clean guideline about doing sentiment analysis with HMM.






    share|improve this answer



























      up vote
      0
      down vote













      I was able to do it somehow with surprisingly good accuracy, yet I am not sure what happened exactly, I used seqlearn framework which has a sad documentation. I really suggest to use MATLAB instead of python for HMM.



      I used sklearn TfidfVectorizer for feature extraction, then I did this:



      vectorizer = TfidfVectorizer(norm=None)
      x_train = vectorizer.fit_transform(train_review)
      x_test = vectorizer.transform(test_review)

      len_train_seq = np.array([[1,1]]*(len(train_review)/2)) # this part was really annoying
      len_test_seq = np.array([1]*len(test_review))

      model = seqlearn.hmm.MultinomialHMM()
      HMM_Classifier = model.fit(x_train, Y, lengths = len_train_seq)
      y_predict = HMM_Classifier.predict(x_test, lengths=len_test_seq)


      I still would appreciate if a knowledgable person about HMM gives a more robust and clean guideline about doing sentiment analysis with HMM.






      share|improve this answer

























        up vote
        0
        down vote










        up vote
        0
        down vote









        I was able to do it somehow with surprisingly good accuracy, yet I am not sure what happened exactly, I used seqlearn framework which has a sad documentation. I really suggest to use MATLAB instead of python for HMM.



        I used sklearn TfidfVectorizer for feature extraction, then I did this:



        vectorizer = TfidfVectorizer(norm=None)
        x_train = vectorizer.fit_transform(train_review)
        x_test = vectorizer.transform(test_review)

        len_train_seq = np.array([[1,1]]*(len(train_review)/2)) # this part was really annoying
        len_test_seq = np.array([1]*len(test_review))

        model = seqlearn.hmm.MultinomialHMM()
        HMM_Classifier = model.fit(x_train, Y, lengths = len_train_seq)
        y_predict = HMM_Classifier.predict(x_test, lengths=len_test_seq)


        I still would appreciate if a knowledgable person about HMM gives a more robust and clean guideline about doing sentiment analysis with HMM.






        share|improve this answer














        I was able to do it somehow with surprisingly good accuracy, yet I am not sure what happened exactly, I used seqlearn framework which has a sad documentation. I really suggest to use MATLAB instead of python for HMM.



        I used sklearn TfidfVectorizer for feature extraction, then I did this:



        vectorizer = TfidfVectorizer(norm=None)
        x_train = vectorizer.fit_transform(train_review)
        x_test = vectorizer.transform(test_review)

        len_train_seq = np.array([[1,1]]*(len(train_review)/2)) # this part was really annoying
        len_test_seq = np.array([1]*len(test_review))

        model = seqlearn.hmm.MultinomialHMM()
        HMM_Classifier = model.fit(x_train, Y, lengths = len_train_seq)
        y_predict = HMM_Classifier.predict(x_test, lengths=len_test_seq)


        I still would appreciate if a knowledgable person about HMM gives a more robust and clean guideline about doing sentiment analysis with HMM.







        share|improve this answer














        share|improve this answer



        share|improve this answer








        edited yesterday

























        answered yesterday









        leo

        1618




        1618






























             

            draft saved


            draft discarded



















































             


            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53238402%2fsentiment-analysis-using-hidden-markov-model%23new-answer', 'question_page');
            }
            );

            Post as a guest




















































































            Popular posts from this blog

            List item for chat from Array inside array React Native

            Thiostrepton

            Caerphilly