Sentiment analysis using Hidden Markov Model

up vote
1
down vote

favorite

I have a list of reviews, each element of the list is a review of IMDB data set in kaggle. there are 25000 reviews in total. I have the label of each review +1 for positive and -1 for negative.

I want to train a Hidden Markov Model with these reviews and labels.

1- what is the sequence that I should give to HMM? is it something like Bag of words or is it something else like probabilities which I need to calculate? what kind of feature extraction method is appropriate? I was told to use Bag of words on review's list, but when I searched a little I find out HMM cares about the order but bag of words doesn't maintain the order of words in sequences. how should I prepare this List of reviews to be able to feed it into a HMM model?

2- is there a framework for this? I know hmmlearn, and I think I should use the MultinomialHMM, correct me if I'm wrong. but it is not supervised, its models do not take labels as input when i want to train it, and I get some funny errors which I don't know how to solve because of the first question I asked about the correct type of input I should give to it. seqlearn is the one I find recently, is it good or there is a better one to use?

I appreciate any guidance since I have almost zero knowledge about NLP.

edited yesterday

asked yesterday

leo

1618

HMMs are used when you need to assign one label for each item in a sequence. In sentiment analysis, you assign a single label to the whole sequence (the review), so HMMs are not very appropriate for this task. Instead, you can turn to a Naive Bayes classifier (as in this blog post). Both HMMs and Naive Bayes can be learned either in a supervised setting or in an unsupervised setting (you specify the number of labels, and usually use the Expectation-Maximization algorithm to learn them without supervision).
– mcoav
yesterday

indeed. that was what I find out too, you gave label to each item in a sequence, but this is a project for my class and I must use HMM I can't use anything else.I know how HMM works in a abstract level, but I can't map my little knowledge of HMM to this problem. thanks for feedback
– leo
yesterday

add a comment |

up vote
1
down vote

favorite

I have a list of reviews, each element of the list is a review of IMDB data set in kaggle. there are 25000 reviews in total. I have the label of each review +1 for positive and -1 for negative.

I want to train a Hidden Markov Model with these reviews and labels.

I appreciate any guidance since I have almost zero knowledge about NLP.

edited yesterday

asked yesterday

leo

1618

HMMs are used when you need to assign one label for each item in a sequence. In sentiment analysis, you assign a single label to the whole sequence (the review), so HMMs are not very appropriate for this task. Instead, you can turn to a Naive Bayes classifier (as in this blog post). Both HMMs and Naive Bayes can be learned either in a supervised setting or in an unsupervised setting (you specify the number of labels, and usually use the Expectation-Maximization algorithm to learn them without supervision).
– mcoav
yesterday

indeed. that was what I find out too, you gave label to each item in a sequence, but this is a project for my class and I must use HMM I can't use anything else.I know how HMM works in a abstract level, but I can't map my little knowledge of HMM to this problem. thanks for feedback
– leo
yesterday

add a comment |

up vote
1
down vote

favorite

I have a list of reviews, each element of the list is a review of IMDB data set in kaggle. there are 25000 reviews in total. I have the label of each review +1 for positive and -1 for negative.

I want to train a Hidden Markov Model with these reviews and labels.

I appreciate any guidance since I have almost zero knowledge about NLP.

edited yesterday

asked yesterday

leo

1618

I have a list of reviews, each element of the list is a review of IMDB data set in kaggle. there are 25000 reviews in total. I have the label of each review +1 for positive and -1 for negative.

I want to train a Hidden Markov Model with these reviews and labels.

I appreciate any guidance since I have almost zero knowledge about NLP.

python nlp sentiment-analysis hidden-markov-models

edited yesterday

asked yesterday

leo

1618

edited yesterday

asked yesterday

leo

1618

edited yesterday

asked yesterday

leo

1618

asked yesterday

leo

1618

asked yesterday

leo

1618

HMMs are used when you need to assign one label for each item in a sequence. In sentiment analysis, you assign a single label to the whole sequence (the review), so HMMs are not very appropriate for this task. Instead, you can turn to a Naive Bayes classifier (as in this blog post). Both HMMs and Naive Bayes can be learned either in a supervised setting or in an unsupervised setting (you specify the number of labels, and usually use the Expectation-Maximization algorithm to learn them without supervision).
– mcoav
yesterday

indeed. that was what I find out too, you gave label to each item in a sequence, but this is a project for my class and I must use HMM I can't use anything else.I know how HMM works in a abstract level, but I can't map my little knowledge of HMM to this problem. thanks for feedback
– leo
yesterday

add a comment |

HMMs are used when you need to assign one label for each item in a sequence. In sentiment analysis, you assign a single label to the whole sequence (the review), so HMMs are not very appropriate for this task. Instead, you can turn to a Naive Bayes classifier (as in this blog post). Both HMMs and Naive Bayes can be learned either in a supervised setting or in an unsupervised setting (you specify the number of labels, and usually use the Expectation-Maximization algorithm to learn them without supervision).
– mcoav
yesterday

indeed. that was what I find out too, you gave label to each item in a sequence, but this is a project for my class and I must use HMM I can't use anything else.I know how HMM works in a abstract level, but I can't map my little knowledge of HMM to this problem. thanks for feedback
– leo
yesterday

HMMs are used when you need to assign one label for each item in a sequence. In sentiment analysis, you assign a single label to the whole sequence (the review), so HMMs are not very appropriate for this task. Instead, you can turn to a Naive Bayes classifier (as in this blog post). Both HMMs and Naive Bayes can be learned either in a supervised setting or in an unsupervised setting (you specify the number of labels, and usually use the Expectation-Maximization algorithm to learn them without supervision).
– mcoav
yesterday

indeed. that was what I find out too, you gave label to each item in a sequence, but this is a project for my class and I must use HMM I can't use anything else.I know how HMM works in a abstract level, but I can't map my little knowledge of HMM to this problem. thanks for feedback
– leo
yesterday

add a comment |

1 Answer
1

active

oldest

votes

up vote
0
down vote

I was able to do it somehow with surprisingly good accuracy, yet I am not sure what happened exactly, I used seqlearn framework which has a sad documentation. I really suggest to use MATLAB instead of python for HMM.

I used sklearn TfidfVectorizer for feature extraction, then I did this:

vectorizer = TfidfVectorizer(norm=None)

x_train = vectorizer.fit_transform(train_review)

x_test = vectorizer.transform(test_review)



len_train_seq = np.array([[1,1]]*(len(train_review)/2)) # this part was really annoying

len_test_seq = np.array([1]*len(test_review))



model = seqlearn.hmm.MultinomialHMM()

HMM_Classifier = model.fit(x_train, Y, lengths = len_train_seq)

y_predict = HMM_Classifier.predict(x_test, lengths=len_test_seq)

I still would appreciate if a knowledgable person about HMM gives a more robust and clean guideline about doing sentiment analysis with HMM.

edited yesterday

answered yesterday

leo

1618

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53238402%2fsentiment-analysis-using-hidden-markov-model%23new-answer', 'question_page');
}
);

Post as a guest

Name

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

up vote
0
down vote

I used sklearn TfidfVectorizer for feature extraction, then I did this:

vectorizer = TfidfVectorizer(norm=None)

x_train = vectorizer.fit_transform(train_review)

x_test = vectorizer.transform(test_review)



len_train_seq = np.array([[1,1]]*(len(train_review)/2)) # this part was really annoying

len_test_seq = np.array([1]*len(test_review))



model = seqlearn.hmm.MultinomialHMM()

HMM_Classifier = model.fit(x_train, Y, lengths = len_train_seq)

y_predict = HMM_Classifier.predict(x_test, lengths=len_test_seq)

I still would appreciate if a knowledgable person about HMM gives a more robust and clean guideline about doing sentiment analysis with HMM.

edited yesterday

answered yesterday

leo

1618

add a comment |

up vote
0
down vote

I used sklearn TfidfVectorizer for feature extraction, then I did this:

vectorizer = TfidfVectorizer(norm=None)

x_train = vectorizer.fit_transform(train_review)

x_test = vectorizer.transform(test_review)



len_train_seq = np.array([[1,1]]*(len(train_review)/2)) # this part was really annoying

len_test_seq = np.array([1]*len(test_review))



model = seqlearn.hmm.MultinomialHMM()

HMM_Classifier = model.fit(x_train, Y, lengths = len_train_seq)

y_predict = HMM_Classifier.predict(x_test, lengths=len_test_seq)

I still would appreciate if a knowledgable person about HMM gives a more robust and clean guideline about doing sentiment analysis with HMM.

edited yesterday

answered yesterday

leo

1618

add a comment |

up vote
0
down vote

I used sklearn TfidfVectorizer for feature extraction, then I did this:

vectorizer = TfidfVectorizer(norm=None)

x_train = vectorizer.fit_transform(train_review)

x_test = vectorizer.transform(test_review)



len_train_seq = np.array([[1,1]]*(len(train_review)/2)) # this part was really annoying

len_test_seq = np.array([1]*len(test_review))



model = seqlearn.hmm.MultinomialHMM()

HMM_Classifier = model.fit(x_train, Y, lengths = len_train_seq)

y_predict = HMM_Classifier.predict(x_test, lengths=len_test_seq)

I still would appreciate if a knowledgable person about HMM gives a more robust and clean guideline about doing sentiment analysis with HMM.

edited yesterday

answered yesterday

leo

1618

I used sklearn TfidfVectorizer for feature extraction, then I did this:

vectorizer = TfidfVectorizer(norm=None)

x_train = vectorizer.fit_transform(train_review)

x_test = vectorizer.transform(test_review)



len_train_seq = np.array([[1,1]]*(len(train_review)/2)) # this part was really annoying

len_test_seq = np.array([1]*len(test_review))



model = seqlearn.hmm.MultinomialHMM()

HMM_Classifier = model.fit(x_train, Y, lengths = len_train_seq)

y_predict = HMM_Classifier.predict(x_test, lengths=len_test_seq)

I still would appreciate if a knowledgable person about HMM gives a more robust and clean guideline about doing sentiment analysis with HMM.

edited yesterday

answered yesterday

leo

1618

edited yesterday

answered yesterday

leo

1618

answered yesterday

leo

1618

answered yesterday

leo

1618

add a comment |

draft saved

draft discarded

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Name

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Vfrdtyky