Sentiment analysis using Hidden Markov Model
up vote
1
down vote
favorite
I have a list of reviews, each element of the list is a review of IMDB data set in kaggle. there are 25000 reviews in total. I have the label of each review +1 for positive and -1 for negative.
I want to train a Hidden Markov Model with these reviews and labels.
1- what is the sequence that I should give to HMM? is it something like Bag of words or is it something else like probabilities which I need to calculate? what kind of feature extraction method is appropriate? I was told to use Bag of words on review's list, but when I searched a little I find out HMM cares about the order but bag of words doesn't maintain the order of words in sequences. how should I prepare this List of reviews to be able to feed it into a HMM model?
2- is there a framework for this? I know hmmlearn, and I think I should use the MultinomialHMM, correct me if I'm wrong. but it is not supervised, its models do not take labels as input when i want to train it, and I get some funny errors which I don't know how to solve because of the first question I asked about the correct type of input I should give to it. seqlearn is the one I find recently, is it good or there is a better one to use?
I appreciate any guidance since I have almost zero knowledge about NLP.
python nlp sentiment-analysis hidden-markov-models
add a comment |
up vote
1
down vote
favorite
I have a list of reviews, each element of the list is a review of IMDB data set in kaggle. there are 25000 reviews in total. I have the label of each review +1 for positive and -1 for negative.
I want to train a Hidden Markov Model with these reviews and labels.
1- what is the sequence that I should give to HMM? is it something like Bag of words or is it something else like probabilities which I need to calculate? what kind of feature extraction method is appropriate? I was told to use Bag of words on review's list, but when I searched a little I find out HMM cares about the order but bag of words doesn't maintain the order of words in sequences. how should I prepare this List of reviews to be able to feed it into a HMM model?
2- is there a framework for this? I know hmmlearn, and I think I should use the MultinomialHMM, correct me if I'm wrong. but it is not supervised, its models do not take labels as input when i want to train it, and I get some funny errors which I don't know how to solve because of the first question I asked about the correct type of input I should give to it. seqlearn is the one I find recently, is it good or there is a better one to use?
I appreciate any guidance since I have almost zero knowledge about NLP.
python nlp sentiment-analysis hidden-markov-models
HMMs are used when you need to assign one label for each item in a sequence. In sentiment analysis, you assign a single label to the whole sequence (the review), so HMMs are not very appropriate for this task. Instead, you can turn to a Naive Bayes classifier (as in this blog post). Both HMMs and Naive Bayes can be learned either in a supervised setting or in an unsupervised setting (you specify the number of labels, and usually use the Expectation-Maximization algorithm to learn them without supervision).
– mcoav
yesterday
indeed. that was what I find out too, you gave label to each item in a sequence, but this is a project for my class and I must use HMM I can't use anything else.I know how HMM works in a abstract level, but I can't map my little knowledge of HMM to this problem. thanks for feedback
– leo
yesterday
add a comment |
up vote
1
down vote
favorite
up vote
1
down vote
favorite
I have a list of reviews, each element of the list is a review of IMDB data set in kaggle. there are 25000 reviews in total. I have the label of each review +1 for positive and -1 for negative.
I want to train a Hidden Markov Model with these reviews and labels.
1- what is the sequence that I should give to HMM? is it something like Bag of words or is it something else like probabilities which I need to calculate? what kind of feature extraction method is appropriate? I was told to use Bag of words on review's list, but when I searched a little I find out HMM cares about the order but bag of words doesn't maintain the order of words in sequences. how should I prepare this List of reviews to be able to feed it into a HMM model?
2- is there a framework for this? I know hmmlearn, and I think I should use the MultinomialHMM, correct me if I'm wrong. but it is not supervised, its models do not take labels as input when i want to train it, and I get some funny errors which I don't know how to solve because of the first question I asked about the correct type of input I should give to it. seqlearn is the one I find recently, is it good or there is a better one to use?
I appreciate any guidance since I have almost zero knowledge about NLP.
python nlp sentiment-analysis hidden-markov-models
I have a list of reviews, each element of the list is a review of IMDB data set in kaggle. there are 25000 reviews in total. I have the label of each review +1 for positive and -1 for negative.
I want to train a Hidden Markov Model with these reviews and labels.
1- what is the sequence that I should give to HMM? is it something like Bag of words or is it something else like probabilities which I need to calculate? what kind of feature extraction method is appropriate? I was told to use Bag of words on review's list, but when I searched a little I find out HMM cares about the order but bag of words doesn't maintain the order of words in sequences. how should I prepare this List of reviews to be able to feed it into a HMM model?
2- is there a framework for this? I know hmmlearn, and I think I should use the MultinomialHMM, correct me if I'm wrong. but it is not supervised, its models do not take labels as input when i want to train it, and I get some funny errors which I don't know how to solve because of the first question I asked about the correct type of input I should give to it. seqlearn is the one I find recently, is it good or there is a better one to use?
I appreciate any guidance since I have almost zero knowledge about NLP.
python nlp sentiment-analysis hidden-markov-models
python nlp sentiment-analysis hidden-markov-models
edited yesterday
asked yesterday
leo
1618
1618
HMMs are used when you need to assign one label for each item in a sequence. In sentiment analysis, you assign a single label to the whole sequence (the review), so HMMs are not very appropriate for this task. Instead, you can turn to a Naive Bayes classifier (as in this blog post). Both HMMs and Naive Bayes can be learned either in a supervised setting or in an unsupervised setting (you specify the number of labels, and usually use the Expectation-Maximization algorithm to learn them without supervision).
– mcoav
yesterday
indeed. that was what I find out too, you gave label to each item in a sequence, but this is a project for my class and I must use HMM I can't use anything else.I know how HMM works in a abstract level, but I can't map my little knowledge of HMM to this problem. thanks for feedback
– leo
yesterday
add a comment |
HMMs are used when you need to assign one label for each item in a sequence. In sentiment analysis, you assign a single label to the whole sequence (the review), so HMMs are not very appropriate for this task. Instead, you can turn to a Naive Bayes classifier (as in this blog post). Both HMMs and Naive Bayes can be learned either in a supervised setting or in an unsupervised setting (you specify the number of labels, and usually use the Expectation-Maximization algorithm to learn them without supervision).
– mcoav
yesterday
indeed. that was what I find out too, you gave label to each item in a sequence, but this is a project for my class and I must use HMM I can't use anything else.I know how HMM works in a abstract level, but I can't map my little knowledge of HMM to this problem. thanks for feedback
– leo
yesterday
HMMs are used when you need to assign one label for each item in a sequence. In sentiment analysis, you assign a single label to the whole sequence (the review), so HMMs are not very appropriate for this task. Instead, you can turn to a Naive Bayes classifier (as in this blog post). Both HMMs and Naive Bayes can be learned either in a supervised setting or in an unsupervised setting (you specify the number of labels, and usually use the Expectation-Maximization algorithm to learn them without supervision).
– mcoav
yesterday
HMMs are used when you need to assign one label for each item in a sequence. In sentiment analysis, you assign a single label to the whole sequence (the review), so HMMs are not very appropriate for this task. Instead, you can turn to a Naive Bayes classifier (as in this blog post). Both HMMs and Naive Bayes can be learned either in a supervised setting or in an unsupervised setting (you specify the number of labels, and usually use the Expectation-Maximization algorithm to learn them without supervision).
– mcoav
yesterday
indeed. that was what I find out too, you gave label to each item in a sequence, but this is a project for my class and I must use HMM I can't use anything else.I know how HMM works in a abstract level, but I can't map my little knowledge of HMM to this problem. thanks for feedback
– leo
yesterday
indeed. that was what I find out too, you gave label to each item in a sequence, but this is a project for my class and I must use HMM I can't use anything else.I know how HMM works in a abstract level, but I can't map my little knowledge of HMM to this problem. thanks for feedback
– leo
yesterday
add a comment |
1 Answer
1
active
oldest
votes
up vote
0
down vote
I was able to do it somehow with surprisingly good accuracy, yet I am not sure what happened exactly, I used seqlearn framework which has a sad documentation. I really suggest to use MATLAB instead of python for HMM.
I used sklearn TfidfVectorizer for feature extraction, then I did this:
vectorizer = TfidfVectorizer(norm=None)
x_train = vectorizer.fit_transform(train_review)
x_test = vectorizer.transform(test_review)
len_train_seq = np.array([[1,1]]*(len(train_review)/2)) # this part was really annoying
len_test_seq = np.array([1]*len(test_review))
model = seqlearn.hmm.MultinomialHMM()
HMM_Classifier = model.fit(x_train, Y, lengths = len_train_seq)
y_predict = HMM_Classifier.predict(x_test, lengths=len_test_seq)
I still would appreciate if a knowledgable person about HMM gives a more robust and clean guideline about doing sentiment analysis with HMM.
add a comment |
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
0
down vote
I was able to do it somehow with surprisingly good accuracy, yet I am not sure what happened exactly, I used seqlearn framework which has a sad documentation. I really suggest to use MATLAB instead of python for HMM.
I used sklearn TfidfVectorizer for feature extraction, then I did this:
vectorizer = TfidfVectorizer(norm=None)
x_train = vectorizer.fit_transform(train_review)
x_test = vectorizer.transform(test_review)
len_train_seq = np.array([[1,1]]*(len(train_review)/2)) # this part was really annoying
len_test_seq = np.array([1]*len(test_review))
model = seqlearn.hmm.MultinomialHMM()
HMM_Classifier = model.fit(x_train, Y, lengths = len_train_seq)
y_predict = HMM_Classifier.predict(x_test, lengths=len_test_seq)
I still would appreciate if a knowledgable person about HMM gives a more robust and clean guideline about doing sentiment analysis with HMM.
add a comment |
up vote
0
down vote
I was able to do it somehow with surprisingly good accuracy, yet I am not sure what happened exactly, I used seqlearn framework which has a sad documentation. I really suggest to use MATLAB instead of python for HMM.
I used sklearn TfidfVectorizer for feature extraction, then I did this:
vectorizer = TfidfVectorizer(norm=None)
x_train = vectorizer.fit_transform(train_review)
x_test = vectorizer.transform(test_review)
len_train_seq = np.array([[1,1]]*(len(train_review)/2)) # this part was really annoying
len_test_seq = np.array([1]*len(test_review))
model = seqlearn.hmm.MultinomialHMM()
HMM_Classifier = model.fit(x_train, Y, lengths = len_train_seq)
y_predict = HMM_Classifier.predict(x_test, lengths=len_test_seq)
I still would appreciate if a knowledgable person about HMM gives a more robust and clean guideline about doing sentiment analysis with HMM.
add a comment |
up vote
0
down vote
up vote
0
down vote
I was able to do it somehow with surprisingly good accuracy, yet I am not sure what happened exactly, I used seqlearn framework which has a sad documentation. I really suggest to use MATLAB instead of python for HMM.
I used sklearn TfidfVectorizer for feature extraction, then I did this:
vectorizer = TfidfVectorizer(norm=None)
x_train = vectorizer.fit_transform(train_review)
x_test = vectorizer.transform(test_review)
len_train_seq = np.array([[1,1]]*(len(train_review)/2)) # this part was really annoying
len_test_seq = np.array([1]*len(test_review))
model = seqlearn.hmm.MultinomialHMM()
HMM_Classifier = model.fit(x_train, Y, lengths = len_train_seq)
y_predict = HMM_Classifier.predict(x_test, lengths=len_test_seq)
I still would appreciate if a knowledgable person about HMM gives a more robust and clean guideline about doing sentiment analysis with HMM.
I was able to do it somehow with surprisingly good accuracy, yet I am not sure what happened exactly, I used seqlearn framework which has a sad documentation. I really suggest to use MATLAB instead of python for HMM.
I used sklearn TfidfVectorizer for feature extraction, then I did this:
vectorizer = TfidfVectorizer(norm=None)
x_train = vectorizer.fit_transform(train_review)
x_test = vectorizer.transform(test_review)
len_train_seq = np.array([[1,1]]*(len(train_review)/2)) # this part was really annoying
len_test_seq = np.array([1]*len(test_review))
model = seqlearn.hmm.MultinomialHMM()
HMM_Classifier = model.fit(x_train, Y, lengths = len_train_seq)
y_predict = HMM_Classifier.predict(x_test, lengths=len_test_seq)
I still would appreciate if a knowledgable person about HMM gives a more robust and clean guideline about doing sentiment analysis with HMM.
edited yesterday
answered yesterday
leo
1618
1618
add a comment |
add a comment |
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53238402%2fsentiment-analysis-using-hidden-markov-model%23new-answer', 'question_page');
}
);
Post as a guest
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
HMMs are used when you need to assign one label for each item in a sequence. In sentiment analysis, you assign a single label to the whole sequence (the review), so HMMs are not very appropriate for this task. Instead, you can turn to a Naive Bayes classifier (as in this blog post). Both HMMs and Naive Bayes can be learned either in a supervised setting or in an unsupervised setting (you specify the number of labels, and usually use the Expectation-Maximization algorithm to learn them without supervision).
– mcoav
yesterday
indeed. that was what I find out too, you gave label to each item in a sequence, but this is a project for my class and I must use HMM I can't use anything else.I know how HMM works in a abstract level, but I can't map my little knowledge of HMM to this problem. thanks for feedback
– leo
yesterday