How to get document_topics distribution of all of the document in gensim LDA?
I'm new to python and I need to construct a LDA project. After doing some preprocessing step, here is my code:
dictionary = Dictionary(docs)
corpus = [dictionary.doc2bow(doc) for doc in docs]
from gensim.models import LdaModel
num_topics = 10
chunksize = 2000
passes = 20
iterations = 400
eval_every = None
temp = dictionary[0]
id2word = dictionary.id2token
model = LdaModel(corpus=corpus, id2word=id2word, chunksize=chunksize,
alpha='auto', eta='auto',
random_state=42,
iterations=iterations, num_topics=num_topics,
passes=passes, eval_every=eval_every)
I want to get a topic distribution of docs, all of the document and get 10 probability of topic distribution, but when I use:
get_document_topics = model.get_document_topics(corpus)
print(get_document_topics)
The output only appear
<gensim.interfaces.TransformedCorpus object at 0x000001DF28708E10>
How do I get a topic distribution of docs?
python-3.x gensim lda topic-modeling probability-distribution
add a comment |
I'm new to python and I need to construct a LDA project. After doing some preprocessing step, here is my code:
dictionary = Dictionary(docs)
corpus = [dictionary.doc2bow(doc) for doc in docs]
from gensim.models import LdaModel
num_topics = 10
chunksize = 2000
passes = 20
iterations = 400
eval_every = None
temp = dictionary[0]
id2word = dictionary.id2token
model = LdaModel(corpus=corpus, id2word=id2word, chunksize=chunksize,
alpha='auto', eta='auto',
random_state=42,
iterations=iterations, num_topics=num_topics,
passes=passes, eval_every=eval_every)
I want to get a topic distribution of docs, all of the document and get 10 probability of topic distribution, but when I use:
get_document_topics = model.get_document_topics(corpus)
print(get_document_topics)
The output only appear
<gensim.interfaces.TransformedCorpus object at 0x000001DF28708E10>
How do I get a topic distribution of docs?
python-3.x gensim lda topic-modeling probability-distribution
add a comment |
I'm new to python and I need to construct a LDA project. After doing some preprocessing step, here is my code:
dictionary = Dictionary(docs)
corpus = [dictionary.doc2bow(doc) for doc in docs]
from gensim.models import LdaModel
num_topics = 10
chunksize = 2000
passes = 20
iterations = 400
eval_every = None
temp = dictionary[0]
id2word = dictionary.id2token
model = LdaModel(corpus=corpus, id2word=id2word, chunksize=chunksize,
alpha='auto', eta='auto',
random_state=42,
iterations=iterations, num_topics=num_topics,
passes=passes, eval_every=eval_every)
I want to get a topic distribution of docs, all of the document and get 10 probability of topic distribution, but when I use:
get_document_topics = model.get_document_topics(corpus)
print(get_document_topics)
The output only appear
<gensim.interfaces.TransformedCorpus object at 0x000001DF28708E10>
How do I get a topic distribution of docs?
python-3.x gensim lda topic-modeling probability-distribution
I'm new to python and I need to construct a LDA project. After doing some preprocessing step, here is my code:
dictionary = Dictionary(docs)
corpus = [dictionary.doc2bow(doc) for doc in docs]
from gensim.models import LdaModel
num_topics = 10
chunksize = 2000
passes = 20
iterations = 400
eval_every = None
temp = dictionary[0]
id2word = dictionary.id2token
model = LdaModel(corpus=corpus, id2word=id2word, chunksize=chunksize,
alpha='auto', eta='auto',
random_state=42,
iterations=iterations, num_topics=num_topics,
passes=passes, eval_every=eval_every)
I want to get a topic distribution of docs, all of the document and get 10 probability of topic distribution, but when I use:
get_document_topics = model.get_document_topics(corpus)
print(get_document_topics)
The output only appear
<gensim.interfaces.TransformedCorpus object at 0x000001DF28708E10>
How do I get a topic distribution of docs?
python-3.x gensim lda topic-modeling probability-distribution
python-3.x gensim lda topic-modeling probability-distribution
edited Nov 15 '18 at 6:45
wayne64001
asked Nov 15 '18 at 6:23
wayne64001wayne64001
475
475
add a comment |
add a comment |
1 Answer
1
active
oldest
votes
The function get_document_topics
takes an input of a single document in BOW format. You're calling it on the full corpus (an array of documents) so it returns an iterable object with the scores for each document.
You have a few options. If you just want one document, run it on the document you want the values for:
get_document_topics = model.get_document_topics(corpus[0])
or do the following to get an array of scores for all the documents:
get_document_topics = [model.get_document_topics(item) for item in corpus]
Or directly access each object from your original code:
get_document_topics = model.get_document_topics(corpus)
print(get_document_topics[0])
Thanks! Is it posible to get a topic distribution about a docs not a single document? I want to check out the importance of the 10 topics in the corpus.
– wayne64001
Nov 15 '18 at 9:29
I'm not sure exactly what you're looking for. LDA works by figuring out how important a topic is for a document, relative to the whole corpus. If you want to see what it thinks of as a topic, usemodel.show_topics()
. According to the gensim documentation at radimrehurek.com/gensim/models/… : "Unlike LSA, there is no natural ordering between the topics in LDA. The returned topics subset of all topics is therefore arbitrary and may change between two LDA training runs."
– Andrew McDowell
Nov 15 '18 at 10:28
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53313575%2fhow-to-get-document-topics-distribution-of-all-of-the-document-in-gensim-lda%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
The function get_document_topics
takes an input of a single document in BOW format. You're calling it on the full corpus (an array of documents) so it returns an iterable object with the scores for each document.
You have a few options. If you just want one document, run it on the document you want the values for:
get_document_topics = model.get_document_topics(corpus[0])
or do the following to get an array of scores for all the documents:
get_document_topics = [model.get_document_topics(item) for item in corpus]
Or directly access each object from your original code:
get_document_topics = model.get_document_topics(corpus)
print(get_document_topics[0])
Thanks! Is it posible to get a topic distribution about a docs not a single document? I want to check out the importance of the 10 topics in the corpus.
– wayne64001
Nov 15 '18 at 9:29
I'm not sure exactly what you're looking for. LDA works by figuring out how important a topic is for a document, relative to the whole corpus. If you want to see what it thinks of as a topic, usemodel.show_topics()
. According to the gensim documentation at radimrehurek.com/gensim/models/… : "Unlike LSA, there is no natural ordering between the topics in LDA. The returned topics subset of all topics is therefore arbitrary and may change between two LDA training runs."
– Andrew McDowell
Nov 15 '18 at 10:28
add a comment |
The function get_document_topics
takes an input of a single document in BOW format. You're calling it on the full corpus (an array of documents) so it returns an iterable object with the scores for each document.
You have a few options. If you just want one document, run it on the document you want the values for:
get_document_topics = model.get_document_topics(corpus[0])
or do the following to get an array of scores for all the documents:
get_document_topics = [model.get_document_topics(item) for item in corpus]
Or directly access each object from your original code:
get_document_topics = model.get_document_topics(corpus)
print(get_document_topics[0])
Thanks! Is it posible to get a topic distribution about a docs not a single document? I want to check out the importance of the 10 topics in the corpus.
– wayne64001
Nov 15 '18 at 9:29
I'm not sure exactly what you're looking for. LDA works by figuring out how important a topic is for a document, relative to the whole corpus. If you want to see what it thinks of as a topic, usemodel.show_topics()
. According to the gensim documentation at radimrehurek.com/gensim/models/… : "Unlike LSA, there is no natural ordering between the topics in LDA. The returned topics subset of all topics is therefore arbitrary and may change between two LDA training runs."
– Andrew McDowell
Nov 15 '18 at 10:28
add a comment |
The function get_document_topics
takes an input of a single document in BOW format. You're calling it on the full corpus (an array of documents) so it returns an iterable object with the scores for each document.
You have a few options. If you just want one document, run it on the document you want the values for:
get_document_topics = model.get_document_topics(corpus[0])
or do the following to get an array of scores for all the documents:
get_document_topics = [model.get_document_topics(item) for item in corpus]
Or directly access each object from your original code:
get_document_topics = model.get_document_topics(corpus)
print(get_document_topics[0])
The function get_document_topics
takes an input of a single document in BOW format. You're calling it on the full corpus (an array of documents) so it returns an iterable object with the scores for each document.
You have a few options. If you just want one document, run it on the document you want the values for:
get_document_topics = model.get_document_topics(corpus[0])
or do the following to get an array of scores for all the documents:
get_document_topics = [model.get_document_topics(item) for item in corpus]
Or directly access each object from your original code:
get_document_topics = model.get_document_topics(corpus)
print(get_document_topics[0])
answered Nov 15 '18 at 8:41
Andrew McDowellAndrew McDowell
1,9161416
1,9161416
Thanks! Is it posible to get a topic distribution about a docs not a single document? I want to check out the importance of the 10 topics in the corpus.
– wayne64001
Nov 15 '18 at 9:29
I'm not sure exactly what you're looking for. LDA works by figuring out how important a topic is for a document, relative to the whole corpus. If you want to see what it thinks of as a topic, usemodel.show_topics()
. According to the gensim documentation at radimrehurek.com/gensim/models/… : "Unlike LSA, there is no natural ordering between the topics in LDA. The returned topics subset of all topics is therefore arbitrary and may change between two LDA training runs."
– Andrew McDowell
Nov 15 '18 at 10:28
add a comment |
Thanks! Is it posible to get a topic distribution about a docs not a single document? I want to check out the importance of the 10 topics in the corpus.
– wayne64001
Nov 15 '18 at 9:29
I'm not sure exactly what you're looking for. LDA works by figuring out how important a topic is for a document, relative to the whole corpus. If you want to see what it thinks of as a topic, usemodel.show_topics()
. According to the gensim documentation at radimrehurek.com/gensim/models/… : "Unlike LSA, there is no natural ordering between the topics in LDA. The returned topics subset of all topics is therefore arbitrary and may change between two LDA training runs."
– Andrew McDowell
Nov 15 '18 at 10:28
Thanks! Is it posible to get a topic distribution about a docs not a single document? I want to check out the importance of the 10 topics in the corpus.
– wayne64001
Nov 15 '18 at 9:29
Thanks! Is it posible to get a topic distribution about a docs not a single document? I want to check out the importance of the 10 topics in the corpus.
– wayne64001
Nov 15 '18 at 9:29
I'm not sure exactly what you're looking for. LDA works by figuring out how important a topic is for a document, relative to the whole corpus. If you want to see what it thinks of as a topic, use
model.show_topics()
. According to the gensim documentation at radimrehurek.com/gensim/models/… : "Unlike LSA, there is no natural ordering between the topics in LDA. The returned topics subset of all topics is therefore arbitrary and may change between two LDA training runs."– Andrew McDowell
Nov 15 '18 at 10:28
I'm not sure exactly what you're looking for. LDA works by figuring out how important a topic is for a document, relative to the whole corpus. If you want to see what it thinks of as a topic, use
model.show_topics()
. According to the gensim documentation at radimrehurek.com/gensim/models/… : "Unlike LSA, there is no natural ordering between the topics in LDA. The returned topics subset of all topics is therefore arbitrary and may change between two LDA training runs."– Andrew McDowell
Nov 15 '18 at 10:28
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53313575%2fhow-to-get-document-topics-distribution-of-all-of-the-document-in-gensim-lda%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown