How to get document_topics distribution of all of the document in gensim LDA?

I'm new to python and I need to construct a LDA project. After doing some preprocessing step, here is my code:

dictionary = Dictionary(docs)

corpus = [dictionary.doc2bow(doc) for doc in docs]



from gensim.models import LdaModel

num_topics = 10

chunksize = 2000

passes = 20

iterations = 400

eval_every = None

temp = dictionary[0]

id2word = dictionary.id2token

model = LdaModel(corpus=corpus, id2word=id2word, chunksize=chunksize, 

                       alpha='auto', eta='auto', 

                       random_state=42, 

                       iterations=iterations, num_topics=num_topics, 

                       passes=passes, eval_every=eval_every)

I want to get a topic distribution of docs, all of the document and get 10 probability of topic distribution, but when I use:

get_document_topics = model.get_document_topics(corpus)

print(get_document_topics)

The output only appear

<gensim.interfaces.TransformedCorpus object at 0x000001DF28708E10>

How do I get a topic distribution of docs?

edited Nov 15 '18 at 6:45

asked Nov 15 '18 at 6:23

wayne64001

475

add a comment |

I'm new to python and I need to construct a LDA project. After doing some preprocessing step, here is my code:

dictionary = Dictionary(docs)

corpus = [dictionary.doc2bow(doc) for doc in docs]



from gensim.models import LdaModel

num_topics = 10

chunksize = 2000

passes = 20

iterations = 400

eval_every = None

temp = dictionary[0]

id2word = dictionary.id2token

model = LdaModel(corpus=corpus, id2word=id2word, chunksize=chunksize, 

                       alpha='auto', eta='auto', 

                       random_state=42, 

                       iterations=iterations, num_topics=num_topics, 

                       passes=passes, eval_every=eval_every)

I want to get a topic distribution of docs, all of the document and get 10 probability of topic distribution, but when I use:

get_document_topics = model.get_document_topics(corpus)

print(get_document_topics)

The output only appear

<gensim.interfaces.TransformedCorpus object at 0x000001DF28708E10>

How do I get a topic distribution of docs?

edited Nov 15 '18 at 6:45

asked Nov 15 '18 at 6:23

wayne64001

475

add a comment |

I'm new to python and I need to construct a LDA project. After doing some preprocessing step, here is my code:

dictionary = Dictionary(docs)

corpus = [dictionary.doc2bow(doc) for doc in docs]



from gensim.models import LdaModel

num_topics = 10

chunksize = 2000

passes = 20

iterations = 400

eval_every = None

temp = dictionary[0]

id2word = dictionary.id2token

model = LdaModel(corpus=corpus, id2word=id2word, chunksize=chunksize, 

                       alpha='auto', eta='auto', 

                       random_state=42, 

                       iterations=iterations, num_topics=num_topics, 

                       passes=passes, eval_every=eval_every)

I want to get a topic distribution of docs, all of the document and get 10 probability of topic distribution, but when I use:

get_document_topics = model.get_document_topics(corpus)

print(get_document_topics)

The output only appear

<gensim.interfaces.TransformedCorpus object at 0x000001DF28708E10>

How do I get a topic distribution of docs?

edited Nov 15 '18 at 6:45

asked Nov 15 '18 at 6:23

wayne64001

475

I'm new to python and I need to construct a LDA project. After doing some preprocessing step, here is my code:

dictionary = Dictionary(docs)

corpus = [dictionary.doc2bow(doc) for doc in docs]



from gensim.models import LdaModel

num_topics = 10

chunksize = 2000

passes = 20

iterations = 400

eval_every = None

temp = dictionary[0]

id2word = dictionary.id2token

model = LdaModel(corpus=corpus, id2word=id2word, chunksize=chunksize, 

                       alpha='auto', eta='auto', 

                       random_state=42, 

                       iterations=iterations, num_topics=num_topics, 

                       passes=passes, eval_every=eval_every)

I want to get a topic distribution of docs, all of the document and get 10 probability of topic distribution, but when I use:

get_document_topics = model.get_document_topics(corpus)

print(get_document_topics)

The output only appear

<gensim.interfaces.TransformedCorpus object at 0x000001DF28708E10>

How do I get a topic distribution of docs?

python-3.x gensim lda topic-modeling probability-distribution

edited Nov 15 '18 at 6:45

asked Nov 15 '18 at 6:23

wayne64001

475

edited Nov 15 '18 at 6:45

asked Nov 15 '18 at 6:23

wayne64001

475

edited Nov 15 '18 at 6:45

asked Nov 15 '18 at 6:23

wayne64001

475

asked Nov 15 '18 at 6:23

wayne64001

475

asked Nov 15 '18 at 6:23

wayne64001

475

add a comment |

1 Answer
1

active

oldest

votes

The function get_document_topics takes an input of a single document in BOW format. You're calling it on the full corpus (an array of documents) so it returns an iterable object with the scores for each document.

You have a few options. If you just want one document, run it on the document you want the values for:

get_document_topics = model.get_document_topics(corpus[0])

or do the following to get an array of scores for all the documents:

get_document_topics = [model.get_document_topics(item) for item in corpus]

Or directly access each object from your original code:

get_document_topics = model.get_document_topics(corpus)

print(get_document_topics[0])

answered Nov 15 '18 at 8:41

Andrew McDowell

1,9161416

Thanks! Is it posible to get a topic distribution about a docs not a single document? I want to check out the importance of the 10 topics in the corpus.

– wayne64001
Nov 15 '18 at 9:29

I'm not sure exactly what you're looking for. LDA works by figuring out how important a topic is for a document, relative to the whole corpus. If you want to see what it thinks of as a topic, use model.show_topics(). According to the gensim documentation at radimrehurek.com/gensim/models/… : "Unlike LSA, there is no natural ordering between the topics in LDA. The returned topics subset of all topics is therefore arbitrary and may change between two LDA training runs."

– Andrew McDowell
Nov 15 '18 at 10:28

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53313575%2fhow-to-get-document-topics-distribution-of-all-of-the-document-in-gensim-lda%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

You have a few options. If you just want one document, run it on the document you want the values for:

get_document_topics = model.get_document_topics(corpus[0])

or do the following to get an array of scores for all the documents:

get_document_topics = [model.get_document_topics(item) for item in corpus]

Or directly access each object from your original code:

get_document_topics = model.get_document_topics(corpus)

print(get_document_topics[0])

answered Nov 15 '18 at 8:41

Andrew McDowell

1,9161416

Thanks! Is it posible to get a topic distribution about a docs not a single document? I want to check out the importance of the 10 topics in the corpus.

– wayne64001
Nov 15 '18 at 9:29

I'm not sure exactly what you're looking for. LDA works by figuring out how important a topic is for a document, relative to the whole corpus. If you want to see what it thinks of as a topic, use model.show_topics(). According to the gensim documentation at radimrehurek.com/gensim/models/… : "Unlike LSA, there is no natural ordering between the topics in LDA. The returned topics subset of all topics is therefore arbitrary and may change between two LDA training runs."

– Andrew McDowell
Nov 15 '18 at 10:28

add a comment |

You have a few options. If you just want one document, run it on the document you want the values for:

get_document_topics = model.get_document_topics(corpus[0])

or do the following to get an array of scores for all the documents:

get_document_topics = [model.get_document_topics(item) for item in corpus]

Or directly access each object from your original code:

get_document_topics = model.get_document_topics(corpus)

print(get_document_topics[0])

answered Nov 15 '18 at 8:41

Andrew McDowell

1,9161416

Thanks! Is it posible to get a topic distribution about a docs not a single document? I want to check out the importance of the 10 topics in the corpus.

– wayne64001
Nov 15 '18 at 9:29

I'm not sure exactly what you're looking for. LDA works by figuring out how important a topic is for a document, relative to the whole corpus. If you want to see what it thinks of as a topic, use model.show_topics(). According to the gensim documentation at radimrehurek.com/gensim/models/… : "Unlike LSA, there is no natural ordering between the topics in LDA. The returned topics subset of all topics is therefore arbitrary and may change between two LDA training runs."

– Andrew McDowell
Nov 15 '18 at 10:28

add a comment |

You have a few options. If you just want one document, run it on the document you want the values for:

get_document_topics = model.get_document_topics(corpus[0])

or do the following to get an array of scores for all the documents:

get_document_topics = [model.get_document_topics(item) for item in corpus]

Or directly access each object from your original code:

get_document_topics = model.get_document_topics(corpus)

print(get_document_topics[0])

answered Nov 15 '18 at 8:41

Andrew McDowell

1,9161416

You have a few options. If you just want one document, run it on the document you want the values for:

get_document_topics = model.get_document_topics(corpus[0])

or do the following to get an array of scores for all the documents:

get_document_topics = [model.get_document_topics(item) for item in corpus]

Or directly access each object from your original code:

get_document_topics = model.get_document_topics(corpus)

print(get_document_topics[0])

answered Nov 15 '18 at 8:41

Andrew McDowell

1,9161416

answered Nov 15 '18 at 8:41

Andrew McDowell

1,9161416

answered Nov 15 '18 at 8:41

Andrew McDowell

1,9161416

answered Nov 15 '18 at 8:41

Andrew McDowell

1,9161416

Thanks! Is it posible to get a topic distribution about a docs not a single document? I want to check out the importance of the 10 topics in the corpus.

– wayne64001
Nov 15 '18 at 9:29

I'm not sure exactly what you're looking for. LDA works by figuring out how important a topic is for a document, relative to the whole corpus. If you want to see what it thinks of as a topic, use model.show_topics(). According to the gensim documentation at radimrehurek.com/gensim/models/… : "Unlike LSA, there is no natural ordering between the topics in LDA. The returned topics subset of all topics is therefore arbitrary and may change between two LDA training runs."

– Andrew McDowell
Nov 15 '18 at 10:28

add a comment |

Thanks! Is it posible to get a topic distribution about a docs not a single document? I want to check out the importance of the 10 topics in the corpus.

– wayne64001
Nov 15 '18 at 9:29

I'm not sure exactly what you're looking for. LDA works by figuring out how important a topic is for a document, relative to the whole corpus. If you want to see what it thinks of as a topic, use model.show_topics(). According to the gensim documentation at radimrehurek.com/gensim/models/… : "Unlike LSA, there is no natural ordering between the topics in LDA. The returned topics subset of all topics is therefore arbitrary and may change between two LDA training runs."

– Andrew McDowell
Nov 15 '18 at 10:28

Thanks! Is it posible to get a topic distribution about a docs not a single document? I want to check out the importance of the 10 topics in the corpus.

– wayne64001
Nov 15 '18 at 9:29

I'm not sure exactly what you're looking for. LDA works by figuring out how important a topic is for a document, relative to the whole corpus. If you want to see what it thinks of as a topic, use model.show_topics(). According to the gensim documentation at radimrehurek.com/gensim/models/… : "Unlike LSA, there is no natural ordering between the topics in LDA. The returned topics subset of all topics is therefore arbitrary and may change between two LDA training runs."

– Andrew McDowell
Nov 15 '18 at 10:28

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

qicQ1TrRouA8IvzsCXt3aObSkbn6

搜尋此網誌

Vfrdtyky