Text Similarity - Cosine - Control
I would like to ask you, if anybody could check my code, because it was behaving weird - not working, giving me errors to suddenly working without changing anything - the code will be at the bottom.
Background: So my goal is to calculate text similarity [cosine, for now] between annual statements given by several countries at the UN General Assembly. More specifically find similarity between statement x and statement y in given year and do it for all 45 years. So I can make a graph for its evolution.
How I went about it: So [im novice] I decided to do the work in several steps - finding the similarity of statements of country A to country B first, and then re-doing the work for other countries (country A stays, everything is to country A).
So I filtered statements for Country A, arranged by year. Did text-preprocessing (tokenization, to lower, stopwords, lemenization, bag-of-words). And then I made a TF-IDF matrix from it - named: text.tokens.tfidf
I did the same process for Country B, and got text.tokensChina.tfidf - just replacing all text.tokens to text.tokensChina on new paper. So each matrix contains tf-idf of annual statements from 1971 - 2005, where Rows = documents (years) and columns = terms.
Calculating cosine similarity: So I decided to use Text2Vec as is described here - however, I did not define common space and project documents to it - dunno if it's crucial. And then decided to text two functionssim2 and psim2 since I did not know the difference in parallel.
What was wrong at the start: When first running the functions, I was getting an error, probably telling me, that my lengths of columns in the two TF-IDF matrixes are not matched:
ncol(x) == ncol(y) is not TRUE
However, re-running the code for all my steps and then trying again, it worked, but I did not change anything ...
Results: Result for the function sim2 is weird table [1:45, 1:45]. Clearly not what I wanted - one column with the similarity between the speech of Country A and country B in given year.
Result for the function psim2 is better - one column with the results [not sure, how right they are though].
Technical questions: Using Psim2 is what I wanna - Not I see that sim2 created something like correlation heat map, my bad. But why is the Psim2 function working, even when the length of columns is different (picture)? Also, did I not do anything wrong, especially when I did not create a common space?
Code, picture:
# *** Text Pre-Processing with Quanteda ***
# 1. Tokenization
text.tokens <- tokens(docs$text, what = 'word',
remove_numbers = TRUE,
remove_punct = TRUE,
remove_symbols = TRUE,
remove_hyphens = TRUE)
# 2. Transform words to lower case
text.tokens <- tokens_tolower(text.tokens)
# 3. Removing stop-words (Using quanteda's built-in stopwords list)
text.tokens <- tokens_select(text.tokens, stopwords(),
selection = 'remove')
# 4. Perform stemming on the tokens.
text.tokens <- tokens_wordstem(text.tokens, language = 'english')
# 5. Create bag-of-words model / document feature(frequance)
text.tokens.dfm <- dfm(text.tokens, tolower = FALSE)
# 6. Transform to a matrix to work with and inspect
text.tokens.matrix <- as.matrix(text.tokens.dfm)
dim(text.tokens.matrix)
# *** Doing TF-IDF ***
# Defining Function for calculating relative term frequency (TF)
term.frequency <- function(row) {
row / sum(row)
}
# Defining Function for calculating inverse document frequency (IDF)
inverse.doc.freq <- function(col) {
corpus.size <- length(col)
doc.count <- length(which(col > 0))
log10(corpus.size / doc.count)
}
# Defining function for calculating TD-IDF
tf.idf <- function(tf, idf) {
tf * idf
}
# 1. First step, normalize all documents via TF.
text.tokens.df <- apply(text.tokens.matrix, 1, term.frequency)
dim(text.tokens.df)
# 2. Second step, calculate the IDF vector
text.tokens.idf <- apply(text.tokens.matrix, 2, inverse.doc.freq)
str(text.tokens.idf)
# 3. Lastly, calculate TF-IDF for our corpus
# Apply function on columns, because matrix is transposed from TF function
text.tokens.tfidf <- apply(text.tokens.df, 2, tf.idf, idf = text.tokens.idf)
dim(text.tokens.tfidf)
# Now, transpose the matrix back
text.tokens.tfidf <- t(text.tokens.tfidf)
dim(text.tokens.tfidf)
# Cosine similarity using Text2Vec
similarity.sim2 <- sim2(text.tokensChina.tfidf, text.tokensChina.tfidf, method = "cosine", norm = "none")
similarity.psim2 <- psim2(text.tokensChina.tfidf, text.tokensChina.tfidf, method = "cosine", norm = "none")
similarity.psim2 <- as.data.frame(similarity.psim2)
Global Enviroment picture:
Picture of my screen with Global Environment + Psim2 Results
r cosine-similarity linguistics quanteda text2vec
add a comment |
I would like to ask you, if anybody could check my code, because it was behaving weird - not working, giving me errors to suddenly working without changing anything - the code will be at the bottom.
Background: So my goal is to calculate text similarity [cosine, for now] between annual statements given by several countries at the UN General Assembly. More specifically find similarity between statement x and statement y in given year and do it for all 45 years. So I can make a graph for its evolution.
How I went about it: So [im novice] I decided to do the work in several steps - finding the similarity of statements of country A to country B first, and then re-doing the work for other countries (country A stays, everything is to country A).
So I filtered statements for Country A, arranged by year. Did text-preprocessing (tokenization, to lower, stopwords, lemenization, bag-of-words). And then I made a TF-IDF matrix from it - named: text.tokens.tfidf
I did the same process for Country B, and got text.tokensChina.tfidf - just replacing all text.tokens to text.tokensChina on new paper. So each matrix contains tf-idf of annual statements from 1971 - 2005, where Rows = documents (years) and columns = terms.
Calculating cosine similarity: So I decided to use Text2Vec as is described here - however, I did not define common space and project documents to it - dunno if it's crucial. And then decided to text two functionssim2 and psim2 since I did not know the difference in parallel.
What was wrong at the start: When first running the functions, I was getting an error, probably telling me, that my lengths of columns in the two TF-IDF matrixes are not matched:
ncol(x) == ncol(y) is not TRUE
However, re-running the code for all my steps and then trying again, it worked, but I did not change anything ...
Results: Result for the function sim2 is weird table [1:45, 1:45]. Clearly not what I wanted - one column with the similarity between the speech of Country A and country B in given year.
Result for the function psim2 is better - one column with the results [not sure, how right they are though].
Technical questions: Using Psim2 is what I wanna - Not I see that sim2 created something like correlation heat map, my bad. But why is the Psim2 function working, even when the length of columns is different (picture)? Also, did I not do anything wrong, especially when I did not create a common space?
Code, picture:
# *** Text Pre-Processing with Quanteda ***
# 1. Tokenization
text.tokens <- tokens(docs$text, what = 'word',
remove_numbers = TRUE,
remove_punct = TRUE,
remove_symbols = TRUE,
remove_hyphens = TRUE)
# 2. Transform words to lower case
text.tokens <- tokens_tolower(text.tokens)
# 3. Removing stop-words (Using quanteda's built-in stopwords list)
text.tokens <- tokens_select(text.tokens, stopwords(),
selection = 'remove')
# 4. Perform stemming on the tokens.
text.tokens <- tokens_wordstem(text.tokens, language = 'english')
# 5. Create bag-of-words model / document feature(frequance)
text.tokens.dfm <- dfm(text.tokens, tolower = FALSE)
# 6. Transform to a matrix to work with and inspect
text.tokens.matrix <- as.matrix(text.tokens.dfm)
dim(text.tokens.matrix)
# *** Doing TF-IDF ***
# Defining Function for calculating relative term frequency (TF)
term.frequency <- function(row) {
row / sum(row)
}
# Defining Function for calculating inverse document frequency (IDF)
inverse.doc.freq <- function(col) {
corpus.size <- length(col)
doc.count <- length(which(col > 0))
log10(corpus.size / doc.count)
}
# Defining function for calculating TD-IDF
tf.idf <- function(tf, idf) {
tf * idf
}
# 1. First step, normalize all documents via TF.
text.tokens.df <- apply(text.tokens.matrix, 1, term.frequency)
dim(text.tokens.df)
# 2. Second step, calculate the IDF vector
text.tokens.idf <- apply(text.tokens.matrix, 2, inverse.doc.freq)
str(text.tokens.idf)
# 3. Lastly, calculate TF-IDF for our corpus
# Apply function on columns, because matrix is transposed from TF function
text.tokens.tfidf <- apply(text.tokens.df, 2, tf.idf, idf = text.tokens.idf)
dim(text.tokens.tfidf)
# Now, transpose the matrix back
text.tokens.tfidf <- t(text.tokens.tfidf)
dim(text.tokens.tfidf)
# Cosine similarity using Text2Vec
similarity.sim2 <- sim2(text.tokensChina.tfidf, text.tokensChina.tfidf, method = "cosine", norm = "none")
similarity.psim2 <- psim2(text.tokensChina.tfidf, text.tokensChina.tfidf, method = "cosine", norm = "none")
similarity.psim2 <- as.data.frame(similarity.psim2)
Global Enviroment picture:
Picture of my screen with Global Environment + Psim2 Results
r cosine-similarity linguistics quanteda text2vec
This is not really the kind of question for Stack Overflow. You might do better to post this on Code Review
– G5W
Nov 15 '18 at 19:16
Oh thank you, was not aware of the code review :/
– Kamil Liskutin
Nov 15 '18 at 19:34
Seequanteda::dfm_tfidf()
.
– Ken Benoit
Nov 15 '18 at 20:02
Yeah, I know you can skin most of the preprocessing code, I just wanted to have it step by them cuz I am new to it so I remember it. Afterward will be using that.
– Kamil Liskutin
Nov 15 '18 at 20:40
add a comment |
I would like to ask you, if anybody could check my code, because it was behaving weird - not working, giving me errors to suddenly working without changing anything - the code will be at the bottom.
Background: So my goal is to calculate text similarity [cosine, for now] between annual statements given by several countries at the UN General Assembly. More specifically find similarity between statement x and statement y in given year and do it for all 45 years. So I can make a graph for its evolution.
How I went about it: So [im novice] I decided to do the work in several steps - finding the similarity of statements of country A to country B first, and then re-doing the work for other countries (country A stays, everything is to country A).
So I filtered statements for Country A, arranged by year. Did text-preprocessing (tokenization, to lower, stopwords, lemenization, bag-of-words). And then I made a TF-IDF matrix from it - named: text.tokens.tfidf
I did the same process for Country B, and got text.tokensChina.tfidf - just replacing all text.tokens to text.tokensChina on new paper. So each matrix contains tf-idf of annual statements from 1971 - 2005, where Rows = documents (years) and columns = terms.
Calculating cosine similarity: So I decided to use Text2Vec as is described here - however, I did not define common space and project documents to it - dunno if it's crucial. And then decided to text two functionssim2 and psim2 since I did not know the difference in parallel.
What was wrong at the start: When first running the functions, I was getting an error, probably telling me, that my lengths of columns in the two TF-IDF matrixes are not matched:
ncol(x) == ncol(y) is not TRUE
However, re-running the code for all my steps and then trying again, it worked, but I did not change anything ...
Results: Result for the function sim2 is weird table [1:45, 1:45]. Clearly not what I wanted - one column with the similarity between the speech of Country A and country B in given year.
Result for the function psim2 is better - one column with the results [not sure, how right they are though].
Technical questions: Using Psim2 is what I wanna - Not I see that sim2 created something like correlation heat map, my bad. But why is the Psim2 function working, even when the length of columns is different (picture)? Also, did I not do anything wrong, especially when I did not create a common space?
Code, picture:
# *** Text Pre-Processing with Quanteda ***
# 1. Tokenization
text.tokens <- tokens(docs$text, what = 'word',
remove_numbers = TRUE,
remove_punct = TRUE,
remove_symbols = TRUE,
remove_hyphens = TRUE)
# 2. Transform words to lower case
text.tokens <- tokens_tolower(text.tokens)
# 3. Removing stop-words (Using quanteda's built-in stopwords list)
text.tokens <- tokens_select(text.tokens, stopwords(),
selection = 'remove')
# 4. Perform stemming on the tokens.
text.tokens <- tokens_wordstem(text.tokens, language = 'english')
# 5. Create bag-of-words model / document feature(frequance)
text.tokens.dfm <- dfm(text.tokens, tolower = FALSE)
# 6. Transform to a matrix to work with and inspect
text.tokens.matrix <- as.matrix(text.tokens.dfm)
dim(text.tokens.matrix)
# *** Doing TF-IDF ***
# Defining Function for calculating relative term frequency (TF)
term.frequency <- function(row) {
row / sum(row)
}
# Defining Function for calculating inverse document frequency (IDF)
inverse.doc.freq <- function(col) {
corpus.size <- length(col)
doc.count <- length(which(col > 0))
log10(corpus.size / doc.count)
}
# Defining function for calculating TD-IDF
tf.idf <- function(tf, idf) {
tf * idf
}
# 1. First step, normalize all documents via TF.
text.tokens.df <- apply(text.tokens.matrix, 1, term.frequency)
dim(text.tokens.df)
# 2. Second step, calculate the IDF vector
text.tokens.idf <- apply(text.tokens.matrix, 2, inverse.doc.freq)
str(text.tokens.idf)
# 3. Lastly, calculate TF-IDF for our corpus
# Apply function on columns, because matrix is transposed from TF function
text.tokens.tfidf <- apply(text.tokens.df, 2, tf.idf, idf = text.tokens.idf)
dim(text.tokens.tfidf)
# Now, transpose the matrix back
text.tokens.tfidf <- t(text.tokens.tfidf)
dim(text.tokens.tfidf)
# Cosine similarity using Text2Vec
similarity.sim2 <- sim2(text.tokensChina.tfidf, text.tokensChina.tfidf, method = "cosine", norm = "none")
similarity.psim2 <- psim2(text.tokensChina.tfidf, text.tokensChina.tfidf, method = "cosine", norm = "none")
similarity.psim2 <- as.data.frame(similarity.psim2)
Global Enviroment picture:
Picture of my screen with Global Environment + Psim2 Results
r cosine-similarity linguistics quanteda text2vec
I would like to ask you, if anybody could check my code, because it was behaving weird - not working, giving me errors to suddenly working without changing anything - the code will be at the bottom.
Background: So my goal is to calculate text similarity [cosine, for now] between annual statements given by several countries at the UN General Assembly. More specifically find similarity between statement x and statement y in given year and do it for all 45 years. So I can make a graph for its evolution.
How I went about it: So [im novice] I decided to do the work in several steps - finding the similarity of statements of country A to country B first, and then re-doing the work for other countries (country A stays, everything is to country A).
So I filtered statements for Country A, arranged by year. Did text-preprocessing (tokenization, to lower, stopwords, lemenization, bag-of-words). And then I made a TF-IDF matrix from it - named: text.tokens.tfidf
I did the same process for Country B, and got text.tokensChina.tfidf - just replacing all text.tokens to text.tokensChina on new paper. So each matrix contains tf-idf of annual statements from 1971 - 2005, where Rows = documents (years) and columns = terms.
Calculating cosine similarity: So I decided to use Text2Vec as is described here - however, I did not define common space and project documents to it - dunno if it's crucial. And then decided to text two functionssim2 and psim2 since I did not know the difference in parallel.
What was wrong at the start: When first running the functions, I was getting an error, probably telling me, that my lengths of columns in the two TF-IDF matrixes are not matched:
ncol(x) == ncol(y) is not TRUE
However, re-running the code for all my steps and then trying again, it worked, but I did not change anything ...
Results: Result for the function sim2 is weird table [1:45, 1:45]. Clearly not what I wanted - one column with the similarity between the speech of Country A and country B in given year.
Result for the function psim2 is better - one column with the results [not sure, how right they are though].
Technical questions: Using Psim2 is what I wanna - Not I see that sim2 created something like correlation heat map, my bad. But why is the Psim2 function working, even when the length of columns is different (picture)? Also, did I not do anything wrong, especially when I did not create a common space?
Code, picture:
# *** Text Pre-Processing with Quanteda ***
# 1. Tokenization
text.tokens <- tokens(docs$text, what = 'word',
remove_numbers = TRUE,
remove_punct = TRUE,
remove_symbols = TRUE,
remove_hyphens = TRUE)
# 2. Transform words to lower case
text.tokens <- tokens_tolower(text.tokens)
# 3. Removing stop-words (Using quanteda's built-in stopwords list)
text.tokens <- tokens_select(text.tokens, stopwords(),
selection = 'remove')
# 4. Perform stemming on the tokens.
text.tokens <- tokens_wordstem(text.tokens, language = 'english')
# 5. Create bag-of-words model / document feature(frequance)
text.tokens.dfm <- dfm(text.tokens, tolower = FALSE)
# 6. Transform to a matrix to work with and inspect
text.tokens.matrix <- as.matrix(text.tokens.dfm)
dim(text.tokens.matrix)
# *** Doing TF-IDF ***
# Defining Function for calculating relative term frequency (TF)
term.frequency <- function(row) {
row / sum(row)
}
# Defining Function for calculating inverse document frequency (IDF)
inverse.doc.freq <- function(col) {
corpus.size <- length(col)
doc.count <- length(which(col > 0))
log10(corpus.size / doc.count)
}
# Defining function for calculating TD-IDF
tf.idf <- function(tf, idf) {
tf * idf
}
# 1. First step, normalize all documents via TF.
text.tokens.df <- apply(text.tokens.matrix, 1, term.frequency)
dim(text.tokens.df)
# 2. Second step, calculate the IDF vector
text.tokens.idf <- apply(text.tokens.matrix, 2, inverse.doc.freq)
str(text.tokens.idf)
# 3. Lastly, calculate TF-IDF for our corpus
# Apply function on columns, because matrix is transposed from TF function
text.tokens.tfidf <- apply(text.tokens.df, 2, tf.idf, idf = text.tokens.idf)
dim(text.tokens.tfidf)
# Now, transpose the matrix back
text.tokens.tfidf <- t(text.tokens.tfidf)
dim(text.tokens.tfidf)
# Cosine similarity using Text2Vec
similarity.sim2 <- sim2(text.tokensChina.tfidf, text.tokensChina.tfidf, method = "cosine", norm = "none")
similarity.psim2 <- psim2(text.tokensChina.tfidf, text.tokensChina.tfidf, method = "cosine", norm = "none")
similarity.psim2 <- as.data.frame(similarity.psim2)
Global Enviroment picture:
Picture of my screen with Global Environment + Psim2 Results
r cosine-similarity linguistics quanteda text2vec
r cosine-similarity linguistics quanteda text2vec
asked Nov 15 '18 at 18:43
Kamil LiskutinKamil Liskutin
13
13
This is not really the kind of question for Stack Overflow. You might do better to post this on Code Review
– G5W
Nov 15 '18 at 19:16
Oh thank you, was not aware of the code review :/
– Kamil Liskutin
Nov 15 '18 at 19:34
Seequanteda::dfm_tfidf()
.
– Ken Benoit
Nov 15 '18 at 20:02
Yeah, I know you can skin most of the preprocessing code, I just wanted to have it step by them cuz I am new to it so I remember it. Afterward will be using that.
– Kamil Liskutin
Nov 15 '18 at 20:40
add a comment |
This is not really the kind of question for Stack Overflow. You might do better to post this on Code Review
– G5W
Nov 15 '18 at 19:16
Oh thank you, was not aware of the code review :/
– Kamil Liskutin
Nov 15 '18 at 19:34
Seequanteda::dfm_tfidf()
.
– Ken Benoit
Nov 15 '18 at 20:02
Yeah, I know you can skin most of the preprocessing code, I just wanted to have it step by them cuz I am new to it so I remember it. Afterward will be using that.
– Kamil Liskutin
Nov 15 '18 at 20:40
This is not really the kind of question for Stack Overflow. You might do better to post this on Code Review
– G5W
Nov 15 '18 at 19:16
This is not really the kind of question for Stack Overflow. You might do better to post this on Code Review
– G5W
Nov 15 '18 at 19:16
Oh thank you, was not aware of the code review :/
– Kamil Liskutin
Nov 15 '18 at 19:34
Oh thank you, was not aware of the code review :/
– Kamil Liskutin
Nov 15 '18 at 19:34
See
quanteda::dfm_tfidf()
.– Ken Benoit
Nov 15 '18 at 20:02
See
quanteda::dfm_tfidf()
.– Ken Benoit
Nov 15 '18 at 20:02
Yeah, I know you can skin most of the preprocessing code, I just wanted to have it step by them cuz I am new to it so I remember it. Afterward will be using that.
– Kamil Liskutin
Nov 15 '18 at 20:40
Yeah, I know you can skin most of the preprocessing code, I just wanted to have it step by them cuz I am new to it so I remember it. Afterward will be using that.
– Kamil Liskutin
Nov 15 '18 at 20:40
add a comment |
1 Answer
1
active
oldest
votes
Well, the outcome is, the whole thing is complete BS. Did not compare things in one vector space. Not to mention, the best method is to use doc2vec but I tried to figure it out for several days and got nowhere, unfortunately.
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53326014%2ftext-similarity-cosine-control%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
Well, the outcome is, the whole thing is complete BS. Did not compare things in one vector space. Not to mention, the best method is to use doc2vec but I tried to figure it out for several days and got nowhere, unfortunately.
add a comment |
Well, the outcome is, the whole thing is complete BS. Did not compare things in one vector space. Not to mention, the best method is to use doc2vec but I tried to figure it out for several days and got nowhere, unfortunately.
add a comment |
Well, the outcome is, the whole thing is complete BS. Did not compare things in one vector space. Not to mention, the best method is to use doc2vec but I tried to figure it out for several days and got nowhere, unfortunately.
Well, the outcome is, the whole thing is complete BS. Did not compare things in one vector space. Not to mention, the best method is to use doc2vec but I tried to figure it out for several days and got nowhere, unfortunately.
answered Nov 18 '18 at 1:01
Kamil LiskutinKamil Liskutin
13
13
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53326014%2ftext-similarity-cosine-control%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
This is not really the kind of question for Stack Overflow. You might do better to post this on Code Review
– G5W
Nov 15 '18 at 19:16
Oh thank you, was not aware of the code review :/
– Kamil Liskutin
Nov 15 '18 at 19:34
See
quanteda::dfm_tfidf()
.– Ken Benoit
Nov 15 '18 at 20:02
Yeah, I know you can skin most of the preprocessing code, I just wanted to have it step by them cuz I am new to it so I remember it. Afterward will be using that.
– Kamil Liskutin
Nov 15 '18 at 20:40