Text Similarity - Cosine - Control












-1















I would like to ask you, if anybody could check my code, because it was behaving weird - not working, giving me errors to suddenly working without changing anything - the code will be at the bottom.



Background: So my goal is to calculate text similarity [cosine, for now] between annual statements given by several countries at the UN General Assembly. More specifically find similarity between statement x and statement y in given year and do it for all 45 years. So I can make a graph for its evolution.



How I went about it: So [im novice] I decided to do the work in several steps - finding the similarity of statements of country A to country B first, and then re-doing the work for other countries (country A stays, everything is to country A).



So I filtered statements for Country A, arranged by year. Did text-preprocessing (tokenization, to lower, stopwords, lemenization, bag-of-words). And then I made a TF-IDF matrix from it - named: text.tokens.tfidf



I did the same process for Country B, and got text.tokensChina.tfidf - just replacing all text.tokens to text.tokensChina on new paper. So each matrix contains tf-idf of annual statements from 1971 - 2005, where Rows = documents (years) and columns = terms.



Calculating cosine similarity: So I decided to use Text2Vec as is described here - however, I did not define common space and project documents to it - dunno if it's crucial. And then decided to text two functionssim2 and psim2 since I did not know the difference in parallel.



What was wrong at the start: When first running the functions, I was getting an error, probably telling me, that my lengths of columns in the two TF-IDF matrixes are not matched:




ncol(x) == ncol(y) is not TRUE




However, re-running the code for all my steps and then trying again, it worked, but I did not change anything ...



Results: Result for the function sim2 is weird table [1:45, 1:45]. Clearly not what I wanted - one column with the similarity between the speech of Country A and country B in given year.



Result for the function psim2 is better - one column with the results [not sure, how right they are though].



Technical questions: Using Psim2 is what I wanna - Not I see that sim2 created something like correlation heat map, my bad. But why is the Psim2 function working, even when the length of columns is different (picture)? Also, did I not do anything wrong, especially when I did not create a common space?



Code, picture:



    # *** Text Pre-Processing with Quanteda *** 
# 1. Tokenization
text.tokens <- tokens(docs$text, what = 'word',
remove_numbers = TRUE,
remove_punct = TRUE,
remove_symbols = TRUE,
remove_hyphens = TRUE)

# 2. Transform words to lower case
text.tokens <- tokens_tolower(text.tokens)

# 3. Removing stop-words (Using quanteda's built-in stopwords list)
text.tokens <- tokens_select(text.tokens, stopwords(),
selection = 'remove')
# 4. Perform stemming on the tokens.
text.tokens <- tokens_wordstem(text.tokens, language = 'english')

# 5. Create bag-of-words model / document feature(frequance)
text.tokens.dfm <- dfm(text.tokens, tolower = FALSE)

# 6. Transform to a matrix to work with and inspect
text.tokens.matrix <- as.matrix(text.tokens.dfm)
dim(text.tokens.matrix)

# *** Doing TF-IDF ***
# Defining Function for calculating relative term frequency (TF)
term.frequency <- function(row) {
row / sum(row)
}
# Defining Function for calculating inverse document frequency (IDF)
inverse.doc.freq <- function(col) {
corpus.size <- length(col)
doc.count <- length(which(col > 0))

log10(corpus.size / doc.count)
}
# Defining function for calculating TD-IDF
tf.idf <- function(tf, idf) {
tf * idf
}

# 1. First step, normalize all documents via TF.
text.tokens.df <- apply(text.tokens.matrix, 1, term.frequency)
dim(text.tokens.df)

# 2. Second step, calculate the IDF vector
text.tokens.idf <- apply(text.tokens.matrix, 2, inverse.doc.freq)
str(text.tokens.idf)

# 3. Lastly, calculate TF-IDF for our corpus
# Apply function on columns, because matrix is transposed from TF function
text.tokens.tfidf <- apply(text.tokens.df, 2, tf.idf, idf = text.tokens.idf)
dim(text.tokens.tfidf)

# Now, transpose the matrix back
text.tokens.tfidf <- t(text.tokens.tfidf)
dim(text.tokens.tfidf)

# Cosine similarity using Text2Vec
similarity.sim2 <- sim2(text.tokensChina.tfidf, text.tokensChina.tfidf, method = "cosine", norm = "none")

similarity.psim2 <- psim2(text.tokensChina.tfidf, text.tokensChina.tfidf, method = "cosine", norm = "none")
similarity.psim2 <- as.data.frame(similarity.psim2)


Global Enviroment picture:
Picture of my screen with Global Environment + Psim2 Results










share|improve this question























  • This is not really the kind of question for Stack Overflow. You might do better to post this on Code Review

    – G5W
    Nov 15 '18 at 19:16











  • Oh thank you, was not aware of the code review :/

    – Kamil Liskutin
    Nov 15 '18 at 19:34











  • See quanteda::dfm_tfidf().

    – Ken Benoit
    Nov 15 '18 at 20:02











  • Yeah, I know you can skin most of the preprocessing code, I just wanted to have it step by them cuz I am new to it so I remember it. Afterward will be using that.

    – Kamil Liskutin
    Nov 15 '18 at 20:40
















-1















I would like to ask you, if anybody could check my code, because it was behaving weird - not working, giving me errors to suddenly working without changing anything - the code will be at the bottom.



Background: So my goal is to calculate text similarity [cosine, for now] between annual statements given by several countries at the UN General Assembly. More specifically find similarity between statement x and statement y in given year and do it for all 45 years. So I can make a graph for its evolution.



How I went about it: So [im novice] I decided to do the work in several steps - finding the similarity of statements of country A to country B first, and then re-doing the work for other countries (country A stays, everything is to country A).



So I filtered statements for Country A, arranged by year. Did text-preprocessing (tokenization, to lower, stopwords, lemenization, bag-of-words). And then I made a TF-IDF matrix from it - named: text.tokens.tfidf



I did the same process for Country B, and got text.tokensChina.tfidf - just replacing all text.tokens to text.tokensChina on new paper. So each matrix contains tf-idf of annual statements from 1971 - 2005, where Rows = documents (years) and columns = terms.



Calculating cosine similarity: So I decided to use Text2Vec as is described here - however, I did not define common space and project documents to it - dunno if it's crucial. And then decided to text two functionssim2 and psim2 since I did not know the difference in parallel.



What was wrong at the start: When first running the functions, I was getting an error, probably telling me, that my lengths of columns in the two TF-IDF matrixes are not matched:




ncol(x) == ncol(y) is not TRUE




However, re-running the code for all my steps and then trying again, it worked, but I did not change anything ...



Results: Result for the function sim2 is weird table [1:45, 1:45]. Clearly not what I wanted - one column with the similarity between the speech of Country A and country B in given year.



Result for the function psim2 is better - one column with the results [not sure, how right they are though].



Technical questions: Using Psim2 is what I wanna - Not I see that sim2 created something like correlation heat map, my bad. But why is the Psim2 function working, even when the length of columns is different (picture)? Also, did I not do anything wrong, especially when I did not create a common space?



Code, picture:



    # *** Text Pre-Processing with Quanteda *** 
# 1. Tokenization
text.tokens <- tokens(docs$text, what = 'word',
remove_numbers = TRUE,
remove_punct = TRUE,
remove_symbols = TRUE,
remove_hyphens = TRUE)

# 2. Transform words to lower case
text.tokens <- tokens_tolower(text.tokens)

# 3. Removing stop-words (Using quanteda's built-in stopwords list)
text.tokens <- tokens_select(text.tokens, stopwords(),
selection = 'remove')
# 4. Perform stemming on the tokens.
text.tokens <- tokens_wordstem(text.tokens, language = 'english')

# 5. Create bag-of-words model / document feature(frequance)
text.tokens.dfm <- dfm(text.tokens, tolower = FALSE)

# 6. Transform to a matrix to work with and inspect
text.tokens.matrix <- as.matrix(text.tokens.dfm)
dim(text.tokens.matrix)

# *** Doing TF-IDF ***
# Defining Function for calculating relative term frequency (TF)
term.frequency <- function(row) {
row / sum(row)
}
# Defining Function for calculating inverse document frequency (IDF)
inverse.doc.freq <- function(col) {
corpus.size <- length(col)
doc.count <- length(which(col > 0))

log10(corpus.size / doc.count)
}
# Defining function for calculating TD-IDF
tf.idf <- function(tf, idf) {
tf * idf
}

# 1. First step, normalize all documents via TF.
text.tokens.df <- apply(text.tokens.matrix, 1, term.frequency)
dim(text.tokens.df)

# 2. Second step, calculate the IDF vector
text.tokens.idf <- apply(text.tokens.matrix, 2, inverse.doc.freq)
str(text.tokens.idf)

# 3. Lastly, calculate TF-IDF for our corpus
# Apply function on columns, because matrix is transposed from TF function
text.tokens.tfidf <- apply(text.tokens.df, 2, tf.idf, idf = text.tokens.idf)
dim(text.tokens.tfidf)

# Now, transpose the matrix back
text.tokens.tfidf <- t(text.tokens.tfidf)
dim(text.tokens.tfidf)

# Cosine similarity using Text2Vec
similarity.sim2 <- sim2(text.tokensChina.tfidf, text.tokensChina.tfidf, method = "cosine", norm = "none")

similarity.psim2 <- psim2(text.tokensChina.tfidf, text.tokensChina.tfidf, method = "cosine", norm = "none")
similarity.psim2 <- as.data.frame(similarity.psim2)


Global Enviroment picture:
Picture of my screen with Global Environment + Psim2 Results










share|improve this question























  • This is not really the kind of question for Stack Overflow. You might do better to post this on Code Review

    – G5W
    Nov 15 '18 at 19:16











  • Oh thank you, was not aware of the code review :/

    – Kamil Liskutin
    Nov 15 '18 at 19:34











  • See quanteda::dfm_tfidf().

    – Ken Benoit
    Nov 15 '18 at 20:02











  • Yeah, I know you can skin most of the preprocessing code, I just wanted to have it step by them cuz I am new to it so I remember it. Afterward will be using that.

    – Kamil Liskutin
    Nov 15 '18 at 20:40














-1












-1








-1








I would like to ask you, if anybody could check my code, because it was behaving weird - not working, giving me errors to suddenly working without changing anything - the code will be at the bottom.



Background: So my goal is to calculate text similarity [cosine, for now] between annual statements given by several countries at the UN General Assembly. More specifically find similarity between statement x and statement y in given year and do it for all 45 years. So I can make a graph for its evolution.



How I went about it: So [im novice] I decided to do the work in several steps - finding the similarity of statements of country A to country B first, and then re-doing the work for other countries (country A stays, everything is to country A).



So I filtered statements for Country A, arranged by year. Did text-preprocessing (tokenization, to lower, stopwords, lemenization, bag-of-words). And then I made a TF-IDF matrix from it - named: text.tokens.tfidf



I did the same process for Country B, and got text.tokensChina.tfidf - just replacing all text.tokens to text.tokensChina on new paper. So each matrix contains tf-idf of annual statements from 1971 - 2005, where Rows = documents (years) and columns = terms.



Calculating cosine similarity: So I decided to use Text2Vec as is described here - however, I did not define common space and project documents to it - dunno if it's crucial. And then decided to text two functionssim2 and psim2 since I did not know the difference in parallel.



What was wrong at the start: When first running the functions, I was getting an error, probably telling me, that my lengths of columns in the two TF-IDF matrixes are not matched:




ncol(x) == ncol(y) is not TRUE




However, re-running the code for all my steps and then trying again, it worked, but I did not change anything ...



Results: Result for the function sim2 is weird table [1:45, 1:45]. Clearly not what I wanted - one column with the similarity between the speech of Country A and country B in given year.



Result for the function psim2 is better - one column with the results [not sure, how right they are though].



Technical questions: Using Psim2 is what I wanna - Not I see that sim2 created something like correlation heat map, my bad. But why is the Psim2 function working, even when the length of columns is different (picture)? Also, did I not do anything wrong, especially when I did not create a common space?



Code, picture:



    # *** Text Pre-Processing with Quanteda *** 
# 1. Tokenization
text.tokens <- tokens(docs$text, what = 'word',
remove_numbers = TRUE,
remove_punct = TRUE,
remove_symbols = TRUE,
remove_hyphens = TRUE)

# 2. Transform words to lower case
text.tokens <- tokens_tolower(text.tokens)

# 3. Removing stop-words (Using quanteda's built-in stopwords list)
text.tokens <- tokens_select(text.tokens, stopwords(),
selection = 'remove')
# 4. Perform stemming on the tokens.
text.tokens <- tokens_wordstem(text.tokens, language = 'english')

# 5. Create bag-of-words model / document feature(frequance)
text.tokens.dfm <- dfm(text.tokens, tolower = FALSE)

# 6. Transform to a matrix to work with and inspect
text.tokens.matrix <- as.matrix(text.tokens.dfm)
dim(text.tokens.matrix)

# *** Doing TF-IDF ***
# Defining Function for calculating relative term frequency (TF)
term.frequency <- function(row) {
row / sum(row)
}
# Defining Function for calculating inverse document frequency (IDF)
inverse.doc.freq <- function(col) {
corpus.size <- length(col)
doc.count <- length(which(col > 0))

log10(corpus.size / doc.count)
}
# Defining function for calculating TD-IDF
tf.idf <- function(tf, idf) {
tf * idf
}

# 1. First step, normalize all documents via TF.
text.tokens.df <- apply(text.tokens.matrix, 1, term.frequency)
dim(text.tokens.df)

# 2. Second step, calculate the IDF vector
text.tokens.idf <- apply(text.tokens.matrix, 2, inverse.doc.freq)
str(text.tokens.idf)

# 3. Lastly, calculate TF-IDF for our corpus
# Apply function on columns, because matrix is transposed from TF function
text.tokens.tfidf <- apply(text.tokens.df, 2, tf.idf, idf = text.tokens.idf)
dim(text.tokens.tfidf)

# Now, transpose the matrix back
text.tokens.tfidf <- t(text.tokens.tfidf)
dim(text.tokens.tfidf)

# Cosine similarity using Text2Vec
similarity.sim2 <- sim2(text.tokensChina.tfidf, text.tokensChina.tfidf, method = "cosine", norm = "none")

similarity.psim2 <- psim2(text.tokensChina.tfidf, text.tokensChina.tfidf, method = "cosine", norm = "none")
similarity.psim2 <- as.data.frame(similarity.psim2)


Global Enviroment picture:
Picture of my screen with Global Environment + Psim2 Results










share|improve this question














I would like to ask you, if anybody could check my code, because it was behaving weird - not working, giving me errors to suddenly working without changing anything - the code will be at the bottom.



Background: So my goal is to calculate text similarity [cosine, for now] between annual statements given by several countries at the UN General Assembly. More specifically find similarity between statement x and statement y in given year and do it for all 45 years. So I can make a graph for its evolution.



How I went about it: So [im novice] I decided to do the work in several steps - finding the similarity of statements of country A to country B first, and then re-doing the work for other countries (country A stays, everything is to country A).



So I filtered statements for Country A, arranged by year. Did text-preprocessing (tokenization, to lower, stopwords, lemenization, bag-of-words). And then I made a TF-IDF matrix from it - named: text.tokens.tfidf



I did the same process for Country B, and got text.tokensChina.tfidf - just replacing all text.tokens to text.tokensChina on new paper. So each matrix contains tf-idf of annual statements from 1971 - 2005, where Rows = documents (years) and columns = terms.



Calculating cosine similarity: So I decided to use Text2Vec as is described here - however, I did not define common space and project documents to it - dunno if it's crucial. And then decided to text two functionssim2 and psim2 since I did not know the difference in parallel.



What was wrong at the start: When first running the functions, I was getting an error, probably telling me, that my lengths of columns in the two TF-IDF matrixes are not matched:




ncol(x) == ncol(y) is not TRUE




However, re-running the code for all my steps and then trying again, it worked, but I did not change anything ...



Results: Result for the function sim2 is weird table [1:45, 1:45]. Clearly not what I wanted - one column with the similarity between the speech of Country A and country B in given year.



Result for the function psim2 is better - one column with the results [not sure, how right they are though].



Technical questions: Using Psim2 is what I wanna - Not I see that sim2 created something like correlation heat map, my bad. But why is the Psim2 function working, even when the length of columns is different (picture)? Also, did I not do anything wrong, especially when I did not create a common space?



Code, picture:



    # *** Text Pre-Processing with Quanteda *** 
# 1. Tokenization
text.tokens <- tokens(docs$text, what = 'word',
remove_numbers = TRUE,
remove_punct = TRUE,
remove_symbols = TRUE,
remove_hyphens = TRUE)

# 2. Transform words to lower case
text.tokens <- tokens_tolower(text.tokens)

# 3. Removing stop-words (Using quanteda's built-in stopwords list)
text.tokens <- tokens_select(text.tokens, stopwords(),
selection = 'remove')
# 4. Perform stemming on the tokens.
text.tokens <- tokens_wordstem(text.tokens, language = 'english')

# 5. Create bag-of-words model / document feature(frequance)
text.tokens.dfm <- dfm(text.tokens, tolower = FALSE)

# 6. Transform to a matrix to work with and inspect
text.tokens.matrix <- as.matrix(text.tokens.dfm)
dim(text.tokens.matrix)

# *** Doing TF-IDF ***
# Defining Function for calculating relative term frequency (TF)
term.frequency <- function(row) {
row / sum(row)
}
# Defining Function for calculating inverse document frequency (IDF)
inverse.doc.freq <- function(col) {
corpus.size <- length(col)
doc.count <- length(which(col > 0))

log10(corpus.size / doc.count)
}
# Defining function for calculating TD-IDF
tf.idf <- function(tf, idf) {
tf * idf
}

# 1. First step, normalize all documents via TF.
text.tokens.df <- apply(text.tokens.matrix, 1, term.frequency)
dim(text.tokens.df)

# 2. Second step, calculate the IDF vector
text.tokens.idf <- apply(text.tokens.matrix, 2, inverse.doc.freq)
str(text.tokens.idf)

# 3. Lastly, calculate TF-IDF for our corpus
# Apply function on columns, because matrix is transposed from TF function
text.tokens.tfidf <- apply(text.tokens.df, 2, tf.idf, idf = text.tokens.idf)
dim(text.tokens.tfidf)

# Now, transpose the matrix back
text.tokens.tfidf <- t(text.tokens.tfidf)
dim(text.tokens.tfidf)

# Cosine similarity using Text2Vec
similarity.sim2 <- sim2(text.tokensChina.tfidf, text.tokensChina.tfidf, method = "cosine", norm = "none")

similarity.psim2 <- psim2(text.tokensChina.tfidf, text.tokensChina.tfidf, method = "cosine", norm = "none")
similarity.psim2 <- as.data.frame(similarity.psim2)


Global Enviroment picture:
Picture of my screen with Global Environment + Psim2 Results







r cosine-similarity linguistics quanteda text2vec






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked Nov 15 '18 at 18:43









Kamil LiskutinKamil Liskutin

13




13













  • This is not really the kind of question for Stack Overflow. You might do better to post this on Code Review

    – G5W
    Nov 15 '18 at 19:16











  • Oh thank you, was not aware of the code review :/

    – Kamil Liskutin
    Nov 15 '18 at 19:34











  • See quanteda::dfm_tfidf().

    – Ken Benoit
    Nov 15 '18 at 20:02











  • Yeah, I know you can skin most of the preprocessing code, I just wanted to have it step by them cuz I am new to it so I remember it. Afterward will be using that.

    – Kamil Liskutin
    Nov 15 '18 at 20:40



















  • This is not really the kind of question for Stack Overflow. You might do better to post this on Code Review

    – G5W
    Nov 15 '18 at 19:16











  • Oh thank you, was not aware of the code review :/

    – Kamil Liskutin
    Nov 15 '18 at 19:34











  • See quanteda::dfm_tfidf().

    – Ken Benoit
    Nov 15 '18 at 20:02











  • Yeah, I know you can skin most of the preprocessing code, I just wanted to have it step by them cuz I am new to it so I remember it. Afterward will be using that.

    – Kamil Liskutin
    Nov 15 '18 at 20:40

















This is not really the kind of question for Stack Overflow. You might do better to post this on Code Review

– G5W
Nov 15 '18 at 19:16





This is not really the kind of question for Stack Overflow. You might do better to post this on Code Review

– G5W
Nov 15 '18 at 19:16













Oh thank you, was not aware of the code review :/

– Kamil Liskutin
Nov 15 '18 at 19:34





Oh thank you, was not aware of the code review :/

– Kamil Liskutin
Nov 15 '18 at 19:34













See quanteda::dfm_tfidf().

– Ken Benoit
Nov 15 '18 at 20:02





See quanteda::dfm_tfidf().

– Ken Benoit
Nov 15 '18 at 20:02













Yeah, I know you can skin most of the preprocessing code, I just wanted to have it step by them cuz I am new to it so I remember it. Afterward will be using that.

– Kamil Liskutin
Nov 15 '18 at 20:40





Yeah, I know you can skin most of the preprocessing code, I just wanted to have it step by them cuz I am new to it so I remember it. Afterward will be using that.

– Kamil Liskutin
Nov 15 '18 at 20:40












1 Answer
1






active

oldest

votes


















0














Well, the outcome is, the whole thing is complete BS. Did not compare things in one vector space. Not to mention, the best method is to use doc2vec but I tried to figure it out for several days and got nowhere, unfortunately.






share|improve this answer























    Your Answer






    StackExchange.ifUsing("editor", function () {
    StackExchange.using("externalEditor", function () {
    StackExchange.using("snippets", function () {
    StackExchange.snippets.init();
    });
    });
    }, "code-snippets");

    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "1"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














    draft saved

    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53326014%2ftext-similarity-cosine-control%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    1 Answer
    1






    active

    oldest

    votes








    1 Answer
    1






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    0














    Well, the outcome is, the whole thing is complete BS. Did not compare things in one vector space. Not to mention, the best method is to use doc2vec but I tried to figure it out for several days and got nowhere, unfortunately.






    share|improve this answer




























      0














      Well, the outcome is, the whole thing is complete BS. Did not compare things in one vector space. Not to mention, the best method is to use doc2vec but I tried to figure it out for several days and got nowhere, unfortunately.






      share|improve this answer


























        0












        0








        0







        Well, the outcome is, the whole thing is complete BS. Did not compare things in one vector space. Not to mention, the best method is to use doc2vec but I tried to figure it out for several days and got nowhere, unfortunately.






        share|improve this answer













        Well, the outcome is, the whole thing is complete BS. Did not compare things in one vector space. Not to mention, the best method is to use doc2vec but I tried to figure it out for several days and got nowhere, unfortunately.







        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered Nov 18 '18 at 1:01









        Kamil LiskutinKamil Liskutin

        13




        13
































            draft saved

            draft discarded




















































            Thanks for contributing an answer to Stack Overflow!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53326014%2ftext-similarity-cosine-control%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            Xamarin.iOS Cant Deploy on Iphone

            Glorious Revolution

            Dulmage-Mendelsohn matrix decomposition in Python