Multiset: Problem with multiset adding more than one version of a word and cannot handle large amounts of...











up vote
0
down vote

favorite












Update and fixed: I have fixed the problem causing the error message- Huge thanks to user PaulMcKenzie for helping me understand what the error message was telling me!- When my program encountered a letter with a mark above it (diacritical marks I think they are called), it crashed. I have adjusted my code to account for these and now it doesn't crash at all! Another huge thanks to user ihavenoidea for helping me understand multisets! My program is now working the way it's supposed to!



Original post:
****I am VERY new to C++ so any and all help is appreciated!****



Ok, so I'm trying to use multiset to sort words so I can see how many times a word appears in a text. First, my program accepts a file, then it reads the words and takes out any punctuation, then it puts it into a multiset. After this, it is supposed to put the results into a text file the user names themselves.



My first issue is that the multiset seems to be creating more than one element for the same word (For example: in one of my tests I saw a(4) listed in the text document 3 times in a row instead of one time).



My Second issue is that when I try to read in large text documents (I'm using John Colliers story "Bottle Party" http://ciscohouston.com/docs/docs/greats/bottle_party.html to test it) my program completely crashes but doesn't crash when I test it with a smaller text document (small being with say about 5-10 lines of text). I'm using Visual Studios and (once again I'm new to Visual Studios also) I don't know what the error message is trying to tell me but it says:
error message



After selecting retry:
error2
As always, any and all help is greatly appreciated.



Code here:



#include <iostream>
#include <string> //for strings
#include <fstream> //for files
#include <set> //for use of multiset

using namespace std;

string cleanUpPunc(string);

//Global variables
multiset <string> words; //will change back to local variable later

int main() {
//Starting variables
string fileName1 = "", fileName2 = "", input = "", input2 = ""; //To hold the input file and the file we wish to print data to if desired
ifstream fileStream; //gets infor from file

//Program start
cout << "Welcome to Bags Program by Rachel Woods!" << endl;
cout << "Please enter the name of the file you wish to input data from: ";
getline(cin, fileName1);

//Trys to open file
try {
fileStream.open(fileName1);
if (!fileStream) {
cerr << "Unable to open file, please check file name and try again." << endl;
system("PAUSE");
exit(1);
}

while (fileStream >> input) {
input2 = cleanUpPunc(input); //sends the input word to check for punctation
words.insert(input2); //puts the 'cleaned up' word into the multiset for counting
}
fileStream.close();

//Sends it to a text document
cout << "Please name the file you would like to put the results into: ";
getline(cin, fileName2);

ofstream toFile; //writes info to a file

//Code to put info into text file
toFile.open(fileName2);
if (toFile.is_open()) {

multiset<string>::iterator pos;
for (pos = words.begin(); pos != words.end(); pos++) {
toFile << *pos << " " << words.count(*pos) << endl;
}
toFile.close();
cout << "Results written to file!" << endl;

}
else {
cout << "Could not create file, please try again." << endl;
}

}catch (exception e) {
cout << "Stop that. ";
cout << e.what();
}

cout << "Thanks for using this program!" << endl;
system("PAUSE");
return 0;
}

string cleanUpPunc(string maybe) {
//Takes out puncuation from string
//Variables
string takeOut = maybe;

//Method

for (int i = 0, len = maybe.size(); i < len; i++) {
if (ispunct(takeOut[i])) {
takeOut.erase(i--, 1);
len = takeOut.size();
}
}

return takeOut;
}









share|improve this question
























  • You have a debugger, so please use that to tell us more information about what is going wrong, including the variables at that point. You may also find this like useful: ericlippert.com/2014/03/05/how-to-debug-small-programs . Finally, your bug has nothing to do with multisets.
    – Ken Y-N
    Nov 12 at 2:13








  • 1




    Multisets allow duplicate elements, that's why you have multiple copies of strings.
    – ihavenoidea
    Nov 12 at 2:18








  • 1




    I would suggest to you the use of a dictionary. (std::map in C++). Check this answer that address exactly your problem: stackoverflow.com/questions/16867944/…
    – ihavenoidea
    Nov 12 at 2:30






  • 1




    In fact, multisets and sets (mathematically speaking) have nothing to do about the number of times an element is present. It's just that one allow duplicates and the other does not (among other things like how each one do certain set operations).
    – ihavenoidea
    Nov 12 at 2:35








  • 1




    @RJWoods That crash indicates you probably have a non-ASCII character being processed, and you're feeding that character into a function that expects ASCII characters. It has nothing to do with multiset. Also, your takeOutPunct is way too much work. To do that in C++ is a one-line call to std::erase_if.
    – PaulMcKenzie
    Nov 12 at 2:43

















up vote
0
down vote

favorite












Update and fixed: I have fixed the problem causing the error message- Huge thanks to user PaulMcKenzie for helping me understand what the error message was telling me!- When my program encountered a letter with a mark above it (diacritical marks I think they are called), it crashed. I have adjusted my code to account for these and now it doesn't crash at all! Another huge thanks to user ihavenoidea for helping me understand multisets! My program is now working the way it's supposed to!



Original post:
****I am VERY new to C++ so any and all help is appreciated!****



Ok, so I'm trying to use multiset to sort words so I can see how many times a word appears in a text. First, my program accepts a file, then it reads the words and takes out any punctuation, then it puts it into a multiset. After this, it is supposed to put the results into a text file the user names themselves.



My first issue is that the multiset seems to be creating more than one element for the same word (For example: in one of my tests I saw a(4) listed in the text document 3 times in a row instead of one time).



My Second issue is that when I try to read in large text documents (I'm using John Colliers story "Bottle Party" http://ciscohouston.com/docs/docs/greats/bottle_party.html to test it) my program completely crashes but doesn't crash when I test it with a smaller text document (small being with say about 5-10 lines of text). I'm using Visual Studios and (once again I'm new to Visual Studios also) I don't know what the error message is trying to tell me but it says:
error message



After selecting retry:
error2
As always, any and all help is greatly appreciated.



Code here:



#include <iostream>
#include <string> //for strings
#include <fstream> //for files
#include <set> //for use of multiset

using namespace std;

string cleanUpPunc(string);

//Global variables
multiset <string> words; //will change back to local variable later

int main() {
//Starting variables
string fileName1 = "", fileName2 = "", input = "", input2 = ""; //To hold the input file and the file we wish to print data to if desired
ifstream fileStream; //gets infor from file

//Program start
cout << "Welcome to Bags Program by Rachel Woods!" << endl;
cout << "Please enter the name of the file you wish to input data from: ";
getline(cin, fileName1);

//Trys to open file
try {
fileStream.open(fileName1);
if (!fileStream) {
cerr << "Unable to open file, please check file name and try again." << endl;
system("PAUSE");
exit(1);
}

while (fileStream >> input) {
input2 = cleanUpPunc(input); //sends the input word to check for punctation
words.insert(input2); //puts the 'cleaned up' word into the multiset for counting
}
fileStream.close();

//Sends it to a text document
cout << "Please name the file you would like to put the results into: ";
getline(cin, fileName2);

ofstream toFile; //writes info to a file

//Code to put info into text file
toFile.open(fileName2);
if (toFile.is_open()) {

multiset<string>::iterator pos;
for (pos = words.begin(); pos != words.end(); pos++) {
toFile << *pos << " " << words.count(*pos) << endl;
}
toFile.close();
cout << "Results written to file!" << endl;

}
else {
cout << "Could not create file, please try again." << endl;
}

}catch (exception e) {
cout << "Stop that. ";
cout << e.what();
}

cout << "Thanks for using this program!" << endl;
system("PAUSE");
return 0;
}

string cleanUpPunc(string maybe) {
//Takes out puncuation from string
//Variables
string takeOut = maybe;

//Method

for (int i = 0, len = maybe.size(); i < len; i++) {
if (ispunct(takeOut[i])) {
takeOut.erase(i--, 1);
len = takeOut.size();
}
}

return takeOut;
}









share|improve this question
























  • You have a debugger, so please use that to tell us more information about what is going wrong, including the variables at that point. You may also find this like useful: ericlippert.com/2014/03/05/how-to-debug-small-programs . Finally, your bug has nothing to do with multisets.
    – Ken Y-N
    Nov 12 at 2:13








  • 1




    Multisets allow duplicate elements, that's why you have multiple copies of strings.
    – ihavenoidea
    Nov 12 at 2:18








  • 1




    I would suggest to you the use of a dictionary. (std::map in C++). Check this answer that address exactly your problem: stackoverflow.com/questions/16867944/…
    – ihavenoidea
    Nov 12 at 2:30






  • 1




    In fact, multisets and sets (mathematically speaking) have nothing to do about the number of times an element is present. It's just that one allow duplicates and the other does not (among other things like how each one do certain set operations).
    – ihavenoidea
    Nov 12 at 2:35








  • 1




    @RJWoods That crash indicates you probably have a non-ASCII character being processed, and you're feeding that character into a function that expects ASCII characters. It has nothing to do with multiset. Also, your takeOutPunct is way too much work. To do that in C++ is a one-line call to std::erase_if.
    – PaulMcKenzie
    Nov 12 at 2:43















up vote
0
down vote

favorite









up vote
0
down vote

favorite











Update and fixed: I have fixed the problem causing the error message- Huge thanks to user PaulMcKenzie for helping me understand what the error message was telling me!- When my program encountered a letter with a mark above it (diacritical marks I think they are called), it crashed. I have adjusted my code to account for these and now it doesn't crash at all! Another huge thanks to user ihavenoidea for helping me understand multisets! My program is now working the way it's supposed to!



Original post:
****I am VERY new to C++ so any and all help is appreciated!****



Ok, so I'm trying to use multiset to sort words so I can see how many times a word appears in a text. First, my program accepts a file, then it reads the words and takes out any punctuation, then it puts it into a multiset. After this, it is supposed to put the results into a text file the user names themselves.



My first issue is that the multiset seems to be creating more than one element for the same word (For example: in one of my tests I saw a(4) listed in the text document 3 times in a row instead of one time).



My Second issue is that when I try to read in large text documents (I'm using John Colliers story "Bottle Party" http://ciscohouston.com/docs/docs/greats/bottle_party.html to test it) my program completely crashes but doesn't crash when I test it with a smaller text document (small being with say about 5-10 lines of text). I'm using Visual Studios and (once again I'm new to Visual Studios also) I don't know what the error message is trying to tell me but it says:
error message



After selecting retry:
error2
As always, any and all help is greatly appreciated.



Code here:



#include <iostream>
#include <string> //for strings
#include <fstream> //for files
#include <set> //for use of multiset

using namespace std;

string cleanUpPunc(string);

//Global variables
multiset <string> words; //will change back to local variable later

int main() {
//Starting variables
string fileName1 = "", fileName2 = "", input = "", input2 = ""; //To hold the input file and the file we wish to print data to if desired
ifstream fileStream; //gets infor from file

//Program start
cout << "Welcome to Bags Program by Rachel Woods!" << endl;
cout << "Please enter the name of the file you wish to input data from: ";
getline(cin, fileName1);

//Trys to open file
try {
fileStream.open(fileName1);
if (!fileStream) {
cerr << "Unable to open file, please check file name and try again." << endl;
system("PAUSE");
exit(1);
}

while (fileStream >> input) {
input2 = cleanUpPunc(input); //sends the input word to check for punctation
words.insert(input2); //puts the 'cleaned up' word into the multiset for counting
}
fileStream.close();

//Sends it to a text document
cout << "Please name the file you would like to put the results into: ";
getline(cin, fileName2);

ofstream toFile; //writes info to a file

//Code to put info into text file
toFile.open(fileName2);
if (toFile.is_open()) {

multiset<string>::iterator pos;
for (pos = words.begin(); pos != words.end(); pos++) {
toFile << *pos << " " << words.count(*pos) << endl;
}
toFile.close();
cout << "Results written to file!" << endl;

}
else {
cout << "Could not create file, please try again." << endl;
}

}catch (exception e) {
cout << "Stop that. ";
cout << e.what();
}

cout << "Thanks for using this program!" << endl;
system("PAUSE");
return 0;
}

string cleanUpPunc(string maybe) {
//Takes out puncuation from string
//Variables
string takeOut = maybe;

//Method

for (int i = 0, len = maybe.size(); i < len; i++) {
if (ispunct(takeOut[i])) {
takeOut.erase(i--, 1);
len = takeOut.size();
}
}

return takeOut;
}









share|improve this question















Update and fixed: I have fixed the problem causing the error message- Huge thanks to user PaulMcKenzie for helping me understand what the error message was telling me!- When my program encountered a letter with a mark above it (diacritical marks I think they are called), it crashed. I have adjusted my code to account for these and now it doesn't crash at all! Another huge thanks to user ihavenoidea for helping me understand multisets! My program is now working the way it's supposed to!



Original post:
****I am VERY new to C++ so any and all help is appreciated!****



Ok, so I'm trying to use multiset to sort words so I can see how many times a word appears in a text. First, my program accepts a file, then it reads the words and takes out any punctuation, then it puts it into a multiset. After this, it is supposed to put the results into a text file the user names themselves.



My first issue is that the multiset seems to be creating more than one element for the same word (For example: in one of my tests I saw a(4) listed in the text document 3 times in a row instead of one time).



My Second issue is that when I try to read in large text documents (I'm using John Colliers story "Bottle Party" http://ciscohouston.com/docs/docs/greats/bottle_party.html to test it) my program completely crashes but doesn't crash when I test it with a smaller text document (small being with say about 5-10 lines of text). I'm using Visual Studios and (once again I'm new to Visual Studios also) I don't know what the error message is trying to tell me but it says:
error message



After selecting retry:
error2
As always, any and all help is greatly appreciated.



Code here:



#include <iostream>
#include <string> //for strings
#include <fstream> //for files
#include <set> //for use of multiset

using namespace std;

string cleanUpPunc(string);

//Global variables
multiset <string> words; //will change back to local variable later

int main() {
//Starting variables
string fileName1 = "", fileName2 = "", input = "", input2 = ""; //To hold the input file and the file we wish to print data to if desired
ifstream fileStream; //gets infor from file

//Program start
cout << "Welcome to Bags Program by Rachel Woods!" << endl;
cout << "Please enter the name of the file you wish to input data from: ";
getline(cin, fileName1);

//Trys to open file
try {
fileStream.open(fileName1);
if (!fileStream) {
cerr << "Unable to open file, please check file name and try again." << endl;
system("PAUSE");
exit(1);
}

while (fileStream >> input) {
input2 = cleanUpPunc(input); //sends the input word to check for punctation
words.insert(input2); //puts the 'cleaned up' word into the multiset for counting
}
fileStream.close();

//Sends it to a text document
cout << "Please name the file you would like to put the results into: ";
getline(cin, fileName2);

ofstream toFile; //writes info to a file

//Code to put info into text file
toFile.open(fileName2);
if (toFile.is_open()) {

multiset<string>::iterator pos;
for (pos = words.begin(); pos != words.end(); pos++) {
toFile << *pos << " " << words.count(*pos) << endl;
}
toFile.close();
cout << "Results written to file!" << endl;

}
else {
cout << "Could not create file, please try again." << endl;
}

}catch (exception e) {
cout << "Stop that. ";
cout << e.what();
}

cout << "Thanks for using this program!" << endl;
system("PAUSE");
return 0;
}

string cleanUpPunc(string maybe) {
//Takes out puncuation from string
//Variables
string takeOut = maybe;

//Method

for (int i = 0, len = maybe.size(); i < len; i++) {
if (ispunct(takeOut[i])) {
takeOut.erase(i--, 1);
len = takeOut.size();
}
}

return takeOut;
}






c++ multiset






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 12 at 6:53

























asked Nov 12 at 2:01









RJWoods

215




215












  • You have a debugger, so please use that to tell us more information about what is going wrong, including the variables at that point. You may also find this like useful: ericlippert.com/2014/03/05/how-to-debug-small-programs . Finally, your bug has nothing to do with multisets.
    – Ken Y-N
    Nov 12 at 2:13








  • 1




    Multisets allow duplicate elements, that's why you have multiple copies of strings.
    – ihavenoidea
    Nov 12 at 2:18








  • 1




    I would suggest to you the use of a dictionary. (std::map in C++). Check this answer that address exactly your problem: stackoverflow.com/questions/16867944/…
    – ihavenoidea
    Nov 12 at 2:30






  • 1




    In fact, multisets and sets (mathematically speaking) have nothing to do about the number of times an element is present. It's just that one allow duplicates and the other does not (among other things like how each one do certain set operations).
    – ihavenoidea
    Nov 12 at 2:35








  • 1




    @RJWoods That crash indicates you probably have a non-ASCII character being processed, and you're feeding that character into a function that expects ASCII characters. It has nothing to do with multiset. Also, your takeOutPunct is way too much work. To do that in C++ is a one-line call to std::erase_if.
    – PaulMcKenzie
    Nov 12 at 2:43




















  • You have a debugger, so please use that to tell us more information about what is going wrong, including the variables at that point. You may also find this like useful: ericlippert.com/2014/03/05/how-to-debug-small-programs . Finally, your bug has nothing to do with multisets.
    – Ken Y-N
    Nov 12 at 2:13








  • 1




    Multisets allow duplicate elements, that's why you have multiple copies of strings.
    – ihavenoidea
    Nov 12 at 2:18








  • 1




    I would suggest to you the use of a dictionary. (std::map in C++). Check this answer that address exactly your problem: stackoverflow.com/questions/16867944/…
    – ihavenoidea
    Nov 12 at 2:30






  • 1




    In fact, multisets and sets (mathematically speaking) have nothing to do about the number of times an element is present. It's just that one allow duplicates and the other does not (among other things like how each one do certain set operations).
    – ihavenoidea
    Nov 12 at 2:35








  • 1




    @RJWoods That crash indicates you probably have a non-ASCII character being processed, and you're feeding that character into a function that expects ASCII characters. It has nothing to do with multiset. Also, your takeOutPunct is way too much work. To do that in C++ is a one-line call to std::erase_if.
    – PaulMcKenzie
    Nov 12 at 2:43


















You have a debugger, so please use that to tell us more information about what is going wrong, including the variables at that point. You may also find this like useful: ericlippert.com/2014/03/05/how-to-debug-small-programs . Finally, your bug has nothing to do with multisets.
– Ken Y-N
Nov 12 at 2:13






You have a debugger, so please use that to tell us more information about what is going wrong, including the variables at that point. You may also find this like useful: ericlippert.com/2014/03/05/how-to-debug-small-programs . Finally, your bug has nothing to do with multisets.
– Ken Y-N
Nov 12 at 2:13






1




1




Multisets allow duplicate elements, that's why you have multiple copies of strings.
– ihavenoidea
Nov 12 at 2:18






Multisets allow duplicate elements, that's why you have multiple copies of strings.
– ihavenoidea
Nov 12 at 2:18






1




1




I would suggest to you the use of a dictionary. (std::map in C++). Check this answer that address exactly your problem: stackoverflow.com/questions/16867944/…
– ihavenoidea
Nov 12 at 2:30




I would suggest to you the use of a dictionary. (std::map in C++). Check this answer that address exactly your problem: stackoverflow.com/questions/16867944/…
– ihavenoidea
Nov 12 at 2:30




1




1




In fact, multisets and sets (mathematically speaking) have nothing to do about the number of times an element is present. It's just that one allow duplicates and the other does not (among other things like how each one do certain set operations).
– ihavenoidea
Nov 12 at 2:35






In fact, multisets and sets (mathematically speaking) have nothing to do about the number of times an element is present. It's just that one allow duplicates and the other does not (among other things like how each one do certain set operations).
– ihavenoidea
Nov 12 at 2:35






1




1




@RJWoods That crash indicates you probably have a non-ASCII character being processed, and you're feeding that character into a function that expects ASCII characters. It has nothing to do with multiset. Also, your takeOutPunct is way too much work. To do that in C++ is a one-line call to std::erase_if.
– PaulMcKenzie
Nov 12 at 2:43






@RJWoods That crash indicates you probably have a non-ASCII character being processed, and you're feeding that character into a function that expects ASCII characters. It has nothing to do with multiset. Also, your takeOutPunct is way too much work. To do that in C++ is a one-line call to std::erase_if.
– PaulMcKenzie
Nov 12 at 2:43



















active

oldest

votes











Your Answer






StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53255130%2fmultiset-problem-with-multiset-adding-more-than-one-version-of-a-word-and-canno%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown






























active

oldest

votes













active

oldest

votes









active

oldest

votes






active

oldest

votes
















draft saved

draft discarded




















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.





Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


Please pay close attention to the following guidance:


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53255130%2fmultiset-problem-with-multiset-adding-more-than-one-version-of-a-word-and-canno%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

List item for chat from Array inside array React Native

Thiostrepton

Caerphilly