Multiset: Problem with multiset adding more than one version of a word and cannot handle large amounts of...
up vote
0
down vote
favorite
Update and fixed: I have fixed the problem causing the error message- Huge thanks to user PaulMcKenzie for helping me understand what the error message was telling me!- When my program encountered a letter with a mark above it (diacritical marks I think they are called), it crashed. I have adjusted my code to account for these and now it doesn't crash at all! Another huge thanks to user ihavenoidea for helping me understand multisets! My program is now working the way it's supposed to!
Original post:
****I am VERY new to C++ so any and all help is appreciated!****
Ok, so I'm trying to use multiset to sort words so I can see how many times a word appears in a text. First, my program accepts a file, then it reads the words and takes out any punctuation, then it puts it into a multiset. After this, it is supposed to put the results into a text file the user names themselves.
My first issue is that the multiset seems to be creating more than one element for the same word (For example: in one of my tests I saw a(4) listed in the text document 3 times in a row instead of one time).
My Second issue is that when I try to read in large text documents (I'm using John Colliers story "Bottle Party" http://ciscohouston.com/docs/docs/greats/bottle_party.html to test it) my program completely crashes but doesn't crash when I test it with a smaller text document (small being with say about 5-10 lines of text). I'm using Visual Studios and (once again I'm new to Visual Studios also) I don't know what the error message is trying to tell me but it says:

After selecting retry:

As always, any and all help is greatly appreciated.
Code here:
#include <iostream>
#include <string> //for strings
#include <fstream> //for files
#include <set> //for use of multiset
using namespace std;
string cleanUpPunc(string);
//Global variables
multiset <string> words; //will change back to local variable later
int main() {
//Starting variables
string fileName1 = "", fileName2 = "", input = "", input2 = ""; //To hold the input file and the file we wish to print data to if desired
ifstream fileStream; //gets infor from file
//Program start
cout << "Welcome to Bags Program by Rachel Woods!" << endl;
cout << "Please enter the name of the file you wish to input data from: ";
getline(cin, fileName1);
//Trys to open file
try {
fileStream.open(fileName1);
if (!fileStream) {
cerr << "Unable to open file, please check file name and try again." << endl;
system("PAUSE");
exit(1);
}
while (fileStream >> input) {
input2 = cleanUpPunc(input); //sends the input word to check for punctation
words.insert(input2); //puts the 'cleaned up' word into the multiset for counting
}
fileStream.close();
//Sends it to a text document
cout << "Please name the file you would like to put the results into: ";
getline(cin, fileName2);
ofstream toFile; //writes info to a file
//Code to put info into text file
toFile.open(fileName2);
if (toFile.is_open()) {
multiset<string>::iterator pos;
for (pos = words.begin(); pos != words.end(); pos++) {
toFile << *pos << " " << words.count(*pos) << endl;
}
toFile.close();
cout << "Results written to file!" << endl;
}
else {
cout << "Could not create file, please try again." << endl;
}
}catch (exception e) {
cout << "Stop that. ";
cout << e.what();
}
cout << "Thanks for using this program!" << endl;
system("PAUSE");
return 0;
}
string cleanUpPunc(string maybe) {
//Takes out puncuation from string
//Variables
string takeOut = maybe;
//Method
for (int i = 0, len = maybe.size(); i < len; i++) {
if (ispunct(takeOut[i])) {
takeOut.erase(i--, 1);
len = takeOut.size();
}
}
return takeOut;
}
c++ multiset
|
show 5 more comments
up vote
0
down vote
favorite
Update and fixed: I have fixed the problem causing the error message- Huge thanks to user PaulMcKenzie for helping me understand what the error message was telling me!- When my program encountered a letter with a mark above it (diacritical marks I think they are called), it crashed. I have adjusted my code to account for these and now it doesn't crash at all! Another huge thanks to user ihavenoidea for helping me understand multisets! My program is now working the way it's supposed to!
Original post:
****I am VERY new to C++ so any and all help is appreciated!****
Ok, so I'm trying to use multiset to sort words so I can see how many times a word appears in a text. First, my program accepts a file, then it reads the words and takes out any punctuation, then it puts it into a multiset. After this, it is supposed to put the results into a text file the user names themselves.
My first issue is that the multiset seems to be creating more than one element for the same word (For example: in one of my tests I saw a(4) listed in the text document 3 times in a row instead of one time).
My Second issue is that when I try to read in large text documents (I'm using John Colliers story "Bottle Party" http://ciscohouston.com/docs/docs/greats/bottle_party.html to test it) my program completely crashes but doesn't crash when I test it with a smaller text document (small being with say about 5-10 lines of text). I'm using Visual Studios and (once again I'm new to Visual Studios also) I don't know what the error message is trying to tell me but it says:

After selecting retry:

As always, any and all help is greatly appreciated.
Code here:
#include <iostream>
#include <string> //for strings
#include <fstream> //for files
#include <set> //for use of multiset
using namespace std;
string cleanUpPunc(string);
//Global variables
multiset <string> words; //will change back to local variable later
int main() {
//Starting variables
string fileName1 = "", fileName2 = "", input = "", input2 = ""; //To hold the input file and the file we wish to print data to if desired
ifstream fileStream; //gets infor from file
//Program start
cout << "Welcome to Bags Program by Rachel Woods!" << endl;
cout << "Please enter the name of the file you wish to input data from: ";
getline(cin, fileName1);
//Trys to open file
try {
fileStream.open(fileName1);
if (!fileStream) {
cerr << "Unable to open file, please check file name and try again." << endl;
system("PAUSE");
exit(1);
}
while (fileStream >> input) {
input2 = cleanUpPunc(input); //sends the input word to check for punctation
words.insert(input2); //puts the 'cleaned up' word into the multiset for counting
}
fileStream.close();
//Sends it to a text document
cout << "Please name the file you would like to put the results into: ";
getline(cin, fileName2);
ofstream toFile; //writes info to a file
//Code to put info into text file
toFile.open(fileName2);
if (toFile.is_open()) {
multiset<string>::iterator pos;
for (pos = words.begin(); pos != words.end(); pos++) {
toFile << *pos << " " << words.count(*pos) << endl;
}
toFile.close();
cout << "Results written to file!" << endl;
}
else {
cout << "Could not create file, please try again." << endl;
}
}catch (exception e) {
cout << "Stop that. ";
cout << e.what();
}
cout << "Thanks for using this program!" << endl;
system("PAUSE");
return 0;
}
string cleanUpPunc(string maybe) {
//Takes out puncuation from string
//Variables
string takeOut = maybe;
//Method
for (int i = 0, len = maybe.size(); i < len; i++) {
if (ispunct(takeOut[i])) {
takeOut.erase(i--, 1);
len = takeOut.size();
}
}
return takeOut;
}
c++ multiset
You have a debugger, so please use that to tell us more information about what is going wrong, including the variables at that point. You may also find this like useful: ericlippert.com/2014/03/05/how-to-debug-small-programs . Finally, your bug has nothing to do with multisets.
– Ken Y-N
Nov 12 at 2:13
1
Multisets allow duplicate elements, that's why you have multiple copies of strings.
– ihavenoidea
Nov 12 at 2:18
1
I would suggest to you the use of a dictionary. (std::mapin C++). Check this answer that address exactly your problem: stackoverflow.com/questions/16867944/…
– ihavenoidea
Nov 12 at 2:30
1
In fact, multisets and sets (mathematically speaking) have nothing to do about the number of times an element is present. It's just that one allow duplicates and the other does not (among other things like how each one do certain set operations).
– ihavenoidea
Nov 12 at 2:35
1
@RJWoods That crash indicates you probably have a non-ASCII character being processed, and you're feeding that character into a function that expects ASCII characters. It has nothing to do with multiset. Also, yourtakeOutPunctis way too much work. To do that in C++ is a one-line call tostd::erase_if.
– PaulMcKenzie
Nov 12 at 2:43
|
show 5 more comments
up vote
0
down vote
favorite
up vote
0
down vote
favorite
Update and fixed: I have fixed the problem causing the error message- Huge thanks to user PaulMcKenzie for helping me understand what the error message was telling me!- When my program encountered a letter with a mark above it (diacritical marks I think they are called), it crashed. I have adjusted my code to account for these and now it doesn't crash at all! Another huge thanks to user ihavenoidea for helping me understand multisets! My program is now working the way it's supposed to!
Original post:
****I am VERY new to C++ so any and all help is appreciated!****
Ok, so I'm trying to use multiset to sort words so I can see how many times a word appears in a text. First, my program accepts a file, then it reads the words and takes out any punctuation, then it puts it into a multiset. After this, it is supposed to put the results into a text file the user names themselves.
My first issue is that the multiset seems to be creating more than one element for the same word (For example: in one of my tests I saw a(4) listed in the text document 3 times in a row instead of one time).
My Second issue is that when I try to read in large text documents (I'm using John Colliers story "Bottle Party" http://ciscohouston.com/docs/docs/greats/bottle_party.html to test it) my program completely crashes but doesn't crash when I test it with a smaller text document (small being with say about 5-10 lines of text). I'm using Visual Studios and (once again I'm new to Visual Studios also) I don't know what the error message is trying to tell me but it says:

After selecting retry:

As always, any and all help is greatly appreciated.
Code here:
#include <iostream>
#include <string> //for strings
#include <fstream> //for files
#include <set> //for use of multiset
using namespace std;
string cleanUpPunc(string);
//Global variables
multiset <string> words; //will change back to local variable later
int main() {
//Starting variables
string fileName1 = "", fileName2 = "", input = "", input2 = ""; //To hold the input file and the file we wish to print data to if desired
ifstream fileStream; //gets infor from file
//Program start
cout << "Welcome to Bags Program by Rachel Woods!" << endl;
cout << "Please enter the name of the file you wish to input data from: ";
getline(cin, fileName1);
//Trys to open file
try {
fileStream.open(fileName1);
if (!fileStream) {
cerr << "Unable to open file, please check file name and try again." << endl;
system("PAUSE");
exit(1);
}
while (fileStream >> input) {
input2 = cleanUpPunc(input); //sends the input word to check for punctation
words.insert(input2); //puts the 'cleaned up' word into the multiset for counting
}
fileStream.close();
//Sends it to a text document
cout << "Please name the file you would like to put the results into: ";
getline(cin, fileName2);
ofstream toFile; //writes info to a file
//Code to put info into text file
toFile.open(fileName2);
if (toFile.is_open()) {
multiset<string>::iterator pos;
for (pos = words.begin(); pos != words.end(); pos++) {
toFile << *pos << " " << words.count(*pos) << endl;
}
toFile.close();
cout << "Results written to file!" << endl;
}
else {
cout << "Could not create file, please try again." << endl;
}
}catch (exception e) {
cout << "Stop that. ";
cout << e.what();
}
cout << "Thanks for using this program!" << endl;
system("PAUSE");
return 0;
}
string cleanUpPunc(string maybe) {
//Takes out puncuation from string
//Variables
string takeOut = maybe;
//Method
for (int i = 0, len = maybe.size(); i < len; i++) {
if (ispunct(takeOut[i])) {
takeOut.erase(i--, 1);
len = takeOut.size();
}
}
return takeOut;
}
c++ multiset
Update and fixed: I have fixed the problem causing the error message- Huge thanks to user PaulMcKenzie for helping me understand what the error message was telling me!- When my program encountered a letter with a mark above it (diacritical marks I think they are called), it crashed. I have adjusted my code to account for these and now it doesn't crash at all! Another huge thanks to user ihavenoidea for helping me understand multisets! My program is now working the way it's supposed to!
Original post:
****I am VERY new to C++ so any and all help is appreciated!****
Ok, so I'm trying to use multiset to sort words so I can see how many times a word appears in a text. First, my program accepts a file, then it reads the words and takes out any punctuation, then it puts it into a multiset. After this, it is supposed to put the results into a text file the user names themselves.
My first issue is that the multiset seems to be creating more than one element for the same word (For example: in one of my tests I saw a(4) listed in the text document 3 times in a row instead of one time).
My Second issue is that when I try to read in large text documents (I'm using John Colliers story "Bottle Party" http://ciscohouston.com/docs/docs/greats/bottle_party.html to test it) my program completely crashes but doesn't crash when I test it with a smaller text document (small being with say about 5-10 lines of text). I'm using Visual Studios and (once again I'm new to Visual Studios also) I don't know what the error message is trying to tell me but it says:

After selecting retry:

As always, any and all help is greatly appreciated.
Code here:
#include <iostream>
#include <string> //for strings
#include <fstream> //for files
#include <set> //for use of multiset
using namespace std;
string cleanUpPunc(string);
//Global variables
multiset <string> words; //will change back to local variable later
int main() {
//Starting variables
string fileName1 = "", fileName2 = "", input = "", input2 = ""; //To hold the input file and the file we wish to print data to if desired
ifstream fileStream; //gets infor from file
//Program start
cout << "Welcome to Bags Program by Rachel Woods!" << endl;
cout << "Please enter the name of the file you wish to input data from: ";
getline(cin, fileName1);
//Trys to open file
try {
fileStream.open(fileName1);
if (!fileStream) {
cerr << "Unable to open file, please check file name and try again." << endl;
system("PAUSE");
exit(1);
}
while (fileStream >> input) {
input2 = cleanUpPunc(input); //sends the input word to check for punctation
words.insert(input2); //puts the 'cleaned up' word into the multiset for counting
}
fileStream.close();
//Sends it to a text document
cout << "Please name the file you would like to put the results into: ";
getline(cin, fileName2);
ofstream toFile; //writes info to a file
//Code to put info into text file
toFile.open(fileName2);
if (toFile.is_open()) {
multiset<string>::iterator pos;
for (pos = words.begin(); pos != words.end(); pos++) {
toFile << *pos << " " << words.count(*pos) << endl;
}
toFile.close();
cout << "Results written to file!" << endl;
}
else {
cout << "Could not create file, please try again." << endl;
}
}catch (exception e) {
cout << "Stop that. ";
cout << e.what();
}
cout << "Thanks for using this program!" << endl;
system("PAUSE");
return 0;
}
string cleanUpPunc(string maybe) {
//Takes out puncuation from string
//Variables
string takeOut = maybe;
//Method
for (int i = 0, len = maybe.size(); i < len; i++) {
if (ispunct(takeOut[i])) {
takeOut.erase(i--, 1);
len = takeOut.size();
}
}
return takeOut;
}
c++ multiset
c++ multiset
edited Nov 12 at 6:53
asked Nov 12 at 2:01
RJWoods
215
215
You have a debugger, so please use that to tell us more information about what is going wrong, including the variables at that point. You may also find this like useful: ericlippert.com/2014/03/05/how-to-debug-small-programs . Finally, your bug has nothing to do with multisets.
– Ken Y-N
Nov 12 at 2:13
1
Multisets allow duplicate elements, that's why you have multiple copies of strings.
– ihavenoidea
Nov 12 at 2:18
1
I would suggest to you the use of a dictionary. (std::mapin C++). Check this answer that address exactly your problem: stackoverflow.com/questions/16867944/…
– ihavenoidea
Nov 12 at 2:30
1
In fact, multisets and sets (mathematically speaking) have nothing to do about the number of times an element is present. It's just that one allow duplicates and the other does not (among other things like how each one do certain set operations).
– ihavenoidea
Nov 12 at 2:35
1
@RJWoods That crash indicates you probably have a non-ASCII character being processed, and you're feeding that character into a function that expects ASCII characters. It has nothing to do with multiset. Also, yourtakeOutPunctis way too much work. To do that in C++ is a one-line call tostd::erase_if.
– PaulMcKenzie
Nov 12 at 2:43
|
show 5 more comments
You have a debugger, so please use that to tell us more information about what is going wrong, including the variables at that point. You may also find this like useful: ericlippert.com/2014/03/05/how-to-debug-small-programs . Finally, your bug has nothing to do with multisets.
– Ken Y-N
Nov 12 at 2:13
1
Multisets allow duplicate elements, that's why you have multiple copies of strings.
– ihavenoidea
Nov 12 at 2:18
1
I would suggest to you the use of a dictionary. (std::mapin C++). Check this answer that address exactly your problem: stackoverflow.com/questions/16867944/…
– ihavenoidea
Nov 12 at 2:30
1
In fact, multisets and sets (mathematically speaking) have nothing to do about the number of times an element is present. It's just that one allow duplicates and the other does not (among other things like how each one do certain set operations).
– ihavenoidea
Nov 12 at 2:35
1
@RJWoods That crash indicates you probably have a non-ASCII character being processed, and you're feeding that character into a function that expects ASCII characters. It has nothing to do with multiset. Also, yourtakeOutPunctis way too much work. To do that in C++ is a one-line call tostd::erase_if.
– PaulMcKenzie
Nov 12 at 2:43
You have a debugger, so please use that to tell us more information about what is going wrong, including the variables at that point. You may also find this like useful: ericlippert.com/2014/03/05/how-to-debug-small-programs . Finally, your bug has nothing to do with multisets.
– Ken Y-N
Nov 12 at 2:13
You have a debugger, so please use that to tell us more information about what is going wrong, including the variables at that point. You may also find this like useful: ericlippert.com/2014/03/05/how-to-debug-small-programs . Finally, your bug has nothing to do with multisets.
– Ken Y-N
Nov 12 at 2:13
1
1
Multisets allow duplicate elements, that's why you have multiple copies of strings.
– ihavenoidea
Nov 12 at 2:18
Multisets allow duplicate elements, that's why you have multiple copies of strings.
– ihavenoidea
Nov 12 at 2:18
1
1
I would suggest to you the use of a dictionary. (
std::map in C++). Check this answer that address exactly your problem: stackoverflow.com/questions/16867944/…– ihavenoidea
Nov 12 at 2:30
I would suggest to you the use of a dictionary. (
std::map in C++). Check this answer that address exactly your problem: stackoverflow.com/questions/16867944/…– ihavenoidea
Nov 12 at 2:30
1
1
In fact, multisets and sets (mathematically speaking) have nothing to do about the number of times an element is present. It's just that one allow duplicates and the other does not (among other things like how each one do certain set operations).
– ihavenoidea
Nov 12 at 2:35
In fact, multisets and sets (mathematically speaking) have nothing to do about the number of times an element is present. It's just that one allow duplicates and the other does not (among other things like how each one do certain set operations).
– ihavenoidea
Nov 12 at 2:35
1
1
@RJWoods That crash indicates you probably have a non-ASCII character being processed, and you're feeding that character into a function that expects ASCII characters. It has nothing to do with multiset. Also, your
takeOutPunct is way too much work. To do that in C++ is a one-line call to std::erase_if.– PaulMcKenzie
Nov 12 at 2:43
@RJWoods That crash indicates you probably have a non-ASCII character being processed, and you're feeding that character into a function that expects ASCII characters. It has nothing to do with multiset. Also, your
takeOutPunct is way too much work. To do that in C++ is a one-line call to std::erase_if.– PaulMcKenzie
Nov 12 at 2:43
|
show 5 more comments
active
oldest
votes
active
oldest
votes
active
oldest
votes
active
oldest
votes
active
oldest
votes
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53255130%2fmultiset-problem-with-multiset-adding-more-than-one-version-of-a-word-and-canno%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
You have a debugger, so please use that to tell us more information about what is going wrong, including the variables at that point. You may also find this like useful: ericlippert.com/2014/03/05/how-to-debug-small-programs . Finally, your bug has nothing to do with multisets.
– Ken Y-N
Nov 12 at 2:13
1
Multisets allow duplicate elements, that's why you have multiple copies of strings.
– ihavenoidea
Nov 12 at 2:18
1
I would suggest to you the use of a dictionary. (
std::mapin C++). Check this answer that address exactly your problem: stackoverflow.com/questions/16867944/…– ihavenoidea
Nov 12 at 2:30
1
In fact, multisets and sets (mathematically speaking) have nothing to do about the number of times an element is present. It's just that one allow duplicates and the other does not (among other things like how each one do certain set operations).
– ihavenoidea
Nov 12 at 2:35
1
@RJWoods That crash indicates you probably have a non-ASCII character being processed, and you're feeding that character into a function that expects ASCII characters. It has nothing to do with multiset. Also, your
takeOutPunctis way too much work. To do that in C++ is a one-line call tostd::erase_if.– PaulMcKenzie
Nov 12 at 2:43