java- Full text inverted index defining a word












-1















I am working on a simple full text inverted index trying to build an index of words that I extract from PDF files. I am using PDFBox library to achieve this.



However, I would like to know how does one define a definition of word to index.The way my indexing works is define every word with a space is a word token. For example,



This string, is a code.


In this case: the index table would contain



This
string,
is
a
code.


The flaw here is for like string, , it comes with a comma where I think string would just be sufficient enough because nobody searches string, or code.



Back to my question, is there a specific rule there I could use to define my word token in a way to prevent this kind of issue with what I have ?



Code:



File folder = new File("D:\PDF1");
File listOfFiles = folder.listFiles();

for (File file : listOfFiles) {
if (file.isFile()) {
HashSet<String> uniqueWords = new HashSet<>();
String path = "D:\PDF1\" + file.getName();
try (PDDocument document = PDDocument.load(new File(path))) {
if (!document.isEncrypted()) {
PDFTextStripper tStripper = new PDFTextStripper();
String pdfFileInText = tStripper.getText(document);
String lines = pdfFileInText.split("\r?\n");
for(String line : lines) {
String words = line.split(" ");
for (String word : words) {
uniqueWords.add(word);
}

}
}
} catch (IOException e) {
System.err.println("Exception while trying to read pdf document - " + e);
}
}
}









share|improve this question

























  • why don' t you replace , with a "" ?

    – Scary Wombat
    Nov 14 '18 at 1:54











  • @ScaryWombat What do you mean? Sorry, I'm a bit blur on this regular expression thing.

    – Daredevil
    Nov 14 '18 at 1:54











  • let's see, a word is a String, a String has a method replace - so replace "," with "" - this is not regex. Then add it to your List

    – Scary Wombat
    Nov 14 '18 at 1:56











  • I see but would that contradict some special case like there is a sentence with date 15/12/2018 or f(x) = 2x +3y where it would be ideal to classify these as 2 words considering they are not separated by spaces.

    – Daredevil
    Nov 14 '18 at 1:58











  • The logic is yours, in my example all I am replacing is comma

    – Scary Wombat
    Nov 14 '18 at 1:58


















-1















I am working on a simple full text inverted index trying to build an index of words that I extract from PDF files. I am using PDFBox library to achieve this.



However, I would like to know how does one define a definition of word to index.The way my indexing works is define every word with a space is a word token. For example,



This string, is a code.


In this case: the index table would contain



This
string,
is
a
code.


The flaw here is for like string, , it comes with a comma where I think string would just be sufficient enough because nobody searches string, or code.



Back to my question, is there a specific rule there I could use to define my word token in a way to prevent this kind of issue with what I have ?



Code:



File folder = new File("D:\PDF1");
File listOfFiles = folder.listFiles();

for (File file : listOfFiles) {
if (file.isFile()) {
HashSet<String> uniqueWords = new HashSet<>();
String path = "D:\PDF1\" + file.getName();
try (PDDocument document = PDDocument.load(new File(path))) {
if (!document.isEncrypted()) {
PDFTextStripper tStripper = new PDFTextStripper();
String pdfFileInText = tStripper.getText(document);
String lines = pdfFileInText.split("\r?\n");
for(String line : lines) {
String words = line.split(" ");
for (String word : words) {
uniqueWords.add(word);
}

}
}
} catch (IOException e) {
System.err.println("Exception while trying to read pdf document - " + e);
}
}
}









share|improve this question

























  • why don' t you replace , with a "" ?

    – Scary Wombat
    Nov 14 '18 at 1:54











  • @ScaryWombat What do you mean? Sorry, I'm a bit blur on this regular expression thing.

    – Daredevil
    Nov 14 '18 at 1:54











  • let's see, a word is a String, a String has a method replace - so replace "," with "" - this is not regex. Then add it to your List

    – Scary Wombat
    Nov 14 '18 at 1:56











  • I see but would that contradict some special case like there is a sentence with date 15/12/2018 or f(x) = 2x +3y where it would be ideal to classify these as 2 words considering they are not separated by spaces.

    – Daredevil
    Nov 14 '18 at 1:58











  • The logic is yours, in my example all I am replacing is comma

    – Scary Wombat
    Nov 14 '18 at 1:58
















-1












-1








-1








I am working on a simple full text inverted index trying to build an index of words that I extract from PDF files. I am using PDFBox library to achieve this.



However, I would like to know how does one define a definition of word to index.The way my indexing works is define every word with a space is a word token. For example,



This string, is a code.


In this case: the index table would contain



This
string,
is
a
code.


The flaw here is for like string, , it comes with a comma where I think string would just be sufficient enough because nobody searches string, or code.



Back to my question, is there a specific rule there I could use to define my word token in a way to prevent this kind of issue with what I have ?



Code:



File folder = new File("D:\PDF1");
File listOfFiles = folder.listFiles();

for (File file : listOfFiles) {
if (file.isFile()) {
HashSet<String> uniqueWords = new HashSet<>();
String path = "D:\PDF1\" + file.getName();
try (PDDocument document = PDDocument.load(new File(path))) {
if (!document.isEncrypted()) {
PDFTextStripper tStripper = new PDFTextStripper();
String pdfFileInText = tStripper.getText(document);
String lines = pdfFileInText.split("\r?\n");
for(String line : lines) {
String words = line.split(" ");
for (String word : words) {
uniqueWords.add(word);
}

}
}
} catch (IOException e) {
System.err.println("Exception while trying to read pdf document - " + e);
}
}
}









share|improve this question
















I am working on a simple full text inverted index trying to build an index of words that I extract from PDF files. I am using PDFBox library to achieve this.



However, I would like to know how does one define a definition of word to index.The way my indexing works is define every word with a space is a word token. For example,



This string, is a code.


In this case: the index table would contain



This
string,
is
a
code.


The flaw here is for like string, , it comes with a comma where I think string would just be sufficient enough because nobody searches string, or code.



Back to my question, is there a specific rule there I could use to define my word token in a way to prevent this kind of issue with what I have ?



Code:



File folder = new File("D:\PDF1");
File listOfFiles = folder.listFiles();

for (File file : listOfFiles) {
if (file.isFile()) {
HashSet<String> uniqueWords = new HashSet<>();
String path = "D:\PDF1\" + file.getName();
try (PDDocument document = PDDocument.load(new File(path))) {
if (!document.isEncrypted()) {
PDFTextStripper tStripper = new PDFTextStripper();
String pdfFileInText = tStripper.getText(document);
String lines = pdfFileInText.split("\r?\n");
for(String line : lines) {
String words = line.split(" ");
for (String word : words) {
uniqueWords.add(word);
}

}
}
} catch (IOException e) {
System.err.println("Exception while trying to read pdf document - " + e);
}
}
}






java pdfbox






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 14 '18 at 2:04









GBlodgett

9,74341733




9,74341733










asked Nov 14 '18 at 1:51









DaredevilDaredevil

18210




18210













  • why don' t you replace , with a "" ?

    – Scary Wombat
    Nov 14 '18 at 1:54











  • @ScaryWombat What do you mean? Sorry, I'm a bit blur on this regular expression thing.

    – Daredevil
    Nov 14 '18 at 1:54











  • let's see, a word is a String, a String has a method replace - so replace "," with "" - this is not regex. Then add it to your List

    – Scary Wombat
    Nov 14 '18 at 1:56











  • I see but would that contradict some special case like there is a sentence with date 15/12/2018 or f(x) = 2x +3y where it would be ideal to classify these as 2 words considering they are not separated by spaces.

    – Daredevil
    Nov 14 '18 at 1:58











  • The logic is yours, in my example all I am replacing is comma

    – Scary Wombat
    Nov 14 '18 at 1:58





















  • why don' t you replace , with a "" ?

    – Scary Wombat
    Nov 14 '18 at 1:54











  • @ScaryWombat What do you mean? Sorry, I'm a bit blur on this regular expression thing.

    – Daredevil
    Nov 14 '18 at 1:54











  • let's see, a word is a String, a String has a method replace - so replace "," with "" - this is not regex. Then add it to your List

    – Scary Wombat
    Nov 14 '18 at 1:56











  • I see but would that contradict some special case like there is a sentence with date 15/12/2018 or f(x) = 2x +3y where it would be ideal to classify these as 2 words considering they are not separated by spaces.

    – Daredevil
    Nov 14 '18 at 1:58











  • The logic is yours, in my example all I am replacing is comma

    – Scary Wombat
    Nov 14 '18 at 1:58



















why don' t you replace , with a "" ?

– Scary Wombat
Nov 14 '18 at 1:54





why don' t you replace , with a "" ?

– Scary Wombat
Nov 14 '18 at 1:54













@ScaryWombat What do you mean? Sorry, I'm a bit blur on this regular expression thing.

– Daredevil
Nov 14 '18 at 1:54





@ScaryWombat What do you mean? Sorry, I'm a bit blur on this regular expression thing.

– Daredevil
Nov 14 '18 at 1:54













let's see, a word is a String, a String has a method replace - so replace "," with "" - this is not regex. Then add it to your List

– Scary Wombat
Nov 14 '18 at 1:56





let's see, a word is a String, a String has a method replace - so replace "," with "" - this is not regex. Then add it to your List

– Scary Wombat
Nov 14 '18 at 1:56













I see but would that contradict some special case like there is a sentence with date 15/12/2018 or f(x) = 2x +3y where it would be ideal to classify these as 2 words considering they are not separated by spaces.

– Daredevil
Nov 14 '18 at 1:58





I see but would that contradict some special case like there is a sentence with date 15/12/2018 or f(x) = 2x +3y where it would be ideal to classify these as 2 words considering they are not separated by spaces.

– Daredevil
Nov 14 '18 at 1:58













The logic is yours, in my example all I am replacing is comma

– Scary Wombat
Nov 14 '18 at 1:58







The logic is yours, in my example all I am replacing is comma

– Scary Wombat
Nov 14 '18 at 1:58














2 Answers
2






active

oldest

votes


















1














Yes. You can use replaceAll method to get rid of non-word characters like this:



uniqueWords.add(word.replaceAll("([\W]+$)|(^[\W]+)", "")); 





share|improve this answer


























  • what is \W? I am confused

    – Daredevil
    Nov 14 '18 at 2:14











  • non-word characters, W should be a capital one

    – Aleksandr Gromov
    Nov 14 '18 at 2:15













  • But there's gonna be a problem say I have a date 10/12/2018 and I have to include this whole in my index then it's gonna omit the "/" which I don't want

    – Daredevil
    Nov 14 '18 at 2:18











  • Edited. I added exclusion, you can add exclusions in this section [^/]. So now, it will remove all non-word characters except those which are provided in [^/] section

    – Aleksandr Gromov
    Nov 14 '18 at 2:42













  • There is a problem. If I have animal. then i would get animal which is fine. But what if I have 69.4 and i would like it in the same form, it would then omit the dot and becomes 694

    – Daredevil
    Nov 14 '18 at 2:44



















2














If you wanted to remove all punctuation you could do:



for(String word : words) {
uniqueWords.add(word.replaceAll("[.,!?]", ""));
}


Which will replace all periods, commas, exclamation marks, and question marks.





If you also want to get rid of quotes you can do:



uniqueWords.add(word.replaceAll("[.,?!"]", "")





share|improve this answer


























  • What does it do? But what if my sentence contains say 11/2/2018 and I would like it as a whole as a word. It would eliminate it right?

    – Daredevil
    Nov 14 '18 at 1:56






  • 1





    Which will replace all periods, commas, exclamation marks, and question marks

    – Scary Wombat
    Nov 14 '18 at 1:57











  • @Daredevil No it will not. Try it for yourself: System.out.println("10/2/18".replaceAll("[.,!?]", ""));

    – GBlodgett
    Nov 14 '18 at 1:58











  • Would it be possible to replace like "animal" to read it as animal? I tried including it as well but it wouldn't take the argument

    – Daredevil
    Nov 14 '18 at 2:14











  • @Daredevil What do you mean? Replace animal with what?

    – GBlodgett
    Nov 14 '18 at 2:15











Your Answer






StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53292067%2fjava-full-text-inverted-index-defining-a-word%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























2 Answers
2






active

oldest

votes








2 Answers
2






active

oldest

votes









active

oldest

votes






active

oldest

votes









1














Yes. You can use replaceAll method to get rid of non-word characters like this:



uniqueWords.add(word.replaceAll("([\W]+$)|(^[\W]+)", "")); 





share|improve this answer


























  • what is \W? I am confused

    – Daredevil
    Nov 14 '18 at 2:14











  • non-word characters, W should be a capital one

    – Aleksandr Gromov
    Nov 14 '18 at 2:15













  • But there's gonna be a problem say I have a date 10/12/2018 and I have to include this whole in my index then it's gonna omit the "/" which I don't want

    – Daredevil
    Nov 14 '18 at 2:18











  • Edited. I added exclusion, you can add exclusions in this section [^/]. So now, it will remove all non-word characters except those which are provided in [^/] section

    – Aleksandr Gromov
    Nov 14 '18 at 2:42













  • There is a problem. If I have animal. then i would get animal which is fine. But what if I have 69.4 and i would like it in the same form, it would then omit the dot and becomes 694

    – Daredevil
    Nov 14 '18 at 2:44
















1














Yes. You can use replaceAll method to get rid of non-word characters like this:



uniqueWords.add(word.replaceAll("([\W]+$)|(^[\W]+)", "")); 





share|improve this answer


























  • what is \W? I am confused

    – Daredevil
    Nov 14 '18 at 2:14











  • non-word characters, W should be a capital one

    – Aleksandr Gromov
    Nov 14 '18 at 2:15













  • But there's gonna be a problem say I have a date 10/12/2018 and I have to include this whole in my index then it's gonna omit the "/" which I don't want

    – Daredevil
    Nov 14 '18 at 2:18











  • Edited. I added exclusion, you can add exclusions in this section [^/]. So now, it will remove all non-word characters except those which are provided in [^/] section

    – Aleksandr Gromov
    Nov 14 '18 at 2:42













  • There is a problem. If I have animal. then i would get animal which is fine. But what if I have 69.4 and i would like it in the same form, it would then omit the dot and becomes 694

    – Daredevil
    Nov 14 '18 at 2:44














1












1








1







Yes. You can use replaceAll method to get rid of non-word characters like this:



uniqueWords.add(word.replaceAll("([\W]+$)|(^[\W]+)", "")); 





share|improve this answer















Yes. You can use replaceAll method to get rid of non-word characters like this:



uniqueWords.add(word.replaceAll("([\W]+$)|(^[\W]+)", "")); 






share|improve this answer














share|improve this answer



share|improve this answer








edited Nov 14 '18 at 4:01

























answered Nov 14 '18 at 2:14









Aleksandr GromovAleksandr Gromov

463




463













  • what is \W? I am confused

    – Daredevil
    Nov 14 '18 at 2:14











  • non-word characters, W should be a capital one

    – Aleksandr Gromov
    Nov 14 '18 at 2:15













  • But there's gonna be a problem say I have a date 10/12/2018 and I have to include this whole in my index then it's gonna omit the "/" which I don't want

    – Daredevil
    Nov 14 '18 at 2:18











  • Edited. I added exclusion, you can add exclusions in this section [^/]. So now, it will remove all non-word characters except those which are provided in [^/] section

    – Aleksandr Gromov
    Nov 14 '18 at 2:42













  • There is a problem. If I have animal. then i would get animal which is fine. But what if I have 69.4 and i would like it in the same form, it would then omit the dot and becomes 694

    – Daredevil
    Nov 14 '18 at 2:44



















  • what is \W? I am confused

    – Daredevil
    Nov 14 '18 at 2:14











  • non-word characters, W should be a capital one

    – Aleksandr Gromov
    Nov 14 '18 at 2:15













  • But there's gonna be a problem say I have a date 10/12/2018 and I have to include this whole in my index then it's gonna omit the "/" which I don't want

    – Daredevil
    Nov 14 '18 at 2:18











  • Edited. I added exclusion, you can add exclusions in this section [^/]. So now, it will remove all non-word characters except those which are provided in [^/] section

    – Aleksandr Gromov
    Nov 14 '18 at 2:42













  • There is a problem. If I have animal. then i would get animal which is fine. But what if I have 69.4 and i would like it in the same form, it would then omit the dot and becomes 694

    – Daredevil
    Nov 14 '18 at 2:44

















what is \W? I am confused

– Daredevil
Nov 14 '18 at 2:14





what is \W? I am confused

– Daredevil
Nov 14 '18 at 2:14













non-word characters, W should be a capital one

– Aleksandr Gromov
Nov 14 '18 at 2:15







non-word characters, W should be a capital one

– Aleksandr Gromov
Nov 14 '18 at 2:15















But there's gonna be a problem say I have a date 10/12/2018 and I have to include this whole in my index then it's gonna omit the "/" which I don't want

– Daredevil
Nov 14 '18 at 2:18





But there's gonna be a problem say I have a date 10/12/2018 and I have to include this whole in my index then it's gonna omit the "/" which I don't want

– Daredevil
Nov 14 '18 at 2:18













Edited. I added exclusion, you can add exclusions in this section [^/]. So now, it will remove all non-word characters except those which are provided in [^/] section

– Aleksandr Gromov
Nov 14 '18 at 2:42







Edited. I added exclusion, you can add exclusions in this section [^/]. So now, it will remove all non-word characters except those which are provided in [^/] section

– Aleksandr Gromov
Nov 14 '18 at 2:42















There is a problem. If I have animal. then i would get animal which is fine. But what if I have 69.4 and i would like it in the same form, it would then omit the dot and becomes 694

– Daredevil
Nov 14 '18 at 2:44





There is a problem. If I have animal. then i would get animal which is fine. But what if I have 69.4 and i would like it in the same form, it would then omit the dot and becomes 694

– Daredevil
Nov 14 '18 at 2:44













2














If you wanted to remove all punctuation you could do:



for(String word : words) {
uniqueWords.add(word.replaceAll("[.,!?]", ""));
}


Which will replace all periods, commas, exclamation marks, and question marks.





If you also want to get rid of quotes you can do:



uniqueWords.add(word.replaceAll("[.,?!"]", "")





share|improve this answer


























  • What does it do? But what if my sentence contains say 11/2/2018 and I would like it as a whole as a word. It would eliminate it right?

    – Daredevil
    Nov 14 '18 at 1:56






  • 1





    Which will replace all periods, commas, exclamation marks, and question marks

    – Scary Wombat
    Nov 14 '18 at 1:57











  • @Daredevil No it will not. Try it for yourself: System.out.println("10/2/18".replaceAll("[.,!?]", ""));

    – GBlodgett
    Nov 14 '18 at 1:58











  • Would it be possible to replace like "animal" to read it as animal? I tried including it as well but it wouldn't take the argument

    – Daredevil
    Nov 14 '18 at 2:14











  • @Daredevil What do you mean? Replace animal with what?

    – GBlodgett
    Nov 14 '18 at 2:15
















2














If you wanted to remove all punctuation you could do:



for(String word : words) {
uniqueWords.add(word.replaceAll("[.,!?]", ""));
}


Which will replace all periods, commas, exclamation marks, and question marks.





If you also want to get rid of quotes you can do:



uniqueWords.add(word.replaceAll("[.,?!"]", "")





share|improve this answer


























  • What does it do? But what if my sentence contains say 11/2/2018 and I would like it as a whole as a word. It would eliminate it right?

    – Daredevil
    Nov 14 '18 at 1:56






  • 1





    Which will replace all periods, commas, exclamation marks, and question marks

    – Scary Wombat
    Nov 14 '18 at 1:57











  • @Daredevil No it will not. Try it for yourself: System.out.println("10/2/18".replaceAll("[.,!?]", ""));

    – GBlodgett
    Nov 14 '18 at 1:58











  • Would it be possible to replace like "animal" to read it as animal? I tried including it as well but it wouldn't take the argument

    – Daredevil
    Nov 14 '18 at 2:14











  • @Daredevil What do you mean? Replace animal with what?

    – GBlodgett
    Nov 14 '18 at 2:15














2












2








2







If you wanted to remove all punctuation you could do:



for(String word : words) {
uniqueWords.add(word.replaceAll("[.,!?]", ""));
}


Which will replace all periods, commas, exclamation marks, and question marks.





If you also want to get rid of quotes you can do:



uniqueWords.add(word.replaceAll("[.,?!"]", "")





share|improve this answer















If you wanted to remove all punctuation you could do:



for(String word : words) {
uniqueWords.add(word.replaceAll("[.,!?]", ""));
}


Which will replace all periods, commas, exclamation marks, and question marks.





If you also want to get rid of quotes you can do:



uniqueWords.add(word.replaceAll("[.,?!"]", "")






share|improve this answer














share|improve this answer



share|improve this answer








edited Nov 14 '18 at 2:26

























answered Nov 14 '18 at 1:54









GBlodgettGBlodgett

9,74341733




9,74341733













  • What does it do? But what if my sentence contains say 11/2/2018 and I would like it as a whole as a word. It would eliminate it right?

    – Daredevil
    Nov 14 '18 at 1:56






  • 1





    Which will replace all periods, commas, exclamation marks, and question marks

    – Scary Wombat
    Nov 14 '18 at 1:57











  • @Daredevil No it will not. Try it for yourself: System.out.println("10/2/18".replaceAll("[.,!?]", ""));

    – GBlodgett
    Nov 14 '18 at 1:58











  • Would it be possible to replace like "animal" to read it as animal? I tried including it as well but it wouldn't take the argument

    – Daredevil
    Nov 14 '18 at 2:14











  • @Daredevil What do you mean? Replace animal with what?

    – GBlodgett
    Nov 14 '18 at 2:15



















  • What does it do? But what if my sentence contains say 11/2/2018 and I would like it as a whole as a word. It would eliminate it right?

    – Daredevil
    Nov 14 '18 at 1:56






  • 1





    Which will replace all periods, commas, exclamation marks, and question marks

    – Scary Wombat
    Nov 14 '18 at 1:57











  • @Daredevil No it will not. Try it for yourself: System.out.println("10/2/18".replaceAll("[.,!?]", ""));

    – GBlodgett
    Nov 14 '18 at 1:58











  • Would it be possible to replace like "animal" to read it as animal? I tried including it as well but it wouldn't take the argument

    – Daredevil
    Nov 14 '18 at 2:14











  • @Daredevil What do you mean? Replace animal with what?

    – GBlodgett
    Nov 14 '18 at 2:15

















What does it do? But what if my sentence contains say 11/2/2018 and I would like it as a whole as a word. It would eliminate it right?

– Daredevil
Nov 14 '18 at 1:56





What does it do? But what if my sentence contains say 11/2/2018 and I would like it as a whole as a word. It would eliminate it right?

– Daredevil
Nov 14 '18 at 1:56




1




1





Which will replace all periods, commas, exclamation marks, and question marks

– Scary Wombat
Nov 14 '18 at 1:57





Which will replace all periods, commas, exclamation marks, and question marks

– Scary Wombat
Nov 14 '18 at 1:57













@Daredevil No it will not. Try it for yourself: System.out.println("10/2/18".replaceAll("[.,!?]", ""));

– GBlodgett
Nov 14 '18 at 1:58





@Daredevil No it will not. Try it for yourself: System.out.println("10/2/18".replaceAll("[.,!?]", ""));

– GBlodgett
Nov 14 '18 at 1:58













Would it be possible to replace like "animal" to read it as animal? I tried including it as well but it wouldn't take the argument

– Daredevil
Nov 14 '18 at 2:14





Would it be possible to replace like "animal" to read it as animal? I tried including it as well but it wouldn't take the argument

– Daredevil
Nov 14 '18 at 2:14













@Daredevil What do you mean? Replace animal with what?

– GBlodgett
Nov 14 '18 at 2:15





@Daredevil What do you mean? Replace animal with what?

– GBlodgett
Nov 14 '18 at 2:15


















draft saved

draft discarded




















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53292067%2fjava-full-text-inverted-index-defining-a-word%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Xamarin.iOS Cant Deploy on Iphone

Glorious Revolution

Dulmage-Mendelsohn matrix decomposition in Python