How to search the field which could contains spaces,- and a concatenated number.?












0















Hi I have a field with the following schema,



  <fieldType name="text" class="solr.TextField" positionIncrementGap="100">
<analyzer type="index">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.StopFilterFactory" words="stopwords.txt" ignoreCase="true"/>
<filter class="solr.WordDelimiterFilterFactory" catenateNumbers="1" generateNumberParts="1" protected="protwords.txt" splitOnCaseChange="1" generateWordParts="0" preserveOriginal="1" catenateAll="0" catenateWords="1"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.SynonymFilterFactory" expand="true" ignoreCase="true" synonyms="synonyms.txt"/>
<filter class="solr.StopFilterFactory" words="stopwords.txt" ignoreCase="true"/>
<filter class="solr.WordDelimiterFilterFactory" catenateNumbers="0" protected="protwords.txt" splitOnCaseChange="1" generateWordParts="0" preserveOriginal="1" catenateAll="0" catenateWords="0"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
</analyzer>
</fieldType>


I am storing complete pdf documents.



Now suppose I have 4 documents with the following content.



1. stackoverflow is a good site.
2. stack-overflow is a good site.
3. stack overflow is a good site.
4. stackoverflow2018 is a good site.


Now when I search stackoverflow It should return me 1,
when I search stack-overflow it should return me 2.
when I search stack overflow it should return me 3.
when I search stackoverflow2018 it should return me 4.



what should the schema for it the schema not working in this case.
Is there any thing I could specify in the query ?










share|improve this question



























    0















    Hi I have a field with the following schema,



      <fieldType name="text" class="solr.TextField" positionIncrementGap="100">
    <analyzer type="index">
    <tokenizer class="solr.WhitespaceTokenizerFactory"/>
    <filter class="solr.StopFilterFactory" words="stopwords.txt" ignoreCase="true"/>
    <filter class="solr.WordDelimiterFilterFactory" catenateNumbers="1" generateNumberParts="1" protected="protwords.txt" splitOnCaseChange="1" generateWordParts="0" preserveOriginal="1" catenateAll="0" catenateWords="1"/>
    <filter class="solr.LowerCaseFilterFactory"/>
    <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
    </analyzer>
    <analyzer type="query">
    <tokenizer class="solr.WhitespaceTokenizerFactory"/>
    <filter class="solr.SynonymFilterFactory" expand="true" ignoreCase="true" synonyms="synonyms.txt"/>
    <filter class="solr.StopFilterFactory" words="stopwords.txt" ignoreCase="true"/>
    <filter class="solr.WordDelimiterFilterFactory" catenateNumbers="0" protected="protwords.txt" splitOnCaseChange="1" generateWordParts="0" preserveOriginal="1" catenateAll="0" catenateWords="0"/>
    <filter class="solr.LowerCaseFilterFactory"/>
    <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
    </analyzer>
    </fieldType>


    I am storing complete pdf documents.



    Now suppose I have 4 documents with the following content.



    1. stackoverflow is a good site.
    2. stack-overflow is a good site.
    3. stack overflow is a good site.
    4. stackoverflow2018 is a good site.


    Now when I search stackoverflow It should return me 1,
    when I search stack-overflow it should return me 2.
    when I search stack overflow it should return me 3.
    when I search stackoverflow2018 it should return me 4.



    what should the schema for it the schema not working in this case.
    Is there any thing I could specify in the query ?










    share|improve this question

























      0












      0








      0








      Hi I have a field with the following schema,



        <fieldType name="text" class="solr.TextField" positionIncrementGap="100">
      <analyzer type="index">
      <tokenizer class="solr.WhitespaceTokenizerFactory"/>
      <filter class="solr.StopFilterFactory" words="stopwords.txt" ignoreCase="true"/>
      <filter class="solr.WordDelimiterFilterFactory" catenateNumbers="1" generateNumberParts="1" protected="protwords.txt" splitOnCaseChange="1" generateWordParts="0" preserveOriginal="1" catenateAll="0" catenateWords="1"/>
      <filter class="solr.LowerCaseFilterFactory"/>
      <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
      </analyzer>
      <analyzer type="query">
      <tokenizer class="solr.WhitespaceTokenizerFactory"/>
      <filter class="solr.SynonymFilterFactory" expand="true" ignoreCase="true" synonyms="synonyms.txt"/>
      <filter class="solr.StopFilterFactory" words="stopwords.txt" ignoreCase="true"/>
      <filter class="solr.WordDelimiterFilterFactory" catenateNumbers="0" protected="protwords.txt" splitOnCaseChange="1" generateWordParts="0" preserveOriginal="1" catenateAll="0" catenateWords="0"/>
      <filter class="solr.LowerCaseFilterFactory"/>
      <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
      </analyzer>
      </fieldType>


      I am storing complete pdf documents.



      Now suppose I have 4 documents with the following content.



      1. stackoverflow is a good site.
      2. stack-overflow is a good site.
      3. stack overflow is a good site.
      4. stackoverflow2018 is a good site.


      Now when I search stackoverflow It should return me 1,
      when I search stack-overflow it should return me 2.
      when I search stack overflow it should return me 3.
      when I search stackoverflow2018 it should return me 4.



      what should the schema for it the schema not working in this case.
      Is there any thing I could specify in the query ?










      share|improve this question














      Hi I have a field with the following schema,



        <fieldType name="text" class="solr.TextField" positionIncrementGap="100">
      <analyzer type="index">
      <tokenizer class="solr.WhitespaceTokenizerFactory"/>
      <filter class="solr.StopFilterFactory" words="stopwords.txt" ignoreCase="true"/>
      <filter class="solr.WordDelimiterFilterFactory" catenateNumbers="1" generateNumberParts="1" protected="protwords.txt" splitOnCaseChange="1" generateWordParts="0" preserveOriginal="1" catenateAll="0" catenateWords="1"/>
      <filter class="solr.LowerCaseFilterFactory"/>
      <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
      </analyzer>
      <analyzer type="query">
      <tokenizer class="solr.WhitespaceTokenizerFactory"/>
      <filter class="solr.SynonymFilterFactory" expand="true" ignoreCase="true" synonyms="synonyms.txt"/>
      <filter class="solr.StopFilterFactory" words="stopwords.txt" ignoreCase="true"/>
      <filter class="solr.WordDelimiterFilterFactory" catenateNumbers="0" protected="protwords.txt" splitOnCaseChange="1" generateWordParts="0" preserveOriginal="1" catenateAll="0" catenateWords="0"/>
      <filter class="solr.LowerCaseFilterFactory"/>
      <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
      </analyzer>
      </fieldType>


      I am storing complete pdf documents.



      Now suppose I have 4 documents with the following content.



      1. stackoverflow is a good site.
      2. stack-overflow is a good site.
      3. stack overflow is a good site.
      4. stackoverflow2018 is a good site.


      Now when I search stackoverflow It should return me 1,
      when I search stack-overflow it should return me 2.
      when I search stack overflow it should return me 3.
      when I search stackoverflow2018 it should return me 4.



      what should the schema for it the schema not working in this case.
      Is there any thing I could specify in the query ?







      solr lucene solr6






      share|improve this question













      share|improve this question











      share|improve this question




      share|improve this question










      asked Nov 13 '18 at 12:15









      RootRoot

      313928




      313928
























          1 Answer
          1






          active

          oldest

          votes


















          1














          A Word Delimiter Graph Filter will split on non-alphanumerics (-), case changes, and numbers by default.




          The rules for determining delimiters are determined as follows:



          A change in case within a word: "CamelCase" -> "Camel", "Case". This
          can be disabled by setting splitOnCaseChange="0".



          A transition from alpha to numeric characters or vice versa:
          "Gonzo5000" -> "Gonzo", "5000" "4500XL" -> "4500", "XL". This can be
          disabled by setting splitOnNumerics="0".



          Non-alphanumeric characters (discarded): "hot-spot" -> "hot", "spot"



          A trailing "'s" is removed: "O’Reilly’s" -> "O", "Reilly"



          Any leading or trailing delimiters are discarded: "--hot-spot--" ->
          "hot", "spot"




          If you don't want that behavior, remove the WordDelimiterFilter from your filter list and add other filters to support the part of the WDF behavior that you need.






          share|improve this answer
























          • Non-alphanumeric characters (discarded): "hot-spot" -> "hot", "spot" Is there any way to discard/disable this behavior ?

            – Root
            Nov 13 '18 at 15:10






          • 1





            Not everything at once, but you can use the types parameter with a file that redefines - to be an alphanumeric - - => ALPHANUM. See the types parameter in the source linked above.

            – MatsLindh
            Nov 13 '18 at 15:48











          Your Answer






          StackExchange.ifUsing("editor", function () {
          StackExchange.using("externalEditor", function () {
          StackExchange.using("snippets", function () {
          StackExchange.snippets.init();
          });
          });
          }, "code-snippets");

          StackExchange.ready(function() {
          var channelOptions = {
          tags: "".split(" "),
          id: "1"
          };
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function() {
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled) {
          StackExchange.using("snippets", function() {
          createEditor();
          });
          }
          else {
          createEditor();
          }
          });

          function createEditor() {
          StackExchange.prepareEditor({
          heartbeatType: 'answer',
          autoActivateHeartbeat: false,
          convertImagesToLinks: true,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: 10,
          bindNavPrevention: true,
          postfix: "",
          imageUploader: {
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          },
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          });


          }
          });














          draft saved

          draft discarded


















          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53280807%2fhow-to-search-the-field-which-could-contains-spaces-and-a-concatenated-number%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown

























          1 Answer
          1






          active

          oldest

          votes








          1 Answer
          1






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes









          1














          A Word Delimiter Graph Filter will split on non-alphanumerics (-), case changes, and numbers by default.




          The rules for determining delimiters are determined as follows:



          A change in case within a word: "CamelCase" -> "Camel", "Case". This
          can be disabled by setting splitOnCaseChange="0".



          A transition from alpha to numeric characters or vice versa:
          "Gonzo5000" -> "Gonzo", "5000" "4500XL" -> "4500", "XL". This can be
          disabled by setting splitOnNumerics="0".



          Non-alphanumeric characters (discarded): "hot-spot" -> "hot", "spot"



          A trailing "'s" is removed: "O’Reilly’s" -> "O", "Reilly"



          Any leading or trailing delimiters are discarded: "--hot-spot--" ->
          "hot", "spot"




          If you don't want that behavior, remove the WordDelimiterFilter from your filter list and add other filters to support the part of the WDF behavior that you need.






          share|improve this answer
























          • Non-alphanumeric characters (discarded): "hot-spot" -> "hot", "spot" Is there any way to discard/disable this behavior ?

            – Root
            Nov 13 '18 at 15:10






          • 1





            Not everything at once, but you can use the types parameter with a file that redefines - to be an alphanumeric - - => ALPHANUM. See the types parameter in the source linked above.

            – MatsLindh
            Nov 13 '18 at 15:48
















          1














          A Word Delimiter Graph Filter will split on non-alphanumerics (-), case changes, and numbers by default.




          The rules for determining delimiters are determined as follows:



          A change in case within a word: "CamelCase" -> "Camel", "Case". This
          can be disabled by setting splitOnCaseChange="0".



          A transition from alpha to numeric characters or vice versa:
          "Gonzo5000" -> "Gonzo", "5000" "4500XL" -> "4500", "XL". This can be
          disabled by setting splitOnNumerics="0".



          Non-alphanumeric characters (discarded): "hot-spot" -> "hot", "spot"



          A trailing "'s" is removed: "O’Reilly’s" -> "O", "Reilly"



          Any leading or trailing delimiters are discarded: "--hot-spot--" ->
          "hot", "spot"




          If you don't want that behavior, remove the WordDelimiterFilter from your filter list and add other filters to support the part of the WDF behavior that you need.






          share|improve this answer
























          • Non-alphanumeric characters (discarded): "hot-spot" -> "hot", "spot" Is there any way to discard/disable this behavior ?

            – Root
            Nov 13 '18 at 15:10






          • 1





            Not everything at once, but you can use the types parameter with a file that redefines - to be an alphanumeric - - => ALPHANUM. See the types parameter in the source linked above.

            – MatsLindh
            Nov 13 '18 at 15:48














          1












          1








          1







          A Word Delimiter Graph Filter will split on non-alphanumerics (-), case changes, and numbers by default.




          The rules for determining delimiters are determined as follows:



          A change in case within a word: "CamelCase" -> "Camel", "Case". This
          can be disabled by setting splitOnCaseChange="0".



          A transition from alpha to numeric characters or vice versa:
          "Gonzo5000" -> "Gonzo", "5000" "4500XL" -> "4500", "XL". This can be
          disabled by setting splitOnNumerics="0".



          Non-alphanumeric characters (discarded): "hot-spot" -> "hot", "spot"



          A trailing "'s" is removed: "O’Reilly’s" -> "O", "Reilly"



          Any leading or trailing delimiters are discarded: "--hot-spot--" ->
          "hot", "spot"




          If you don't want that behavior, remove the WordDelimiterFilter from your filter list and add other filters to support the part of the WDF behavior that you need.






          share|improve this answer













          A Word Delimiter Graph Filter will split on non-alphanumerics (-), case changes, and numbers by default.




          The rules for determining delimiters are determined as follows:



          A change in case within a word: "CamelCase" -> "Camel", "Case". This
          can be disabled by setting splitOnCaseChange="0".



          A transition from alpha to numeric characters or vice versa:
          "Gonzo5000" -> "Gonzo", "5000" "4500XL" -> "4500", "XL". This can be
          disabled by setting splitOnNumerics="0".



          Non-alphanumeric characters (discarded): "hot-spot" -> "hot", "spot"



          A trailing "'s" is removed: "O’Reilly’s" -> "O", "Reilly"



          Any leading or trailing delimiters are discarded: "--hot-spot--" ->
          "hot", "spot"




          If you don't want that behavior, remove the WordDelimiterFilter from your filter list and add other filters to support the part of the WDF behavior that you need.







          share|improve this answer












          share|improve this answer



          share|improve this answer










          answered Nov 13 '18 at 12:23









          MatsLindhMatsLindh

          24.8k22241




          24.8k22241













          • Non-alphanumeric characters (discarded): "hot-spot" -> "hot", "spot" Is there any way to discard/disable this behavior ?

            – Root
            Nov 13 '18 at 15:10






          • 1





            Not everything at once, but you can use the types parameter with a file that redefines - to be an alphanumeric - - => ALPHANUM. See the types parameter in the source linked above.

            – MatsLindh
            Nov 13 '18 at 15:48



















          • Non-alphanumeric characters (discarded): "hot-spot" -> "hot", "spot" Is there any way to discard/disable this behavior ?

            – Root
            Nov 13 '18 at 15:10






          • 1





            Not everything at once, but you can use the types parameter with a file that redefines - to be an alphanumeric - - => ALPHANUM. See the types parameter in the source linked above.

            – MatsLindh
            Nov 13 '18 at 15:48

















          Non-alphanumeric characters (discarded): "hot-spot" -> "hot", "spot" Is there any way to discard/disable this behavior ?

          – Root
          Nov 13 '18 at 15:10





          Non-alphanumeric characters (discarded): "hot-spot" -> "hot", "spot" Is there any way to discard/disable this behavior ?

          – Root
          Nov 13 '18 at 15:10




          1




          1





          Not everything at once, but you can use the types parameter with a file that redefines - to be an alphanumeric - - => ALPHANUM. See the types parameter in the source linked above.

          – MatsLindh
          Nov 13 '18 at 15:48





          Not everything at once, but you can use the types parameter with a file that redefines - to be an alphanumeric - - => ALPHANUM. See the types parameter in the source linked above.

          – MatsLindh
          Nov 13 '18 at 15:48


















          draft saved

          draft discarded




















































          Thanks for contributing an answer to Stack Overflow!


          • Please be sure to answer the question. Provide details and share your research!

          But avoid



          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.


          To learn more, see our tips on writing great answers.




          draft saved


          draft discarded














          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53280807%2fhow-to-search-the-field-which-could-contains-spaces-and-a-concatenated-number%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          Popular posts from this blog

          Bressuire

          Vorschmack

          Quarantine