Elasticsearch Edge NGram tokenizer higher score when word begins with n-gram











up vote
0
down vote

favorite












Suppose there is the following mapping with Edge NGram Tokenizer:



{
"settings": {
"analysis": {
"analyzer": {
"autocomplete_analyzer": {
"tokenizer": "autocomplete_tokenizer",
"filter": [
"standard"
]
},
"autocomplete_search": {
"tokenizer": "whitespace"
}
},
"tokenizer": {
"autocomplete_tokenizer": {
"type": "edge_ngram",
"min_gram": 1,
"max_gram": 10,
"token_chars": [
"letter",
"symbol"
]
}
}
}
},
"mappings": {
"tag": {
"properties": {
"id": {
"type": "long"
},
"name": {
"type": "text",
"analyzer": "autocomplete_analyzer",
"search_analyzer": "autocomplete_search"
}
}
}
}
}


And the following documents are indexed:



POST /tag/tag/_bulk
{"index":{}}
{"name" : "HITS FIND SOME"}
{"index":{}}
{"name" : "TRENDING HI"}
{"index":{}}
{"name" : "HITS OTHER"}


Then searching



{
"query": {
"match": {
"name": {
"query": "HI"
}
}
}
}


yields all with the same score, or TRENDING - HI with a score higher than one of the others.



How can it be configured, to show with a higher score the entries that actually start with the n-gram? In this case, HITS FIND SOME and HITS OTHER have a higher score than TRENDING HI; at the same time TRENDING HI should be in the results.










share|improve this question


























    up vote
    0
    down vote

    favorite












    Suppose there is the following mapping with Edge NGram Tokenizer:



    {
    "settings": {
    "analysis": {
    "analyzer": {
    "autocomplete_analyzer": {
    "tokenizer": "autocomplete_tokenizer",
    "filter": [
    "standard"
    ]
    },
    "autocomplete_search": {
    "tokenizer": "whitespace"
    }
    },
    "tokenizer": {
    "autocomplete_tokenizer": {
    "type": "edge_ngram",
    "min_gram": 1,
    "max_gram": 10,
    "token_chars": [
    "letter",
    "symbol"
    ]
    }
    }
    }
    },
    "mappings": {
    "tag": {
    "properties": {
    "id": {
    "type": "long"
    },
    "name": {
    "type": "text",
    "analyzer": "autocomplete_analyzer",
    "search_analyzer": "autocomplete_search"
    }
    }
    }
    }
    }


    And the following documents are indexed:



    POST /tag/tag/_bulk
    {"index":{}}
    {"name" : "HITS FIND SOME"}
    {"index":{}}
    {"name" : "TRENDING HI"}
    {"index":{}}
    {"name" : "HITS OTHER"}


    Then searching



    {
    "query": {
    "match": {
    "name": {
    "query": "HI"
    }
    }
    }
    }


    yields all with the same score, or TRENDING - HI with a score higher than one of the others.



    How can it be configured, to show with a higher score the entries that actually start with the n-gram? In this case, HITS FIND SOME and HITS OTHER have a higher score than TRENDING HI; at the same time TRENDING HI should be in the results.










    share|improve this question
























      up vote
      0
      down vote

      favorite









      up vote
      0
      down vote

      favorite











      Suppose there is the following mapping with Edge NGram Tokenizer:



      {
      "settings": {
      "analysis": {
      "analyzer": {
      "autocomplete_analyzer": {
      "tokenizer": "autocomplete_tokenizer",
      "filter": [
      "standard"
      ]
      },
      "autocomplete_search": {
      "tokenizer": "whitespace"
      }
      },
      "tokenizer": {
      "autocomplete_tokenizer": {
      "type": "edge_ngram",
      "min_gram": 1,
      "max_gram": 10,
      "token_chars": [
      "letter",
      "symbol"
      ]
      }
      }
      }
      },
      "mappings": {
      "tag": {
      "properties": {
      "id": {
      "type": "long"
      },
      "name": {
      "type": "text",
      "analyzer": "autocomplete_analyzer",
      "search_analyzer": "autocomplete_search"
      }
      }
      }
      }
      }


      And the following documents are indexed:



      POST /tag/tag/_bulk
      {"index":{}}
      {"name" : "HITS FIND SOME"}
      {"index":{}}
      {"name" : "TRENDING HI"}
      {"index":{}}
      {"name" : "HITS OTHER"}


      Then searching



      {
      "query": {
      "match": {
      "name": {
      "query": "HI"
      }
      }
      }
      }


      yields all with the same score, or TRENDING - HI with a score higher than one of the others.



      How can it be configured, to show with a higher score the entries that actually start with the n-gram? In this case, HITS FIND SOME and HITS OTHER have a higher score than TRENDING HI; at the same time TRENDING HI should be in the results.










      share|improve this question













      Suppose there is the following mapping with Edge NGram Tokenizer:



      {
      "settings": {
      "analysis": {
      "analyzer": {
      "autocomplete_analyzer": {
      "tokenizer": "autocomplete_tokenizer",
      "filter": [
      "standard"
      ]
      },
      "autocomplete_search": {
      "tokenizer": "whitespace"
      }
      },
      "tokenizer": {
      "autocomplete_tokenizer": {
      "type": "edge_ngram",
      "min_gram": 1,
      "max_gram": 10,
      "token_chars": [
      "letter",
      "symbol"
      ]
      }
      }
      }
      },
      "mappings": {
      "tag": {
      "properties": {
      "id": {
      "type": "long"
      },
      "name": {
      "type": "text",
      "analyzer": "autocomplete_analyzer",
      "search_analyzer": "autocomplete_search"
      }
      }
      }
      }
      }


      And the following documents are indexed:



      POST /tag/tag/_bulk
      {"index":{}}
      {"name" : "HITS FIND SOME"}
      {"index":{}}
      {"name" : "TRENDING HI"}
      {"index":{}}
      {"name" : "HITS OTHER"}


      Then searching



      {
      "query": {
      "match": {
      "name": {
      "query": "HI"
      }
      }
      }
      }


      yields all with the same score, or TRENDING - HI with a score higher than one of the others.



      How can it be configured, to show with a higher score the entries that actually start with the n-gram? In this case, HITS FIND SOME and HITS OTHER have a higher score than TRENDING HI; at the same time TRENDING HI should be in the results.







      elasticsearch search n-gram






      share|improve this question













      share|improve this question











      share|improve this question




      share|improve this question










      asked yesterday









      m3th0dman

      5,59833566




      5,59833566
























          1 Answer
          1






          active

          oldest

          votes

















          up vote
          0
          down vote













          In this particular case you could add a match_phrase_prefix term to your query, which does prefix match on the last term in the text:



          {
          "query": {
          "bool": {
          "should": [
          {
          "match": {
          "name": "HI"
          }
          },
          {
          "match_phrase_prefix": {
          "name": "HI"
          }
          }
          ]
          }
          }
          }


          The match term will match on all three results, but the match_phrase_prefix won't match on TRENDING HI. As a result, you'll get all three items in the results, but TRENDING HI will appear with a lower score.



          Quoting the docs:




          The match_phrase_prefix query is a poor-man’s autocomplete[...] For better solutions for search-as-you-type see the completion suggester and Index-Time Search-as-You-Type.




          On a side note, if you're introducing that bool query, you'll probably want to look at the minimum_should_match option, depending on the results you want.






          share|improve this answer























          • But I need TRENDING HI as a result; just with a lower score.
            – m3th0dman
            21 hours ago










          • @m3th0dman the overall results are a combination of matching results for each term, so TRENDING HI will appear in the results, and it will appear with a lower score. Edited the answer to make this clearer.
            – AdrienF
            17 hours ago











          Your Answer






          StackExchange.ifUsing("editor", function () {
          StackExchange.using("externalEditor", function () {
          StackExchange.using("snippets", function () {
          StackExchange.snippets.init();
          });
          });
          }, "code-snippets");

          StackExchange.ready(function() {
          var channelOptions = {
          tags: "".split(" "),
          id: "1"
          };
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function() {
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled) {
          StackExchange.using("snippets", function() {
          createEditor();
          });
          }
          else {
          createEditor();
          }
          });

          function createEditor() {
          StackExchange.prepareEditor({
          heartbeatType: 'answer',
          convertImagesToLinks: true,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: 10,
          bindNavPrevention: true,
          postfix: "",
          imageUploader: {
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          },
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          });


          }
          });














           

          draft saved


          draft discarded


















          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53238598%2felasticsearch-edge-ngram-tokenizer-higher-score-when-word-begins-with-n-gram%23new-answer', 'question_page');
          }
          );

          Post as a guest
































          1 Answer
          1






          active

          oldest

          votes








          1 Answer
          1






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes








          up vote
          0
          down vote













          In this particular case you could add a match_phrase_prefix term to your query, which does prefix match on the last term in the text:



          {
          "query": {
          "bool": {
          "should": [
          {
          "match": {
          "name": "HI"
          }
          },
          {
          "match_phrase_prefix": {
          "name": "HI"
          }
          }
          ]
          }
          }
          }


          The match term will match on all three results, but the match_phrase_prefix won't match on TRENDING HI. As a result, you'll get all three items in the results, but TRENDING HI will appear with a lower score.



          Quoting the docs:




          The match_phrase_prefix query is a poor-man’s autocomplete[...] For better solutions for search-as-you-type see the completion suggester and Index-Time Search-as-You-Type.




          On a side note, if you're introducing that bool query, you'll probably want to look at the minimum_should_match option, depending on the results you want.






          share|improve this answer























          • But I need TRENDING HI as a result; just with a lower score.
            – m3th0dman
            21 hours ago










          • @m3th0dman the overall results are a combination of matching results for each term, so TRENDING HI will appear in the results, and it will appear with a lower score. Edited the answer to make this clearer.
            – AdrienF
            17 hours ago















          up vote
          0
          down vote













          In this particular case you could add a match_phrase_prefix term to your query, which does prefix match on the last term in the text:



          {
          "query": {
          "bool": {
          "should": [
          {
          "match": {
          "name": "HI"
          }
          },
          {
          "match_phrase_prefix": {
          "name": "HI"
          }
          }
          ]
          }
          }
          }


          The match term will match on all three results, but the match_phrase_prefix won't match on TRENDING HI. As a result, you'll get all three items in the results, but TRENDING HI will appear with a lower score.



          Quoting the docs:




          The match_phrase_prefix query is a poor-man’s autocomplete[...] For better solutions for search-as-you-type see the completion suggester and Index-Time Search-as-You-Type.




          On a side note, if you're introducing that bool query, you'll probably want to look at the minimum_should_match option, depending on the results you want.






          share|improve this answer























          • But I need TRENDING HI as a result; just with a lower score.
            – m3th0dman
            21 hours ago










          • @m3th0dman the overall results are a combination of matching results for each term, so TRENDING HI will appear in the results, and it will appear with a lower score. Edited the answer to make this clearer.
            – AdrienF
            17 hours ago













          up vote
          0
          down vote










          up vote
          0
          down vote









          In this particular case you could add a match_phrase_prefix term to your query, which does prefix match on the last term in the text:



          {
          "query": {
          "bool": {
          "should": [
          {
          "match": {
          "name": "HI"
          }
          },
          {
          "match_phrase_prefix": {
          "name": "HI"
          }
          }
          ]
          }
          }
          }


          The match term will match on all three results, but the match_phrase_prefix won't match on TRENDING HI. As a result, you'll get all three items in the results, but TRENDING HI will appear with a lower score.



          Quoting the docs:




          The match_phrase_prefix query is a poor-man’s autocomplete[...] For better solutions for search-as-you-type see the completion suggester and Index-Time Search-as-You-Type.




          On a side note, if you're introducing that bool query, you'll probably want to look at the minimum_should_match option, depending on the results you want.






          share|improve this answer














          In this particular case you could add a match_phrase_prefix term to your query, which does prefix match on the last term in the text:



          {
          "query": {
          "bool": {
          "should": [
          {
          "match": {
          "name": "HI"
          }
          },
          {
          "match_phrase_prefix": {
          "name": "HI"
          }
          }
          ]
          }
          }
          }


          The match term will match on all three results, but the match_phrase_prefix won't match on TRENDING HI. As a result, you'll get all three items in the results, but TRENDING HI will appear with a lower score.



          Quoting the docs:




          The match_phrase_prefix query is a poor-man’s autocomplete[...] For better solutions for search-as-you-type see the completion suggester and Index-Time Search-as-You-Type.




          On a side note, if you're introducing that bool query, you'll probably want to look at the minimum_should_match option, depending on the results you want.







          share|improve this answer














          share|improve this answer



          share|improve this answer








          edited 17 hours ago

























          answered yesterday









          AdrienF

          352113




          352113












          • But I need TRENDING HI as a result; just with a lower score.
            – m3th0dman
            21 hours ago










          • @m3th0dman the overall results are a combination of matching results for each term, so TRENDING HI will appear in the results, and it will appear with a lower score. Edited the answer to make this clearer.
            – AdrienF
            17 hours ago


















          • But I need TRENDING HI as a result; just with a lower score.
            – m3th0dman
            21 hours ago










          • @m3th0dman the overall results are a combination of matching results for each term, so TRENDING HI will appear in the results, and it will appear with a lower score. Edited the answer to make this clearer.
            – AdrienF
            17 hours ago
















          But I need TRENDING HI as a result; just with a lower score.
          – m3th0dman
          21 hours ago




          But I need TRENDING HI as a result; just with a lower score.
          – m3th0dman
          21 hours ago












          @m3th0dman the overall results are a combination of matching results for each term, so TRENDING HI will appear in the results, and it will appear with a lower score. Edited the answer to make this clearer.
          – AdrienF
          17 hours ago




          @m3th0dman the overall results are a combination of matching results for each term, so TRENDING HI will appear in the results, and it will appear with a lower score. Edited the answer to make this clearer.
          – AdrienF
          17 hours ago


















           

          draft saved


          draft discarded



















































           


          draft saved


          draft discarded














          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53238598%2felasticsearch-edge-ngram-tokenizer-higher-score-when-word-begins-with-n-gram%23new-answer', 'question_page');
          }
          );

          Post as a guest




















































































          Popular posts from this blog

          Xamarin.iOS Cant Deploy on Iphone

          Glorious Revolution

          Dulmage-Mendelsohn matrix decomposition in Python