Elasticsearch Edge NGram tokenizer higher score when word begins with n-gram
up vote
0
down vote
favorite
Suppose there is the following mapping with Edge NGram Tokenizer:
{
"settings": {
"analysis": {
"analyzer": {
"autocomplete_analyzer": {
"tokenizer": "autocomplete_tokenizer",
"filter": [
"standard"
]
},
"autocomplete_search": {
"tokenizer": "whitespace"
}
},
"tokenizer": {
"autocomplete_tokenizer": {
"type": "edge_ngram",
"min_gram": 1,
"max_gram": 10,
"token_chars": [
"letter",
"symbol"
]
}
}
}
},
"mappings": {
"tag": {
"properties": {
"id": {
"type": "long"
},
"name": {
"type": "text",
"analyzer": "autocomplete_analyzer",
"search_analyzer": "autocomplete_search"
}
}
}
}
}
And the following documents are indexed:
POST /tag/tag/_bulk
{"index":{}}
{"name" : "HITS FIND SOME"}
{"index":{}}
{"name" : "TRENDING HI"}
{"index":{}}
{"name" : "HITS OTHER"}
Then searching
{
"query": {
"match": {
"name": {
"query": "HI"
}
}
}
}
yields all with the same score, or TRENDING - HI
with a score higher than one of the others.
How can it be configured, to show with a higher score the entries that actually start with the n-gram? In this case, HITS FIND SOME
and HITS OTHER
have a higher score than TRENDING HI
; at the same time TRENDING HI
should be in the results.
elasticsearch search n-gram
add a comment |
up vote
0
down vote
favorite
Suppose there is the following mapping with Edge NGram Tokenizer:
{
"settings": {
"analysis": {
"analyzer": {
"autocomplete_analyzer": {
"tokenizer": "autocomplete_tokenizer",
"filter": [
"standard"
]
},
"autocomplete_search": {
"tokenizer": "whitespace"
}
},
"tokenizer": {
"autocomplete_tokenizer": {
"type": "edge_ngram",
"min_gram": 1,
"max_gram": 10,
"token_chars": [
"letter",
"symbol"
]
}
}
}
},
"mappings": {
"tag": {
"properties": {
"id": {
"type": "long"
},
"name": {
"type": "text",
"analyzer": "autocomplete_analyzer",
"search_analyzer": "autocomplete_search"
}
}
}
}
}
And the following documents are indexed:
POST /tag/tag/_bulk
{"index":{}}
{"name" : "HITS FIND SOME"}
{"index":{}}
{"name" : "TRENDING HI"}
{"index":{}}
{"name" : "HITS OTHER"}
Then searching
{
"query": {
"match": {
"name": {
"query": "HI"
}
}
}
}
yields all with the same score, or TRENDING - HI
with a score higher than one of the others.
How can it be configured, to show with a higher score the entries that actually start with the n-gram? In this case, HITS FIND SOME
and HITS OTHER
have a higher score than TRENDING HI
; at the same time TRENDING HI
should be in the results.
elasticsearch search n-gram
add a comment |
up vote
0
down vote
favorite
up vote
0
down vote
favorite
Suppose there is the following mapping with Edge NGram Tokenizer:
{
"settings": {
"analysis": {
"analyzer": {
"autocomplete_analyzer": {
"tokenizer": "autocomplete_tokenizer",
"filter": [
"standard"
]
},
"autocomplete_search": {
"tokenizer": "whitespace"
}
},
"tokenizer": {
"autocomplete_tokenizer": {
"type": "edge_ngram",
"min_gram": 1,
"max_gram": 10,
"token_chars": [
"letter",
"symbol"
]
}
}
}
},
"mappings": {
"tag": {
"properties": {
"id": {
"type": "long"
},
"name": {
"type": "text",
"analyzer": "autocomplete_analyzer",
"search_analyzer": "autocomplete_search"
}
}
}
}
}
And the following documents are indexed:
POST /tag/tag/_bulk
{"index":{}}
{"name" : "HITS FIND SOME"}
{"index":{}}
{"name" : "TRENDING HI"}
{"index":{}}
{"name" : "HITS OTHER"}
Then searching
{
"query": {
"match": {
"name": {
"query": "HI"
}
}
}
}
yields all with the same score, or TRENDING - HI
with a score higher than one of the others.
How can it be configured, to show with a higher score the entries that actually start with the n-gram? In this case, HITS FIND SOME
and HITS OTHER
have a higher score than TRENDING HI
; at the same time TRENDING HI
should be in the results.
elasticsearch search n-gram
Suppose there is the following mapping with Edge NGram Tokenizer:
{
"settings": {
"analysis": {
"analyzer": {
"autocomplete_analyzer": {
"tokenizer": "autocomplete_tokenizer",
"filter": [
"standard"
]
},
"autocomplete_search": {
"tokenizer": "whitespace"
}
},
"tokenizer": {
"autocomplete_tokenizer": {
"type": "edge_ngram",
"min_gram": 1,
"max_gram": 10,
"token_chars": [
"letter",
"symbol"
]
}
}
}
},
"mappings": {
"tag": {
"properties": {
"id": {
"type": "long"
},
"name": {
"type": "text",
"analyzer": "autocomplete_analyzer",
"search_analyzer": "autocomplete_search"
}
}
}
}
}
And the following documents are indexed:
POST /tag/tag/_bulk
{"index":{}}
{"name" : "HITS FIND SOME"}
{"index":{}}
{"name" : "TRENDING HI"}
{"index":{}}
{"name" : "HITS OTHER"}
Then searching
{
"query": {
"match": {
"name": {
"query": "HI"
}
}
}
}
yields all with the same score, or TRENDING - HI
with a score higher than one of the others.
How can it be configured, to show with a higher score the entries that actually start with the n-gram? In this case, HITS FIND SOME
and HITS OTHER
have a higher score than TRENDING HI
; at the same time TRENDING HI
should be in the results.
elasticsearch search n-gram
elasticsearch search n-gram
asked yesterday
m3th0dman
5,59833566
5,59833566
add a comment |
add a comment |
1 Answer
1
active
oldest
votes
up vote
0
down vote
In this particular case you could add a match_phrase_prefix
term to your query, which does prefix match on the last term in the text:
{
"query": {
"bool": {
"should": [
{
"match": {
"name": "HI"
}
},
{
"match_phrase_prefix": {
"name": "HI"
}
}
]
}
}
}
The match
term will match on all three results, but the match_phrase_prefix
won't match on TRENDING HI
. As a result, you'll get all three items in the results, but TRENDING HI
will appear with a lower score.
Quoting the docs:
The match_phrase_prefix query is a poor-man’s autocomplete[...] For better solutions for search-as-you-type see the completion suggester and Index-Time Search-as-You-Type.
On a side note, if you're introducing that bool
query, you'll probably want to look at the minimum_should_match
option, depending on the results you want.
But I needTRENDING HI
as a result; just with a lower score.
– m3th0dman
21 hours ago
@m3th0dman the overall results are a combination of matching results for each term, soTRENDING HI
will appear in the results, and it will appear with a lower score. Edited the answer to make this clearer.
– AdrienF
17 hours ago
add a comment |
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
0
down vote
In this particular case you could add a match_phrase_prefix
term to your query, which does prefix match on the last term in the text:
{
"query": {
"bool": {
"should": [
{
"match": {
"name": "HI"
}
},
{
"match_phrase_prefix": {
"name": "HI"
}
}
]
}
}
}
The match
term will match on all three results, but the match_phrase_prefix
won't match on TRENDING HI
. As a result, you'll get all three items in the results, but TRENDING HI
will appear with a lower score.
Quoting the docs:
The match_phrase_prefix query is a poor-man’s autocomplete[...] For better solutions for search-as-you-type see the completion suggester and Index-Time Search-as-You-Type.
On a side note, if you're introducing that bool
query, you'll probably want to look at the minimum_should_match
option, depending on the results you want.
But I needTRENDING HI
as a result; just with a lower score.
– m3th0dman
21 hours ago
@m3th0dman the overall results are a combination of matching results for each term, soTRENDING HI
will appear in the results, and it will appear with a lower score. Edited the answer to make this clearer.
– AdrienF
17 hours ago
add a comment |
up vote
0
down vote
In this particular case you could add a match_phrase_prefix
term to your query, which does prefix match on the last term in the text:
{
"query": {
"bool": {
"should": [
{
"match": {
"name": "HI"
}
},
{
"match_phrase_prefix": {
"name": "HI"
}
}
]
}
}
}
The match
term will match on all three results, but the match_phrase_prefix
won't match on TRENDING HI
. As a result, you'll get all three items in the results, but TRENDING HI
will appear with a lower score.
Quoting the docs:
The match_phrase_prefix query is a poor-man’s autocomplete[...] For better solutions for search-as-you-type see the completion suggester and Index-Time Search-as-You-Type.
On a side note, if you're introducing that bool
query, you'll probably want to look at the minimum_should_match
option, depending on the results you want.
But I needTRENDING HI
as a result; just with a lower score.
– m3th0dman
21 hours ago
@m3th0dman the overall results are a combination of matching results for each term, soTRENDING HI
will appear in the results, and it will appear with a lower score. Edited the answer to make this clearer.
– AdrienF
17 hours ago
add a comment |
up vote
0
down vote
up vote
0
down vote
In this particular case you could add a match_phrase_prefix
term to your query, which does prefix match on the last term in the text:
{
"query": {
"bool": {
"should": [
{
"match": {
"name": "HI"
}
},
{
"match_phrase_prefix": {
"name": "HI"
}
}
]
}
}
}
The match
term will match on all three results, but the match_phrase_prefix
won't match on TRENDING HI
. As a result, you'll get all three items in the results, but TRENDING HI
will appear with a lower score.
Quoting the docs:
The match_phrase_prefix query is a poor-man’s autocomplete[...] For better solutions for search-as-you-type see the completion suggester and Index-Time Search-as-You-Type.
On a side note, if you're introducing that bool
query, you'll probably want to look at the minimum_should_match
option, depending on the results you want.
In this particular case you could add a match_phrase_prefix
term to your query, which does prefix match on the last term in the text:
{
"query": {
"bool": {
"should": [
{
"match": {
"name": "HI"
}
},
{
"match_phrase_prefix": {
"name": "HI"
}
}
]
}
}
}
The match
term will match on all three results, but the match_phrase_prefix
won't match on TRENDING HI
. As a result, you'll get all three items in the results, but TRENDING HI
will appear with a lower score.
Quoting the docs:
The match_phrase_prefix query is a poor-man’s autocomplete[...] For better solutions for search-as-you-type see the completion suggester and Index-Time Search-as-You-Type.
On a side note, if you're introducing that bool
query, you'll probably want to look at the minimum_should_match
option, depending on the results you want.
edited 17 hours ago
answered yesterday
AdrienF
352113
352113
But I needTRENDING HI
as a result; just with a lower score.
– m3th0dman
21 hours ago
@m3th0dman the overall results are a combination of matching results for each term, soTRENDING HI
will appear in the results, and it will appear with a lower score. Edited the answer to make this clearer.
– AdrienF
17 hours ago
add a comment |
But I needTRENDING HI
as a result; just with a lower score.
– m3th0dman
21 hours ago
@m3th0dman the overall results are a combination of matching results for each term, soTRENDING HI
will appear in the results, and it will appear with a lower score. Edited the answer to make this clearer.
– AdrienF
17 hours ago
But I need
TRENDING HI
as a result; just with a lower score.– m3th0dman
21 hours ago
But I need
TRENDING HI
as a result; just with a lower score.– m3th0dman
21 hours ago
@m3th0dman the overall results are a combination of matching results for each term, so
TRENDING HI
will appear in the results, and it will appear with a lower score. Edited the answer to make this clearer.– AdrienF
17 hours ago
@m3th0dman the overall results are a combination of matching results for each term, so
TRENDING HI
will appear in the results, and it will appear with a lower score. Edited the answer to make this clearer.– AdrienF
17 hours ago
add a comment |
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53238598%2felasticsearch-edge-ngram-tokenizer-higher-score-when-word-begins-with-n-gram%23new-answer', 'question_page');
}
);
Post as a guest
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password