Elasticsearch aggregation on values in nested list (array)
I have stored some values in Elasticsearch nested data type (an array) but without using key/value pair. An example record would be:
{
"categories": [
"Category1",
"Category2"
],
"product_name": "productx"
}
Now I want to run aggregation query to find out unique list of categories available. But all the examples I've seen pointed to mapping that has key/value. Is there any way I can use above schema as is or do I need to change my schema to something like this to run aggregation query
{
"categories": [
{"name": "Category1"},
{"name": "Category2"}
],
"product_name": "productx"
}
elasticsearch elasticsearch-aggregation
add a comment |
I have stored some values in Elasticsearch nested data type (an array) but without using key/value pair. An example record would be:
{
"categories": [
"Category1",
"Category2"
],
"product_name": "productx"
}
Now I want to run aggregation query to find out unique list of categories available. But all the examples I've seen pointed to mapping that has key/value. Is there any way I can use above schema as is or do I need to change my schema to something like this to run aggregation query
{
"categories": [
{"name": "Category1"},
{"name": "Category2"}
],
"product_name": "productx"
}
elasticsearch elasticsearch-aggregation
add a comment |
I have stored some values in Elasticsearch nested data type (an array) but without using key/value pair. An example record would be:
{
"categories": [
"Category1",
"Category2"
],
"product_name": "productx"
}
Now I want to run aggregation query to find out unique list of categories available. But all the examples I've seen pointed to mapping that has key/value. Is there any way I can use above schema as is or do I need to change my schema to something like this to run aggregation query
{
"categories": [
{"name": "Category1"},
{"name": "Category2"}
],
"product_name": "productx"
}
elasticsearch elasticsearch-aggregation
I have stored some values in Elasticsearch nested data type (an array) but without using key/value pair. An example record would be:
{
"categories": [
"Category1",
"Category2"
],
"product_name": "productx"
}
Now I want to run aggregation query to find out unique list of categories available. But all the examples I've seen pointed to mapping that has key/value. Is there any way I can use above schema as is or do I need to change my schema to something like this to run aggregation query
{
"categories": [
{"name": "Category1"},
{"name": "Category2"}
],
"product_name": "productx"
}
elasticsearch elasticsearch-aggregation
elasticsearch elasticsearch-aggregation
asked Nov 16 '18 at 3:24
Sameera GodakandaSameera Godakanda
83
83
add a comment |
add a comment |
1 Answer
1
active
oldest
votes
Well regarding JSON structure, you need to take a step back and figure out if you'd want list
or key-value
pairs.
Looking at your example, I don't think you need key-value
pairs but again its something you may want to clarify by understanding your domain if there'd be some more properties for categories
.
Regarding aggregation, as far as I know, aggregations
would work on any valid JSON structure.
For the data you've mentioned, you can make use of the below aggregation
query. Also I'm assuming the fields are of type keyword
.
Aggregation Query
POST <your_index_name>/_search
{
"size": 0,
"aggs": {
"myaggs": {
"terms": {
"size": 100,
"script": {
"inline": """
def myString = "";
def list = new ArrayList();
for(int i=0; i<doc['categories'].length; i++){
myString = doc['categories'][i] + ", " + doc['product'].value;
list.add(myString);
}
return list;
"""
}
}
}
}
}
Aggregation Response
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 0,
"hits":
},
"aggregations": {
"myaggs": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "category1, productx",
"doc_count": 1
},
{
"key": "category2, productx",
"doc_count": 1
}
]
}
}
}
Hope it helps!
1
I read more about array data type. If I use nested type, I cannot add array type field in to nested field and my example wouldn't work. In my original mapping (since I've created categories field dynamically) it was set to a text field and keyword is the first element of the array so aggression wouldn't work. I guess I will settle with key/val pair and nested data type.
– Sameera Godakanda
Nov 16 '18 at 8:56
The aggregation I've provided works if you replacecategories
withcategories.keyword
. However yes, I'd suggest you to play around withnested
andnon-nested
fields, queries to understand how they work and try to see how they'd match your requirements and use-cases and make changes accordingly. Also note that for nested type, you would required to make use ofnested aggregations
.
– Kamal
Nov 16 '18 at 9:19
1
Yes. I used the keyword before. however your query above missing size: parameter inside "terms": { so it was only showing 10 results. Apparently you cannot use size:0 and had to use some large value there.
– Sameera Godakanda
Nov 16 '18 at 11:18
That's good observation. Yes. Aggregation for terms only shows top 10 buckets. You need to add size manually to get larger list. Thanks for bringing this up, I've updated my answer accordingly. Added 100 to the size. :|
– Kamal
Nov 16 '18 at 11:41
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53330971%2felasticsearch-aggregation-on-values-in-nested-list-array%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
Well regarding JSON structure, you need to take a step back and figure out if you'd want list
or key-value
pairs.
Looking at your example, I don't think you need key-value
pairs but again its something you may want to clarify by understanding your domain if there'd be some more properties for categories
.
Regarding aggregation, as far as I know, aggregations
would work on any valid JSON structure.
For the data you've mentioned, you can make use of the below aggregation
query. Also I'm assuming the fields are of type keyword
.
Aggregation Query
POST <your_index_name>/_search
{
"size": 0,
"aggs": {
"myaggs": {
"terms": {
"size": 100,
"script": {
"inline": """
def myString = "";
def list = new ArrayList();
for(int i=0; i<doc['categories'].length; i++){
myString = doc['categories'][i] + ", " + doc['product'].value;
list.add(myString);
}
return list;
"""
}
}
}
}
}
Aggregation Response
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 0,
"hits":
},
"aggregations": {
"myaggs": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "category1, productx",
"doc_count": 1
},
{
"key": "category2, productx",
"doc_count": 1
}
]
}
}
}
Hope it helps!
1
I read more about array data type. If I use nested type, I cannot add array type field in to nested field and my example wouldn't work. In my original mapping (since I've created categories field dynamically) it was set to a text field and keyword is the first element of the array so aggression wouldn't work. I guess I will settle with key/val pair and nested data type.
– Sameera Godakanda
Nov 16 '18 at 8:56
The aggregation I've provided works if you replacecategories
withcategories.keyword
. However yes, I'd suggest you to play around withnested
andnon-nested
fields, queries to understand how they work and try to see how they'd match your requirements and use-cases and make changes accordingly. Also note that for nested type, you would required to make use ofnested aggregations
.
– Kamal
Nov 16 '18 at 9:19
1
Yes. I used the keyword before. however your query above missing size: parameter inside "terms": { so it was only showing 10 results. Apparently you cannot use size:0 and had to use some large value there.
– Sameera Godakanda
Nov 16 '18 at 11:18
That's good observation. Yes. Aggregation for terms only shows top 10 buckets. You need to add size manually to get larger list. Thanks for bringing this up, I've updated my answer accordingly. Added 100 to the size. :|
– Kamal
Nov 16 '18 at 11:41
add a comment |
Well regarding JSON structure, you need to take a step back and figure out if you'd want list
or key-value
pairs.
Looking at your example, I don't think you need key-value
pairs but again its something you may want to clarify by understanding your domain if there'd be some more properties for categories
.
Regarding aggregation, as far as I know, aggregations
would work on any valid JSON structure.
For the data you've mentioned, you can make use of the below aggregation
query. Also I'm assuming the fields are of type keyword
.
Aggregation Query
POST <your_index_name>/_search
{
"size": 0,
"aggs": {
"myaggs": {
"terms": {
"size": 100,
"script": {
"inline": """
def myString = "";
def list = new ArrayList();
for(int i=0; i<doc['categories'].length; i++){
myString = doc['categories'][i] + ", " + doc['product'].value;
list.add(myString);
}
return list;
"""
}
}
}
}
}
Aggregation Response
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 0,
"hits":
},
"aggregations": {
"myaggs": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "category1, productx",
"doc_count": 1
},
{
"key": "category2, productx",
"doc_count": 1
}
]
}
}
}
Hope it helps!
1
I read more about array data type. If I use nested type, I cannot add array type field in to nested field and my example wouldn't work. In my original mapping (since I've created categories field dynamically) it was set to a text field and keyword is the first element of the array so aggression wouldn't work. I guess I will settle with key/val pair and nested data type.
– Sameera Godakanda
Nov 16 '18 at 8:56
The aggregation I've provided works if you replacecategories
withcategories.keyword
. However yes, I'd suggest you to play around withnested
andnon-nested
fields, queries to understand how they work and try to see how they'd match your requirements and use-cases and make changes accordingly. Also note that for nested type, you would required to make use ofnested aggregations
.
– Kamal
Nov 16 '18 at 9:19
1
Yes. I used the keyword before. however your query above missing size: parameter inside "terms": { so it was only showing 10 results. Apparently you cannot use size:0 and had to use some large value there.
– Sameera Godakanda
Nov 16 '18 at 11:18
That's good observation. Yes. Aggregation for terms only shows top 10 buckets. You need to add size manually to get larger list. Thanks for bringing this up, I've updated my answer accordingly. Added 100 to the size. :|
– Kamal
Nov 16 '18 at 11:41
add a comment |
Well regarding JSON structure, you need to take a step back and figure out if you'd want list
or key-value
pairs.
Looking at your example, I don't think you need key-value
pairs but again its something you may want to clarify by understanding your domain if there'd be some more properties for categories
.
Regarding aggregation, as far as I know, aggregations
would work on any valid JSON structure.
For the data you've mentioned, you can make use of the below aggregation
query. Also I'm assuming the fields are of type keyword
.
Aggregation Query
POST <your_index_name>/_search
{
"size": 0,
"aggs": {
"myaggs": {
"terms": {
"size": 100,
"script": {
"inline": """
def myString = "";
def list = new ArrayList();
for(int i=0; i<doc['categories'].length; i++){
myString = doc['categories'][i] + ", " + doc['product'].value;
list.add(myString);
}
return list;
"""
}
}
}
}
}
Aggregation Response
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 0,
"hits":
},
"aggregations": {
"myaggs": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "category1, productx",
"doc_count": 1
},
{
"key": "category2, productx",
"doc_count": 1
}
]
}
}
}
Hope it helps!
Well regarding JSON structure, you need to take a step back and figure out if you'd want list
or key-value
pairs.
Looking at your example, I don't think you need key-value
pairs but again its something you may want to clarify by understanding your domain if there'd be some more properties for categories
.
Regarding aggregation, as far as I know, aggregations
would work on any valid JSON structure.
For the data you've mentioned, you can make use of the below aggregation
query. Also I'm assuming the fields are of type keyword
.
Aggregation Query
POST <your_index_name>/_search
{
"size": 0,
"aggs": {
"myaggs": {
"terms": {
"size": 100,
"script": {
"inline": """
def myString = "";
def list = new ArrayList();
for(int i=0; i<doc['categories'].length; i++){
myString = doc['categories'][i] + ", " + doc['product'].value;
list.add(myString);
}
return list;
"""
}
}
}
}
}
Aggregation Response
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 0,
"hits":
},
"aggregations": {
"myaggs": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "category1, productx",
"doc_count": 1
},
{
"key": "category2, productx",
"doc_count": 1
}
]
}
}
}
Hope it helps!
edited Nov 16 '18 at 11:41
answered Nov 16 '18 at 7:03
KamalKamal
2,31711022
2,31711022
1
I read more about array data type. If I use nested type, I cannot add array type field in to nested field and my example wouldn't work. In my original mapping (since I've created categories field dynamically) it was set to a text field and keyword is the first element of the array so aggression wouldn't work. I guess I will settle with key/val pair and nested data type.
– Sameera Godakanda
Nov 16 '18 at 8:56
The aggregation I've provided works if you replacecategories
withcategories.keyword
. However yes, I'd suggest you to play around withnested
andnon-nested
fields, queries to understand how they work and try to see how they'd match your requirements and use-cases and make changes accordingly. Also note that for nested type, you would required to make use ofnested aggregations
.
– Kamal
Nov 16 '18 at 9:19
1
Yes. I used the keyword before. however your query above missing size: parameter inside "terms": { so it was only showing 10 results. Apparently you cannot use size:0 and had to use some large value there.
– Sameera Godakanda
Nov 16 '18 at 11:18
That's good observation. Yes. Aggregation for terms only shows top 10 buckets. You need to add size manually to get larger list. Thanks for bringing this up, I've updated my answer accordingly. Added 100 to the size. :|
– Kamal
Nov 16 '18 at 11:41
add a comment |
1
I read more about array data type. If I use nested type, I cannot add array type field in to nested field and my example wouldn't work. In my original mapping (since I've created categories field dynamically) it was set to a text field and keyword is the first element of the array so aggression wouldn't work. I guess I will settle with key/val pair and nested data type.
– Sameera Godakanda
Nov 16 '18 at 8:56
The aggregation I've provided works if you replacecategories
withcategories.keyword
. However yes, I'd suggest you to play around withnested
andnon-nested
fields, queries to understand how they work and try to see how they'd match your requirements and use-cases and make changes accordingly. Also note that for nested type, you would required to make use ofnested aggregations
.
– Kamal
Nov 16 '18 at 9:19
1
Yes. I used the keyword before. however your query above missing size: parameter inside "terms": { so it was only showing 10 results. Apparently you cannot use size:0 and had to use some large value there.
– Sameera Godakanda
Nov 16 '18 at 11:18
That's good observation. Yes. Aggregation for terms only shows top 10 buckets. You need to add size manually to get larger list. Thanks for bringing this up, I've updated my answer accordingly. Added 100 to the size. :|
– Kamal
Nov 16 '18 at 11:41
1
1
I read more about array data type. If I use nested type, I cannot add array type field in to nested field and my example wouldn't work. In my original mapping (since I've created categories field dynamically) it was set to a text field and keyword is the first element of the array so aggression wouldn't work. I guess I will settle with key/val pair and nested data type.
– Sameera Godakanda
Nov 16 '18 at 8:56
I read more about array data type. If I use nested type, I cannot add array type field in to nested field and my example wouldn't work. In my original mapping (since I've created categories field dynamically) it was set to a text field and keyword is the first element of the array so aggression wouldn't work. I guess I will settle with key/val pair and nested data type.
– Sameera Godakanda
Nov 16 '18 at 8:56
The aggregation I've provided works if you replace
categories
with categories.keyword
. However yes, I'd suggest you to play around with nested
and non-nested
fields, queries to understand how they work and try to see how they'd match your requirements and use-cases and make changes accordingly. Also note that for nested type, you would required to make use of nested aggregations
.– Kamal
Nov 16 '18 at 9:19
The aggregation I've provided works if you replace
categories
with categories.keyword
. However yes, I'd suggest you to play around with nested
and non-nested
fields, queries to understand how they work and try to see how they'd match your requirements and use-cases and make changes accordingly. Also note that for nested type, you would required to make use of nested aggregations
.– Kamal
Nov 16 '18 at 9:19
1
1
Yes. I used the keyword before. however your query above missing size: parameter inside "terms": { so it was only showing 10 results. Apparently you cannot use size:0 and had to use some large value there.
– Sameera Godakanda
Nov 16 '18 at 11:18
Yes. I used the keyword before. however your query above missing size: parameter inside "terms": { so it was only showing 10 results. Apparently you cannot use size:0 and had to use some large value there.
– Sameera Godakanda
Nov 16 '18 at 11:18
That's good observation. Yes. Aggregation for terms only shows top 10 buckets. You need to add size manually to get larger list. Thanks for bringing this up, I've updated my answer accordingly. Added 100 to the size. :|
– Kamal
Nov 16 '18 at 11:41
That's good observation. Yes. Aggregation for terms only shows top 10 buckets. You need to add size manually to get larger list. Thanks for bringing this up, I've updated my answer accordingly. Added 100 to the size. :|
– Kamal
Nov 16 '18 at 11:41
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53330971%2felasticsearch-aggregation-on-values-in-nested-list-array%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown