Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] reverse nested context,while shifting from nested to root and root to nested path. #17109

Open
Shivacharangoud opened this issue Jan 24, 2025 · 6 comments
Labels
enhancement Enhancement or improvement to existing feature or request Search:Aggregations

Comments

@Shivacharangoud
Copy link

Shivacharangoud commented Jan 24, 2025

Describe the bug

mapping:

PUT reverse_nested
{
    "mappings": {
      "properties": {
        "priority": {
          "type": "long"
        },
        "teams": {
          "type":"nested",
          "properties": {
            "hours": {
              "type": "long"
            },
            "name": {
              "type": "text",
              "fields": {
                "keyword": {
                  "type": "keyword",
                  "ignore_above": 256
                }
              }
            }
          }
        }
      }
    }
  }

**data:**

PUT reverse_nested/_doc/1
{
  "priority":1,
  "teams":[
    {"name":"team1", "hours":10},
    {"name":"team2", "hours":20}
  ]
}

**query:**

GET reverse_nested/_search
{
  "aggs": {
    "group_by_team": {
      "nested": {
        "path": "teams"
      },
      "aggs": {
        "group_by_teamm": {
          "terms": {
            "field": "teams.name.keyword",
            "size": 10
          },
          "aggs": {
            "sum_of_hours": {
              "sum": {
                "field": "teams.hours"
              }
            },
            "reverse_to_base":{
              "reverse_nested": {
              
              },
              "aggs": {
                "group_by_priority": {
                  "terms": {
                    "field": "priority"
                  },
                  "aggs":{
                    "nested_again":{
                      "nested": {
                        "path": "teams"
                      },
                      "aggs": {
                        "sum_of_hours": {
                          "sum": {
                            "field": "teams.hours"
                          }
                        }
                      }
                    }
                  }
                }
              }
            }
          }
        }
      }
    }
  }
}

**and the response:**
{
  "took": 10,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 1,
      "relation": "eq"
    },
    "max_score": 1,
    "hits": [
      {
        "_index": "reverse_nested",
        "_id": "1",
        "_score": 1,
        "_source": {
          "priority": 1,
          "teams": [
            {
              "name": "team1",
              "hours": 10
            },
            {
              "name": "team2",
              "hours": 20
            }
          ]
        }
      }
    ]
  },
  "aggregations": {
    "group_by_team": {
      "doc_count": 2,
      "group_by_teamm": {
        "doc_count_error_upper_bound": 0,
        "sum_other_doc_count": 0,
        "buckets": [
          {
            "key": "team1",
            "doc_count": 1,
            "reverse_to_base": {
              "doc_count": 1,
              "group_by_priority": {
                "doc_count_error_upper_bound": 0,
                "sum_other_doc_count": 0,
                "buckets": [
                  {
                    "key": 1,
                    "doc_count": 1,
                    "nested_again": {
                      "doc_count": 2,
                      "sum_of_hours": {
                        "value": 30
                      }
                    }
                  }
                ]
              }
            },
            "sum_of_hours": {
              "value": 10
            }
          },
          {
            "key": "team2",
            "doc_count": 1,
            "reverse_to_base": {
              "doc_count": 1,
              "group_by_priority": {
                "doc_count_error_upper_bound": 0,
                "sum_other_doc_count": 0,
                "buckets": [
                  {
                    "key": 1,
                    "doc_count": 1,
                    "nested_again": {
                      "doc_count": 2,
                      "sum_of_hours": {
                        "value": 30
                      }
                    }
                  }
                ]
              }
            },
            "sum_of_hours": {
              "value": 20
            }
          }
        ]
      }
    }
  }
}

Let's try to produce a report that gives number of hours by team by priority. Doing nested aggregation on teams.name and then reverse nested to group on priority and then nested to sum teams.hours double-counts hours because second nesting on teams knows nothing about upstream nesting as it is executed in context of request and as result it will lump hours for each team on request under the top level team aggregation. sum of hour under team is 10 and 20 but when we group again with priority( under each team ) and perform sum of hour its 30 means its considering two nested team objects. how to solve this issue?

Related component

Search:Aggregations

To Reproduce

mapping:

PUT reverse_nested
{
    "mappings": {
      "properties": {
        "priority": {
          "type": "long"
        },
        "teams": {
          "type":"nested",
          "properties": {
            "hours": {
              "type": "long"
            },
            "name": {
              "type": "text",
              "fields": {
                "keyword": {
                  "type": "keyword",
                  "ignore_above": 256
                }
              }
            }
          }
        }
      }
    }
  }

**data:**

PUT reverse_nested/_doc/1
{
  "priority":1,
  "teams":[
    {"name":"team1", "hours":10},
    {"name":"team2", "hours":20}
  ]
}

**query:**

GET reverse_nested/_search
{
  "aggs": {
    "group_by_team": {
      "nested": {
        "path": "teams"
      },
      "aggs": {
        "group_by_teamm": {
          "terms": {
            "field": "teams.name.keyword",
            "size": 10
          },
          "aggs": {
            "sum_of_hours": {
              "sum": {
                "field": "teams.hours"
              }
            },
            "reverse_to_base":{
              "reverse_nested": {
              
              },
              "aggs": {
                "group_by_priority": {
                  "terms": {
                    "field": "priority"
                  },
                  "aggs":{
                    "nested_again":{
                      "nested": {
                        "path": "teams"
                      },
                      "aggs": {
                        "sum_of_hours": {
                          "sum": {
                            "field": "teams.hours"
                          }
                        }
                      }
                    }
                  }
                }
              }
            }
          }
        }
      }
    }
  }
}

Expected behavior

expected result:

{
  "took": 10,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 1,
      "relation": "eq"
    },
    "max_score": 1,
    "hits": [
      {
        "_index": "reverse_nested",
        "_id": "1",
        "_score": 1,
        "_source": {
          "priority": 1,
          "teams": [
            {
              "name": "team1",
              "hours": 10
            },
            {
              "name": "team2",
              "hours": 20
            }
          ]
        }
      }
    ]
  },
  "aggregations": {
    "group_by_team": {
      "doc_count": 2,
      "group_by_teamm": {
        "doc_count_error_upper_bound": 0,
        "sum_other_doc_count": 0,
        "buckets": [
          {
            "key": "team1",
            "doc_count": 1,
            "reverse_to_base": {
              "doc_count": 1,
              "group_by_priority": {
                "doc_count_error_upper_bound": 0,
                "sum_other_doc_count": 0,
                "buckets": [
                  {
                    "key": 1,
                    "doc_count": 1,
                    "nested_again": {
                      "doc_count": 2,
                      "sum_of_hours": {
                        "value": 10
                      }
                    }
                  }
                ]
              }
            },
            "sum_of_hours": {
              "value": 10
            }
          },
          {
            "key": "team2",
            "doc_count": 1,
            "reverse_to_base": {
              "doc_count": 1,
              "group_by_priority": {
                "doc_count_error_upper_bound": 0,
                "sum_other_doc_count": 0,
                "buckets": [
                  {
                    "key": 1,
                    "doc_count": 1,
                    "nested_again": {
                      "doc_count": 2,
                      "sum_of_hours": {
                        "value": 20
                      }
                    }
                  }
                ]
              }
            },
            "sum_of_hours": {
              "value": 20
            }
          }
        ]
      }
    }
  }
}

Additional Details

  • OS: aws mananged opensearch service.
  • Version : 2.1
@kkewwei
Copy link
Contributor

kkewwei commented Jan 27, 2025

hours double-counts hours because second nesting on teams knows nothing about upstream nesting.

@Shivacharangoud, reverse_nested considers one case, shifting from nested to root, but not including the root to nested again, in this case, the context will be new.

Should we support this? please help evaluate. @msfroh

@Shivacharangoud
Copy link
Author

@kkewwei Thank you for your response. I understand that reverse_nested works for shifting from nested to root but does not maintain the context when shifting back to nested. However, maintaining consistent context across such transitions is critical for accurate aggregations in cases like this.

Is there any alternative solution or approach that could achieve the desired outcome. For instance, could a different query structure or using any other features of aggregations?. I feel it as a basic requirement.

Looking forward to your thoughts or suggestions on how this could be resolved. Thank you!

@Shivacharangoud
Copy link
Author

@kkewwei , @msfroh please respond.

@msfroh
Copy link
Collaborator

msfroh commented Jan 29, 2025

@kkewwei , @msfroh please respond.

Sorry -- we're in the midst of the code freeze for OpenSearch 2.19 and my days are spent reviewing pull requests from across the OpenSearch community. I am not able to jump to attention at this time.

Also, it's Chinese New Year, so I believe kkewwei is not working this week.

@sandeshkr419 sandeshkr419 added enhancement Enhancement or improvement to existing feature or request and removed untriaged bug Something isn't working labels Jan 29, 2025
@kkewwei
Copy link
Contributor

kkewwei commented Feb 2, 2025

Yes, It's at the end of the Chinese New Year.

@Shivacharangoud, We need to use nested type to preserve a nested object, there seems no alternative solution.

Maybe we can support reverse_nested with maintaining the context. if you have no time, I'd be pleasure to have a try.

@Shivacharangoud
Copy link
Author

@kkewwei Thanks for the response! maintain context in reverse nested is a common use case, and having such support would make aggregations much more intuitive.
If there's a possibility of adding this enhancement, that would be great! I appreciate your time and efforts on this. Looking forward to any updates or suggestions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Enhancement or improvement to existing feature or request Search:Aggregations
Projects
Status: 🆕 New
Development

No branches or pull requests

4 participants