Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Incomplete results with search_after and multiple shards #14824

Open
TatianaNeuer opened this issue Jul 18, 2024 · 1 comment · May be fixed by #14852
Open

[BUG] Incomplete results with search_after and multiple shards #14824

TatianaNeuer opened this issue Jul 18, 2024 · 1 comment · May be fixed by #14852
Labels
bug Something isn't working Search Search query, autocomplete ...etc

Comments

@TatianaNeuer
Copy link

Describe the bug

In some cases, a search request with "search_after" and "track_total_hits=false" does not return all expected documents, some documents are missing.

Related component

Search

To Reproduce

  1. Create an index with 10 shards (a different number of shards might not trigger the bug):
    PUT /test_index
{
    "settings": {
        "index": {
            "number_of_shards": 10
        }
    }
}
  1. Index some documents (putting different _id might not trigger the bug):
    POST /_bulk
{ "index": { "_index": "test_index", "_id": "test_index-id-doc1" } }
{"docNb": "doc1","name": "bob"}
{ "index": { "_index": "test_index", "_id": "test_index-id-doc2" } }
{"docNb": "doc2","name": ""}
{ "index": { "_index": "test_index", "_id": "test_index-id-doc3" } }
{"docNb": "doc3"}
{ "index": { "_index": "test_index", "_id": "test_index-id-doc4" } }
{"docNb": "doc4","name": "ana"}
{ "index": { "_index": "test_index", "_id": "test_index-id-doc5" } }
{"docNb": "doc5","name": ""}
{ "index": { "_index": "test_index", "_id": "test_index-id-doc6" } }
{"docNb": "doc6"}
{ "index": { "_index": "test_index", "_id": "test_index-id-doc7" } }
{"docNb": "doc7"}
{ "index": { "_index": "test_index", "_id": "test_index-id-doc8" } }
{"docNb": "doc8"}
{ "index": { "_index": "test_index", "_id": "test_index-id-doc9" } }
{"docNb": "doc9","name": ""}
{ "index": { "_index": "test_index", "_id": "test_index-id-doc10" } }
{"docNb": "doc10","name": ""}

  1. Search documents:
    GET /test_index/_search
{
    "size": 20,
    "track_total_hits": false,
    "sort": [
        {
            "name.keyword": {
                "order": "desc"
            }
        },
        {
            "docNb.keyword": {
                "order": "asc"
            }
        }
    ],
    "search_after": [
        "ana",
        "doc4"
    ]
}
  1. The result contains 7 documents instead of 8. Execute the previous request with "track_total_hits": true and the results contains the correct number of documents.

Expected behavior

A request with "search_after" and "track_total_hits:false" should return the correct number of documents.

Additional Details

Host/Environment (please complete the following information):

  • OS: Windows 10 with WSL2 and docker
  • Version : docker image: opensearchproject/opensearch:2.15.0
  • 1 opensearch node run with the following docker compose file:
version: '3'
services:
  opensearch:
    image: opensearchproject/opensearch:2.15.0
    container_name: opensearch
    environment:
      - cluster.name=opensearch-cluster # Name the cluster
      - node.name=opensearch # Name the node that will run in this container
      - discovery.seed_hosts=opensearch # Nodes to look for when discovering the cluster
      - cluster.initial_cluster_manager_nodes=opensearch # Nodes eligibile to serve as cluster manager
      - bootstrap.memory_lock=true # Disable JVM heap memory swapping
      - "OPENSEARCH_JAVA_OPTS=-Xms512m -Xmx512m" # Set min and max JVM heap sizes to at least 50% of system RAM
      - "DISABLE_INSTALL_DEMO_CONFIG=true" # Prevents execution of bundled demo script which installs demo certificates and security configurations to OpenSearch
      - "DISABLE_SECURITY_PLUGIN=true" # Disables Security plugin
    ulimits:
      memlock:
        soft: -1 # Set memlock to unlimited (no soft or hard limit)
        hard: -1
      nofile:
        soft: 65536 # Maximum number of open files for the opensearch user - set to at least 65536
        hard: 65536
    volumes:
      - opensearch:/usr/share/opensearch/data # Creates volume called opensearch-data1 and mounts it to the container
    ports:
      - 9200:9200 # REST API
      - 9600:9600 # Performance Analyzer
    networks:
      - opensearch-net # All of the containers will join the same Docker bridge network


volumes:
  opensearch:

networks:
  opensearch-net:
@TatianaNeuer TatianaNeuer added bug Something isn't working untriaged labels Jul 18, 2024
@github-actions github-actions bot added the Search Search query, autocomplete ...etc label Jul 18, 2024
@bugmakerrrrrr
Copy link
Contributor

bugmakerrrrrr commented Jul 19, 2024

The doc test_index-id-doc1 and test_index-id-doc6 is on the same shard, so the MinAndMax value of field name on this shard is bob(doc6 has no name field). If we set track_total_hits=false, the search_after param will be took into consideration during can match phase. Because the MinAndMax of the shard is larger than search_after value ana, this shard will be filtered due to cannot match. I think we should take missing value into consideration during can match phase. I can help to fix this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Search Search query, autocomplete ...etc
Projects
Status: 🆕 New
Development

Successfully merging a pull request may close this issue.

3 participants