Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Stack Monitoring] Shard Activity for completed recoveries shows N/A on Total Time column #135041

Closed
Tracked by #127224
crespocarlos opened this issue Jun 23, 2022 · 10 comments
Labels
bug Fixes for quality problems that affect the customer experience Feature:Stack Monitoring Team:Infra Monitoring UI - DEPRECATED DEPRECATED - Label for the Infra Monitoring UI team. Use Team:obs-ux-infra_services

Comments

@crespocarlos
Copy link
Contributor

crespocarlos commented Jun 23, 2022

Summary

We heard of a problem with the Total Time column in the Shard Activity table showing N/A

Image

After some investigation, I found that the /api/monitoring/v1/clusters/{clusterUuid}/elasticsearch endpoint gets the shard activity by querying both ECS and legacy index patterns. However both endpoint and table look at fields that don't exist on ECS docs

These fields are:

  • start_time_in_millis
  • total_time_in_millis

The existing fields on ECS are start_time and stop_time. This mismatch causes the table to display wrong info.

ES mapping: https://github.com/elastic/elasticsearch/blob/b318cd6f80aaa94fcd074fae2509dc5e028c1b31/x-pack/plugin/core/src/main/resources/monitoring-es-mb.json#L1417

What metricbeat does: https://github.com/elastic/beats/blob/87949288faf8450b35a12ae946050444d4272c18/metricbeat/module/elasticsearch/index_recovery/data.go#L76

AC

  • The table should show the correct start time (which is shown once you hover onto the total time) and total time.
@crespocarlos crespocarlos added bug Fixes for quality problems that affect the customer experience Team:Infra Monitoring UI - DEPRECATED DEPRECATED - Label for the Infra Monitoring UI team. Use Team:obs-ux-infra_services Feature:Stack Monitoring labels Jun 23, 2022
@elasticmachine
Copy link
Contributor

Pinging @elastic/infra-monitoring-ui (Team:Infra Monitoring UI)

@crespocarlos crespocarlos changed the title [Stack Monitoring] Shard Activity for completed recoveries shows N/A on Total Time [Stack Monitoring] Shard Activity for completed recoveries shows N/A on Total Time column Jun 23, 2022
@MakoWish
Copy link

MakoWish commented Aug 1, 2022

Any traction on this? It seems the lack of the Total Time metric also affects the sorting of shard recoveries/relocations. Makes it a real pain to follow the progress of actions in the GUI when they keep rearranging with every refresh of the data.

@MakoWish
Copy link

Still no action on this? We just upgraded to 8.6.2, and this bug has still not been addressed.

@klacabane
Copy link
Contributor

Didn't notice the issue was already logged when creating elastic/beats#34427. The fix is currently in review and will be available in 8.7.0.

Closing this in favor of the dup elastic/beats#34427

@MakoWish
Copy link

This is unfortunately still not fixed in 8.8.1. We just upgraded today and are quite disappointed this issue is more than a year old now and still not resolved.

@miltonhultgren
Copy link
Contributor

miltonhultgren commented Jun 13, 2023

Screenshot 2023-06-13 at 09 52 30

This screenshot is from my local machine, running the stack at 8.8.1 from source, which works.

@MakoWish Can you share more details about your setup? I will share in a moment the query to run and the expected results so you can compare your own data to that.

@miltonhultgren
Copy link
Contributor

miltonhultgren commented Jun 13, 2023

I'm assuming you're using Metricbeat so this should be the query (replace YOUR_CLUSTER_UUID with the UUID of your cluster):

GET .monitoring-es-8-*,metrics-elasticsearch.stack_monitoring.index_recovery-*/_search
{
  "size": 10000,
  "_source": [
    "elasticsearch.index.recovery",
    "@timestamp"
  ],
  "sort": {
    "timestamp": {
      "order": "desc",
      "unmapped_type": "long"
    }
  },
  "query": {
    "bool": {
      "filter": [
        {
          "bool": {
            "should": [
              {
                "term": {
                  "data_stream.dataset": "elasticsearch.stack_monitoring.index_recovery"
                }
              },
              {
                "term": {
                  "metricset.name": "index_recovery"
                }
              },
              {
                "term": {
                  "type": "index_recovery"
                }
              }
            ]
          }
        },
        {
          "term": {
            "cluster_uuid": "YOUR_CLUSTER_UUID"
          }
        },
        {
          "range": {
            "timestamp": {
              "gte": "now-15m",
              "lte": "now"
            }
          }
        }
      ]
    }
  },
  "aggs": {
    "max_timestamp": {
      "max": {
        "field": "@timestamp"
      }
    }
  }
}

And this is what the result should look like result.json.zip

Can you please verify that the recoveries in your result have the property total_time?

@MakoWish
Copy link

@MakoWish Can you share more details about your setup? I will share in a moment the query to run and the expected results so you can compare your own data to that.

We are using Elastic Agent with the Elasticsearch integration to collect logs and metrics.

What kind of voodoo are you pulling over there? I just went to grab a screenshot, and it is now showing the times correctly again. We upgraded our cluster yesterday, performing a rolling restart on each node, including our two Kibana/coordinating nodes, and it was still showing N/A. Nothing has changed since yesterday aside from a reboot on my local machine, but it is now showing correctly. I feel like I just drove my truck to the dealership, and the check engine light turned itself off.

Disregard!

@miltonhultgren
Copy link
Contributor

Elastic Agent runs Metricbeat as a subprocess and their versions should align. So if you have Agent 8.8.1 it should bundle Metricbeat 8.8.1 which should include the fix as far as I can tell. Oddly enough we had a similar issue pop up internally, perhaps I can share something more if that investigation leads somewhere.

The only other aspect I could think of is the mappings applied but that shouldn't really affect the shape of the reported documents, but there mappings included in the Elasticsearch Integration. Perhaps there was an update of the version there that could have made a change?

In any case, glad it's working but I would be happier if we knew why :D

@MakoWish
Copy link

Yeah, would be good to know why it was still not working initially, but also glad it is now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Fixes for quality problems that affect the customer experience Feature:Stack Monitoring Team:Infra Monitoring UI - DEPRECATED DEPRECATED - Label for the Infra Monitoring UI team. Use Team:obs-ux-infra_services
Projects
None yet
Development

No branches or pull requests

5 participants