[Stack Monitoring] Shard Activity for completed recoveries shows N/A on Total Time column #135041

crespocarlos · 2022-06-23T15:17:20Z

Summary

We heard of a problem with the Total Time column in the Shard Activity table showing N/A

After some investigation, I found that the /api/monitoring/v1/clusters/{clusterUuid}/elasticsearch endpoint gets the shard activity by querying both ECS and legacy index patterns. However both endpoint and table look at fields that don't exist on ECS docs

These fields are:

start_time_in_millis
total_time_in_millis

The existing fields on ECS are start_time and stop_time. This mismatch causes the table to display wrong info.

ES mapping: https://github.com/elastic/elasticsearch/blob/b318cd6f80aaa94fcd074fae2509dc5e028c1b31/x-pack/plugin/core/src/main/resources/monitoring-es-mb.json#L1417

What metricbeat does: https://github.com/elastic/beats/blob/87949288faf8450b35a12ae946050444d4272c18/metricbeat/module/elasticsearch/index_recovery/data.go#L76

AC

The table should show the correct start time (which is shown once you hover onto the total time) and total time.

The text was updated successfully, but these errors were encountered:

elasticmachine · 2022-06-23T15:17:44Z

Pinging @elastic/infra-monitoring-ui (Team:Infra Monitoring UI)

MakoWish · 2022-08-01T19:55:26Z

Any traction on this? It seems the lack of the Total Time metric also affects the sorting of shard recoveries/relocations. Makes it a real pain to follow the progress of actions in the GUI when they keep rearranging with every refresh of the data.

MakoWish · 2023-02-23T16:51:59Z

Still no action on this? We just upgraded to 8.6.2, and this bug has still not been addressed.

klacabane · 2023-02-23T17:10:07Z

Didn't notice the issue was already logged when creating elastic/beats#34427. The fix is currently in review and will be available in 8.7.0.

Closing this in favor of the dup elastic/beats#34427

MakoWish · 2023-06-12T19:14:23Z

This is unfortunately still not fixed in 8.8.1. We just upgraded today and are quite disappointed this issue is more than a year old now and still not resolved.

miltonhultgren · 2023-06-13T07:54:45Z

This screenshot is from my local machine, running the stack at 8.8.1 from source, which works.

@MakoWish Can you share more details about your setup? I will share in a moment the query to run and the expected results so you can compare your own data to that.

miltonhultgren · 2023-06-13T08:06:46Z

I'm assuming you're using Metricbeat so this should be the query (replace YOUR_CLUSTER_UUID with the UUID of your cluster):

GET .monitoring-es-8-*,metrics-elasticsearch.stack_monitoring.index_recovery-*/_search
{
  "size": 10000,
  "_source": [
    "elasticsearch.index.recovery",
    "@timestamp"
  ],
  "sort": {
    "timestamp": {
      "order": "desc",
      "unmapped_type": "long"
    }
  },
  "query": {
    "bool": {
      "filter": [
        {
          "bool": {
            "should": [
              {
                "term": {
                  "data_stream.dataset": "elasticsearch.stack_monitoring.index_recovery"
                }
              },
              {
                "term": {
                  "metricset.name": "index_recovery"
                }
              },
              {
                "term": {
                  "type": "index_recovery"
                }
              }
            ]
          }
        },
        {
          "term": {
            "cluster_uuid": "YOUR_CLUSTER_UUID"
          }
        },
        {
          "range": {
            "timestamp": {
              "gte": "now-15m",
              "lte": "now"
            }
          }
        }
      ]
    }
  },
  "aggs": {
    "max_timestamp": {
      "max": {
        "field": "@timestamp"
      }
    }
  }
}

And this is what the result should look like result.json.zip

Can you please verify that the recoveries in your result have the property total_time?

MakoWish · 2023-06-13T14:59:07Z

@MakoWish Can you share more details about your setup? I will share in a moment the query to run and the expected results so you can compare your own data to that.

We are using Elastic Agent with the Elasticsearch integration to collect logs and metrics.

What kind of voodoo are you pulling over there? I just went to grab a screenshot, and it is now showing the times correctly again. We upgraded our cluster yesterday, performing a rolling restart on each node, including our two Kibana/coordinating nodes, and it was still showing N/A. Nothing has changed since yesterday aside from a reboot on my local machine, but it is now showing correctly. I feel like I just drove my truck to the dealership, and the check engine light turned itself off.

Disregard!

miltonhultgren · 2023-06-15T10:48:48Z

Elastic Agent runs Metricbeat as a subprocess and their versions should align. So if you have Agent 8.8.1 it should bundle Metricbeat 8.8.1 which should include the fix as far as I can tell. Oddly enough we had a similar issue pop up internally, perhaps I can share something more if that investigation leads somewhere.

The only other aspect I could think of is the mappings applied but that shouldn't really affect the shape of the reported documents, but there mappings included in the Elasticsearch Integration. Perhaps there was an update of the version there that could have made a change?

In any case, glad it's working but I would be happier if we knew why :D

MakoWish · 2023-06-15T15:27:23Z

Yeah, would be good to know why it was still not working initially, but also glad it is now.

crespocarlos added bug Fixes for quality problems that affect the customer experience Team:Infra Monitoring UI - DEPRECATED DEPRECATED - Label for the Infra Monitoring UI team. Use Team:obs-ux-infra_services Feature:Stack Monitoring labels Jun 23, 2022

crespocarlos changed the title ~~[Stack Monitoring] Shard Activity for completed recoveries shows N/A on Total Time~~ [Stack Monitoring] Shard Activity for completed recoveries shows N/A on Total Time column Jun 23, 2022

matschaffer mentioned this issue Jul 4, 2022

Stack Monitoring Tech Debt Plan #127224

Closed

39 tasks

klacabane closed this as completed Feb 23, 2023

miltonhultgren mentioned this issue Jun 13, 2023

[Stack Monitoring] shard activity does not show total time for metricbeat data elastic/beats#34427

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Stack Monitoring] Shard Activity for completed recoveries shows N/A on Total Time column #135041

[Stack Monitoring] Shard Activity for completed recoveries shows N/A on Total Time column #135041

crespocarlos commented Jun 23, 2022 •

edited by weltenwort

Loading

elasticmachine commented Jun 23, 2022

MakoWish commented Aug 1, 2022

MakoWish commented Feb 23, 2023

klacabane commented Feb 23, 2023

MakoWish commented Jun 12, 2023

miltonhultgren commented Jun 13, 2023 •

edited

Loading

miltonhultgren commented Jun 13, 2023 •

edited

Loading

MakoWish commented Jun 13, 2023

miltonhultgren commented Jun 15, 2023

MakoWish commented Jun 15, 2023

[Stack Monitoring] Shard Activity for completed recoveries shows N/A on Total Time column #135041

[Stack Monitoring] Shard Activity for completed recoveries shows N/A on Total Time column #135041

Comments

crespocarlos commented Jun 23, 2022 • edited by weltenwort Loading

Summary

AC

elasticmachine commented Jun 23, 2022

MakoWish commented Aug 1, 2022

MakoWish commented Feb 23, 2023

klacabane commented Feb 23, 2023

MakoWish commented Jun 12, 2023

miltonhultgren commented Jun 13, 2023 • edited Loading

miltonhultgren commented Jun 13, 2023 • edited Loading

MakoWish commented Jun 13, 2023

miltonhultgren commented Jun 15, 2023

MakoWish commented Jun 15, 2023

crespocarlos commented Jun 23, 2022 •

edited by weltenwort

Loading

miltonhultgren commented Jun 13, 2023 •

edited

Loading

miltonhultgren commented Jun 13, 2023 •

edited

Loading