[RFC] [Remote Store] Remote Store Stats API #7153

linuxpi · 2023-04-14T01:27:29Z

Is your feature request related to a problem? Please describe.
#6789 talks about adding stats related to Remote Store. This will help us identify how we are performing with Remote Store enabled indices. We need Rest API to expose these stats to provide visibility into how the Remote Store enabled indices are performing.

Scope of this issue is to add required API(s) for exposing Remote Store related stats

Describe the solution you'd like

GET /_remote_store/stats/<index>/<shardId>

{
      "shard_id" : "[my-index-1][0]",
      "local_refresh_timestamp_in_millis" : 196439653,
      "local_refresh_cumulative_count" : 0,
      "remote_refresh_timestamp_in_millis" : 196439653,
      "remote_refresh_cumulative_count" : 0,
      "bytes_lag" : 0,
      "rejection_count" : 0,
      "consecutive_failure_count" : 0,
      "total_remote_refresh" : {
        "started" : 0,
        "succeeded" : 0,
        "failed" : 0
      },
      "total_uploads_in_bytes" : {
        "started" : 0,
        "succeeded" : 0,
        "failed" : 0
      },
      "remote_refresh_size_in_bytes" : {
        "last_successful" : 0,
        "moving_avg" : 0.0
      },
      "upload_latency_in_bytes_per_sec" : {
        "moving_avg" : 0.0
      },
      "remote_refresh_latency_in_nanos" : {
        "moving_avg" : 0.0
      }
}

Index Level Stats

GET /_remote_store/stats/<index>

[
  {
     "shardId": <>,
     ...
  },
  {
     "shardId": <>,
     ...
  }
  ...
]

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.

Additional context
Add any other context or screenshots about the feature request here.

The text was updated successfully, but these errors were encountered:

ashking94 · 2023-04-17T06:50:14Z

@linuxpi Couple of points -

We can add a field bytes_behind which gives info on how many bytes are we lagging behind the local store.
We could allow passing '*' as the value in the stats api - /cat/remote_store/{index". This will allow to fetch stats for all the shards in one api call.
We could have the api return total sum across all shards present in the cluster.
We would want the aggregated information per node basis as well. This will provide data on the outgoing traffic to remote store and be a feedback for reallocating the shard across nodes manually/programmatically.
Currently each remote-backed index allows user to set translog and segments repository. We should also have aggregate on repository level. This will give insights on when a repository is not acting as usual.

sachinpkale · 2023-04-24T03:42:18Z

@linuxpi Can you please also provide details around the API permissions?

linuxpi · 2023-04-25T08:46:53Z

@ashking94

We can add a field bytes_behind which gives info on how many bytes are we lagging behind the local store.

Yes. update the structure

We could allow passing '*' as the value in the stats api - /cat/remote_store/{index". This will allow to fetch stats for all the shards in one api call.

Not sure if this is a good idea. if a cluster has many shards the api response become huge

We could have the api return total sum across all shards present in the cluster.

sum of all metrics? I dont think all metrics would make sense when summed

We would want the aggregated information per node basis as well. This will provide data on the outgoing traffic to remote store and be a feedback for reallocating the shard across nodes manually/programmatically.

Node level aggregation would be very useful. but i am planning to add it incrementally

Currently each remote-backed index allows user to set translog and segments repository. We should also have aggregate on repository level. This will give insights on when a repository is not acting as usual.

Thats a good point. We can implement all aggregate level metrics incrementally - cluster, node and repository level

sachinpkale · 2023-04-25T09:50:33Z

@linuxpi What does started signify under upload_bytes, total_uploads and total_deletes? Can we check existing stats API and use the same naming conventions?

linuxpi · 2023-04-25T10:52:52Z

@sachinpkale started signifies the bytes/objects sent for upload. succeeded and failed would reflect out of those how many succeeded or failed. started should be equal to succeeded + failed

linuxpi · 2023-04-25T10:58:39Z

Can we check existing stats API and use the same naming conventions?

@sachinpkale i checked a various stats objects part of ClusterStatsIndices . What i've noticed there is each stat is appended with _in_bytes where-ever application. i'll check more and try to comply as much as possible but if you had anything specific in mind do let me know

sachinpkale · 2023-05-09T14:04:59Z

@linuxpi Which metric/s will be used to determine the time it takes to complete one run of segment uploads post a refresh? Reference: #7474

linuxpi · 2023-05-16T15:04:40Z

@linuxpi Which metric/s will be used to determine the time it takes to complete one run of segment uploads post a refresh? Reference: #7474

Should we covered by remote_refresh_latency_in_nanos

linuxpi · 2024-03-05T18:21:29Z

Closing this as the API was released with 2.10 release

linuxpi added enhancement Enhancement or improvement to existing feature or request untriaged labels Apr 14, 2023

minalsha added RFC Issues requesting major changes and removed untriaged labels Apr 14, 2023

sachinpkale added the Storage:Durability Issues and PRs related to the durability framework label Apr 24, 2023

This was referenced May 3, 2023

[Meta] Remote Store: 2.8.0 - Release Tracking #7382

Closed

[Remote Store] Add API to check status of remote store sync #3146

Closed

linuxpi mentioned this issue May 5, 2023

[Remote Store] Add Remote store stats api #7441

Merged

5 tasks

sachinpkale mentioned this issue May 15, 2023

Identify stats for Remote Translog and integrate with remote store stats API #7559

Closed

linuxpi changed the title ~~[Draft] [RFC] [Remote Store] Remote Store Stats API~~ [RFC] [Remote Store] Remote Store Stats API May 16, 2023

gbbafna closed this as completed in #7441 May 18, 2023

ashking94 mentioned this issue May 24, 2023

[Remote Store] Add Remote store stats api (#7441) #7719

Closed

6 tasks

ashking94 reopened this Jun 21, 2023

github-actions bot added the untriaged label Jun 21, 2023

BhumikaSaini-Amazon mentioned this issue Jun 28, 2023

[RFC] [Remote Store] /_remotestore/stats API and _nodes/stats API enhancements for observability on Remote Translog Store upload operations #8311

Closed

anasalkouz removed the untriaged label Jul 10, 2023

linuxpi self-assigned this Nov 26, 2023

Bukhtawar added this to Storage Project Board Feb 15, 2024

github-project-automation bot moved this to 🆕 New in Storage Project Board Feb 15, 2024

linuxpi closed this as completed Mar 5, 2024

github-project-automation bot moved this from 🆕 New to ✅ Done in Storage Project Board Mar 5, 2024

gbbafna mentioned this issue Aug 8, 2024

Add Varun Bansal as maintainer #15163

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RFC] [Remote Store] Remote Store Stats API #7153

[RFC] [Remote Store] Remote Store Stats API #7153

linuxpi commented Apr 14, 2023 •

edited

Loading

ashking94 commented Apr 17, 2023

sachinpkale commented Apr 24, 2023

linuxpi commented Apr 25, 2023

sachinpkale commented Apr 25, 2023

linuxpi commented Apr 25, 2023

linuxpi commented Apr 25, 2023

sachinpkale commented May 9, 2023

linuxpi commented May 16, 2023

linuxpi commented Mar 5, 2024

[RFC] [Remote Store] Remote Store Stats API #7153

[RFC] [Remote Store] Remote Store Stats API #7153

Comments

linuxpi commented Apr 14, 2023 • edited Loading

ashking94 commented Apr 17, 2023

sachinpkale commented Apr 24, 2023

linuxpi commented Apr 25, 2023

sachinpkale commented Apr 25, 2023

linuxpi commented Apr 25, 2023

linuxpi commented Apr 25, 2023

sachinpkale commented May 9, 2023

linuxpi commented May 16, 2023

linuxpi commented Mar 5, 2024

linuxpi commented Apr 14, 2023 •

edited

Loading