Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(node | das | libs/header/sync): add total uptime node metrics + totalSampled das metrics + totalSynced sync metrics #1638

Merged
merged 17 commits into from
Feb 8, 2023

Conversation

derrandz
Copy link
Contributor

@derrandz derrandz commented Jan 25, 2023

Overview

This PR introduces node uptime metrics + das total sampled headers metrics to support calculating the uptime index proposed by mustafa on the monitoring side.

This PR introduces a new module named Telemetry to support node related telemetry. This module can also host all general telemetry and observability that does not interest specific modules.

Changes

  • Introduced uptime metrics for node under nodebuilder/node/uptime.go
  • Introduced persistent uptime metrics using datastore to persist node start time
  • Testing for uptime metrics persistence using the store
  • Unit testing for uptime metrics
  • Integration testing for uptime metrics
  • e2e testing for uptime metrics

Checklist

  • New and updated code has appropriate documentation
  • New and updated code has new and/or updated testing
  • Required CI checks are passing
  • Visual proof for any user facing features like CLI or documentation updates
  • Linked issues closed with keywords

Blocked By

PR: #1537

@derrandz derrandz changed the title Feat: Node Uptime Metrics feat(node/das): add node start and total uptime node metrics + totalSampled das metrics Jan 25, 2023
nodebuilder/config.go Outdated Show resolved Hide resolved
nodebuilder/node/uptime.go Outdated Show resolved Hide resolved
nodebuilder/node/uptime.go Outdated Show resolved Hide resolved
nodebuilder/node/uptime.go Outdated Show resolved Hide resolved
nodebuilder/node/uptime.go Outdated Show resolved Hide resolved
das/coordinator.go Outdated Show resolved Hide resolved
das/metrics.go Outdated Show resolved Hide resolved
das/metrics.go Show resolved Hide resolved
das/metrics.go Outdated Show resolved Hide resolved
nodebuilder/node/uptime.go Outdated Show resolved Hide resolved
@codecov-commenter
Copy link

codecov-commenter commented Jan 27, 2023

Codecov Report

Merging #1638 (e2795d7) into main (b51509d) will increase coverage by 1.12%.
The diff coverage is 71.42%.

@@            Coverage Diff             @@
##             main    #1638      +/-   ##
==========================================
+ Coverage   55.11%   56.24%   +1.12%     
==========================================
  Files         230      232       +2     
  Lines       15029    15151     +122     
==========================================
+ Hits         8283     8521     +238     
+ Misses       5862     5725     -137     
- Partials      884      905      +21     
Impacted Files Coverage Δ
das/options.go 60.00% <ø> (ø)
libs/header/sync/sync.go 68.15% <0.00%> (-1.33%) ⬇️
share/availability/mocks/availability.go 59.25% <ø> (ø)
share/mocks/getter.go 45.45% <ø> (ø)
das/metrics.go 47.86% <48.57%> (+34.38%) ⬆️
das/checkpoint.go 90.00% <50.00%> (-10.00%) ⬇️
header/metrics.go 66.66% <50.00%> (+66.66%) ⬆️
libs/header/sync/metrics.go 66.66% <66.66%> (ø)
nodebuilder/node/metrics.go 79.31% <79.31%> (ø)
das/coordinator.go 88.88% <100.00%> (+0.65%) ⬆️
... and 12 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

@derrandz derrandz added area:node Node area:das Related to DASer area:metrics Related to measuring/collecting node metrics kind:improvement labels Jan 27, 2023
@derrandz
Copy link
Contributor Author

We're successfully retrieving metrics at the moment, however we need to think about calculating the uptime index (from mustafa's proposal) somewhere either in node or in mojtaba's API, since Grafana's min function returns a min over a vector and not a minimum of two values.

Here's a screen short of the two values, green being totalSampled/network_head, and yellow being (time-nodeStartTs)/total_uptime_in_s

Screen Shot 2023-01-27 at 14 56 12

das/coordinator.go Outdated Show resolved Hide resolved
das/checkpoint.go Outdated Show resolved Hide resolved
das/checkpoint.go Outdated Show resolved Hide resolved
nodebuilder/node/uptime.go Outdated Show resolved Hide resolved
Copy link
Member

@renaynay renaynay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just define the const in uptime.go

@derrandz derrandz requested a review from renaynay January 27, 2023 20:16
@derrandz
Copy link
Contributor Author

@mojtaba-esk and I discussed the following comment and concluded to not store any metrics on store at the node side, and defer this requirement to the parties that require it (i.e: NodeLogger and Leaderboard backend)

das/coordinator.go Outdated Show resolved Hide resolved
das/coordinator.go Outdated Show resolved Hide resolved
das/metrics.go Outdated Show resolved Hide resolved
nodebuilder/settings.go Outdated Show resolved Hide resolved
nodebuilder/node/uptime.go Outdated Show resolved Hide resolved
@derrandz derrandz requested a review from walldiss January 30, 2023 12:54
@derrandz
Copy link
Contributor Author

Here's a screenshot of the last changes that shows:

  1. In green: the time factor of uptime: (total node runtime in seconds) / (time() - node start time ts)
  2. In yellow: the work factor of uptime (total headers sampled) / network head

Screen Shot 2023-01-31 at 01 10 36

@derrandz derrandz requested review from vgonkivs and walldiss February 7, 2023 18:02
das/metrics.go Show resolved Hide resolved
das/metrics.go Outdated Show resolved Hide resolved
libs/header/sync/sync.go Outdated Show resolved Hide resolved
nodebuilder/tests/swamp/swamp_tx.go Outdated Show resolved Hide resolved
@derrandz derrandz requested a review from renaynay February 7, 2023 20:56
@derrandz derrandz modified the milestone: 2023 Q1 Onsite Feb 8, 2023
@renaynay renaynay changed the title feat(node/das): add node start and total uptime node metrics + totalSampled das metrics feat(node | das | libs/header/sync): add total uptime node metrics + totalSampled das metrics + totalSynced sync metrics Feb 8, 2023
renaynay
renaynay previously approved these changes Feb 8, 2023
distractedm1nd
distractedm1nd previously approved these changes Feb 8, 2023
nodebuilder/node_test.go Outdated Show resolved Hide resolved
@renaynay renaynay dismissed stale reviews from distractedm1nd and themself via 7dbdf89 February 8, 2023 14:06
distractedm1nd
distractedm1nd previously approved these changes Feb 8, 2023
Copy link
Member

@vgonkivs vgonkivs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small nits regarding type conversion but other than that lgtm

das/checkpoint.go Outdated Show resolved Hide resolved
@renaynay renaynay merged commit 6933f03 into celestiaorg:main Feb 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:das Related to DASer area:metrics Related to measuring/collecting node metrics area:node Node
Projects
No open projects
Status: Done
Development

Successfully merging this pull request may close these issues.

7 participants