Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(cosmovisor): premature upgrade on restart #22528

Merged
merged 8 commits into from
Nov 18, 2024

Conversation

akhilkumarpilli
Copy link
Contributor

@akhilkumarpilli akhilkumarpilli commented Nov 14, 2024

Description

Closes: #22167

When on restarting, unable to fetch current height and height will be 0, which will update binary


Author Checklist

All items are required. Please add a note to the item if the item is not applicable and
please add links to any relevant follow up issues.

I have...

  • included the correct type prefix in the PR title, you can find examples of the prefixes below:
  • confirmed ! in the type prefix if API or client breaking change
  • targeted the correct branch (see PR Targeting)
  • provided a link to the relevant issue or specification
  • reviewed "Files changed" and left comments if necessary
  • included the necessary unit and integration tests
  • added a changelog entry to CHANGELOG.md
  • updated the relevant documentation or specification, including comments for documenting Go code
  • confirmed all CI checks have passed

Reviewers Checklist

All items are required. Please add a note if the item is not applicable and please add
your handle next to the items reviewed if you only reviewed selected items.

Please see Pull Request Reviewer section in the contributing guide for more information on how to review a pull request.

I have...

  • confirmed the correct type prefix in the PR title
  • confirmed all author checklist items have been addressed
  • reviewed state machine logic, API design and naming, documentation is accurate, tests and test coverage

Summary by CodeRabbit

  • New Features

    • Introduced new commands: cosmovisor show-upgrade-info, add-batch-upgrade, and prepare-upgrade for enhanced functionality.
  • Bug Fixes

    • Resolved issues with premature upgrades during restarts of cosmovisor.
  • Documentation

    • Updated changelog for clarity on features, improvements, and bug fixes, following Semantic Versioning principles.

@github-actions github-actions bot added the C:Cosmovisor Issues and PR related to Cosmovisor label Nov 14, 2024
Copy link
Contributor

coderabbitai bot commented Nov 14, 2024

📝 Walkthrough

Walkthrough

The pull request updates the changelog for the cosmovisor tool, adding a new version entry v1.7.0 with new commands like add-batch-upgrade, prepare-upgrade, and cosmovisor show-upgrade-info. It also includes a bug fix for premature upgrades upon restarting the service. Additionally, a new error variable errUntestAble is introduced in the codebase to enhance error handling during testing scenarios, refining the control flow in the CheckUpdate and checkHeight methods.

Changes

File Path Change Summary
tools/cosmovisor/CHANGELOG.md Updated to include new version v1.7.0, added commands add-batch-upgrade, prepare-upgrade, and cosmovisor show-upgrade-info, and included a bug fix for premature upgrades.
tools/cosmovisor/scanner.go Introduced new error variable errUntestAble, modified checkHeight and CheckUpdate methods for improved error handling.

Assessment against linked issues

Objective Addressed Explanation
Fix premature upgrades after service restart (#22167)

Possibly related PRs

Suggested reviewers

  • facundomedica
  • julienrbrt
  • tac0turtle

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@akhilkumarpilli akhilkumarpilli marked this pull request as ready for review November 15, 2024 06:53

This comment has been minimized.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Outside diff range and nitpick comments (3)
tools/cosmovisor/scanner.go (2)

154-156: Improve error handling clarity and documentation

The current implementation combines multiple conditions in a way that makes the logic hard to follow. Consider separating the conditions and improving the comment.

Apply this change:

-	currentHeight, err := fw.checkHeight()
-	if (err != nil || currentHeight < info.Height) && !errors.Is(err, errUntestAble) { // ignore this check for tests
-		return false
-	}
+	currentHeight, err := fw.checkHeight()
+	// Skip height verification in test environment
+	if errors.Is(err, errHeightCheckUnavailable) {
+		return true
+	}
+	// Return false if there's an error or if we haven't reached the upgrade height
+	if err != nil || currentHeight < info.Height {
+		return false
+	}

186-188: Approve with minor comment improvement

The implementation correctly prevents premature upgrades by handling test scenarios appropriately. However, the comment could be more descriptive.

Apply this change:

-	if testing.Testing() { // we cannot test the command in the test environment
+	if testing.Testing() { // Skip height check in test environment as status command is not available
 		return 0, errUntestAble
 	}
tools/cosmovisor/CHANGELOG.md (1)

52-52: Enhance the bug fix description with more context.

While the entry follows the format, consider adding more details to help users understand the issue and its resolution. Suggested revision:

-* [#22528](https://github.com/cosmos/cosmos-sdk/pull/22528) Fix premature upgrades on restarting cosmovisor.
+* [#22528](https://github.com/cosmos/cosmos-sdk/pull/22528) Fix premature upgrades on restarting cosmovisor where upgrades were incorrectly applied before reaching the designated block height.
📜 Review details

Configuration used: .coderabbit.yml
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between e74799e and cd2028d.

📒 Files selected for processing (2)
  • tools/cosmovisor/CHANGELOG.md (1 hunks)
  • tools/cosmovisor/scanner.go (3 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
tools/cosmovisor/CHANGELOG.md (1)

Pattern **/*.md: "Assess the documentation for misspellings, grammatical errors, missing documentation and correctness"

tools/cosmovisor/scanner.go (1)

Pattern **/*.go: Review the Golang code for conformity with the Uber Golang style guide, highlighting any deviations.

🔇 Additional comments (3)
tools/cosmovisor/scanner.go (1)

Line range hint 154-188: Verify the fix for premature upgrades

The implementation looks correct, but let's verify that it properly handles the restart scenario described in issue #22167.

✅ Verification successful

Let me gather more information about the height checking implementation to ensure it's properly handling edge cases.


Let me check the test coverage and implementation details one more time to ensure the restart scenario is properly tested.


Implementation correctly handles restart scenarios and premature upgrades

The code review confirms that the implementation properly addresses the premature upgrade issue with multiple safeguards:

  • The initialized flag correctly tracks daemon restarts
  • Height checks are properly guarded against test scenarios and errors
  • The upgrade name comparison during restart prevents incorrect upgrades
  • Empty file handling with retry mechanism prevents race conditions
  • Height comparison ensures upgrades only trigger at the right block height
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check for potential race conditions or edge cases in height checking

# Look for other height checks that might need similar protection
rg -n "latest_block_height|LatestBlockHeight" --type go

# Check for any restart-related error handling
rg -n "daemon has restarted" --type go -A 5

Length of output: 1480


Script:

#!/bin/bash
# Check the complete implementation of height checking and upgrade logic
ast-grep --pattern 'func ($_ *fileWatcher) checkHeight() {
  $$$
}'

# Look for upgrade-related error handling
rg -n "needsUpdate" --type go -A 3

# Check for any test cases related to restart scenarios
rg -n "TestFileWatcher.*restart|TestUpgrade.*restart" --type go

Length of output: 1506


Script:

#!/bin/bash
# Check test files for upgrade and restart scenarios
rg -n "func Test.*" tools/cosmovisor/scanner_test.go -B 1 -A 5

# Look for the complete implementation of the scanner
cat tools/cosmovisor/scanner.go

# Check for any upgrade-related test utilities
rg -n "func.*upgrade" tools/cosmovisor/testdata/ --type go

Length of output: 6583

tools/cosmovisor/CHANGELOG.md (2)

43-43: LGTM! Well-formatted changelog entry.

The new feature entry follows the changelog guidelines with proper PR reference and clear description.


Line range hint 1-52: LGTM! Changelog structure is well-maintained.

The changelog follows the Keep a Changelog format consistently, with proper organization of sections and version history.

tools/cosmovisor/scanner.go Show resolved Hide resolved
Copy link
Member

@julienrbrt julienrbrt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

utACK. Can you add on top of the changelog a 1.7.0 header so it's ready to tag. We should release as soon as this is merged.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (2)
tools/cosmovisor/CHANGELOG.md (2)

39-40: Update the release date

The release date is set to November 15, 2024, which is in the future. Consider updating it to the actual release date or moving these changes to the "Unreleased" section until the release is made.


52-54: Enhance the bug fix description

While the entry correctly references the PR, it would be more helpful to:

  1. Add a reference to the original issue ([Bug]: Cosmovisor prematurely upgrades after systemctl restart cosmovisor.service #22167)
  2. Provide more context about the bug and its impact
  3. Mention that it specifically fixes the issue where Cosmovisor was upgrading prematurely when the node was restarted before reaching the target height

Suggested revision:

-* [#22528](https://github.com/cosmos/cosmos-sdk/pull/22528) Fix premature upgrades on restarting cosmovisor.
+* [#22528](https://github.com/cosmos/cosmos-sdk/pull/22528) Fix premature upgrades on restarting cosmovisor where upgrades were incorrectly applied before reaching the target height (fixes #22167).
📜 Review details

Configuration used: .coderabbit.yml
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between cd2028d and 6b837c9.

📒 Files selected for processing (1)
  • tools/cosmovisor/CHANGELOG.md (1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
tools/cosmovisor/CHANGELOG.md (1)

Pattern **/*.md: "Assess the documentation for misspellings, grammatical errors, missing documentation and correctness"

🔇 Additional comments (1)
tools/cosmovisor/CHANGELOG.md (1)

42-45: LGTM! Well-documented features

The Features section is well-structured, with clear descriptions and proper issue references for each new command.

Copy link
Contributor

@hieuvubk hieuvubk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (1)
tools/cosmovisor/CHANGELOG.md (1)

54-54: Enhance the bug fix description to better reflect the PR objectives.

While the entry correctly references PR #22528, it could be more descriptive about the specific issue being fixed.

Consider expanding the description to:

-* [#22528](https://github.com/cosmos/cosmos-sdk/pull/22528) Fix premature upgrades on restarting cosmovisor.
+* [#22528](https://github.com/cosmos/cosmos-sdk/pull/22528) Fix premature upgrades on restarting cosmovisor by ensuring upgrades only occur at the specified block height.
📜 Review details

Configuration used: .coderabbit.yml
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 6b837c9 and 2b56f7e.

📒 Files selected for processing (1)
  • tools/cosmovisor/CHANGELOG.md (1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
tools/cosmovisor/CHANGELOG.md (1)

Pattern **/*.md: "Assess the documentation for misspellings, grammatical errors, missing documentation and correctness"

🔇 Additional comments (2)
tools/cosmovisor/CHANGELOG.md (2)

39-40: LGTM: Version entry follows the changelog format.

The new version entry follows the correct format with version number and date.


42-45: LGTM: Features section is well-documented.

Each feature entry properly includes:

  • PR reference link
  • Clear description of the new functionality
  • Consistent formatting

@akhilkumarpilli akhilkumarpilli added this pull request to the merge queue Nov 18, 2024
Merged via the queue into main with commit fbd725d Nov 18, 2024
72 of 73 checks passed
@akhilkumarpilli akhilkumarpilli deleted the akhil/fix-cosmovisor-upgrade branch November 18, 2024 08:52
alpe added a commit that referenced this pull request Nov 18, 2024
* main:
  fix(cosmovisor): premature upgrade on restart (#22528)
  fix(store/v2/pebble): handle version 0 in keys (#22524)
@julienrbrt
Copy link
Member

Tagged: https://github.com/cosmos/cosmos-sdk/releases/tag/tools%2Fcosmovisor%2Fv1.7.0

alpe added a commit that referenced this pull request Nov 21, 2024
* main:
  build(deps): Bump cosmossdk.io/math from 1.3.0 to 1.4.0 (#22580)
  fix(server/v2/api/telemetry): enable global metrics  (#22571)
  refactor(server/v2/cometbft): add `codec.Codec` and clean-up APIs (#22566)
  feat(core/coretesting): make memDB satisfy db.Db interface (#22570)
  Merge commit from fork
  fix(server(/v2)): fix fallback genesis path (#22564)
  fix: only overwrite context chainID when necessary (#22568)
  docs(client): Update setFeeGranter and setFeePayer comments (#22526)
  fix(baseapp): populate header info in `NewUncachedContext` (#22557)
  build(deps): Bump buf.build/gen/go/cometbft/cometbft/protocolbuffers/go from 1.35.1-20240701160653-fedbb9acfd2f.1 to 1.35.2-20240701160653-fedbb9acfd2f.1 in /api (#22551)
  build(deps): Bump github.com/creachadair/atomicfile from 0.3.5 to 0.3.6 in /tools/confix (#22552)
  docs: Update reference of Approximation (#22550)
  fix(server/v2/comebft): wire missing services + fix simulation (#21964)
  ci: fix permissions for GITHUB_TOKEN on dependabot workflows (#22547)
  ci: fix permissions for GITHUB_TOKEN in spell check workflow (#22545)
  build(deps): Bump google.golang.org/protobuf from 1.35.1 to 1.35.2 (#22537)
  fix(cosmovisor): premature upgrade on restart (#22528)
  fix(store/v2/pebble): handle version 0 in keys (#22524)
  refactor(server/v2/telemetry): swap redirects (#22520)
  docs: Update content in  CODE_OF_CONDUCT.md (#22518)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C:Cosmovisor Issues and PR related to Cosmovisor
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Bug]: Cosmovisor prematurely upgrades after systemctl restart cosmovisor.service
4 participants