Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add vmId to the Rntbd Health Check Error Message #43079

Merged
merged 13 commits into from
Jan 10, 2025
Merged

Conversation

nehrao1
Copy link
Member

@nehrao1 nehrao1 commented Nov 25, 2024

Description

Issue 42811
Currently vmId is part of our client-side diagnostics. However, diagnostics can only be obtained if an operation succeeds or fails - if it hangs, we will not see anything. There was a WMT Sev2 where we saw channel health check failure logs (the log itself has the source and destination IP) but not the vmId, so we couldn't diagnose the VM well enough. With this change, the vmId can be extracted from health check logs and correlated so we can get VM level visibility.

All SDK Contribution checklist:

  • The pull request does not introduce [breaking changes]
  • CHANGELOG is updated for new features, bug fixes or other significant changes.
  • I have read the contribution guidelines.

General Guidelines and Best Practices

  • Title of the pull request is clear and informative.
  • There are a small number of commits, each of which have an informative message. This means that previously merged commits do not appear in the history of the PR. For more information on cleaning up the commits in your PR, see this page.

Testing Guidelines

  • Pull request includes test coverage for the included changes.

@nehrao1 nehrao1 requested review from kirankumarkolli and a team as code owners November 25, 2024 04:49
@azure-sdk
Copy link
Collaborator

API change check

API changes are not detected in this pull request.

Copy link
Member

@FabianMeiswinkel FabianMeiswinkel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left one blocking comment

Copy link
Member

@tvaron3 tvaron3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM besides some small comments

Copy link
Member

@FabianMeiswinkel FabianMeiswinkel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - Thanks!

@nehrao1 nehrao1 enabled auto-merge (squash) January 6, 2025 22:09
@jeet1995
Copy link
Member

jeet1995 commented Jan 7, 2025

One suggestion is to modify the logs when requests see cancellation or transit timeouts. It is not fully clear whether we'll definitely have diagnostics in these scenarios (or at least not clear to me :P) so it would not hurt to have clientVmId or equivalent here too.

Copy link
Member

@jeet1995 jeet1995 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - thanks!

@jeet1995
Copy link
Member

jeet1995 commented Jan 9, 2025

/azp run java - cosmos - tests

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@nehrao1
Copy link
Member Author

nehrao1 commented Jan 10, 2025

/check-enforcer override

@nehrao1 nehrao1 merged commit f0a3999 into Azure:main Jan 10, 2025
83 of 88 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants