Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DLPX-78776 estat zpl may warn about missing more events than were obs… #79

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

brad-lewis
Copy link
Contributor

This script has been reporting warnings like this for awhile:

$ sudo estat zpl 60
12/14/21 - 01:23:07 UTC

 Tracing enabled... Hit Ctrl-C to end.
WARNING: probe r_zfs_read_bcc_56749 missed 7846 of 2037 events
WARNING: probe r_zfs_write_bcc_56749 missed 12893 of 3297 events

Investigations have found that the issue is in the the code to determine the spa_name from the inode. The objset field is not always set on entering zfs_write or zfs_read. In the most recent version of zfs this appears to be the case all the time. In order to get some data, the check for the pool was pulled out though this results in the rpool operations be included by default. Also there is no way to check for SYNC_ALWAYS. This seems better than no data.

Some other cleanup was done, including removing of the tracking sync vs async reads (whatever that was supposed to mean).

@sebroy
Copy link
Contributor

sebroy commented Mar 9, 2022

Removing the pool selection doesn't feel like the right solution, the reason being that the root disk often has different performance characteristics than disks in the zpl; so it would be hard to understand why there might be latency outliers (is it a performance issue on domain0, or is it just what we'd expect from rpool?). In Azure, for example, the root disk is on a fully virtualized I/O controller that performs rather poorly.

Are there other possible approaches?

@brad-lewis
Copy link
Contributor Author

Looking harder for another approach, we could check the spa name somewhere further down the call stack. For the writes, zil_log_write() should be called at least once for the normal path and the zilog structure gives access to the spa to get the name and the os to check for SYNC_ALWAYS.

I read seems to have multiple paths. Thought it might be worth discussing before further investigation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

2 participants