-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Correct refcount_add in dmu_zfetch #12602
Conversation
I propose we instead ignore tracking if holder == NULL. I see no point to track NULL pointers, since it won't give us anything useful. |
Special-casing NULL doesn't seem like it adds any benefit, to me. If you think that the refcount interface is too heavy for this, maybe it shouldn't be a refcount, and it should just be some simple atomic_adds on an int instead of a zfs_refcount_t, or an additional set of function indirections for not tracking, if people really want to use the interface. But "using it this way panics unless you pass a magic value" seems unfortunate. |
Proposed replacement of single atomic with multiple atomics in a loop does not buy anything, only wastes time. So either pass there some real pointers useful for debugging, or as you have said replace it with just atomics. Or create the refcount as zfs_refcount_create_untracked(). |
Yes, no purpose at all, other than the minor issue of not panicking. Personally, I think changing it to refcount_create_untracked() would be reasonable, but using refcount_{add,remove}_many() like that here is a bad idea. IMO, if you have to start remembering whether refcount_create versus _untracked was used to create something to know if using it a certain way will work or panic, you should probably use a different interface. The only place refcount_create_untracked() is currently called is in dnode.c, and there's no _many calls to be found there. |
It seems to me the fix proposed here is right, this is how the interface was intended to be used. The last thing we want is for callers to need to care if it the refcount was created as tracked or untracked. Switching to the atomic interfaces seems like the right way to handle this if we think the added loop will measurably impact performance. But I'm fine with the change as is since it fixes the core issue. |
It increases number of atomics there from one per call if any new block was prefetched to one per block prefetched. It may be not dramatic (though somebody could test this with recordsize = 4KB) and may be "right", but it is absolutely pointless with NULL as a pointer. Either pass there some useful piointer or value for it to make sense or just axe out the tracking one way or another. |
Either passing a pointer value or switching to the atomic interface would by fine with me. |
If I hadn't already written the patch this way initially and decided not to include it because it wasn't necessary, I'd probably have just closed the PR at this point. "I hate how I wrote this code 7 months ago, please go rewrite every callsite with a cosmetic change to get your one-line fix in" is a great way to discourage contributors. |
Rich, I appreciate your wish to fix the problem. I am just trying to get maximum of it. Don't take it wrong. Speaking about the last revision, I am not sure what additional benefit gives passing identical zfetch_t pointer every time? In what way it should help the tracker to do helpful things? Tracker supposed to help find lost references, references dropped more than once, and in that context passing unique pointer (like pointer specific ZIO or data block) allows to find out what is that block and possibly trace its history. Passing same pointer everywhere won't protect from that. It is only a pointer to the zfetch_t, which may already be available from other places. It may be better than nothing, but is it ehough? Unfortunately to this code zfetch operates only with block numbers, not the block themselves. We could probably pass the block numbers there instead of pointer, but there could be few tricks on that route, plus it is a bit less useful for debugging. That is why I'd prefer tracking to be disabled here, or just switch to atomics. If it is too much to ask of you, I may do it myself one day. |
I'm glad you think this should be improved. I would suggest that a two-line fix for a panic in the code you wrote is not the correct place to plan or request that. @behlendorf please merge this or the prior version without unnecessary changes if you like; if you'd like all the calls reconstructed, please let me know, and I'll just close the PR and keep it in my local branch so I can use the feature without a panic. |
I've already told what I think about the previous patch. I'd prefer this refcount to be marked as untracked with respective comment, until it is rewritten to atomics, unless you wish to do it now. I am going on vacation in few hours, otherwise I'd just done it myself. |
@jwk404 Unless you chose to close this in a pretty unusual way, I think that commit had the wrong Closes... |
@rincebrain if you can restore the original patch let's proceed with that minimal fix to resolve the ASSERT when tracking is enabled. Then when @amotin returns from vacation he can propose a PR which either tracks some pointer values (which I think we all agree would be more useful) or simply switchs it all to atomics. I've gone ahead and reopened this issue which was close due to that commit typo. |
refcount_add_many(foo,N) is not the same as for (i=0; i < N; i++) { refcount_add(foo); } Unfortunately, this is only actually true with debug kernels and reference_tracking_enable=1. Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Sure, tracking a useful value would be helpful for anyone trying to use the functionality moving forward. Reworking that just seemed much more like "future work". Pushed. |
...that's very strange. Both the two runs of sanity that I triggered trying to push the old version of the commit without vacuously updating it (which triggered running against 15cc4ed) and against 4b99eaf all failed on the same tests in destroy (among others). I'm reasonably confident I did not somehow break it between the first time I pushed this before and now, so...not sure why they're failing suddenly when they didn't before? I'll go see about reproducing this on my 20.04 VM... |
I just started seeing these same failures in other PRs so I'm pretty sure it's unrelated. But I'm not sure what changed. |
When it passed originally, it was on the 20210929.1 snapshot. Failing now is on 20211004.1. Perhaps something is rotten there. edit: This run was on 20210929.1 and was the last in the long line of passing, then the very next one where it started failing was 20211004.1...and the outlier that succeeded later was also 20210929.1. |
refcount_add_many(foo,N) is not the same as for (i=0; i < N; i++) { refcount_add(foo); } Unfortunately, this is only actually true with debug kernels and reference_tracking_enable=1. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Rich Ercolani <rincebrain@gmail.com> Closes openzfs#12589 Closes openzfs#12602
For those not already familiar with the code base it can be a challenge to understand how the libraries are laid out. This has sometimes resulted in functionality being added in the wrong place. To help avoid that in the future this commit documents the high-level dependencies for easy reference in lib/Makefile.am. It also simplifies a few things. - Switched libzpool dependency on libzfs_core to libzutil. This change makes it clear libzpool should never depend on the ioctl() functionality provided by libzfs_core. - Moved zfs_ioctl_fd() from libzutil to libzfs_core and renamed it lzc_ioctl_fd(). Normal access to the kmods should all be funneled through the libzfs_core library. The sole exception is the pool_active() which was updated to not use lzc_ioctl_fd() to remove the libzfs_core dependency. - Removed libzfs_core dependency on libzutil. - Removed the lib/libzfs/os/freebsd/libzfs_ioctl_compat.c source file which was all dead code. - Removed libzfs_core dependency from mkbusy and ctime test utilities. It was only needed for some trivial wrapper functions and that code is easy to replicate to shed the unneeded dependency. Reviewed-by: Ryan Moeller <ryan@ixsystems.com> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Don Brady <don.brady@delphix.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes openzfs#12602
Motivation and Context
#12589
Description
Exploding
refcount_add_many(foo, N, ...)
intofor (i=0; i<N; i++) { refcount_add(foo, ...); }
I'm pretty confident that's what was intended, between the NULL identifier and the lack of any refcount_remove_many calls anywhere in the patches that added it.
(In a separate PR, I've got a patch that lets you turn the panic down to a "whoa now, that's not ideal", but it's tangled up with some hacky things...)
How Has This Been Tested?
A round of
zfs-tests.sh -r sanity
with reference_tracking_enable=1.Types of changes
Checklist:
Signed-off-by
.