Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use atomic reads and writes in code that uses double-checked locking. #819

Merged
merged 1 commit into from
Dec 17, 2024

Conversation

hawkinsp
Copy link
Contributor

In a couple of places in nanobind we see this idiom:

nb_internals *internals_ = internals;
PyTypeObject *tp = internals_->nb_ndarray;

if (NB_UNLIKELY(!tp)) {
    lock_internals guard(internals_);
    tp = internals_->nb_ndarray;
    if (tp)
        return tp;

    // ... build tp
    internals_->nb_ndarray = tp;
}

This is the classic double-checked locking idiom, which on architectures that don't have total store ordering is racy (e.g. ARM, not x86). To use this pattern correctly, we need to use an atomic acquire load for the read outside the lock, and to use an atomic store release for the store inside the lock. These add the necessary fences to ensure that, for example, the contents of tp do not appear populated to the reader before the writer has stored them to memory.

This PR adds an include of <atomic> to nb_internals.h if free-threading is enabled. I was unable to think of a good way to avoid this, bar using intrinsics. The use of atomics seemed appropriate to me in the presence of free threading.

In a couple of places in nanobind we see this idiom:
```
nb_internals *internals_ = internals;
PyTypeObject *tp = internals_->nb_ndarray;

if (NB_UNLIKELY(!tp)) {
    lock_internals guard(internals_);
    tp = internals_->nb_ndarray;
    if (tp)
        return tp;

    // ... build tp
    internals_->nb_ndarray = tp;
}
```

This is the classic double-checked locking idiom, which on architectures
that don't have total store ordering is racy (e.g. ARM, not x86). To use
this pattern correctly, we need to use an atomic acquire load for the
read outside the lock, and to use an atomic store release for the store
inside the lock. These add the necessary fences to ensure that, for
example, the contents of `tp` do not appear populated to the reader
before the writer has stored them to memory.

This PR adds an include of `<atomic>` to nb_internals.h if
free-threading is enabled. I was unable to think of a good way to avoid
this, bar using intrinsics. The use of atomics seemed appropriate to me
in the presence of free threading.
@hawkinsp
Copy link
Contributor Author

I should note: another option would be to use std::call_once or similar. I avoided that because as best I can tell you do not already use <mutex> and you're working hard to avoid including STL headers.

@wjakob
Copy link
Owner

wjakob commented Dec 17, 2024

Thanks for catching this. This is the way to go and preferable to a std::call_once.

@wjakob wjakob merged commit 20a367a into wjakob:master Dec 17, 2024
31 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants