Use atomic reads and writes in code that uses double-checked locking. #819

hawkinsp · 2024-12-16T21:56:43Z

In a couple of places in nanobind we see this idiom:

nb_internals *internals_ = internals;
PyTypeObject *tp = internals_->nb_ndarray;

if (NB_UNLIKELY(!tp)) {
    lock_internals guard(internals_);
    tp = internals_->nb_ndarray;
    if (tp)
        return tp;

    // ... build tp
    internals_->nb_ndarray = tp;
}

This is the classic double-checked locking idiom, which on architectures that don't have total store ordering is racy (e.g. ARM, not x86). To use this pattern correctly, we need to use an atomic acquire load for the read outside the lock, and to use an atomic store release for the store inside the lock. These add the necessary fences to ensure that, for example, the contents of tp do not appear populated to the reader before the writer has stored them to memory.

This PR adds an include of <atomic> to nb_internals.h if free-threading is enabled. I was unable to think of a good way to avoid this, bar using intrinsics. The use of atomics seemed appropriate to me in the presence of free threading.

In a couple of places in nanobind we see this idiom: ``` nb_internals *internals_ = internals; PyTypeObject *tp = internals_->nb_ndarray; if (NB_UNLIKELY(!tp)) { lock_internals guard(internals_); tp = internals_->nb_ndarray; if (tp) return tp; // ... build tp internals_->nb_ndarray = tp; } ``` This is the classic double-checked locking idiom, which on architectures that don't have total store ordering is racy (e.g. ARM, not x86). To use this pattern correctly, we need to use an atomic acquire load for the read outside the lock, and to use an atomic store release for the store inside the lock. These add the necessary fences to ensure that, for example, the contents of `tp` do not appear populated to the reader before the writer has stored them to memory. This PR adds an include of `<atomic>` to nb_internals.h if free-threading is enabled. I was unable to think of a good way to avoid this, bar using intrinsics. The use of atomics seemed appropriate to me in the presence of free threading.

hawkinsp · 2024-12-16T22:19:35Z

I should note: another option would be to use std::call_once or similar. I avoided that because as best I can tell you do not already use <mutex> and you're working hard to avoid including STL headers.

wjakob · 2024-12-17T02:03:57Z

Thanks for catching this. This is the way to go and preferable to a std::call_once.

wjakob merged commit 20a367a into wjakob:master Dec 17, 2024
31 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use atomic reads and writes in code that uses double-checked locking. #819

Use atomic reads and writes in code that uses double-checked locking. #819

hawkinsp commented Dec 16, 2024

hawkinsp commented Dec 16, 2024

wjakob commented Dec 17, 2024

Use atomic reads and writes in code that uses double-checked locking. #819

Use atomic reads and writes in code that uses double-checked locking. #819

Conversation

hawkinsp commented Dec 16, 2024

hawkinsp commented Dec 16, 2024

wjakob commented Dec 17, 2024