Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor ebpf_ring_buffer to remove lock #4204

Open
wants to merge 16 commits into
base: main
Choose a base branch
from

Conversation

mikeagun
Copy link
Contributor

@mikeagun mikeagun commented Feb 12, 2025

Remove lock from ebpf_ring_buffer and use acquire/release and atomics to ensure safe ordering.

Description

Removes the lock from ebpf_ring_buffer and uses acquire/release semantics (and one compare exchange) to ensure safe ordering in producer record reservation.

The ring buffer uses the same producer and consumer offsets, but now uses acquire/release and atomic operations to ensure safe ordering of updates to the offsets and record headers.

This change also updates the record header to match the 64 bit header used on linux.

There is no effect on the public API (the ring buffer record struct is changed but that isn't currently exposed by libbpf).

Producer Algorithm

ebpf_ring_buffer_reserve

This function ensures safe ordering of producers.

After the call to reserve, the producer can write to the record and then submit or discard.

The lock bit in the record header will be set when reserve returns, and will stay set until it is submitted or discarded so the consumer knows when it can read the record.

  • This doesn't affect current consumer code which uses the libbpf callback-based API.
  1. ReadAcquire producer reserve offset.
  2. If there isn't space left return EBPF_NO_MEMORY.
  3. Calculate new reserve offset = reserve offset + padded record size.
  4. CompareExchange reserve offset with new reserve offset.
    • If another thread incremented the reserve offset before us, goto step 1.
    • If we succeeded, we have allocated the space for the record.
  5. WriteNoFence header of record at reserve offset (original offset before we added record).
    • Locks the record by writing record.header.length = record length | 1<<31
    • NoFence here because the only requirement is that the header is written before step 7.
  6. Spin on producer offset until it matches reserve offset
    • We need to wait until any previous records are locked to update the producer offset.
      • Several threads may have advanced the reserve offset but haven't finished writing their record headers yet.
    • This loop also orders producer offset updates.
      • Without this the producer offsets from 2 concurrent producers could happen out of order and corrupt the ring buffer.
    • This is at dispatch, so at worst we are waiting for N-1 threads to get from step 5-7 for N cpus.
      • This algorithm works at passive if ALL threads are passive and there is a yield in the spin loop.
  7. WriteRelease producer offset = new reserve offset
    • Using Release ensures this write is visible after the record header write.

ebpf_ring_buffer_submit

  1. WriteRelease record header to clear lock bit.
    • The release ensures the record data is visible before the record is unlocked.

ebpf_ring_buffer_discard

  1. WriteNoFence record header to set discard bit and clear lock bit.

ebpf_ring_buffer_output

  1. ebpf_ring_buffer_reserve
  2. memcpy
  3. ebpf_ring_buffer_submit

Consumer Algorithm

Currently ebpf-for-windows only exposes the libbpf callback-based consumer API, so there is no effect on consumers.

The new algorithm also supports lock-free polling consumers at the ebpf_ring_buffer level by taking advantage of the serialization of record header and producer offset updates done in ebpf_ring_buffer_reserve.

  1. If consumer offset == producer offset the ringbuf is empty.
    • Poll until consumer offset != producer offset to wait for next record.
  2. If the record at the consumer offset is locked, we are done.
    • It is possible later records are ready, but the consumer must always read the next record.
    • For an actively waiting consumer, you can now poll the lock bit of the current record until it is ready.
  3. If the current record has not been discarded, read it.
  4. WriteNoFence advance consumer offset to next record and goto step 1.
    • Add data length, header length (8 bytes), and pad to a multiple of 8 bytes.

Testing

Updated existing ring buffer tests and added new stress tests to test the synchronization changes.

Documentation

Algorithm documented in ebpf_ring_buffer.c.

Installation

N/A

Michael Agun added 2 commits February 11, 2025 16:49
@mikeagun mikeagun changed the title Refactor ebpf_ring_buffer and redesign synchronization Refactor ebpf_ring_buffer to remove lock Feb 12, 2025
@mikeagun mikeagun marked this pull request as ready for review February 13, 2025 00:52
REQUIRE(remaining_failed_returns == 0);
}

TEST_CASE("ring_buffer_stress_tests", "[ring_buffer_stress]")
Copy link
Contributor Author

@mikeagun mikeagun Feb 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where should the stress/speed tests go, or should they be in a new exe? We have ebpf_stress_tests, but that is closer to end-to-end testing while these test the internal ebpf_ring_buffer data structure.

If they stay here then they should probably be disabled by default -- there are already unit tests for ring buffer, these just try to hit any race conditions and take a while to complete.

(Also, there are print statements and comments in the stress tests that were useful during development but should be removed/reduced before merging -- depending on where these tests go they should be updated accordingly).

// - We only advance producer offset once the producer_offset matches the producer_reserve_offset we
// originally got.
// - This guarantees any records allocated before us are locked before we update offset.
while (reserve_offset != ReadULong64Acquire(&ring->producer_offset)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add comments explaining why this is sufficient to ensure ordering.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated the comment above the spin loop to explain the ordering requirements and how they are fulfilled.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants