Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Flush to manually unblock Read/ReadInto #1491

Merged
merged 2 commits into from
Jun 24, 2024

Conversation

brycekahle
Copy link
Contributor

When using WakeupEvents for perf buffers or BPF_RB_NO_WAKEUP for ring buffers, you can enqueue data that does not immediately trigger a reader blocked in Read/ReadInto to wakeup.

This PR adds a function Flush to manually cause those blocked readers to wakeup. Further, a sentinel error FlushCompleteError is returned when all the rings have been read, indicating all the data already queued when Flush was called has been read.

@brycekahle brycekahle requested review from florianl and a team as code owners June 18, 2024 21:03
@brycekahle brycekahle force-pushed the bryce.kahle/perf-flush-end branch 2 times, most recently from 88a6afe to 9782078 Compare June 18, 2024 21:07
Copy link
Contributor

@florianl florianl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the update! 🙏

internal/epoll/poller.go Outdated Show resolved Hide resolved
perf/reader.go Outdated Show resolved Hide resolved
perf/reader.go Outdated Show resolved Hide resolved
perf/reader_test.go Show resolved Hide resolved
p.flushEvent.read()
events = slices.Delete(events, i, i+1)
n -= 1
continue
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the behaviour is that Wait will return 0, nil for a flush. But what about a case where another ring becomes ready simultaneously with flushEvent? In that case we'd get two events from epoll, would discard one and then return 1, nil. In that case we drop an ErrFlushed from Reader.Read which the user is expecting.

How about removing the event here, and doing return n, ErrFlushed from Wait instead? Then ringbuf and perf re-export epoll.ErrFlushed and the rest can mostly stay the same.

Copy link
Contributor Author

@brycekahle brycekahle Jun 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case we drop an ErrFlushed from Reader.Read which the user is expecting.

I don't follow. Flush is the one that sets up a pending error of ErrFlushed, which will still be returned after all the rings have been read.

The n return value from Wait isn't even used. Only if an error is returned does a different code path happen. We want Flush to behave as if a regular ring epoll event happened, which is what it is doing.

Copy link
Contributor Author

@brycekahle brycekahle Jun 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we actually want to ensure we read all the events in case there is both a Flush and Close in the same wakeup.

@brycekahle brycekahle force-pushed the bryce.kahle/perf-flush-end branch from e5f3960 to 6255d47 Compare June 21, 2024 19:03
@brycekahle
Copy link
Contributor Author

The test failures on Go 1.21 do not seem to be related to my PR

@lmb
Copy link
Collaborator

lmb commented Jun 24, 2024

The test failures on Go 1.21 do not seem to be related to my PR

yeah i think it's due to sometimes getting a slow GH runner or something along those lines. are you able to re-run failed runs? if not we might need to fiddle with permissions.

@brycekahle
Copy link
Contributor Author

are you able to re-run failed runs?

It didn't seem like I could.

@brycekahle brycekahle force-pushed the bryce.kahle/perf-flush-end branch from 201d1d0 to e5d823f Compare June 24, 2024 14:21
@lmb
Copy link
Collaborator

lmb commented Jun 24, 2024

@brycekahle both Flush implementations were racing with Read. I've pushed a commit which fixes this, by returning ErrFlushed from the poller as I mentioned. PTAL.

@brycekahle
Copy link
Contributor Author

Looks good

brycekahle and others added 2 commits June 24, 2024 16:49
Add a method Flush which interrupts a perf or ringbuf reader and
causes it to read all data from the ring. This is very similar to
the logic we use to check for data in the ring when a deadline is
expired. The semantics of the Read function change slightly: a
caller is now guaranteed to receive an os.ErrDeadlineExceeded
which wasn't the case when the ring contained data.

Signed-off-by: Bryce Kahle <bryce.kahle@datadoghq.com>
Co-authored-by: Lorenz Bauer <lmb@isovalent.com>
Reader has two separate locks which guard separate data.
Re-organise the fields so that it's clearer which lock protects
which fields.

Signed-off-by: Lorenz Bauer <lmb@isovalent.com>
@lmb lmb force-pushed the bryce.kahle/perf-flush-end branch from 4f6ad84 to c052c45 Compare June 24, 2024 15:53
@lmb lmb merged commit 9a10881 into cilium:main Jun 24, 2024
17 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants