Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rt: Fix data race #146

Merged
merged 1 commit into from
Oct 24, 2022
Merged

Conversation

ollie-etl
Copy link
Contributor

@ollie-etl ollie-etl commented Oct 24, 2022

Fixes #145. A data race was observed in Pr #144, where with high concurrency in user-space, and single threaded sqpoll in the ring, we could trigger a panic when submitting an entry to the queue

The speculation is that the submission queue is full, but no ops have yet been executed and placed in completion queue by the ring. The submit call therefore submits the queue, but doesn't free any sqe's .

The fix, which is not very elegant, does busy-polling on full.

@ollie-etl ollie-etl changed the title Fix data race rt: Fix data race Oct 24, 2022
@Noah-Kennedy Noah-Kennedy merged commit 13f7409 into tokio-rs:master Oct 24, 2022
@FrankReh
Copy link
Collaborator

@ollie-etl I wonder if the inner.submit call in the loop is causing tick to be called? I have also wondered what the logic in driver.Inner.submit is accomplishing when it calls uring.submit first and then based on its ok or err return, calls submission().sync or tick or simply returns the error. We can tell at least the inner.submit isn't returning an error else the submit_with would be failing. I think the submission sync should be done before the uring.submit so we are sure the kernel sees our changes to the submission queue.

Because of the common name submit, I was confused a few weeks ago by the runtime's on_thread_park function where it calls the uring.submit directly, not the driver's inner submit.

One or two of the reasons I was waiting for the driver rc changes to settle before seeing if how I changed things actually made sense. But what you were seeing and how you got around it is very interesting.

I don't understand how this change helped but probably I don't have my head around the implications when sqpoll is being used.

Noah-Kennedy pushed a commit that referenced this pull request Nov 5, 2022
# 0.4.0 (November 5th, 2022)

### Fixed

- Fix panic in Deref/DerefMut for Slice extending into uninitialized
part of the buffer ([#52])
- docs: all-features = true ([#84])
- fix fs unit tests to avoid parallelism ([#121])
- Box the socket address to allow moving the Connect future ([#126])
- rt: Fix data race ([#146])

### Added

- Implement fs::File::readv_at()/writev_at() ([#87])
- fs: implement FromRawFd for File ([#89])
- Implement `AsRawFd` for `TcpStream` ([#94])
- net: add TcpListener.local_addr method ([#107])
- net: add TcpStream.write_all ([#111])
- driver: add Builder API as an option to start ([#113])
- Socket and TcpStream shutdown ([#124])
- fs: implement fs::File::from_std ([#131])
- net: implement FromRawFd for TcpStream ([#132])
- fs: implement OpenOptionsExt for OpenOptions ([#133])
- Add NoOp support ([#134])
- Add writev to TcpStream ([#136])
- sync TcpStream, UnixStream and UdpSocket functionality ([#141])
- Add benchmarks for no-op submission ([#144])
- Expose runtime structure ([#148])

### Changed

- driver: batch submit requests and add benchmark ([#78])
- Depend on io-uring version ^0.5.8 ([#153])

### Internal Improvements

- chore: fix clippy lints ([#99])
- io: refactor post-op logic in ops into Completable ([#116])
- Support multi completion events: v2 ([#130])
- simplify driver operation futures ([#139])
- rt: refactor runtime to avoid Rc\<RefCell\<...>> ([#142])
- Remove unused dev-dependencies ([#143])
- chore: types and fields explicitly named ([#149])
- Ignore errors from uring while cleaning up ([#154])
- rt: drop runtime before driver during shutdown ([#155])
- rt: refactor drop logic ([#157])
- rt: fix error when calling block_on twice ([#162])

### CI changes

- chore: update actions/checkout action to v3 ([#90])
- chore: add all-systems-go ci check ([#98])
- chore: add clippy to ci ([#100])
- ci: run cargo test --doc ([#135])


[#52]: #52
[#78]: #78
[#84]: #84
[#87]: #87
[#89]: #89
[#90]: #90
[#94]: #94
[#98]: #98
[#99]: #99
[#100]: #100
[#107]: #107
[#111]: #111
[#113]: #113
[#116]: #116
[#121]: #121
[#124]: #124
[#126]: #126
[#130]: #130
[#131]: #131
[#132]: #132
[#133]: #133
[#134]: #134
[#135]: #135
[#136]: #136
[#139]: #139
[#141]: #141
[#142]: #142
[#143]: #143
[#144]: #144
[#146]: #146
[#148]: #148
[#149]: #149
[#153]: #153
[#154]: #154
[#155]: #155
[#157]: #157
[#162]: #162
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

driver::op::submit_with() panic
3 participants