Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

purge events on stop #552

Merged
merged 1 commit into from
Jan 4, 2024

Conversation

e-dant
Copy link

@e-dant e-dant commented Dec 30, 2023

There are edge-cases, when many events are pending, despite the stream being stopped, that the stream's associated callback will be invoked. Purging events is intended to prevent this.

Discovered and head-banged on this one on my watcher for a while. Eventually fixed here: e-dant/watcher@3a45a7f

@e-dant
Copy link
Author

e-dant commented Dec 30, 2023

Oh, interesting. I don't think you're using dispatch queues (at least not directly), you're with the cf run-loops. I wonder if the issue is present with them.

Do you have very many-watcher open/close tests? Say, a dozen or so watchers created and destroyed very quickly with some events in the background for them may trigger it. (The fault is not deterministic, and I haven't pinned down anything in particular that triggers is, but repeatedly opening and closing watchers with some events for them in the background does make it more likely.)

If so, have you ever seen a segfault with a stack like this?

... Userland symbols here and above in garbage memory ...
4   FSEvents                      	       0x18cf54ef0 implementation_callback_rpc + 3656
5   FSEvents                      	       0x18cf54020 _Xcallback_rpc + 220
6   FSEvents                      	       0x18cf53f18 FSEventsD2F_server + 68
7   FSEvents                      	       0x18cf57720 receive_and_dispatch_rcv_msg + 316
8   libdispatch.dylib             	       0x18457f910 _dispatch_client_callout + 20
9   libdispatch.dylib             	       0x184582dc8 _dispatch_continuation_pop + 600
10  libdispatch.dylib             	       0x184596be4 _dispatch_source_latch_and_call + 420
11  libdispatch.dylib             	       0x1845957b4 _dispatch_source_invoke + 832
12  libdispatch.dylib             	       0x18459261c _dispatch_root_queue_drain_deferred_wlh + 288
13  libdispatch.dylib             	       0x184591e90 _dispatch_workloop_worker_thread + 404
14  libsystem_pthread.dylib       	       0x184729114 _pthread_wqthread + 288
15  libsystem_pthread.dylib       	       0x184727e30 start_wqthread + 8

@0xpr03
Copy link
Member

0xpr03 commented Jan 4, 2024

We currently have no tests for a ton of files. Might be interesting to reproduce this.

@0xpr03 0xpr03 added A-bug os-mac B-unconfirmed Needs verification (by maintainer or testing party) labels Jan 4, 2024
@0xpr03 0xpr03 merged commit c3929ed into notify-rs:main Jan 4, 2024
13 checks passed
@e-dant
Copy link
Author

e-dant commented Mar 9, 2024

This might be a dispatch-specific problem. I was unable to reproduce the issue with a minimal FSEvents+CFRunLoop-based implementation or with this library.

A minimal-ish FSEvents+Dispatch-implementation seems to reliably reproduce the issue, given enough invocations. There's a file here: https://github.com/e-dant/watcher/blob/release/etc/wip-fsevents-issue/main.cpp which, although there's ample debug logging and commented blocks about, after a few hundred runs and enough events bottled up does seem to hit the issue.

As best I can tell, there's some inconsistency between a user asking FSEvents to stop processing events and it actually doing so. Particularly, lots of events over a short period of time being received by very short lived watchers seems to be ill-handled by FSEvents when it's scheduled with dispatch.

I think we should revert this PR.

If you're interested and if I have some more spare time this weekend, I could get some basic tests for rapidly opening and closes some watchers. If so, do you prefer tests in the module or in the tests directory?

@0xpr03
Copy link
Member

0xpr03 commented Mar 28, 2024

do you prefer tests in the module or in the tests directory

If possible in the tests directory, what ever works better.

Is there a strong reason to revert it ? If it changes to behavior for users, we can leave it as an attempted improvement.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-bug B-unconfirmed Needs verification (by maintainer or testing party) os-mac
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants