-
Notifications
You must be signed in to change notification settings - Fork 39
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Which filesystem events correspond to which event::effect_type values? #39
Comments
I want to look more into 1 and 3 before I respond fully. However, here are my thoughts so far: For 1, my sense is that the events reported are accurate -- At least from the kernel's perspective. However, there are some platform-specific oddities which we smooth over when we can. Maybe this is one which we can add logic to handle. (For example, on Darwin, we keep a map of seen created paths, and only report new create events after the path has been removed, because that system tends to over-report create events for directories.) For 3, my sense is that we aren't watching the file, so the event isn't being reported to us. I will need to confirm that there is no information available to us that we are missing. |
2, 4 and 6 are accurate and intentional as-is. 5 is typical, but not necessarily correct. I will look into it. I have also seen plenty of "duplicate" modification events from editors. I should look into whether or not they are valid. One way to test this is to match the events to a system-call trace of the editor, or just tee. However, I have thought of those events as accurate, a reflection of extra work done by the editors. |
The interesting one -- Properly handling rename events -- Has been on my mind for many months now. I would appreciate your thoughts on how to handle them. One of the design goals of this project is to be simple (for users to use). Representing rename events (same as move events) is, in my mind, tricky to do ergonomically for the user. Some options:
|
Update on 1. Impression is that there's either (unlikely?) a bug in how the kernel reports modify events, or (more likely?) one of the syscalls used for opening the file for writing is causing a modify event. The watcher is getting a "duplicate" event here:
While, from echo+redirection's perspective, it only wrote once:
In particular, I'm wondering if either the |
Update on 3.
It seems that when using the inotify adapter on linux, we don't receive an event for files which are moved-from an unwatched directory into a watched directory. This behavior is different when using fanotify -- We do get the rename event when a file is moved-from an unwatched directory into a watched directory. To be clear, the library doesn't even seem to be notified that there is an event in this case when we're using inotify. If you need a more precise watcher on linux, fanotify is generally the better choice. That adapter has unlimited file descriptors and fewer quirks. The downside is that you need to have root privileges. The fanotify adapter will always be used if you have a kernel version greater than or equal to 5.9.0 and you have root privileges (i.e. run with sudo). I still want to look more into any quirks with inotify that might let us have that event. I'm not sure of any off the top of my head. Maybe there's a way... |
Hi @e-dant, thanks for your thorough analysis! Here are my thoughts on your findings.
That would be appreciated, though it is quite easy to solve from the user side too. If the logic gets too complicated, it might be best just to leave this for the user to deal with. In that case, I would suggest stating in the documentation that copy commands cause 2 events and not 1.
Thanks for the info, it might be best just to leave this to the user. In my case, I will simply keep track of file checksums and compare them on the modify event to double-check if there are any changes that need to be addressed. Users could also use event timestamps and filter out any duplicate modification events that occur within 10ms of the last modify event. It might be useful to add to the documentation, that modify events caused by editors will result in duplicates. That way, users know to watch out for such behavior.
I don't think cookies would be very easy to use for the user. As you state yourself, users would have to keep track of them and that's extra work for the user, Refactoring I think adding an extra Unless, of course, more information needs to be presented to the user. Does
Good to know that. It would be very useful to document when each adapter is used for Linux systems and what the difference between them is. |
I'll update the documentation as you suggest. I began work on rename-from/rename-to events. Currently, the implementation is storing a pointer to an associated event (within the existing event object). That might change. I want to leave room for other kinds of associated events -- Not just rename events. |
I mentioned here: #40 (comment) That the linux adapters (on the It will probably be a few more weeks before this is released and up on Conan. If it is more convenient for you, using the header off the |
Your feedback is welcome about the current implementation, especially from an API perspective. Do you have more thoughts? Does the pointer to an "associated" event seem ergonomic from a user's perspective? Are there any notes or pain-points you might come across? |
Hi @e-dant, thanks for your work! I created a super simple executable to play around with your new changes to see what works and what doesn't. You can find it here. I hope it's okay that I copy-pasted it, if you want, I can use git modules to link to your sources, it's just copy-paste was easier. Anyways, here are my findings:
|
I think a shared pointer sounds good here. The non-const unique pointer would be nice, maybe ideal, except that I don't want users to get tripped us with this:
So, I think a shared pointer seems good.
I thought about that a while ago, but I was unsure if it was possible for a (particularly odd) system to go back in time since before epoch. I'm not sure. The time types are usually signed (a-la time_t, and I think most of the time representations chrono uses as well). I have no idea when, in a running system without any other issues, it would be possible to go back in time before epoch. Maybe in a container or VM migration the system clocks would be all out of whack, but I'm not even sure about that one. If it's possible to go back before epoch, It's definitely not common. But, lots of those time types are signed, so I can only assume there's some reason for it. It would probably be fine to change the type we use to a size_t, or better:
I think we can and should do that.
I agree, and I think this would be useful in some cases. Maybe there's room for an alternative API that does give the user more control and inspectability. I don't want that to affect the "fast path" to get a user up and running. Maybe this is an exception, I'm not sure. In either case, I think we can push that a release or two down the road.
These are (in all the cases I've seen them) "destructive" rename events; A rename-and-overwrite. (Did A very similar pattern happens on Darwin. You can see my full notes about them here. Most of those notes apply to the Linux adapters. I'm not sure if we should filter out and ignore the special case of a destructive rename, indicate a destroy event immediately before the destructive rename event, or just filter it out. It might not always possible to get the full picture of a destructive rename event; We can peek ahead for all the events we know about, and we can almost always catch that pattern when we see it -- but it's not guaranteed that very closely timed events end up reported in the same call to For now, we're just reporting them as-is. What do you think we should do there?
Sounds like a bug. I think I see it, a missing flag -- TYVM
I completely agree. It's a pain point, and I'm not sure how best to handle it. I personally think that CamelCase types make more sense, especially in this context, and I have no problem changing the type names in the future. I am also perfectly happy to rename the fields and values so that they are not shadowing one another. I would prefer to lump that along into a release or two in the future, though. |
Thank you very much for your feedback |
Hi @e-dant, great summary! Regarding rename events, I try to ensure that it is a pure rename and not rename-and-overwrite. I do that as follows:
Here is a bit more detailed explanation: Click me
Observing /test_watcher_changes/observed
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Event Type: Create
Path Type: Watcher
Path Name: s/self/live@/test_watcher_changes/observed
Effect Time: 1698747739951805717
Associated event:
None
================================================================================
Event Type: Create
Path Type: File
Path Name: /test_watcher_changes/observed/new_file
Effect Time: 1698747785194756460
Associated event:
None
================================================================================
Event Type: Rename
Path Type: File
Path Name: /test_watcher_changes/observed/new_file
Effect Time: 1698747811846163594
Associated event:
Event Type: Rename
Path Type: File
Path Name: /test_watcher_changes/observed/old_file
Effect Time: 1698747811846152650
Associated event:
None
================================================================================
Event Type: Rename
Path Type: File
Path Name: /test_watcher_changes/observed/old_file
Effect Time: 1698747811846289196
Associated event:
None
================================================================================ I ran Regarding
I think that is the correct approach. At least from my point of view, this library just reports various filesystem events as they are reported by the system. If the user needs to ensure that the file/directory really did change, then the user should keep a map of monitored files and their hashsums to double-check against. It's just a bit tricky to know which filesystem events trigger which |
Update on this. The issues arounds rename events should be mostly resolved. I plan to release soon. Two of the issues I want to look into a bit more and grab your feedback on.
I went ahead and started work on this along with the groundwork for a user-selectable adapter. The implementation reorganizes the directory tree a bit (to elide some namespaces), changes the watcher's return type to a result-style type, and places all the watch adapter symbols in the final binary, but stubbed out to return an "unsupported" result. I think that only being able to observe the currently-running would leave some people wanting to configure it. This adds about 17k to the final binary and adds a tiny bit of overhead scrolling through function pointers to a supported watcher, if one isn't given. Both are probably negligible, but it does feel a bit like feature creep. Not sure about this.
This one would require keeping a map of On linux and, perhaps, windows, this is a feasible thing to do. We already have data structures for keeping bits of information on paths. On some linux adapters, for example, we keep file descriptor to path name maps. I'm worried about the darwin implementation of this. Not because of its feasibility, but safety. I'm convinced there's a bug somewhere in darwin's dispatch implementation, or just fsevents not behaving properly under extreme load, that leads to calling into a queue after it has been released. That's bad, and leads to a segfault. It's an extraordinarily rare issue only observable on some of the very high-throughput performance tests we run, but it exists. The current workaround is a 1-millisecond sleep after we've asked fsevents to stop, which is ugly. I am in the process of removing the context from the darwin adapter. If it's stateless, that issue from dispatch doesn't exist. Would be hard to get those owner events without a context. |
Hi @e-dant! Thanks for the info and sorry that I did not reply for a long while, I did not have my work setup to test your changes during the holidays. I am glad to confirm that the latest commit on the next branch, 3a45a7f does indeed fix the renaming issue. Now I get one main 'rename' event for the old element name and one associated 'rename' event with the new name in it. However, I noticed that you still use Regarding the user selectable adapter. That does sound like feature creep. I would first implement the reporting functionality and only then gauge if users want/need to choose which type of adapter to use at runtime. From my point of view, it should be a compile-time decision rather than a runtime one. I don't see the need to change the adapter at runtime, only logging the type for debug purposes. If it's not hard to do and you want to implement it, my suggestion would be to add a cmake option that enables this at compile time, thus letting the user decide if this feature is desired and warrants the added binary size. Also, could you link to the code that is suppose to do this? Regarding the Another option would be to have simple I am not sure how often a use-case for Finally, do you have a timetable when the next release will go live? |
I forgot one thing, love the new One little thing though, I would avoid overcomplicating the documented example. Instead of declaring the |
Glad to hear it! And no, I don't recall any issue making it a shared pointer. I think we can do so.
It's almost all done during compile-time. There is only one adapter used on Windows and Darwin. These folders contain inline namespaces, so the adapter::watch function, when invoked, selects the function defined once in those files. There is some extra work we do for linux which selects fanotify if we have privileges for it and it's available, inotify otherwise.
Two things are holding up the release:
The safety issues on Darwin are somewhat outside of our control. The tradeoff currently staged on the The documentation just takes time, and it's something I have yet to get around to. The release was otherwise ready a while ago. I don't have a timeline for the above two items. |
I authored and a corresponding PR was merged into notify-rs in re. the Darwin bug. I often hope someone at Apple will look into it, but I can't, their code being closed and whatnot. |
Hi @e-dant, I am a bit unsure as to what
event::effect_type
values are set for which filesystem events. This might be due to my lack of experience with filesystem events, or due to my debian bookworm setup (I am using Debian Bookworm with KDE Plasma 5.27.5). Could you clarify a few points, please?cp new_file watched_dir/
command, 2 events are triggeredevent::effect_type::create
first and thenevent::effect_type::modify
. Is this intended or should only 1 event be set?mv old_filename new_filename
command, only theevent::effect_type::rename
with the old path is set. How should I get the new filename? Shouldn't aevent::effect_type::create
be set?mv new_file watched_dir/
command to move a file into the watched directory does not trigger any events at all. Shouldn't this setevent::effect_type::create
event?mv filename ..
in the watched directory triggers aevent::effect_type::rename
and notevent::effect_type::destroy
. Is this intended?event::effect_type::modify
is set twice after file modifications are saved. Is this intended?.filename.txt.swp
or.filename.txt.kate-swp
for kate) is created. This file also triggers watch events, it's not hard to filter them out by checking theevent::path_name
filename content, but shouldn't these files not trigger any watch events?The text was updated successfully, but these errors were encountered: