-
Notifications
You must be signed in to change notification settings - Fork 66
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Immediate Rename of uploaded file sometimes fails on Windows #54
Comments
I'll see if I can find some time to look at the trace later. Did you check that, in all cases, curl refrains from sending the RNFR and RNTO before it has received the DONE? On Linux, you can even delete a file while some other process is using it. On Windows, the default policy is different. So, one way around this could be to make a WriteableFile class in file_man.cpp/h. On Unix, it should just wrap a std::oftsream. On Windows, it should use the CreateFile from the Win32 API to change the default policy to allow two processes to access the same file. |
Yes, I did, at least for 1 occurence of the problem. Here are the packet numbers from the pcap file I uploaded:
But why is it in use? The last time I tried to fix this issue, I came up with the following: fineftp-server/fineftp-server/src/ftp_session.cpp Lines 1348 to 1354 in c12589a
I did that to make sure that the file is fully written to disk AND closed before sending
Hm, sounds unsafe to rename a file that somebody has a handle to with write-access... Are you sure that this would work? |
I created a Test for my WritableFile struct. I let 100 Threads create 1000 files each in a loop and immediatelly rename them. It worked like a charm, no issues whatsoever. Maybe some other code of fineFTP is blocking the file. There are some checks that make sure that the source file exists and to not overwrite any existing file. That code was not part of the test. |
Wrt deleting or renaming a file used by another process: From what I understand, on most Linux filesystems, a filename is just a friendly name to an inode. The inode is what has the file contents. If you create a hard link to a file, you essentially just add another friendly name to the inode. If you then delete one of them, the inode is still accessible under the other name, and if you rename one of the friendly names, the underlying inode is also still there. And if you delete (that is, unlink) the last friendly name, the inode only goes away when the last process is done with the inode. I'm guessing Windows NTFS works the same way as it supports hard links, too. |
Note that there is at least one place in ftp_session.cpp where an std::ifstream lives longer than it has to: in handleFtpCommandSize(), for example, the ifstream is not closed until after the response has been sent. This could in principle lead to the problems exhibited here. Better to close the file before sending the response. |
I've just pushed a fix that seems to resolve the race condition. However, one of the newly added tests concerning a path vulnerability is now failing. |
Hmm. I was a bit too fast. It seems that rename test still fails on the github build system. I couldn't get it to fail when I ran it locally. |
The last change I did on my pull request seem to be needed no matter what: If you run the server with more than one thread, the order in which you post actions begin to matter. So, I could foresee a case in which the endDataReceiving was potentially called before the file writing was done in another thread. And I don't even think that the std::ofstreams are thread safe. So, to prevent race conditions, I think it is important that we only have one outstanding transaction per file - that is, we never attempt to post more than one action at a time that could touch on the std::ofstream. The same kind of restriction should probably be enforced elsewhere: Never post more than one action that could access the same variables as that could potentially lead to race conditions when the server is running with more than one thread. |
I'll continue debugging when I find some more time for it. |
I created a test that pre-populates the ftp dir and then only uses CURL to rename the files, without actually uploading them. CURL will however always send a LIST command to the server, which caused issues. I tracked it down to fineFTP not properly closing the data socket.
That test is supposed to remind me to fix #52 as well. Functionality-wise we don't need to be concerned about this.
That sounds like a reason for the issues we are facing.
This, too :D
Well, yeah, it's totally fine for me if things start to fail if you access the same file in parallel. But currently it fails by uploading and then renaming the same file which should behave like a serial transaction one-after-another. |
I'm reading up on the use of strands and post() vs. defer(). They may be protecting us from the race conditions I suspected. So, I'm no longer sure that my latest changes are needed. |
Using strands is necessary, I know. I am already using strands to serialize everything that must be serialized, but apparently I am doing it wrong 😑 |
I've now implemented the proposed WriteableFile that allows renaming a file. See my latest pull request (note that it may be premature to integrate it). But, as a side effect, it now seems that connection issues now arise. Perhaps because the threads are sometimes being blocked for too long? Probably not... Also, some times the renamed file ends up having 0 size. Not good. In any case, I'm able to run the tests a couple of times without problems (except for the path vulnerability test). But every now and then the upload_and_rename tests fail. I have tried to bump the version of asio to 1.28.1, but that didn't improve things. I was then going to try out the async file writing in asio 1.28.1, but then realized that this would break utf-8 support on Windows, so I decided not to go that way, after all. The benefit of async file writing would have been that the worker threads would not be blocked while writing to files. Unfortunately, I don't think I will be able to invest much more time in this during this month. |
The I had to apply a little fix to your Win32 Writable File class (it crashed on close()), but beside from that it seems to work extremely well. I am not able to reproduce any rename issues, anymore.
I am not able to reproduce that 🤔
I am not able to reproduce that as well 🤨
That's fine, you already brought us way closer to the solution. I will try to fix some remaining issues. Maybe you will find some time to check if you can still reproduce the issues you wrote about |
I currently don't have a native Windows build environment easily accessible, so all my observations wrt the tests running on GitHub. |
Oh, OK? So you are developing with that nasty Win32 API without even having a Windows available? That's impressing! nevertheless, here is my fix for your branch (yeah, the branch names are getting out of hand) I was wondering why you modified so many of the strands. Was that necessary, or is it a byproduct of you trying to fix the rename issue? If it only was a byproduct, I would try to revert a couple of those changes. |
Sorry for not documenting my changes properly. That's not my usual style. I was running out of time. I'll submit a patch with improved documentation. The reason for the strand modifications was that I wanted to eliminate all potential race conditions (and resolve some of your todo's) without the use of Mutexes. To close the data_socket from readFtpCommand, for example, it is therefore important that the close operation is done from within the same strand as the other operations on that socket. But this socket was being used in two different strands. I therefore had to merge them into one (the data_socket_strand). You can also see that I'm deliberately access the data_socket_weakptr_ from within that strand, only, as well. For the same reason: To avoid race conditions. Similarly for the command_strand_; it is now used for both reading and writing on the command socket to be absolutely sure that we don't have any race conditions. The intention is to make it easier to convince oneself that there are no race conditions. Another pull request I'm considering is to upgrade to asio 1.28.1 and make use of the async writing to files on Linux/MacOS within the implementation of WriteableFile. This will reduce pressure on the asio service threads. That is, they will have more time to service additional clients. This will not work on Windows as long as we insist on UTF-8 support on Windovs. So, on Windows, it may be more difficult to make that optimization. |
OK, I will not change it back, then. However, I will add mutexes, too for a very good reason: While you can serialize everything perfectly during execution with strands, the Destructor is not serialized with those. There is no way of knowing from which thread the Destructor is executed and whether the io_context is still alive. Thus, there must be a mutex, which prevents from sockets being closed while another thread is currently accessing it.
Would be awesome to have. But I am currently glad to have a working state, so I guess we should save that optimization for a later PR 😁 |
I'll have a look at your latest asio project later, but if you look at the comments I made in the destructor, the argument is that the destructor is only called when the last shared_ptr is dying. And for that reason, there can be no other thread accessing the sockets. You may not know which thread is calling the destructor, but you know that no other thread is accessing the Ftp_session and its sockets. |
I've now had a very brief look at the eCal service. From what I can tell, the challenges in that project are a little bit different because there is at least one extra thread outside the threads driving the asio::io_context. For example when calling stop() on the client. So, for that reason, there is a need to synchronize between the non-asio::io_context threads an the aio::io_context threads. That's not a challenge we have in the ftp_session, I think. |
You are actually right. I also took a look at fineFTP. I think it's absolutely fine this way, as there is no callback or other references to unknown user-code. |
Solved by #58 |
When uploading files and immediatelly renaming them with
RNFR
+RNTO
, the rename sometimes fails on Windows. It seems to work fine on Linux and macOS.When checking with Wireshark, one can see that fineFTP Server properly returns
DONE
to indicate that the file-transfer is successful (from server-side), but then the rename fails with the issue:"The Process cannot access the file, as it is still being used by another process".
The following pcap file shows the issue of one renamed file.
fineftp_rename_issue.zip
I assume, that Windows has some fancy algorithms in place that aim to flush the files in the background.
The following branch has tests in place that reproduce the issue easily by using multiple CURL clients: https://github.com/eclipse-ecal/fineftp-server/tree/test/memory_mapped_files
@bjuulp: If you have any more input feel free to post it here 😊
Btw: If I recall correctly, other FTP Servers like FileZilla have the same issue, but I would still prefer to have that fixed for fineFTP.
The text was updated successfully, but these errors were encountered: