Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add LinkFileTransfer #200

Merged
merged 4 commits into from
Apr 17, 2024
Merged

Add LinkFileTransfer #200

merged 4 commits into from
Apr 17, 2024

Conversation

jl-wynen
Copy link
Collaborator

Fixes #38

First version of a file transfer that directly accesses files if they are available.

@jl-wynen jl-wynen requested a review from nvaytet April 12, 2024 14:21
@jl-wynen jl-wynen force-pushed the link-file-transfer branch from 076557e to b7980d0 Compare April 12, 2024 14:27

.. code-block::

/downloads/file1.dat -> /dataset/source/file1.dat
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we be making symlinks or should we be making copies?
I guess when you are using a sftp file transfer, you make a copy. And then you are allowed to manipulate/modify the file you get as you wish.

If we make symlinks to read-only files, we won't then be able to do the same.
This would imply that using a different file-transfer would give you different results, even if the file you are targetting is the same. Does it seem inconsistent?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. It is a bad idea to modify the files regardless because that can ultimately lead to derived data that was produced from different files than it claims. E.g., reduced data that is uploaded to SciCat and linked to an input dataset even though it was produced from a different file. (The same happens with any storage solution that links back to the input.)

So given this, I think that enabling this behaviour is not worth the cost of copying files. Do you agree?

tests/transfer/link_test.py Outdated Show resolved Hide resolved
con.download_files(
remote=[RemotePath(str(remote_dir / "text.txt"))],
local=[local_dir / "text.txt"],
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you also need a test for when the local directory does not exist?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure. The file transfers never create the directory, but Client does. Is this something we need to test?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, not needed.

Co-authored-by: Neil Vaytet <39047984+nvaytet@users.noreply.github.com>
@jl-wynen
Copy link
Collaborator Author

Actually needs a test to check what happens when a file already exists where it tries to create a link.

@jl-wynen jl-wynen merged commit d144b2d into main Apr 17, 2024
12 checks passed
@jl-wynen jl-wynen deleted the link-file-transfer branch April 17, 2024 13:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

FileTransfer that links files
2 participants