You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
ATM CachingFileSystem has a single bool option same_names to switch layout of files from /hash to /url-filename and thus does not leave room for "improvement":
Under heavy use of the cache use having a flat tree of files (/hash or /url-filename based) could lead to a very heavy directory so filesystem could become inefficient in listing that directory etc.
A common (look under .git/objects, same approach used by git-annex, girder etc) workaround is to establish leading directories, e.g. for a /hash it could be /hash[:2]/hash[2:4]/hash[4:] path to the file, thus reducing impact on the file system
for url-based path, it could simply be a path constructed from URI components, e.g. for http://domain/p1/p2/filename URL it could become http/domain/p1/p2/filename path, thus allowing to disambiguate between file systems etc, and also avoid conflicts for the same common filename (as I guess would be now with same_names=True).
With above in mind, I think it would have been nice if instead of same_names there was a layout={hash,hashtree,url_filename,url_fullpath} or alike, thus allowing users to switch to most appropriate layout depending on their use case.
The text was updated successfully, but these errors were encountered:
ATM
CachingFileSystem
has a singlebool
optionsame_names
to switch layout of files from/hash
to/url-filename
and thus does not leave room for "improvement":Under heavy use of the cache use having a flat tree of files (
/hash
or/url-filename
based) could lead to a very heavy directory so filesystem could become inefficient in listing that directory etc..git/objects
, same approach used by git-annex, girder etc) workaround is to establish leading directories, e.g. for a/hash
it could be/hash[:2]/hash[2:4]/hash[4:]
path to the file, thus reducing impact on the file systemhttp://domain/p1/p2/filename
URL it could becomehttp/domain/p1/p2/filename
path, thus allowing to disambiguate between file systems etc, and also avoid conflicts for the same common filename (as I guess would be now withsame_names=True
).With above in mind, I think it would have been nice if instead of
same_names
there was alayout={hash,hashtree,url_filename,url_fullpath}
or alike, thus allowing users to switch to most appropriate layout depending on their use case.The text was updated successfully, but these errors were encountered: