Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

import/get: meta-issue #9296

Closed
6 of 11 tasks
dberenbaum opened this issue Mar 31, 2023 · 7 comments
Closed
6 of 11 tasks

import/get: meta-issue #9296

dberenbaum opened this issue Mar 31, 2023 · 7 comments
Labels
A: data-sync Related to dvc get/fetch/import/pull/push A: get Related to dvc get p2-medium Medium priority, should be done, but less important performance improvement over resource / time consuming tasks

Comments

@dberenbaum
Copy link
Collaborator

dberenbaum commented Mar 31, 2023

This issue collects improvements needed for dvc import/get:

p2/research issues for performance improvements:

@dberenbaum dberenbaum added performance improvement over resource / time consuming tasks A: data-sync Related to dvc get/fetch/import/pull/push A: get Related to dvc get p2-medium Medium priority, should be done, but less important labels Mar 31, 2023
@efiop
Copy link
Contributor

efiop commented Apr 2, 2023

For the record: Edited first (fixed) and third (need scmrepo mirrors) points.

@dberenbaum
Copy link
Collaborator Author

@aguschin Any details you can add about monorepo support?

@aguschin
Copy link
Contributor

No @dberenbaum. I'll add something when I start implementing import/get I think.

@pmrowla
Copy link
Contributor

pmrowla commented Jul 19, 2023

With regard to clone caching, do we want it to be configurable? I'm not sure having the caching as default behavior is ideal.

Right now if you import/get with a specific branch or tag name, we make a shallow clone with only that specific branch or tag. If a user has a giant git repo (which seems likely in the monorepo case), it is probably going to be faster for DVC to always make a new temporary shallow clone that only contains the specific thing they want to import than it would be for us to mirror their entire monorepo once and then re-use that cached mirror for future imports.

It seems to me that it would make more sense to have the caching be optional (and non-default).

@efiop
Copy link
Contributor

efiop commented Jul 19, 2023

@pmrowla Though global caching might not imply mirroring. We have caching within the session already, maybe there is a way to just make that global as is so we cache those shallow clones in site_cache_dir. And maybe there is a way to be smart about unshallowing them when we need to.

@dberenbaum
Copy link
Collaborator Author

Moved caching clones to the bottom of the checklist. I think what @efiop suggested makes sense, but if it's overly complex and/or isn't likely to be a major performance improvement, we can skip this item.

@efiop
Copy link
Contributor

efiop commented Aug 6, 2023

For the record: all main tasks are done now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A: data-sync Related to dvc get/fetch/import/pull/push A: get Related to dvc get p2-medium Medium priority, should be done, but less important performance improvement over resource / time consuming tasks
Projects
None yet
Development

No branches or pull requests

4 participants