Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ls/import/get: introduce --config #9747

Merged
merged 1 commit into from
Jul 20, 2023
Merged

ls/import/get: introduce --config #9747

merged 1 commit into from
Jul 20, 2023

Conversation

efiop
Copy link
Contributor

@efiop efiop commented Jul 19, 2023

Adds support for --config to all three commands. For ls and get, it is a purely runtime option that will merge the config you provide with target repo's configs (similar to how .dvc/config.local works). E.g.

$ cat myconfig
[remote "myremote"]
    access_key_id = 12345
    secret_access_key = 98765

$ dvc get https://github.com/efiop/mydataregistry recent-grads.csv --config myconfig

In case of dvc import, the semantics are the same, but the value is also recorded in the resulting dvcfile. E.g.

$ dvc import https://github.com/efiop/mydataregistry recent-grads.csv --config myconfig
...
$ cat recent-grads.csv.dvc
md5: 85c86eeb3afa6d1f7d11beedc644d28e
frozen: true
deps:
- path: recent-grads.csv
  repo:
    url: https://github.com/efiop/mydataregistry
    config: myconfig
    rev_lock: af22e06bbfe34d62a140437abf32f48b535944a7
outs:
- md5: f447e23a170a9f25a7799f069a1ba934
  size: 26872
  hash: md5
  path: recent-grads.csv

This is the most general and powerful mechanism that we can give, that works
for everything from reconfiguring your default remote to completely changing
your remote layout and beyond (will be handy in the future).

Fixes #2466

@efiop efiop force-pushed the fix-2466 branch 7 times, most recently from f28cb7d to 69c06fd Compare July 20, 2023 14:31
Adds support for `--config` to all three commands. For `ls` and `get`, it is
a purely runtime option that will merge the config you provide with target
repo's configs (similar to how .dvc/config.local works). E.g.

```
$ cat myconfig
[remote "myremote"]
    access_key_id = 12345
    secret_access_key = 98765

$ dvc get https://github.com/efiop/mydataregistry recent-grads.csv --config myconfig
```

In case of `dvc import`, the semantics are the same, but the value is also
recorded in the resulting dvcfile. E.g.

```
$ dvc import https://github.com/efiop/mydataregistry recent-grads.csv --config myconfig
...
$ cat recent-grads.csv.dvc
md5: 85c86eeb3afa6d1f7d11beedc644d28e
frozen: true
deps:
- path: recent-grads.csv
  repo:
    url: https://github.com/efiop/mydataregistry
    config: myconfig
    rev_lock: af22e06bbfe34d62a140437abf32f48b535944a7
outs:
- md5: f447e23a170a9f25a7799f069a1ba934
  size: 26872
  hash: md5
  path: recent-grads.csv
```

This is the most general and powerful mechanism that we can give, that works
for everything from reconfiguring your default remote to completely changing
your remote layout and beyond (will be handy in the future).

Fixes iterative#2466
@codecov
Copy link

codecov bot commented Jul 20, 2023

Codecov Report

Patch coverage: 83.05% and project coverage change: -0.02 ⚠️

Comparison is base (41194b7) 90.47% compared to head (61607e5) 90.45%.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #9747      +/-   ##
==========================================
- Coverage   90.47%   90.45%   -0.02%     
==========================================
  Files         480      480              
  Lines       36534    36572      +38     
  Branches     5252     5259       +7     
==========================================
+ Hits        33053    33081      +28     
- Misses       2887     2892       +5     
- Partials      594      599       +5     
Impacted Files Coverage Δ
tests/unit/command/test_get.py 100.00% <ø> (ø)
tests/unit/command/test_imp.py 100.00% <ø> (ø)
dvc/repo/imp.py 71.42% <33.33%> (-28.58%) ⬇️
dvc/repo/get.py 92.85% <50.00%> (-7.15%) ⬇️
dvc/repo/ls.py 94.87% <60.00%> (-5.13%) ⬇️
dvc/dependency/repo.py 92.53% <77.77%> (-5.51%) ⬇️
dvc/commands/get.py 85.71% <100.00%> (+0.34%) ⬆️
dvc/commands/imp.py 81.25% <100.00%> (+0.60%) ⬆️
dvc/commands/ls/__init__.py 90.24% <100.00%> (+0.24%) ⬆️
dvc/config.py 98.56% <100.00%> (+0.04%) ⬆️
... and 1 more

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

@efiop efiop marked this pull request as ready for review July 20, 2023 15:03
@efiop efiop merged commit 53ec0fa into iterative:main Jul 20, 2023
@dberenbaum
Copy link
Collaborator

After this change, I'm getting:

$ dvc import
ERROR: the following arguments are required: url, path
ERROR: unexpected error - 'tuple' object has no attribute 'strip'

$ dvc get
ERROR: the following arguments are required: url, path
ERROR: unexpected error - 'tuple' object has no attribute 'strip'

Also, similar to #9656, can we introduce --remote and --remote-config? I know --config is more flexible, but being able to quickly specify an alternate remote that's already configured in the repo or add credentials for the default remote seems easier (plus it keeps the cli and api aligned).

@shcheklein
Copy link
Member

Also, similar to #9656, can we introduce --remote and --remote-config? I know --config is more flexible, but being able to quickly specify an alternate remote that's already configured in the repo or add credentials for the default remote seems easier (plus it keeps the cli and api aligned).

I agree with this. I didn't have nearly enough time to spend on thinking about the task and how it can look like and the workflow. But it seems too low level for me, very low level. (and there might be a good reason).

@efiop
Copy link
Contributor Author

efiop commented Jul 20, 2023

Taking a look at that error, thanks! #9750

Regarding --remote/remote-config - yes, same as with api's config vs remote discussion, trying to get the most general mechanism out first because it works for scenarios of higher complexity (multiple remotes, remote-per-output, etc). Will add remote/remote-config to api and and cli in a separate patch (there are questions on how it should behave with multiple remotes, how to store that in dvc import and so on, hence my hesitation).

efiop added a commit to efiop/dvc that referenced this pull request Jul 20, 2023
efiop added a commit that referenced this pull request Jul 20, 2023
@dberenbaum
Copy link
Collaborator

I guess you don't want to use remote: since that refers to a remote in the workspace repo, not the import repo.

Can config: hardcode the actual values instead of a path to a config file like:

md5: 85c86eeb3afa6d1f7d11beedc644d28e
frozen: true
deps:
- path: recent-grads.csv
  repo:
    url: https://github.com/efiop/mydataregistry
    config:
      core:
        remote: myremote
      myremote:
        access_key_id: 12345
        secret_access_key: 98765
    rev_lock: af22e06bbfe34d62a140437abf32f48b535944a7
outs:
- md5: f447e23a170a9f25a7799f069a1ba934
  size: 26872
  hash: md5
  path: recent-grads.csv

@efiop
Copy link
Contributor Author

efiop commented Jul 20, 2023

@dberenbaum Totally that would work, but what I don't like about that is that you are forced to put your secret keys into your dvcfile, which is pretty bad. With --config you at least depend on a file that you can .gitignore.

@efiop
Copy link
Contributor Author

efiop commented Jul 20, 2023

I guess you don't want to use remote: since that refers to a remote in the workspace repo, not the import repo.

@dberenbaum Actually those are clearly separated. If this remote: will be under repo section - it will refer to the target repo. And if it will be in outs - it will refer to remote-per-output.

@dberenbaum
Copy link
Collaborator

Okay, I get it now. So for --remote and --remote-config, let's hardcode remote: but make --remote-config temporary (and I see how --config is useful then).

@efiop
Copy link
Contributor Author

efiop commented Jul 20, 2023

@dberenbaum I actually really like the way you did it in the example. Both remote and remote-config under config section, looks and feels nice (at least to me). Separating those will probably be uglier (esp since remote is not supported by deps schema anyway). I'll introduce a PR and then I guess we could chat further there about it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

get/import/list/etc command should accept a remote as argument
3 participants