Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

download progress? #670

Closed
deitch opened this issue Feb 4, 2020 · 10 comments
Closed

download progress? #670

deitch opened this issue Feb 4, 2020 · 10 comments

Comments

@deitch
Copy link
Collaborator

deitch commented Feb 4, 2020

When doing a pull, is it possible to get download progress? It is fairly straightforward to calculate the entire size from an image by adding up the layers sizes. If it is possible to get download updates, it would be easy to calculate percentage done.

@jonjohnsonjr
Copy link
Collaborator

I've avoided this because I'm not sure how to implement it without pulling in a ton of dependencies. Doing cross-platform tty detection properly seemed nontrivial as well. I'm also not sure how I'd plumb it through the library and what should be responsible for managing it.

I'm open to a PR if it makes sense, but I wouldn't really know where to start.

If you wanted to implement this without needing to change go-containerregistry, you could probably implement a clever http.RoundTripper that looks for */manifests/* and */blobs/* in the path of http requests/responses to get a signal for what's going on by parsing the manifests and using a TeeReader when downloading blobs.

@deitch
Copy link
Collaborator Author

deitch commented Feb 4, 2020

Oh quite. I think that outputting the status should be a client usage problem. But if the download had a way to report it without dealing with ttys and such, it would be easier.

I was thinking something like:

downloaded := make(chan int64)
err := v1tarball.WriteToFileWithUpdates(path, ref, img, downloaded)

And the various write funcs would send periodic "I have downloaded this much cumulatively" to the channel over time. It is up to the consumer to use it or not, and decide if it wants to send the output to a tty or something else.

This at least would provide a reliable basis for it.

@deitch
Copy link
Collaborator Author

deitch commented Feb 10, 2020

Any thoughts on this? I had some time to mull it over, and I like the approach. It basically lets the library wash its hands of how it is used, just says, "give me a channel, and I will send updates there."

@jonjohnsonjr
Copy link
Collaborator

I'm not really sure where this should live.

We could pass something into remote.Image that could tell us progress when downloading layers from the registry, or we could pass something into tarbal.Write... that could tell us progress when writing to the tarball.

It's unclear which is more correct or efficient (especially given the matrix of source -> sink combinations).

If the progress bar is in the reader, we can more efficiently calculate the Total size of a given layer. E.g. with remote, we know we can access the image manifest to get the blob size. With a tarball, we know we can stat the files to determine the file size.

On the other hand, the writer knows if it's going to skip certain layers or not e.g. if the tarball already has some base image layers present.

It seems to me that accurately (and efficiently) calculating progress requires some context from both the source and the sink.

Source Blob Set Blob Sizes Read Progress
remote Manifest() Manifest() http.Response.Body
tarball manifest.json tar.Header tar.Reader
...
Sink Blobs Skipped Write Progress
remote HEAD .../blobs/... http.Request.Body
tarball tar.Header tar.Writer
...

Docker has a progress package that we might want to use? https://github.com/moby/moby/blob/master/pkg/progress/progress.go

If you want to convey a lot of information like layer granularity, skipped layers, mounted layers (like docker), we probably need to come up with something similar to what they do. If you just want a total percentage (or even just bytes downloading and/or written), we could also do that... but I think we need something that we can pass into every package so that each package can signal some events. It might make sense to standardize on some events, as well. Docker's xfer package seems kind of complex, too: https://github.com/moby/moby/tree/master/distribution/xfer

@deitch
Copy link
Collaborator Author

deitch commented Feb 11, 2020

Docker's xfer package seems kind of complex, too

It is, but in the end, it does something similar to what I was suggesting: sends the Download() function a chan to which it can send updates. It uses a chan<- Progress, which type Progress is a nice struct with lots of information about totals and current and everything. I don't know how necessary that is. I think we could aim for simple that has one API call: when I ask for something to be made available (as a tar file, directory, whatever), it usually knows at that point the total bytes to be placed, and it can know progress as it goes along. I don't think we need to get too fancy with bytes per layer and such.

So maybe the channel is not chan int64 but maybe instead chan Progress where

type Progres struct {
    total int64
    downloaded int64
}

And let others deal with it. You don't really know the total until you start to download it, since you might just have a ref to an image index, which only is resolved to a particular image (and therefore its manifest, and therefore its size) once you start to download.

It's unclear which is more correct or efficient (especially given the matrix of source -> sink combinations).

I am unsure of the part that is unsure. I would think that progress only matters when downloading, which means when calling a writer, is it not? When I retrieve a manifest, progress isn't really relevant. It is only when I am writing v1tarball.WriteToFile or legacytarball.Write or v1layout.Write that progress is relevant.

Yes, we need information from the source (i.e. image info) when writing in order to get progress info, but don't we pass that into the writer anyways? E.g. if I want to pull down an image:

// getting image info, manifest, etc.
desc, err = remote.Get(ref, options...)
img, err = desc.Image()
tag, ok := ref.(name.Tag)
switch pullWriteFormat {
case FORMATV1:
        err = v1tarball.WriteToFile(writePath, tag, img)
case FORMATLEGACY:
        w, err := os.Create(writePath)
        defer w.Close()
        err = legacytarball.Write(tag, img, w)
case FORMATLAYOUT:
        ii, err := desc.ImageIndex()
        _, err = layout.Write(writePath, ii)
}

all of the source info was passed in at the writing stage anyways in the case statements.

@jonjohnsonjr
Copy link
Collaborator

I don't think we need to get too fancy with bytes per layer and such.

It would be nice to be able to represent this information somehow, though...

I would think that progress only matters when downloading,

I'm not sure about this -- we'd want progress when uploading this too (for pushing). It might make some sense to get progress when just converting between formats, but local disk -> disk stuff is probably fast enough that it doesn't matter.

which means when calling a writer, is it not?

The point I was trying to make is that it might be easier to instrument the source of bytes to get progress than in the writer path, but after giving it some thought, I don't think it really matters.

Yes, we need information from the source (i.e. image info) when writing in order to get progress info, but don't we pass that into the writer anyways?

I guess it's reasonable to just track progress on the writer side of things... I'd expect this to look something like a WithProgress option that gets passed to Write functions. I don't have a ton of free time right now to dig into this, but if you can put together a PR I'm happy to review it.

@deitch
Copy link
Collaborator Author

deitch commented Feb 17, 2020

Take a look at #675 as a starting point, @jonjohnsonjr . I can expand it as needed to the other formats; I just wanted to get the conversation moved to something tangible.

@github-actions
Copy link

This issue is stale because it has been open for 90 days with no
activity. It will automatically close after 30 more days of
inactivity. Reopen the issue with /reopen. Mark the issue as
fresh by adding the comment /remove-lifecycle stale.

@jonjohnsonjr
Copy link
Collaborator

@deitch this got closed because it was stale, but did we get to a happy point where it's ~fixed?

@deitch
Copy link
Collaborator Author

deitch commented Jan 6, 2021

I think so. This was only for tarball with #675 but that should be good enough for now. If we come across needing it elsewhere, we can worry about it then.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants