-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Two copies of the image references parser included #16205
Comments
containers/image/v5/docker/reference has same content, but less overhead. Partial fix for: containers#16205 Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
containers/image/v5/docker/reference has same content, but less overhead. Partial fix for: containers#16205 Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
@alexlarsson I guess eventually we should catch up with (unreleased) https://github.com/distribution/distribution/commit/1c89ce5fc1cec85e0726f598bccc12b873d04de3 . . So far I’ve been preferring to keep c/image/v5/docker/reference as similar to the other one as possible, but we can diverge with a good reason. (I’m not inclined to think that < 1.8ms is a good reason, but I’m open to changing my mind on that one. Is it really noticeable in the total?) |
2msec is not a huge problem in general, but the use of 1.3 meg of ram completely unnecessary seems not great. Also, I should note that the actual code is pretty slow. Loading the shortname alias file is very slow, and the majority of that time is spent in reference.Parse(). So, rewriting this to use a hand-written parser instead of some humongous regexps might be a good idea in general. |
For a POC, https://github.com/mtrmac/image/tree/parse-manual :
which is relatively a huge improvement — OTOH 2 µs should still be dwarfed by network latency in most actual uses, and it took so many iterations to make at least the test pass, and I have very little confidence in correctness. I guess that’s … a week of work to add tests and make this actually reliable? |
Yeah, i guess not all the 1.3 MB are kept, but still, lots of churn. For the |
By lot of time, i was seeing 63msec, although 20msec of those was in the toml parser, but the rest was mainly in the reference parser. |
For the API issue, maybe one could use something like https://github.com/charlievieth/reonce to lazily initialize the regexps? |
Also, It seems a bit weird to do the regexp construction at runtime, using those helper functions. I understand that it makes things easier to write/maintain, but could maybe have that code as is and run at buildtime to just generate the final regexp strings. |
Just taking over the note from another issue that parsing short names is only an issue when using short names. No cycles are wasted parsing aliases when using a FQN. A week worth of work may very well be more effective tackling issues that pop up when using FQNs; at least in the automotive context. Yet, I am all in favor of optimizing the reference package. |
AFAICS that would still be an API break. |
That linked upstream commit doesn’t quite do that, but at least it only concatenates strings. Sure, that could be improved further to move to build time. Ideally, I’d like to see a generator that turns the regexes into parsers similar to the test branch above. That parser really is a rat nest of bug opportunities. (Or, well — alternatively, I’d just like the regex engine to be fast. It’s understandable that processing a regexp representation can be slower than hand-crafted code (although not strictly obvious, e.g. character class testing can be done with bitmaps that programmers aren’t going to prepare manually), but 7 times does seem quite excessive.) |
A friendly reminder that this issue had no activity for 30 days. |
@alexlarsson @mtrmac @vrothberg What should we do with this one? |
Keep the docker one as is, unless someone wants to sign up to forking the repo and maintaining the fork over time. WRT the c/image one, I can imagine updating it to the current distribution/distribution implementation which uses fewer regexes, or replacing the parsers entirely, but I’m not going to work on either short-term. |
Improvements on containers/podman#16205 Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
Improvements on containers/podman#16205 Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
I'm not sure what the best thing to do is here, because I understand not wanting to spend time on essentially useless things like this. However, we're currently looking at the performance of |
I’m not at all saying this work is useless, just that I have to prioritize and spending a lot of time on this is currently, short-term, not high enough vs. data corruption and business-requested features. Others might have more time. containers/image#1730 is working on reducing the overhead of one of the copies. WRT the other one … either way we are forking something (docker/distribution/reference the implementation or docker/docker the caller). Using a |
@mtrmac, I can carve out some time next week. @alexlarsson brought up the idea of c/image directly using the docker/reference regex where possible. I didn't explore it further but found the idea pragmatic and efficient. WDYT? Update: That means c/image/docker/reference using the docker/reference regex. |
Thanks!
In general, that is somewhat worrisome; the reference parser is a critical dependency of signing, and with the Go vendoring model, a completely unrelated dependency of a larger project can trigger an update of the docker/distribution subpackage that changes behavior in unanticipated ways. And in particular, containers/image#1730 (comment) is an immediate example of this concern. We can’t just switch to those regexes without a detailed investigation of the impact, and possible changes to other code. (I.e. no matter what we do about this issue, it’s not a small dependency change; it’s at least one of a non-trivial implementation effort or a non-trivial investigation effort.) |
I opened containers/image#1733 as an alternative. |
A friendly reminder that this issue had no activity for 30 days. |
I think containers/image#1733 is as good as we're likely to get without major work. Lets close this. |
Podman vendors the
github.com/containers/image/v5/docker/reference
, with the README.md saying:However, it also vendors the above mentioned
github.com/docker/distribution/reference
, including the "digestset" dependency that was apparently problematic.This is not great, because both of these libraries are very heavy during initialization. Running podman with
GODEBUG=inittrace=1
produces this:I.e. each instance creates 1.3 MB of compiled regexps.
The text was updated successfully, but these errors were encountered: