-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
--mount=type=cache id should not default to the target #1706
Comments
This is expected. Specific contents of the cache mounts is never guaranteed and you build should always work with any state of files, just with different performance implications. Mostly this is used for package managers or specific language cache. Eg. for go, all projects should share the go build cache as it is safe, same for npm tarballs, apt etc. If you want project specifics then make sure to use ID parameter. |
But I think the problem is that the target directory is somewhat arbitrary. For this approach to work (i.e. for people to benefit from shared caches), you would need everyone who builds Go projects to agree on the cache directory being, say, This is somewhat related to my comments in #1512 (comment) and #1705. Because ultimately you need/want the caller to have to specify the location of caches, not the Hence my conclusion that id defaulting to the target directory is not actually that useful, and indeed likely more harmful. I therefore think |
No,
Even when #1512 is added it will be a special case for advanced usage. Mostly so that you can transfer your cache or reuse it with a new buildkit instance. All the dockerfiles need to work without passing any specific location. ID defaults to target path because that is the expected behavior for 95% of the practical valid uses of the cache mounts. |
For But here in lies the problem because this is what the default Which means that the actual module cache in effect is
will not result in anything being written to/read from the cache. And this is effectively the problem I'm trying to get at. Hence why I think pushing the responsibility back to the caller is the right thing to do here (which I why I think there is a connection to #1512). The current approach does not allow me to say "oh and where you need a Go build cache use location X". It relies entirely on convention/identical setup between build containers and that (in my limited experience) doesn't feel particularly robust. Instead, if you make this an optional part of the "API" of such a build container, then the user can pass arguments that satisfy that API in much the same way that
I agree with the sentiment of this point, the bit I'm not clear on is how accurate this 95% figure is, especially given the example above. The issue being that for the other 5% (which I think is likely much higher) there is a cost to bear in terms of cache misses, and critically no way of fixing the issue. How about an approach that combines the best of both worlds? i.e. one where the caller can optionally provide an argument to parameterise the cache source; if not provided then it defaults to the target directory. Incidentally my main use case for #1512 and fixing this issue is CI, where as the caller I need to be able to say where caches should be sourced from (because those caches are themselves cached as part of a CI run). |
That is why if you use the official golang image the "standard" is https://github.com/moby/buildkit/blob/master/Dockerfile#L87-L88 instead.
Yes, if you want to use IDs it is harder because you need to set it and maintain that different components use the same strings. But you want to make it harder for everybody. With ID you can also set it to a path if you want to share cache with existing Dockerfile that you can't change in a different target path. If you want to share cache with a Dockerfile that you don't control you will always have this problem that the Dockerfiles need to agree that they want to share cache. With custom IDs it would just be a huge mess and almost never work as paths are much more stable than random identifier strings. The use case to use IDs is to set it to something project-specific if you want to make sure your different projects don't share some cache artifacts. The possibility that you maintain 2 projects that use the same application cache but on a different path is pretty much non-existent. |
Indeed, but my point is that another Go-based build image will not necessarily use that same approach, and so you end up with multiple modules caches (because they did not override
I agree with this, you make the point well that the choosing of IDs that can't be validated in any way is hard. The benefit of the use of target path is that it is validated, by virtue of it corresponding to a location on disk. Hence the current approach of defaulting the
This is exactly the problem, but also that it takes control away from me, the caller because I can't override this. Which is why I wondered in #1706 (comment) whether we can take a slightly different approach here. But, and apologies if this was obvious to you before, I think the penny has just dropped for me with respect to #1512. My comment in #1706 (comment) about parameterising the cache source is, I think, exactly what is proposed in #1512. In that issue, given:
you propose something like:
Given our discussion above, I would instead declare my
Using the current definition of
If so then I think this absolutely satisfies my need to be able to "inject" a different cache strategy as the caller. |
Yes, that would still work. All cache mounts have an ID, just the user doesn't need to type it in most cases. |
Then I think I will close this issue because my question/concern here is effectively covered by #1512 (modulo one further question of whether one can provide an "offset" into a volume). Thanks very much for engaging to talk this through. |
I asked the following question on Docker slack in
#buildkit
:@tonistiigi kindly replied with:
My conclusion from this is that if I have two separate
Dockerfile
's, where both use--mount=type=cache,target=/blah
, they will share the same cache, even thought their usage of that could be entirely different/conflicting. This seems somewhat surprising (dangerous?) as a default.Would it instead be better to require an
id
for--mount=type=cache
? That way, the caller can ensure caches are "re-used" when expected, not by accident.Somewhat related to #1512 because in that case the volume itself would be the
id
.The text was updated successfully, but these errors were encountered: