-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parallel push fails for multi-module project sometimes fail with MANIFEST_INVALID #2230
Comments
Hi @kan-izh, Does your multi-module setup push multiple images by Jib (i.e., you have multiple sub-modules that configure Jib)? If there are multiple Jib sub-modules, do they depend on another somehow? |
@chanseokoh, There is one common jar and three jib-modules depending on the jar. So... I guess the jib-modules share same dependency layers and they may push concurrently. |
About Assuming that the blob was eventually uploaded, can you download the blob and check the contents? The blob will most likely be One way I can think of to download the blob is to use Crane. For example, |
I've reproduced it once again and found the layer in the target/jib-cache folder. And looks like the failed sha256-id was containing jib-module's JAR... So must be unique... weird. Does it help you? |
That helps, thanks. In addition to getting debug logs ( And do you own your Artifactory repo? I wonder if it is easy to check via its admin UI if a blob exists or not. |
Oh, sorry. I forgot to say you should remove |
Unfortunately I have no luck to reproduce it now... I doubt... it may be linked to a recent change we had in our code. There were some conflicting transitive dependencies (i.e. a library had two different versions and maven may pick randomly). Could it potentially clutter docker layer generation or something? |
Honestly, I have been thinking this could be a server issue. For one, we've observed a Quay server malfunctioning and returning "500 Internal Server Error" when there is a high-level of concurrency: #1986 (comment) and #2013. At that time, I had my own theory for the Quay server malfunction. Another example is a concurrency issue (#2372 (comment)) in Sonatype Nexus. Our code is designed in the way that the thread to push a manifest can run only after all the threads that upload each blob get a concrete confirmation from the server that the blobs are all uploaded successfully or they already exist. (The manifest upload is the final network interaction with the server.) The fact that Jib started to upload a manifest means that there was no error or exception in previous operations and all seemed normal. If there were any anomaly, a Jib execution would just crash and the manifest upload cannot start at all. But yet, the server seems to say that some layer is absent or not found when Jib tried to upload the final manifest. That is, the server seems to think a blob we specified in the manifest doesn't exist (and hence MANIFEST_INVALID).
So I was trying to see, when this happened, if the layer did exist or not in the registry. Immediately running a second Jib push with debug log would have revealed what the server responds when Jib asks if the blob already exists in the server in the second run. |
So I think changing your application or custom base image to create new layers to push may increase the chance to hit this. Emptying the registry repo may help, but some registry may retain blobs internally for a while, so I think creating new layers is ideal. |
Closing as we haven't heard back from the user. Also the user doesn't seem able to reproduce this after some change on their side. But I suspect this is a bug in the registry (Artifactory). |
Yes, seems it was related to Nexus bug, maybe this one https://issues.sonatype.org/browse/NEXUS-20640 I did not see it after we had Nexus upgraded. |
Environment:
Description of the issue:
Sometimes jib:build fails when running parallel build for a multi-module project. I use "mvn -T4" for four running threads.
Expected behavior:
Push images parallel
Steps to reproduce:
That's hard... It happens sporadically as any multithreading issue. Non-parallel build apparently works fine.
Log output:
Additional Information:
I don't use OCI anywhere in configuration, looks like error text is misleading.
Not sure if it is our artifactory fails or the Jib plugin may not handle pushing shared layers between projects.
The text was updated successfully, but these errors were encountered: