Terragrunt run-all init --upgrade is inconsistent regarding use of the shared-cache. #1875

Houlistonm · 2021-10-15T17:45:42Z

We have two use cases where we run the terragrunt init --upgrade

we topo-sort our layers and run `terragrunt init --upgrade --terragrunt-working-dir ${LAYER} (and other commands)
we run terragrunt run-all init --ugprade

For both use cases, we preload the shared cache with the set of providers we've tested and are now deploying.

Use case 1 is rock solid, finds every module in the shared cache and completes successfully. (only one layer is ever in flight)
Use case 2 is flaky. Snippets from the logs are blow

Initializing the backend...
Initializing modules... <REDACTED>

Initializing the backend...

Successfully configured the backend "s3"! Terraform will automatically
use this backend unless the backend configuration changes.

Successfully configured the backend "s3"! Terraform will automatically
use this backend unless the backend configuration changes.

Initializing provider plugins...
- Reusing previous version of hashicorp/tls from the dependency lock file
- Reusing previous version of hashicorp/local from the dependency lock file
- Reusing previous version of hashicorp/null from the dependency lock file
- Reusing previous version of hashicorp/random from the dependency lock file
- Reusing previous version of hashicorp/template from the dependency lock file
- Reusing previous version of hashicorp/aws from the dependency lock file
- Reusing previous version of hashicorp/dns from the dependency lock file
- Reusing previous version of grafana/grafana from the dependency lock file
- Reusing previous version of hashicorp/external from the dependency lock file
- Installing hashicorp/local v2.1.0...

Initializing provider plugins...
- Reusing previous version of grafana/grafana from the dependency lock file
- Reusing previous version of hashicorp/random from the dependency lock file
- Reusing previous version of hashicorp/null from the dependency lock file
- Reusing previous version of hashicorp/external from the dependency lock file
- Reusing previous version of hashicorp/aws from the dependency lock file
- Reusing previous version of hashicorp/dns from the dependency lock file
- Reusing previous version of hashicorp/template from the dependency lock file
- Reusing previous version of hashicorp/tls from the dependency lock file
- Reusing previous version of hashicorp/local from the dependency lock file
- Installing hashicorp/template v2.2.0...
- Installed hashicorp/local v2.1.0 (unauthenticated)
- Installing hashicorp/aws v3.63.0...
- Installed hashicorp/template v2.2.0 (unauthenticated)
- Installing hashicorp/tls v3.1.0...
- Installed hashicorp/tls v3.1.0 (unauthenticated)
- Installing grafana/grafana v1.13.4...
- Installed grafana/grafana v1.13.4 (unauthenticated)
- Installing hashicorp/random v3.1.0...
- Installed hashicorp/random v3.1.0 (unauthenticated)
- Installing hashicorp/null v3.1.0...
- Installing hashicorp/dns v3.2.1...
- Installed hashicorp/null v3.1.0 (unauthenticated)
- Installing hashicorp/external v2.1.0...
- Installed hashicorp/dns v3.2.1 (unauthenticated)
- Using grafana/grafana v1.13.4 from the shared cache directory
- Installed hashicorp/external v2.1.0 (unauthenticated)
- Installing hashicorp/aws v3.63.0...
- Using hashicorp/external v2.1.0 from the shared cache directory
- Using hashicorp/tls v3.1.0 from the shared cache directory
- Using hashicorp/null v3.1.0 from the shared cache directory
- Using hashicorp/random v3.1.0 from the shared cache directory
- Using hashicorp/template v2.2.0 from the shared cache directory
╷
│ Error: Failed to install provider
│ 
│ Error while installing hashicorp/aws v3.63.0: the current package for
│ registry.terraform.io/hashicorp/aws 3.63.0 doesn't match any of the
│ checksums previously recorded in the dependency lock file
╵

The AWS provider was downloaded VS used from the cache.
This is just an example of the failure, the error is random with respect to which provider(s) are re-used vs downloaded.

Version info.
Terragrunt v0.35.3
Terraform v1.0.9

The text was updated successfully, but these errors were encountered:

zen · 2021-12-07T16:23:47Z

Suffering from very similar problem

amkartashov · 2022-10-07T12:17:07Z

I believe this is a Terraform bug: it has race condition between two terraform init trying to install the same provider same version. First it calls installFromHTTPURL and it downloads to a temporary file with random name, but then it calls installFromLocalArchive and this unpacks directly to global plugins cache directory - this is there race condition occurs.

amkartashov · 2022-10-07T15:29:21Z

As per this comment - this is expected behavior, so maybe terraform init should not be done in parallel by terragrunt.

lorengordon · 2022-10-07T16:13:09Z

If a change is made to run init serially, it would be nice to put that behind a cli-arg instead of making it the default. We pre-populate the plugin_cache_dir when we run terragrunt, using a provider mirror, so the plugins are already present and parallel init calls do not step on each other.

levkohimins · 2023-12-04T12:42:42Z

This issue should already be solved. Can someone who has encountered this issue check this?

davidgwps · 2024-01-30T18:09:43Z

@levkohimins i'm still seeing this, as soon as I put back --terragrunt-parallelism 1 it's solved. It's killing me, we could have our init run so much faster :(

╷
│ Error: Failed to install provider from shared cache
│ 
│ Error while importing hashicorp/google v5.9.0 from the shared cache
│ directory: the provider cache at .terraform/providers has a copy of
│ registry.terraform.io/hashicorp/google 5.9.0 that doesn't match any of the
│ checksums recorded in the dependency lock file.
╵

levkohimins · 2024-02-12T23:10:36Z

@davidgwps,
The only way at the moment is to run two commands:

run-all init will be automatically executed sequentially for all modules, just like --terragrunt-parallelism 1
Any other command that can be executed in parallel.

We are working on the better solution #2920

levkohimins · 2024-04-10T16:09:56Z

Resolved in v0.56.4 release. Make sure to read Provider Caching.

yorinasub17 added bug Something isn't working needs-triage labels Oct 19, 2021

infraredgirl added p:needs triage Needs to be processed by maintainer and issue type / priority added and removed needs-triage labels Oct 20, 2021

amkartashov mentioned this issue Oct 7, 2022

Allow multiple Terraform instances to write to plugin_cache_dir concurrently hashicorp/terraform#31964

Open

denis256 added this to Terragrunt Roadmap Aug 2, 2023

denis256 moved this to To do in Terragrunt Roadmap Aug 2, 2023

levkohimins self-assigned this Dec 4, 2023

levkohimins mentioned this issue Feb 5, 2024

Add guidance/features for reducing disk space and bandwidth usage #2920

Open

levkohimins closed this as completed Apr 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Terragrunt run-all init --upgrade is inconsistent regarding use of the shared-cache. #1875

Terragrunt run-all init --upgrade is inconsistent regarding use of the shared-cache. #1875

Houlistonm commented Oct 15, 2021

zen commented Dec 7, 2021

amkartashov commented Oct 7, 2022

amkartashov commented Oct 7, 2022

lorengordon commented Oct 7, 2022

levkohimins commented Dec 4, 2023

davidgwps commented Jan 30, 2024 •

edited

Loading

levkohimins commented Feb 12, 2024

levkohimins commented Apr 10, 2024

Terragrunt run-all init --upgrade is inconsistent regarding use of the shared-cache. #1875

Terragrunt run-all init --upgrade is inconsistent regarding use of the shared-cache. #1875

Comments

Houlistonm commented Oct 15, 2021

zen commented Dec 7, 2021

amkartashov commented Oct 7, 2022

amkartashov commented Oct 7, 2022

lorengordon commented Oct 7, 2022

levkohimins commented Dec 4, 2023

davidgwps commented Jan 30, 2024 • edited Loading

levkohimins commented Feb 12, 2024

levkohimins commented Apr 10, 2024

davidgwps commented Jan 30, 2024 •

edited

Loading