Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Terragrunt run-all init --upgrade is inconsistent regarding use of the shared-cache. #1875

Closed
Houlistonm opened this issue Oct 15, 2021 · 8 comments
Assignees
Labels
bug Something isn't working p:needs triage Needs to be processed by maintainer and issue type / priority added

Comments

@Houlistonm
Copy link

We have two use cases where we run the terragrunt init --upgrade

  1. we topo-sort our layers and run `terragrunt init --upgrade --terragrunt-working-dir ${LAYER} (and other commands)
  2. we run terragrunt run-all init --ugprade

For both use cases, we preload the shared cache with the set of providers we've tested and are now deploying.

Use case 1 is rock solid, finds every module in the shared cache and completes successfully. (only one layer is ever in flight)
Use case 2 is flaky. Snippets from the logs are blow

Initializing the backend...
Initializing modules... <REDACTED>

Initializing the backend...

Successfully configured the backend "s3"! Terraform will automatically
use this backend unless the backend configuration changes.

Successfully configured the backend "s3"! Terraform will automatically
use this backend unless the backend configuration changes.

Initializing provider plugins...
- Reusing previous version of hashicorp/tls from the dependency lock file
- Reusing previous version of hashicorp/local from the dependency lock file
- Reusing previous version of hashicorp/null from the dependency lock file
- Reusing previous version of hashicorp/random from the dependency lock file
- Reusing previous version of hashicorp/template from the dependency lock file
- Reusing previous version of hashicorp/aws from the dependency lock file
- Reusing previous version of hashicorp/dns from the dependency lock file
- Reusing previous version of grafana/grafana from the dependency lock file
- Reusing previous version of hashicorp/external from the dependency lock file
- Installing hashicorp/local v2.1.0...

Initializing provider plugins...
- Reusing previous version of grafana/grafana from the dependency lock file
- Reusing previous version of hashicorp/random from the dependency lock file
- Reusing previous version of hashicorp/null from the dependency lock file
- Reusing previous version of hashicorp/external from the dependency lock file
- Reusing previous version of hashicorp/aws from the dependency lock file
- Reusing previous version of hashicorp/dns from the dependency lock file
- Reusing previous version of hashicorp/template from the dependency lock file
- Reusing previous version of hashicorp/tls from the dependency lock file
- Reusing previous version of hashicorp/local from the dependency lock file
- Installing hashicorp/template v2.2.0...
- Installed hashicorp/local v2.1.0 (unauthenticated)
- Installing hashicorp/aws v3.63.0...
- Installed hashicorp/template v2.2.0 (unauthenticated)
- Installing hashicorp/tls v3.1.0...
- Installed hashicorp/tls v3.1.0 (unauthenticated)
- Installing grafana/grafana v1.13.4...
- Installed grafana/grafana v1.13.4 (unauthenticated)
- Installing hashicorp/random v3.1.0...
- Installed hashicorp/random v3.1.0 (unauthenticated)
- Installing hashicorp/null v3.1.0...
- Installing hashicorp/dns v3.2.1...
- Installed hashicorp/null v3.1.0 (unauthenticated)
- Installing hashicorp/external v2.1.0...
- Installed hashicorp/dns v3.2.1 (unauthenticated)
- Using grafana/grafana v1.13.4 from the shared cache directory
- Installed hashicorp/external v2.1.0 (unauthenticated)
- Installing hashicorp/aws v3.63.0...
- Using hashicorp/external v2.1.0 from the shared cache directory
- Using hashicorp/tls v3.1.0 from the shared cache directory
- Using hashicorp/null v3.1.0 from the shared cache directory
- Using hashicorp/random v3.1.0 from the shared cache directory
- Using hashicorp/template v2.2.0 from the shared cache directory
╷
│ Error: Failed to install provider
│ 
│ Error while installing hashicorp/aws v3.63.0: the current package for
│ registry.terraform.io/hashicorp/aws 3.63.0 doesn't match any of the
│ checksums previously recorded in the dependency lock file
╵

The AWS provider was downloaded VS used from the cache.
This is just an example of the failure, the error is random with respect to which provider(s) are re-used vs downloaded.

Version info.
Terragrunt v0.35.3
Terraform v1.0.9

@yorinasub17 yorinasub17 added bug Something isn't working needs-triage labels Oct 19, 2021
@infraredgirl infraredgirl added p:needs triage Needs to be processed by maintainer and issue type / priority added and removed needs-triage labels Oct 20, 2021
@zen
Copy link

zen commented Dec 7, 2021

Suffering from very similar problem

@amkartashov
Copy link

I believe this is a Terraform bug: it has race condition between two terraform init trying to install the same provider same version. First it calls installFromHTTPURL and it downloads to a temporary file with random name, but then it calls installFromLocalArchive and this unpacks directly to global plugins cache directory - this is there race condition occurs.

@amkartashov
Copy link

As per this comment - this is expected behavior, so maybe terraform init should not be done in parallel by terragrunt.

@lorengordon
Copy link
Contributor

If a change is made to run init serially, it would be nice to put that behind a cli-arg instead of making it the default. We pre-populate the plugin_cache_dir when we run terragrunt, using a provider mirror, so the plugins are already present and parallel init calls do not step on each other.

@levkohimins
Copy link
Contributor

This issue should already be solved. Can someone who has encountered this issue check this?

@levkohimins levkohimins self-assigned this Dec 4, 2023
@davidgwps
Copy link

davidgwps commented Jan 30, 2024

@levkohimins i'm still seeing this, as soon as I put back --terragrunt-parallelism 1 it's solved. It's killing me, we could have our init run so much faster :(

╷
│ Error: Failed to install provider from shared cache
│ 
│ Error while importing hashicorp/google v5.9.0 from the shared cache
│ directory: the provider cache at .terraform/providers has a copy of
│ registry.terraform.io/hashicorp/google 5.9.0 that doesn't match any of the
│ checksums recorded in the dependency lock file.
╵

@levkohimins
Copy link
Contributor

@davidgwps,
The only way at the moment is to run two commands:

  1. run-all init will be automatically executed sequentially for all modules, just like --terragrunt-parallelism 1
  2. Any other command that can be executed in parallel.

We are working on the better solution #2920

@levkohimins
Copy link
Contributor

Resolved in v0.56.4 release. Make sure to read Provider Caching.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working p:needs triage Needs to be processed by maintainer and issue type / priority added
Projects
Status: To do
Development

No branches or pull requests

8 participants