Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Clone only once per PR #2921

Closed
wants to merge 21 commits into from

Conversation

Fabianoshz
Copy link
Contributor

@Fabianoshz Fabianoshz commented Jan 4, 2023

what

Ensure that we clone the repository only once per execution, this should reduce the amount of git clones.

why

  • Reduce the amount of calls to VCS servers.

tests

references

@Fabianoshz
Copy link
Contributor Author

Fabianoshz commented Jan 4, 2023

Tests are broken, I will fix them if we agree that this is a good thing to add.

I still don't know how to solve the clone on the api_controller, it seems to rely on the command running the clone, but I have removed this from the commands, I will take a closer look later to see if can fix.

@GenPage
Copy link
Member

GenPage commented Jan 4, 2023

How about instead of making new function calls within the workingDir struct, we expand the logic of the clone based on the workspace that is passed into it? This would limit the blast radius by avoiding rewriting calls to the Clone function and making how we handle workspace copies (instead of cloning) internal to the function.

@Fabianoshz
Copy link
Contributor Author

Fabianoshz commented Jan 4, 2023

@GenPage we could do that, but if we go this way I believe it would be nice to change the Clone to something else, a name that clearly define that we only clone if needed.

@nitrocode
Copy link
Member

nitrocode commented Jan 4, 2023

@GenPage please see the linked pr #2882 (comment).

Tldr. Recloning on new workspaces and copying code from default workspace is an antipattern. We can utilize the TF_DATA_DIR per workspace to use a single clone per PR for multiple workspaces.

@Fabianoshz
Copy link
Contributor Author

Fabianoshz commented Jan 6, 2023

Hey @nitrocode I'll test the TF_DATA_DIR logic you said in your comment, I believe that we might have conflicts with the plan files if we do this, anyway I will dig in the plan/apply part of the code to try to test.

If the TF_DATA_DIR works I'm fine with going with it.

@GenPage
Copy link
Member

GenPage commented Jan 7, 2023

@nitrocode @Fabianoshz Apologies for delaying in response, I've been out sick.

I completely agree that re-cloning is an anti-pattern. the Clone function already has logic for determining if a git repo has been cloned or not and when to force a re-clone (delete -> clone again) . I was just emphasizing within the context of this PR change to keep the call stack simple as is and move the business logic of the CreateWorkspace function into Clone . Renaming a function is a noop change compared to adding new function calls.

As for the topic of the PR comment and TF_DATA_DIR, I do agree that it is a potential solution. Is there a reason for keeping separate data dirs instead of just supporting a switch in TF_WORKSPACE env var? I know this is proposed as part of how we handle workflow hooks.

@Fabianoshz
Copy link
Contributor Author

@nitrocode changed the code to use TF_DATA_DIR, did some pretty basic tests, I will test more tomorrow.

@Fabianoshz
Copy link
Contributor Author

Fabianoshz commented Jan 8, 2023

Did some more testing (who needs sleep?), looks promising:

dev
image

prd
image

default
image

❯ ls -la repos/Fabianoshz/atlantis-test/14/standard/
total 56
drwxr-xr-x 5 fabiano fabiano 4096 jan  7 23:54 .
drwx------ 6 fabiano fabiano 4096 jan  7 23:53 ..
-rw-r--r-- 1 fabiano fabiano 1478 jan  7 23:54 default.json
-rw-r--r-- 1 fabiano fabiano 2150 jan  7 23:54 default.tfplan
-rw-r--r-- 1 fabiano fabiano 1478 jan  7 23:54 dev.json
-rw-r--r-- 1 fabiano fabiano 2150 jan  7 23:54 dev.tfplan
-rw-r--r-- 1 fabiano fabiano  116 jan  7 23:53 main.tf
-rw-r--r-- 1 fabiano fabiano 1478 jan  7 23:54 prd.json
-rw-r--r-- 1 fabiano fabiano 2150 jan  7 23:54 prd.tfplan
drwxr-xr-x 3 fabiano fabiano 4096 jan  7 23:54 .terraform-default
drwxr-xr-x 3 fabiano fabiano 4096 jan  7 23:54 .terraform-dev
-rw-r--r-- 1 fabiano fabiano 1154 jan  7 23:54 .terraform.lock.hcl
drwxr-xr-x 3 fabiano fabiano 4096 jan  7 23:54 .terraform-prd
-rw-r--r-- 1 fabiano fabiano   30 jan  7 23:53 terragrunt.hcl

Will adjust tests tomorrow and mark this ready to review after that.

@jamengual
Copy link
Contributor

jamengual commented Jan 9, 2023

Well, this looks actually pretty simple, this basically relies on the TF_DATA_DIR functionality and gets rid of the recloning which is good.

@GenPage there was no reason to have multiple dirs for workspace and even talking to Luke there was not a specific reason, my only guess was that at the time it made sense but since then Terraform has changed a lot and made available this new option but Atlantis never went back and improved upon.

Technically PRs come from one repo and every PR will have a checkout/clone repo in atlantis datadir and that should suffice with any workflow that you run against the PR so having one dir for all the workspaces should not affect anything since the workspace switch is done by the TF binary.

I think this makes Atlantis more closely matching a regular TF behavior than doing its own way of workspaces.

@GenPage
Copy link
Member

GenPage commented Jan 9, 2023

Yeah, It would probably be more idempotent to use TF_DATA_DIR just to be safe.

@nitrocode
Copy link
Member

If the following directories were modified in the same PR branch and each directory had workspaces prod and dev, how many git clones would we expect?

components/terraform/eks
components/terraform/vpc

Would this PR only clone once? Would it look like this?

# TF_DATA_DIR=.terraform-prod
/Users/user/.atlantis/repos/org/infra/1/components/terraform/eks/.terraform-prod
# TF_DATA_DIR=.terraform-dev
/Users/user/.atlantis/repos/org/infra/1/components/terraform/eks/.terraform-dev
# TF_DATA_DIR=.terraform-prod
/Users/user/.atlantis/repos/org/infra/1/components/terraform/vpc/.terraform-prod
# TF_DATA_DIR=.terraform-dev
/Users/user/.atlantis/repos/org/infra/1/components/terraform/vpc/.terraform-dev

@GenPage
Copy link
Member

GenPage commented Jan 9, 2023

@nitrocode In theory it should only clone once, but the problem is with the ForceClone function. We don't pull down new commit changes from what I can tell, we delete and reclone the whole repo again.

See: https://github.com/runatlantis/atlantis/blob/main/server/events/working_dir.go#L104-L125

There is bugs reported where if someone triggers commits too quickly with autoplan on it blows up as its trying to delete the repo before re-cloning. See https://atlantis-community.slack.com/archives/C5MGGAV0C/p1672846827664109

Also see #2952

@Fabianoshz
Copy link
Contributor Author

Hi guys, I was going to finish this yesterday, but had some personal problems alongside work on another projects. I believe I can finish this later today after work.

If the following directories were modified in the same PR branch and each directory had workspaces prod and dev, how many git clones would we expect?

One clone.

Would this PR only clone once? Would it look like this?

Yes, that's exactly what I had on my tests, but I think we should test this a lot just to make sure.

@GenPage, I see that this bug is related, but maybe we can finish this PR and work on the ForceClone on another PR. TBF my interest lies on solving this clone issue so I can focus on #2882 to solve #260. And I can see these 2 solutions evolving alongside each other.

@Fabianoshz Fabianoshz marked this pull request as ready for review January 10, 2023 02:56
@Fabianoshz Fabianoshz requested a review from a team as a code owner January 10, 2023 02:56
@Fabianoshz Fabianoshz marked this pull request as draft January 10, 2023 03:17
@Fabianoshz
Copy link
Contributor Author

Fabianoshz commented Jan 10, 2023

Looks like the apply logic depends on this:

func (p *DefaultPendingPlanFinder) findWithAbsPaths(pullDir string) ([]PendingPlan, []string, error) {
	workspaceDirs, err := os.ReadDir(pullDir)
	if err != nil {
		return nil, nil, err
	}
	var plans []PendingPlan
	var absPaths []string
	for _, workspaceDir := range workspaceDirs {
		workspace := workspaceDir.Name()
		repoDir := filepath.Join(pullDir, workspace)

pullDir is the base repository, the code assumes that each directory under this directory is as workspace, but we don't have it anymore. I will look for another solution for this.

@krrrr38
Copy link
Contributor

krrrr38 commented Jan 10, 2023

I like TF_DATA_DIR idea and this impl should be default.

Just one question. When users use custom workflow plan stage which write same file, will parallel plan override each workspace files by this impl? If true, is it better to write document about it like after version v0.22.3...?

@nitrocode
Copy link
Member

nitrocode commented Jan 10, 2023

When users use custom workflow plan stage which write same file, will parallel plan override each workspace files by this impl?

@krrrr38 that's a good point. We'd need to ensure the plan file name contains the workspace name.

If true, is it better to write document about it like after version v0.22.3...?

I do not understand this proposal. Could you explain it more?

As for the topic of the PR comment and TF_DATA_DIR, I do agree that it is a potential solution. Is there a reason for keeping separate data dirs instead of just supporting a switch in TF_WORKSPACE env var? I know this is proposed as part of how we handle workflow hooks.

@GenPage The way the workspace is set, either by the command line or by the TF_WORKSPACE env var, a file within the data dir is modified to contain the value of the workspace.

If we did not use a separate data dir per workspace, parallel plans and applies would stomp over each other. By using a separate data for each workspace, we prevent that issue.

@nitrocode nitrocode changed the title Clone only once feat: Clone only once per PR Jan 10, 2023
@krrrr38
Copy link
Contributor

krrrr38 commented Jan 10, 2023

@krrrr38 that's a good point. We'd need to ensure the plan file name contains the workspace name.

I do not understand this proposal. Could you explain it more?

$PLANFILE already contains workspace name, so no problem. I think users need to migrate file generation like global.tf.

  • asis custom workflow
    • ./generate-var-by-workspace.sh > global.tf && terraform plan > $PLANFILE
  • tobe custom workflow example
    • ./generate-var-by-workspace.sh > global-$WORKSPACE.tf && terraform plan -var-file=global-$WORKSPACE.tf > $PLANFILE

This is a problem on parallel run. It might be ok to be well documented. (or this feature become plugable by user config like --working-dir-strategy)

@nitrocode
Copy link
Member

$PLANFILE already contains workspace name, so no problem. I think users need to migrate file generation like global.tf.

Thank you for confirming that.

This is a problem on parallel run. It might be ok to be well documented. (or this feature become plugable by user config like --working-dir-strategy)

Whether you use parallel run or not, you'd need to pass in the appropriate tfvars file for your workspace run.

@nitrocode nitrocode added this to the 0.23.0 milestone Jan 11, 2023
@nitrocode
Copy link
Member

Has this PR been tested with locking in these scenarios?

  1. PR with files changing in different directories using default workspace (no workspace)
    • Lock should be for each directory for the default workspace
  2. PR with files changing in different directories using custom workspace (single custom workspace)
    • Lock should be for each directory for the custom workspace
  3. PR with files changing in directory using different workspaces
    • Lock should be per workspace e.g. atlantis plan -d terraform/xyz -w custom should lock for custom and not for default. Unsure if this is the current status quo.

Either way, the locking mechanism should not be affected by this change. 🙏

@Fabianoshz
Copy link
Contributor Author

Fabianoshz commented Jan 11, 2023

Hi @nitrocode, I didn't had time to look at locking yet, right now I'm still trying to figure out how to get the workspace on findWithAbsPaths considering that we don't have the workspace directory structure anymore.

Before the code would get the workspace from the directory name and them it would use a regex to get the project name from the plan file, they are separated using a -, which works when you already know the workspace or the projectName, but currently we don't know any beforehand because we removed the workspace directories, so if anyone uses a - either on their workspace or project name we would return invalid values if using this PR code.

I will set this ready to review so you guys can have a look at the tests and maybe can help me how to solve this. For reference:

server/core/runtime/runtime.go @ ProjectNameFromPlanfile - Get's the project name for a file (it must receive the workspace)
server/events/pending_plan_finder.go @ findWithAbsPaths - Reads the pull directory and get's the files to be planned (this used to read the workspace directory to pass the correct information to ProjectNameFromPlanfile)

Edit:
About workspace locking, I've removed any kind of workspace locking, I will still look at this since locking is the main reason I'm working on this PR, I'm pretty sure that we can have a lock per directory which would be enough to avoid conflicts, but we will see when I start working on that, first I need to ensure the basic workspace logic is working at all.

@Fabianoshz Fabianoshz marked this pull request as ready for review January 11, 2023 15:59
@@ -469,6 +469,7 @@ func (c *DefaultClient) prepCmd(log logging.SimpleLogging, v *version.Version, w
fmt.Sprintf("WORKSPACE=%s", workspace),
fmt.Sprintf("ATLANTIS_TERRAFORM_VERSION=%s", v.String()),
fmt.Sprintf("DIR=%s", path),
fmt.Sprintf("TF_DATA_DIR=.%s", workspace),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before overwriting this env var, can we first check to see if the user supplied it in a custom workflow, and then if it's not set, then we can set it ourselves ?

We should have a test for this too to ensure we get the user provided TF_DATA_DIR if one is set

example:

workflows:
  default:
    plan:
      steps:
      - env:
          name: TF_DATA_DIR
          command: 'echo ".terraform-$WORKSPACE-mydir"'

    apply:
      steps:
      - env:
          name: TF_DATA_DIR
          command: 'echo ".terraform-$WORKSPACE-mydir"'

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this scenario the user-provided environments override the TF_DATA_DIR both on RunCommandAsync and RunCommandWithVersion since prepCmd comes before than these on the call stack.

}

// Remove TF_DATA_DIR for the workspace.
os.RemoveAll(fmt.Sprintf("%s/%s/.terraform-%s", w.cloneDir(r, p), path, workspace))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of hard coding this information, can we retrieve the exact TF_DATA_DIR to remove?

Copy link
Contributor

@jamengual jamengual Jan 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does seem that we do not need to set up the TF_DATA_DIR at all since Terraform will set it up if a workspace is defined automatically so in that case we will not have a TF_DATA_DIR exposed, no?

I do not think the terraform binary will set this ENV variable when switching to a workspace in the shell context

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If TF_DATA_DIR is not set, it defaults to .terraform, if it is set, it can be pulled from the env value.

I'm suggesting, instead of assuming the data dir on this line, we should have a way of retrieving it programmatically rather than guessing the path.

If the user sets the TF_DATA_DIR for instance to something that does not match the above line, then this line will fail.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I did some testing and we really need the TF_DATA_DIR for parallel jobs.

But, it's not that easy to get the TF_DATA_DIR for this function, this is executed on the atlantis unlock command (and maybe in a few more places), so by the time we reach this function the TF_DATA_DIR env is long gone.

Also the functions that have access to the final result of the TF_DATA_DIR only return a string containing the output and are being called everywhere in the code, I can create a model to represent the result containing both the output and the TF_DATA_DIR (and even more stuff related to the command result), but it will take me a few more days to adjust the code and the tests.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Uff I see what you are saying, and yes that will be a bunch more work.
Take all the time you need Fabiano.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed to draft based on this thread. Yes please take all the time you need since this is a big change. Thank you for your contribution here

@Fabianoshz
Copy link
Contributor Author

@nitrocode looking at TF_DATA_DIR documentation I could not find anything that is workspace specific, so what prevent us from not changing TF_DATA_DIR at all? I think that if we just leave the default .terraform nothing should break, actually if we change workspace on the same directory terraform will make less calls to get provider and modules configs.

@github-actions github-actions bot added go Pull requests that update Go code provider/github labels Jan 25, 2023
@jamengual
Copy link
Contributor

jamengual commented Jan 25, 2023 via email

@nitrocode
Copy link
Member

@nitrocode looking at TF_DATA_DIR documentation I could not find anything that is workspace specific, so what prevent us from not changing TF_DATA_DIR at all? I think that if we just leave the default .terraform nothing should break, actually if we change workspace on the same directory terraform will make less calls to get provider and modules configs.

The TF_DATA_DIR is workspace specific. Here's an example.

mkdir tmp
cd tmp
terraform init
terraform workspace new fabianoshz
terraform workspace new jamengual
terraform workspace select jamengual

For @jamengual workspace

✗ TF_DATA_DIR=.terraform terraform workspace list
  default
  fabianoshz
* jamengual

✗ cat .terraform/environment
jamengual

For @Fabianoshz workspace

✗ TF_DATA_DIR=.terraform-fabianoshz terraform init
✗ TF_DATA_DIR=.terraform-fabianoshz terraform workspace select fabianoshz
✗ TF_DATA_DIR=.terraform-fabianoshz terraform workspace list
  default
* fabianoshz
  jamengual

✗ cat .terraform-fabianoshz/environment
fabianoshz

Since we want to do parallel plans and parallel applies for workspaces, we must have unique TF_DATA_DIR for each dir-workspace or parallel plans and applies will stomp over each other since each terraform workspace select will change the TF_DATA_DIR/environment value which controls the selected workspace which controls the tfstate to save the resources too.

@nitrocode nitrocode marked this pull request as draft January 30, 2023 12:47
@nitrocode nitrocode modified the milestones: v0.23.0, v0.24.0 Feb 18, 2023
@nitrocode
Copy link
Member

Hi @Fabianoshz any update on this PR ? We'd love to see this get implemented for the next release (0.24.0?).

If you can, please test out your fork in your own setup for a while to make sure there are no issues. Please let us know.

@jamengual
Copy link
Contributor

I will close this since updates have been stale for a long time.
If anyone is interested in working on this, comment on this PR or contact us in slack.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
go Pull requests that update Go code provider/github work-in-progress
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants