Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid parsing symlinked files twice to save CPU and memory #1409

Open
radeksimko opened this issue Sep 15, 2023 · 0 comments
Open

Avoid parsing symlinked files twice to save CPU and memory #1409

radeksimko opened this issue Sep 15, 2023 · 0 comments
Labels
enhancement New feature or request performance Gotta go fast

Comments

@radeksimko
Copy link
Member

Context

As pointed out initially in #1398 the language server currently handles symlinks in way that is far from ideal from resource consumption perspective.

For example, given two modules which have files linked between them

.
├── module-1
│   ├── main.tf
│   └── shared.tf
└── module-2
    ├── main.tf
    └── link-to-shared.tf -> ../module-1/shared.tf

we spend CPU to parse shared.tf twice and also store both the AST (*hcl.File) and any diagnostics ([]*hcl.Diagnostic) twice (i.e. it ends up taking twice as much space in memory).

Proposal

Update the parser package to ensure that it looks up the AST & diagnostics for existing file - i.e. does a lookup in memdb and saves the same pointers to the map/slice.

One unfortunate edge case (effectively a race condition) which needs addressing is that the link destination (module which contains the regular file which has links pointing to it from elsewhere) may not be parsed yet at the time that the link itself is being parsed.

We can try to decouple file walking logic from the terraform/parser package to

  • walk files in a directory before scheduling jobs to discover any links
  • ensure that modules are walked in the right order

Implementation Notes

Here's a simple test which can test whether two symlinked files are are sharing the same pointer:

func TestParseModuleConfiguration(t *testing.T) {
	ctx := context.Background()

	testFsPath, err := filepath.Abs(".")
	if err != nil {
		t.Fatal(err)
	}
	testPath := filepath.Join("testdata", "modules-with-symlinks")
	alphaPath := filepath.Join(testPath, "alpha")
	betaPath := filepath.Join(testPath, "beta")

	testFs := os.DirFS(testFsPath).(ReadOnlyFS)

	ss, err := state.NewStateStore()
	if err != nil {
		t.Fatal(err)
	}

	err = ss.Modules.Add(alphaPath)
	if err != nil {
		t.Fatal(err)
	}
	err = ss.Modules.Add(betaPath)
	if err != nil {
		t.Fatal(err)
	}

	err = ParseModuleConfiguration(ctx, testFs, ss.Modules, alphaPath)
	if err != nil {
		t.Fatal(err)
	}

	err = ParseModuleConfiguration(ctx, testFs, ss.Modules, betaPath)
	if err != nil {
		t.Fatal(err)
	}

	alphaMod, err := ss.Modules.ModuleByPath(alphaPath)
	if err != nil {
		t.Fatal(err)
	}

	betaMod, err := ss.Modules.ModuleByPath(betaPath)
	if err != nil {
		t.Fatal(err)
	}

	if alphaMod.ParsedModuleFiles["shared.tf"] != betaMod.ParsedModuleFiles["link-to-shared.tf"] {
		t.Fatal("linked file mismatch")
	}
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request performance Gotta go fast
Projects
None yet
Development

No branches or pull requests

1 participant