Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
I had reported an apparently glaring performance issue with the LSP here: #99 (comment)
It turns out this was due to an inefficiency while searching for types in module children: using
range
on maps returns a copy of its values, and it seems that this in combination with.Modules()
caused entire modules to be cloned somehow (or something of the sort). I switched toModuleIds()
. In addition, I avoided cloning module children by just caching the names of module children. Thus, the issue seems to be fixed: CPU usage has gone down by 5-6x when triggering completions, and is now "normal", even in a project importing a large library such asraylib.c3l
.This could be made even more efficient by keeping a global map of type names to module pointers in the symbol table, but this should do for now.
Please consider giving it a quick spin locally first to confirm it's all right :)
How I got here
To debug this, I added a CPU profiler to the LSP with the following diff, thanks to some sources found in a Google search:12
And then, while the LSP was running, I executed, in my terminal,
curl -o analyze1.prof localhost:6060/debug/pprof/profile?seconds=20
and then proceeded to perform the operations which caused severe CPU usage and lag (triggering completions).
The saved file would then contain the information necessary to generate a flame graph of expensive function calls in the SpeedScope online app3, which I found in a blog post4:
After some research, I found a StackOverflow post5 which indicated that a repeated call to
runtime.duffcopy
indicates that a value is being copied many times, pointing to the usage ofrange
in for loops. I proceeded to try to fiddle with the outermost for loop in the lowest function in the stack, but turns out that wasn't enough, leading me to create another flamegraph which pointed to the line iterating overModules()
as the culprit, which eventually led me to replacing it withModuleIds()
with a simpleGet()
afterwards.I also created
ChildrenNames()
as an additional optimization to avoid cloning module children unnecessarily. While Indexable types are usually pointers (and I forcedBaseIndexable
to be a pointer for its indexable implementation), some types, such asType
itself, don't appear to be.After all this, I got the desired result of a fast and lean LSP. 🚀
Footnotes
"Profiling Go programs with pprof", by Julia Evans: https://jvns.ca/blog/2017/09/24/profiling-go-with-pprof/ ↩
Official documentation for
net/http/pprof
, used to spin a webserver for generation of CPU profiles while a program is running: https://golang.org/pkg/net/http/pprof/ ↩SpeedScope visualizer: https://www.speedscope.app/ ↩
"FlameGraphs for Code Optimization with Golang and SpeedScope", by Sathish Vj: https://sathishvj.medium.com/flamegraphs-for-code-optimization-with-golang-and-speedscope-80c20725fdd2 ↩
"runtime.duffcopy is called a lot" - Stack Overflow: https://stackoverflow.com/questions/45786687/runtime-duffcopy-is-called-a-lot ↩