Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use caching for foldlevels to improve fold calculation performance #3054

Closed
wants to merge 2 commits into from

Conversation

lervag
Copy link
Owner

@lervag lervag commented Dec 11, 2024

Should resolve #3049.

One thing, though: How can I properly test that this indeed performs better than before?

@Konfekt Do you have any suggestion for how to more precisely test performance for folding?

@lervag lervag mentioned this pull request Dec 11, 2024
@Konfekt
Copy link
Contributor

Konfekt commented Dec 11, 2024

Works for me, it's a lot snappier as for larger files this is a question of seconds. One could think about retaining v:lnum in vimtex#fold#level(v:lnum), though for backwards compatibility (as I had set it explicitly in ftplugin).

I tested with a larger Tex file and

function! TestFoldingSpeed()
  let start_time = reltime()

  normal! zMzX

  let elapsed_time = reltimestr(reltime(start_time))
  echomsg "Folding took: " . elapsed_time
endfunction

sped up from 1.2 to 0.01 seconds if no change, and to 0.08 after a change

@lervag
Copy link
Owner Author

lervag commented Dec 13, 2024

One could think about retaining v:lnum in vimtex#fold#level(v:lnum), though for backwards compatibility (as I had set it explicitly in ftplugin).

Yes, I was unsure about this. The docs claim it is less efficient to pass the line number as an argument, but I'm not sure if that's really true or significant.

Works for me, it's a lot snappier as for larger files this is a question of seconds.

Well, I've tested this more thoroughly, and I think you might be wrong. I made the following test file test.vim

set nocompatible
" Use path to VimTeX here!
set runtimepath^=../..
filetype plugin on

function! Report()
  let l:sum = 0
  let l:max = 0
  for [l:key, l:value] in items(g:time_per_lnum)
    let l:sum += l:value
    if l:value > l:max
      let l:max = l:value
    endif
  endfor

  echo 'Number:  ' .. len(g:time_per_lnum)
  echo 'Sum:     ' .. l:sum
  echo 'Max:     ' .. l:max
  echo 'Avg:     ' .. l:sum/len(g:time_per_lnum)
  echo 'Repeats: ' .. len(g:repeats_per_lnum)
endfunction

let g:time_per_lnum = {}
let g:repeats_per_lnum = {}
function! FoldLevel()
  let start_time = reltime()
  let l:res = vimtex#fold#level(v:lnum)
  let elapsed_time = reltimefloat(reltime(start_time))
  if has_key(g:time_per_lnum, v:lnum)
    let g:repeats_per_lnum[v:lnum] = get(g:repeats_per_lnum, v:lnum, 0) + 1
  endif

  let g:time_per_lnum[v:lnum] = elapsed_time

  return l:res
endfunction

let g:vimtex_cache_root = "."
let g:vimtex_cache_persistent = v:false

setlocal foldmethod=expr
setlocal foldexpr=FoldLevel()
setlocal foldtext=vimtex#fold#text()

nnoremap <space>tt :<c-u>call Report()<cr>

silent edit LARGE_FILE.tex

On a file LARGE_FILE.tex with 26771 lines, I do find that it loads slowly now because the folds are computing (setting g:vimtex_fold_enabled = 0 makes it load fast). On the current master branch, the report after loading with nvim --clean -u test.vim and then doing <space>tt shows:

Number:  26771
Sum:     1.440198
Max:     3.84004e-4
Avg:     5.379695e-5
Repeats: 0

With the current perf/folding branch, I get this:

Number:  26771
Sum:     1.473739
Max:     1.340373
Avg:     5.504984e-5
Repeats: 0

When I move somewhere and start typing in the master branch, it won't repeat the fold calls on all lines. Instead, it only repeats from the position it changed something and upwards until a known state. See :help fold-expr; this is where the fold expression results "=", "s1", "a1" and so on are relevant - these require that we check the fold of the previous line and so on.

Notice, though, that the current perf/folding branch does not improve anything here significant here. In fact, it seems to break things, because we suddenly force refreshing on all lines when the lasttick is updated. With my minimal example above, if you press o, you will notice a large lag because it forces a recompute of the folds. There is no such lag on the current master branch.

@lervag
Copy link
Owner Author

lervag commented Dec 13, 2024

Or, to summarize: Vim and neovim already has a mechanism for caching fold calculations. My understanding is that the idea presented here with using the b:lasttick and a manual cache does not improve anything and in fact degrades performance, because it forces VimTeX to compute folds where it does not need to.

I will be glad to be proven wrong, but I would also be surprised.

Notice, though: zx and zX is meant to recompute the fold levels. And yes, if you did not change anything, this will clearly be wasteful - and in this particular case, this PR would improve things. However, on should in general not use zx or zX unless necessary - and it usually is not necessary.

lervag added a commit that referenced this pull request Dec 13, 2024
@lervag
Copy link
Owner Author

lervag commented Dec 13, 2024

I've added the fold timing measure script to the repo, for convenience: https://github.com/lervag/vimtex/blob/perf/folding/test/perf-folding/measure.vim

@Konfekt
Copy link
Contributor

Konfekt commented Dec 13, 2024

Well, I am surprised as well. Only loading a Lua file (see the referred PR) with folding enabled took ages in Vim without caching. On my old laptop, your large file loads in a few seconds with the performance branch, but takes felt a minute on the standard branch.

In numbers:

Number:  26771
Sum:     4.133511
Max:     3.64441e-4
Avg:     1.544026e-4
Repeats: 0

vs

Number:  26771
Sum:     43.644661
Max:     4.059075
Avg:     0.00163
Repeats: 0

I tested on Vim though.
I deemed custom fold functions on Neovim redundant thanks to Treesitter (absent on Vim).

@lervag
Copy link
Owner Author

lervag commented Dec 13, 2024

On my old laptop, your large file loads in a few seconds with the performance branch, but takes felt a minute on the standard branch.

I guess you meant "takes like a minute"?

I tested on Vim though. I deemed custom fold functions on Neovim redundant thanks to Treesitter (absent on Vim).

This is indeed very strange. Are you testing with the minimal configuration file or did you copy the test code into your full config?

On my end, I get comparable timings with both Vim and neovim. I'm on Arch Linux with neovim nightly and Vim 9.1.866.

It seems like my test code is useful, though, as it did capture the very long load time... if I were to be able to reproduce, I might figure out what is going on.

@Konfekt
Copy link
Contributor

Konfekt commented Dec 14, 2024

I guess you meant "takes like a minute"?

Yes, I guess I wanted to write takes what felt like a minute (but was a bit less).

I ran it with vim --clean -u test.vim and then doing <space>tt on Debian 12 with Vim 9.1.915

Comparing your results

Sum:     1.440198
Sum:     1.473739

the current method does not seem clearly faster, there's always an error margin.

It seems like my test code is useful, though

But does not the very simple

function! TestFoldingSpeed()
  let start_time = reltime()

  normal! zMzX

  let elapsed_time = reltimestr(reltime(start_time))
  echomsg "Folding took: " . elapsed_time
endfunction

also at least capture the time for recomputation of folds :help zx, something that we're interested in as well ? Why's that wrong?

When I move somewhere and start typing in the master branch, it won't repeat the fold calls on all lines. Instead, it only repeats from the position it changed something and upwards until a known state.

This is great as it shows that my issue vim/vim#16184 to implement this natively in Vim is already resolved. Where this is documented at :help fold-expr, though ? (Are you looking It seemingly says something opposite

Note: Since the expression has to be evaluated for every line, this fold
method can be very slow!

Of course, everyone can have a different interpretation of every when the containing set is not mentioned, say every line of all lines that changed, but the most naive interpretation is all lines of the buffer.

See :help fold-expr; this is where the fold expression results "=", "s1", "a1" and so on are relevant - these require that we check the fold of the previous line and so on.

This shows mastery of :help fold-expr but again I don't see how this restricts the foldexpr to the changed lines only.

Looking at the source (let me take Vim's, as Neovim's uses Treesitter), there's

//  src/vim.h (line 1851)
# define MAXLNUM LONG_MAX		// maximum (invalid) line number

used in

/*  src/fold.c (lines 1216-1223) */
checkupdate(win_T *wp)
{
    if (!wp->w_foldinvalid)
	return;

    foldUpdate(wp, (linenr_T)1, (linenr_T)MAXLNUM); // will update all
    wp->w_foldinvalid = FALSE;
}

called by

/*  src/fold.c (lines 836-863) */
	// Mark all folds from top to bot (or bot to top) as maybe-small.
	if (top > bot)
	{
	    maybe_small_start = bot;
	    maybe_small_end = top;
	}
	(void)foldFind(&wp->w_folds, maybe_small_start, &fp);
	while (fp < (fold_T *)wp->w_folds.ga_data + wp->w_folds.ga_len
		&& fp->fd_top <= maybe_small_end)
	{
	    fp->fd_small = MAYBE;
	    ++fp;
	}
    }

    if (foldmethodIsIndent(wp)
	    || foldmethodIsExpr(wp)
	    || foldmethodIsMarker(wp)
#ifdef FEAT_DIFF
	    || foldmethodIsDiff(wp)
#endif
	    || foldmethodIsSyntax(wp))
    {
	int save_got_int = got_int;

	// reset got_int here, otherwise it won't work
	got_int = FALSE;
	foldUpdateIEMS(wp, top, bot);

on updating folds; at least in Vim no check for changed lines springs to my eye, but this was just a hasty grep.

@Konfekt
Copy link
Contributor

Konfekt commented Dec 14, 2024

I might figure out what is going on.

I noticed that you cannot use your test.vim as is for perf/folding 833db84 as it uses v:lnum in the foldexpr; it works for master 3401dc8 though

@lervag
Copy link
Owner Author

lervag commented Dec 14, 2024

I noticed that you cannot use your test.vim as is for perf/folding 833db84 as it uses v:lnum in the foldexpr; it works for master 3401dc8 though

Oh, no, I think it should work as expected. But you will probably need to force pull. That is, I force pushed the branch a couple of times. So do git fetch then git co perf/folding then git --reset hard origin/perf/folding.

I guess you meant "takes like a minute"?

Yes, I guess I wanted to write takes what felt like a minute (but was a bit less).

I ran it with vim --clean -u test.vim and then doing <space>tt on Debian 12 with Vim 9.1.915

Comparing your results

Sum:     1.440198
Sum:     1.473739

Well, that's clearly not a significant difference.

But does not the very simple

function! TestFoldingSpeed()
  let start_time = reltime()

  normal! zMzX

  let elapsed_time = reltimestr(reltime(start_time))
  echomsg "Folding took: " . elapsed_time
endfunction

No, not really. zX will force Vim to recompute the folds. And clearly, if there was no change, then b:changedtick did not change and the cache does not need to update. But my point is that you more or less never really need to do zx or zX, so this simple test does not really test anything useful.

also at least capture the time for recomputation of folds :help zx, something that we're interested in as well ? Why's that wrong?

It is wrong because, as I said, we want to inspect the cost of recomputing folds when there actually was a change that requires recomputing anything. Users should not need to do zx or zX.

When I move somewhere and start typing in the master branch, it won't repeat the fold calls on all lines. Instead, it only repeats from the position it changed something and upwards until a known state.

This is great as it shows that my issue vim/vim#16184 to implement this natively in Vim is already resolved.

I think this is not a recent change, but how foldexpr has worked for a long time.

Where this is documented at :help fold-expr, though ? (Are you looking It seemingly says something opposite

Note: Since the expression has to be evaluated for every line, this fold
method can be very slow!

Yes; when you open a file, the expression is evaluated for every line. But this does not actually say it is evaluated for every line every time.

See :help fold-expr; this is where the fold expression results "=", "s1", "a1" and so on are relevant - these require that we check the fold of the previous line and so on.

This shows mastery of :help fold-expr but again I don't see how this restricts the foldexpr to the changed lines only.

This part of the docs partly explain how the foldexpr function works. After an initial recompute (e.g. on file open or after zx), consequent recomputes on a line (e.g. if it changed) will recursively recompute until if finds a line with a well-defined state. I.e., if I changed something on line 253, then Vim will recompute the fold. If the result is =, then it goes to line 252. Still =, then go to 251, and let's say the result is now a1, then we need to continue to 250. If the result on line 250 is >1, then the state is known and we don't need to look further.

Looking at the source (let me take Vim's, as Neovim's uses Treesitter), there's

No, Neovim uses more or less the same source for the foldexpr code. I mean, if it did not, then VimTeX's foldexpr function wouldn't work on Neovim.

If you want, I can try to study the source code to show how this built-in cache works. To be honest, I'm not very well versed in C or C++, so it'll be challenging for me. But I am quite certain I'm right here.

on updating folds; at least in Vim no check for changed lines springs to my eye, but this was just a hasty grep.

Yes, I think you are looking at the wrong parts here.


That said, you claim that you have an example/situation where the present feature in pref/folding improves the loading time for VimTeX from ~40 seconds to just ~1 seconds. That really sounds wrong to me, but if it is true, then I would be very happy if you could help me reproduce this.

@Konfekt
Copy link
Contributor

Konfekt commented Dec 14, 2024

But my point is that you more or less never really need to do zx or zX,

Nothing is needed, it's comes down to user preference.
After having positioned my cursor somewhere inside folds, hitting zx, closing folds opened elsewhere and opening those around the cursor, helps to locate the cursor globally inside the buffer.

we want to inspect the cost of recomputing folds when there actually was a change that requires recomputing anything. Users should not need to do zx or zX.

I checked above

sped up from 1.2 to 0.01 seconds if no change, and to 0.08 after a change, but found the speed-up to be still significant enough.

After an initial recompute (e.g. on file open or after zx), consequent recomputes on a line (e.g. if it changed) will recursively recompute until if finds a line with a well-defined state. I.e., if I changed something on line 253, then Vim will recompute the fold. If the result is =, then it goes to line 252. Still =, then go to 251, and let's say the result is now a1, then we need to continue to 250.

I cannot infer this from the help. We are reading it differently. I rather read this as another warning

Try to avoid the "=", "a" and "s" return values, since Vim often has to search
backwards for a line for which the fold level is defined. This can be slow.

that one should not try to compute the fold level of a line by using those of other lines.

Yes; when you open a file, the expression is evaluated for every line. But this does not actually say it is evaluated for every line every time.

What you explain sounds very sensible (and would have designed it that way), and that's why I'm surprised that something like this doesn't seem to happen. I couldn't see it in the Vim source, but what brought me here was rather that the simple Lua foldexpr (converted from Lua to vimscript in vim/vim#16151) was too slow to load on larger files, say https://github.com/skywind3000/z.lua/blob/1.8.13/z.lua before caching was added.

Yes, I think you are looking at the wrong parts here.

Proving wrong is the best way to resolve open questions on the internet

@lervag
Copy link
Owner Author

lervag commented Dec 14, 2024

But my point is that you more or less never really need to do zx or zX,

Nothing is needed, it's comes down to user preference. After having positioned my cursor somewhere inside folds, hitting zx, closing folds opened elsewhere and opening those around the cursor, helps to locate the cursor globally inside the buffer.

I don't think you need zx for that? You could use zv? Again, zx recomputes the folds, which you don't need to do. If you want to somehow reset which folds are open, you should rather use zv, zm, zM, zr, zR, or similar.

we want to inspect the cost of recomputing folds when there actually was a change that requires recomputing anything. Users should not need to do zx or zX.

I checked above

sped up from 1.2 to 0.01 seconds if no change, and to 0.08 after a change, but found the speed-up to be still significant enough.

I'm not trying to be annoying here, but: how did you measure this? If you still insist on using zx or zX while measuring this, then yes, you will see the speed-up.

After an initial recompute (e.g. on file open or after zx), consequent recomputes on a line (e.g. if it changed) will recursively recompute until if finds a line with a well-defined state. I.e., if I changed something on line 253, then Vim will recompute the fold. If the result is =, then it goes to line 252. Still =, then go to 251, and let's say the result is now a1, then we need to continue to 250.

I cannot infer this from the help. We are reading it differently.

Well, perhaps I need to read the source code to show you what I mean then. But first, let me try one more time. My understanding, both from reading the docs and from having spent a lot of time writing foldexprs and testing them:

  • Vim will compute the foldlevel for each line when a file is opened.
  • Vim will compute the foldlevel for each line when you do zx or zX.
  • For all other situations, Vim does not compute the foldlevel for each line, only for the lines from a specific change and upwards until it reaches a known state.

I claim that my measure.vim file proves this. If you use the same FoldLevel() function you will find that g:time_per_lnum will be populated with a key for each line number when you open a file. If you were to reset this dictionary with :let g:time_per_lnum = {}, then make some changes in your file, then you should notice that it will not be populated with a key for each line of the file, only for the line numbers that were needed to get a new valid fold state.

… what brought me here was rather that the simple Lua foldexpr (converted from Lua to vimscript in vim/vim#16151) was too slow to load on larger files, say https://github.com/skywind3000/z.lua/blob/1.8.13/z.lua before caching was added.

Well, in the version of the Lua foldexpr before you added the cache (based on this diff), the calculations are done by actually iterating every single line every time. That's very inefficient! And the cache you've implemented here will be a huge improvement, because even if you do refresh by recomputing the folds on each line everytime you make a change, it is still much better than it would be.

But now we are not discussing the Lua foldlevel function. Let's focus on VimTeX and its foldlevel function. I won't mind improving it, but I am claiming that the current PR will lead to a worse experience. As a proof of this:

  1. Checkout the last version of the branch perf/folding.
  2. Ensure you have VimTeX folds enabled.
  3. Open a huge TeX file.
  4. Go somewhere in the file and press o. Since this makes a change and b:changedtick is increased it will recompute the folds on each line.

On my end, this lags a lot! While on the master branch there is no lag, because Vim will not, as I've already claimed a lot of times, not recompute foldlevel on every line.

Further, the startuptime on the master branch is more or less exactly the same as on the perf/folding branch.

The only scenario where perf/folding is better is when I do zx or zX without having changed anything (so that b:changedtick did not change). And yes, in this particular case, there is a huge improvement. But I've tried to explain why I think this does not really matter, because you should avoid using zx - instead do e.g. zMzv or similar.

Now, you say "Proving wrong is the best way to resolve open questions on the internet", and I agree. I've tried to do exactly that.

@Konfekt
Copy link
Contributor

Konfekt commented Dec 14, 2024

You could use zv?

I thought about that while writing, but it doesn't close other folds so helps not as much locating the global position inside a long buffer. But you already give the remedy

zMzv

I never gave it much thought, as the slight lag on hitting zx was just an occasion to recompute folds (postponed by https://github.com/rkhaja/FastFold).

Funnily my muscle memory navigating folds has been tied to zx, so I measured the efficiency of a fold function by the speed of it which is the only scenario you exclude.

This explains our complimentary experiences.

It used to be that the recomputation of folds slowed down insert mode by recomputing them on every changed character (also in TeX files).
Maybe I should have followed the evolution of your &foldexpr that eventually rendered workarounds like Fastfold unnecessary.

:let g:time_per_lnum = {}, then make some changes in your file, then you should notice that it will not be populated with a key for each line of the file, only for the line numbers that were needed to get a new valid fold state.

I can confirm this, so yes, that's an efficient &foldexpr.

Pulling and resetting hard to latest master b8bb79b and perf/folding 79e78ee the results of vim -u NONE test.vim followed by <space>tt vary wildly, but with no clear winner.
So yes, my impression was won by looking at the performance zx.

both from reading the docs and from having spent a lot of time writing foldexpr

Hats off!
I get an understanding that Vim's notoriously slow folding hinges rather on amateurish &foldexprs than Vim's evaluation speed.

I am not sure whether similar improvements can be made to syntax folding, though.
Maybe that's the reason this plug-in used a new &foldexpr instead of Vim's built-in syntax folding?

@lervag
Copy link
Owner Author

lervag commented Dec 14, 2024

zMzv

I never gave it much thought, as the slight lag on hitting zx was just an occasion to recompute folds (postponed by https://github.com/rkhaja/FastFold).

Funnily my muscle memory navigating folds has been tied to zx, so I measured the efficiency of a fold function by the speed of it which is the only scenario you exclude.

This explains our complimentary experiences.

Great, I'm glad to see we are reaching a common understanding here.

It used to be that the recomputation of folds slowed down insert mode by recomputing them on every changed character (also in TeX files). Maybe I should have followed the evolution of your &foldexpr that eventually rendered workarounds like Fastfold unnecessary.

Yes, I do remember that I started using Fastfold at some time and that it really did feel necessary. I no longer use it, and I don't know when it stopped being unnecessary. But it would not be surprising if there were updates to Vim and neovim that somehow did improve the fold experience in a way that rendered Fastfold less relevant.

:let g:time_per_lnum = {}, then make some changes in your file, then you should notice that it will not be populated with a key for each line of the file, only for the line numbers that were needed to get a new valid fold state.

I can confirm this, so yes, that's an efficient &foldexpr.

Good, thanks. I've actually put a lot of work in figuring out how to make it relatively efficient.

But, as is also clear from the present issue: on huge files, there is still a toll e.g. on startup time and it can take on the order of a second to load a file just for the folding. It's not great, but I don't know how to improve that much more. Still, most files I work on are not nearly as large and for just commonly large files it is not a big problem.

Pulling and resetting hard to latest master b8bb79b and perf/folding 79e78ee the results of vim -u NONE test.vim followed by <space>tt vary wildly, but with no clear winner. So yes, my impression was won by looking at the performance zx.

If you have zx imprinted as muscle memory, then I think I would advice to use nnoremap zx zMzv or even nnoremap zx zMzvzz. I think that would more or less give the same experience, except it avoids the recomputations. And for the very rare occasions where you might want the recomputation, you could just do :normal! zx or zX.

And just to be clear: sometimes you might want to do zx or zX. In some cases I do find that foldexpr based folding, and/or perhaps syntax based folding, becomes "insonsistent", and it in these cases a zx may actually help.

both from reading the docs and from having spent a lot of time writing foldexpr

Hats off! I get an understanding that Vim's notoriously slow folding hinges rather on amateurish &foldexprs than Vim's evaluation speed.

Thanks; and yes, indeed, I think that observation is quite correct. Although, it is really not trivial to write perfomant foldexpr code, due to the complexity of how it works. So perhaps using the word "amateurish" is being a little bit too harsh.

I am not sure whether similar improvements can be made to syntax folding, though. Maybe that's the reason this plug-in used a new &foldexpr instead of Vim's built-in syntax folding?

I've never really liked syntax based folding; the times I've used it I've often found myself thinking it is not so robust. But to be fair, I don't think I ever actually wrote syntax rules that also specify folding rules. I think I found that using &foldexpr was easier to understand, which is probably the main reason I've gone that route.


Ok, it's been a relatively long discussion here. But if I understand correctly we are now in agreement that this PR should not be merged. Right?

Konfekt added a commit to Konfekt/FastFold that referenced this pull request Dec 15, 2024
@Konfekt
Copy link
Contributor

Konfekt commented Dec 15, 2024

Yes, closing it is fine (as well as vim/vim#16184 (comment)). I am now skipping foldexpr in Fastfold a try, but keep it for syntax folds.

What would make sense though could be to document your shared knowledge at :help fold-expr by a PR at https://github.com/vim/vim/ as likely one of world's leading experts on this; I could first try to write down what I understood, and ask for you comments

@lervag lervag closed this Dec 15, 2024
@lervag lervag deleted the perf/folding branch December 15, 2024 11:40
@lervag
Copy link
Owner Author

lervag commented Dec 15, 2024

Yes, closing it is fine (as well as vim/vim#16184 (comment)).

👍

What would make sense though could be to document your shared knowledge at :help fold-expr by a PR at https://github.com/vim/vim/ as likely one of world's leading experts on this; I could first try to write down what I understood, and ask for you comments

Yes, I guess that's a good idea. I don't think we really need much, except perhaps rewording a few of the paragraphs that cause confusion? Like for instance, we could qualify the sentence that says "... every line" to indicate that this is only relevant when initializing (or recomputing) folds.

I'll be happy to review and contribute, especially if you write a first version!

@Konfekt
Copy link
Contributor

Konfekt commented Dec 15, 2024

Here we go

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

cache fold levels
2 participants