-
-
Notifications
You must be signed in to change notification settings - Fork 396
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Precomputing JS regexes for perf #878
Comments
Yeah, that's something I have been thinking about for a while but haven't come up with good ideas to handle it well. The main challenge is that it will introduce two versions of incompatible grammars, and if we could communicate well with the usage. I'll keep an eye on that.
Definitely! Thank you! Tho I am not sure if I would ever be able to help :P Seeing how you work I won't say I know anything about RegExp 🤣 |
Very cool to see this landing. 🎉 Re: the potential perf optimization described above, see the review comment I added here #880 (review) |
Do you think it would be a good idea to precompute regexes used by the JS engine so that the regex transpilation step isn't needed at runtime? Perhaps this would be optional. Presumably, it would cause some challenges for the fine-grained bundle approach, but maybe that could be avoided/simplified in a new major Shiki version.
With some extra, somewhat-complex work in the build script for grammars, this could also make the JS regexes used by some grammars faster, by avoiding the use of a
RegExp
subclass when it's not needed (details follow).oniguruma-to-es
'stoRegExp
outputs aRegExp
, which might be a regexp literal or might be an instance of itsEmulatedRegExp
subclass. The subclass has some minor perf overhead, which presumably can add up for large grammars applied to large source files.oniguruma-to-es
usesEmulatedRegExp
for two kinds of cases:\G
.groups
,indices
, andindices.groups
). This applies when using atomic groups, possessive quantifiers, subroutines, and some uses of recursion. In some cases the subclass simply hides injected "emulation groups" from results, and in other cases it's more sophisticated, transferring subpattern match values/indices in order to accurately emulate complex edge-case effects of Oniguruma subroutines.oniguruma-to-es
includes an option (avoidSubclass
) that avoids the use of a subclass (and any changes to the pattern that would rely on the subclass to be interpreted correctly). This is relevant to perf because, although enablingavoidSubclass
for regexes in case 1 (above) results in regexes not being emulatable (and throwing), for case-2 regexes it results in faster regexes that still match the same strings.The trick is that, if a particular regex in a grammar doesn't reference subpattern matches from outside the regex to do any additional handling, you can get a safe perf increase by not using the subclass. The build step that precomputes regexes could determine this based on whether the grammar refers to subpatterns for a given regex (using properties
captures
,beginCaptures
, etc. in the JSON). If it doesn't, you could either:toRegExp
throws withavoidSubclass
enabled. Or...toDetails
instead oftoRegExp
and check whether the result includes.options.strategy
. If so,EmulatedRegExp
needs to be used as a constructor to handle case 1, but otherwise you can run it again withavoidSubclass
and then just use a regexp literal orRegExp
as the constructor. Or...oniguruma-to-es
v0.10.0, you can check for.rawArgs.options.strategy
onEmulatedRegExp
instances. If present, you can use.toString()
(or.source
and.flags
) and use those to construct aRegExp
without the subclass.This extra handling when precomputing regexes would be totally optional, but should provide a meaningful perf increase for some grammars.
Perhaps this perf optimization could even be built into the current JS engine (without the need to precompile regexes), if the
patterns
passed to itsJavaScriptScanner
could include data about whether the patterns will have their subpatterns referred to from outside the regexes.Feel free to close this if you think it's not the right road to go down for whatever reason.
Aside: Just let me know @antfu if you're ever interested in being added as a collaborator for
oniguruma-to-es
. I'd be happy for you to make even fundamental changes that you think would be helpful.The text was updated successfully, but these errors were encountered: