-
-
Notifications
You must be signed in to change notification settings - Fork 198
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Revise inline parsing #549
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
colinodell
added
enhancement
New functionality or behavior
performance
Something could be made faster or more efficient
labels
Sep 26, 2020
colinodell
commented
Sep 26, 2020
colinodell
force-pushed
the
revise-inline-parsing
branch
from
September 26, 2020 21:41
0e5ed0d
to
00649fb
Compare
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
enhancement
New functionality or behavior
performance
Something could be made faster or more efficient
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR implements a new approach to inline parsing which is more flexible (and also 8-10% faster) than before.
Under the hood
In 1.x, an inline parser needed to define each character it was interested in parsing and then only attempting to parse the inlines (via more-expensive, time-consuming) regular expressions when those characters were encountered. This optimization drastically reduced the number of regular expressions we needed to parse and worked well for most of our uses cases.
The core part of this code looked something like this over-simplified example:
However, #514 and #492 (comment) showed us that this approach was not feasible in many cases where we need to match on more than just a single character. The "optimized" code above, while better than checking every single character with every single inline parser, still resulted in the inline parsers running many regexes, including many cases where they failed to match (and thus wasted valuable time).
So in 2.x we're taking a different approach. Instead of running
preg_match
for each parser every time we come across an interesting character, we're now runningpreg_match_all
for each parser exactly once per line. This provides the engine with a list of all positions in theCursor
that the parsers are interested in, which we can then iterate over.This approach has a few nice benefits:
InlineParserInterface
- it's no longer this completely separate step)From the developer's perspective
This new approach does involve tweaking implementations of
InlineParserInterface
but those tweaks are very straightforward. At a minimum, you must:getCharacters()
method togetMatchDefinition()
and change the return value$match
argument to theparse()
methodFor example:
That's the bare minimum you'd need to do. However, that new
$match
argument will contain the text matched by whatever you defined ingetMatchDefinition()
, meaning you could further optimize your parsing method. Take this parser for example:With the new approach,
parse()
is only going to be called if theInlineParserEngine
finds one of those two strings (case-insensitive), so$match
is always going to contain one of those results - no need to try and re-match it ourselves.New possibilities
This new approach opens up some new possibilities like #514, and it also resolves #492.