Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refine inline parsing #560

Merged
merged 5 commits into from
Oct 3, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 8 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@ See <https://commonmark.thephpleague.com/2.0/upgrading/> for detailed informatio
- Added new `FrontMatterExtension` ([see documentation](https://commonmark.thephpleague.com/extensions/front-matter/))
- Added the ability to delegate event dispatching to PSR-14 compliant event dispatcher libraries
- Added the ability to configure disallowed raw HTML tags (#507)
- Added the ability for Mentions to use multiple characters for their symbol (#514, #550)
- Added `heading_permalink/min_heading_level` and `heading_permalink/max_heading_level` options to control which headings get permalinks (#519)
- Added `footnote/backref_symbol` option for customizing backreference link appearance (#522)
- Added new `HtmlFilter` and `StringContainerHelper` utility classes
Expand Down Expand Up @@ -39,6 +40,10 @@ See <https://commonmark.thephpleague.com/2.0/upgrading/> for detailed informatio
- `FencedCode::setInfo()`
- `Heading::setLevel()`
- `HtmlRenderer::renderDocument()`
- `InlineParserContext::getFullMatch()`
- `InlineParserContext::getFullMatchLength()`
- `InlineParserContext::getMatches()`
- `InlineParserContext::getSubMatches()`
- `InvalidOptionException::forConfigOption()`
- `InvalidOptionException::forParameter()`
- `LinkParserHelper::parsePartialLinkLabel()`
Expand All @@ -54,6 +59,9 @@ See <https://commonmark.thephpleague.com/2.0/upgrading/> for detailed informatio
### Changed

- `CommonMarkConverter::convertToHtml()` now returns an instance of `RenderedContentInterface`. This can be cast to a string for backward compatibility with 1.x.
- Changes to configuration options:
- `mentions/*/symbol` has been renamed to `mentions/*/prefix`
- `mentions/*/regex` now requires partial regular expressions (without delimiters or flags)
- Event dispatching is now fully PSR-14 compliant
- Moved and renamed several classes - [see the full list here](https://commonmark.thephpleague.com/2.0/upgrading/#classesnamespaces-renamed)
- Implemented a new approach to block parsing. This was a massive change, so here are the highlights:
Expand All @@ -63,7 +71,6 @@ See <https://commonmark.thephpleague.com/2.0/upgrading/> for detailed informatio
- The paragraph parser no longer needs to be added manually to the environment
- Implemented a new approach to inline parsing where parsers can now specify longer strings or regular expressions they want to parse (instead of just single characters):
- `InlineParserInterface::getCharacters()` is now `getMatchDefinition()` and returns an instance of `InlineParserMatch`
- `InlineParserInterface::parse()` has a new parameter containing the pre-matched text
- `InlineParserContext::__construct()` now requires the contents to be provided as a `Cursor` instead of a `string`
- Implemented delimiter parsing as a special type of inline parser (via the new `DelimiterParser` class)
- Changed block and inline rendering to use common methods and interfaces
Expand Down
45 changes: 26 additions & 19 deletions docs/2.0/customization/inline-parsing.md
Original file line number Diff line number Diff line change
Expand Up @@ -56,8 +56,20 @@ This method will be called if both conditions are met:

#### Parameters

* `string $match` - Contains the text that matches the start pattern from `getMatchDefinition()`
* `InlineParserContext $inlineContext` - Encapsulates the current state of the inline parser, including the [`Cursor`](/2.0/customization/cursor/) used to parse the current line. (Note that the cursor will be positioned **before** the matching text, so you must advance it yourself if you determine it's a valid match)
* `InlineParserContext $inlineContext` - Encapsulates the current state of the inline parser - see more information below.

##### InlineParserContext

This class has several useful methods:

* `getContainer()` - Returns the current container block the inline text was found in. You'll almost always call `$inlineContext->getContainer()->appendChild(...)` to add the parsed inline text inside that block.
* `getReferenceMap()` - Returns the document's reference map
* `getCursor()` - Returns the current [`Cursor`](/2.0/customization/cursor/) used to parse the current line. (Note that the cursor will be positioned **before** the matched text, so you must advance it yourself if you determine it's a valid match)
* `getDelimiterStack()` - Returns the current delimiter stack. Only used in advanced use cases.
* `getFullMatch()` - Returns the full string that matched you `InlineParserMatch` definition
* `getFullMatchLength()` - Returns the length of the full match - useful for advancing the cursor
* `getSubMatches()` - If your `InlineParserMatch` used a regular expression with capture groups, this will return the text matches by those groups.
* `getMatches()` - Returns an array where index `0` is the "full match", plus any sub-matches. It basically simulates `preg_match()`'s behavior.

#### Return value

Expand Down Expand Up @@ -87,10 +99,9 @@ class TwitterHandleParser implements InlineParserInterface
{
public function getMatchDefinition(): InlineParserMatch
{
// Note that you could match the entire regex here instead of in parse() if you wish
return InlineParserMatch::string('@');
return InlineParserMatch::regex('@([A-Za-z0-9_]{1,15}(?!\w))');
}
public function parse(string $match, InlineParserContext $inlineContext): bool
public function parse(InlineParserContext $inlineContext): bool
{
$cursor = $inlineContext->getCursor();
// The @ symbol must not have any other characters immediately prior
Expand All @@ -99,17 +110,13 @@ class TwitterHandleParser implements InlineParserInterface
// peek() doesn't modify the cursor, so no need to restore state first
return false;
}
// Save the cursor state in case we need to rewind and bail
$previousState = $cursor->saveState();
// Advance past the @ symbol to keep parsing simpler
$cursor->advance();
// Parse the handle
$handle = $cursor->match('/^[A-Za-z0-9_]{1,15}(?!\w)/');
if (empty($handle)) {
// Regex failed to match; this isn't a valid Twitter handle
$cursor->restoreState($previousState);
return false;
}

// This seems to be a valid match
// Advance the cursor to the end of the match
$cursor->advanceBy($inlineContext->getFullMatchLength());

// Grab the Twitter handle
[$handle] = $inlineContext->getSubMatches();
$profileUrl = 'https://twitter.com/' . $handle;
$inlineContext->getContainer()->appendChild(new Link($profileUrl, '@' . $handle));
return true;
Expand Down Expand Up @@ -140,17 +147,17 @@ class SmilieParser implements InlineParserInterface
return InlineParserMatch::oneOf(':)', ':(');
}

public function parse(string $match, InlineParserContext $inlineContext): bool
public function parse(InlineParserContext $inlineContext): bool
{
$cursor = $inlineContext->getCursor();

// Advance the cursor past the 2 matched chars since we're able to parse them successfully
$cursor->advanceBy(2);

// Add the corresponding image
if ($match === ':)') {
if ($inlineContext->getFullMatch() === ':)') {
$inlineContext->getContainer()->appendChild(new Image('/img/happy.png'));
} elseif ($match === ':(') {
} elseif ($inlineContext->getFullMatch() === ':(') {
$inlineContext->getContainer()->appendChild(new Image('/img/sad.png'));
}

Expand Down
12 changes: 12 additions & 0 deletions docs/2.0/upgrading.md
Original file line number Diff line number Diff line change
Expand Up @@ -132,6 +132,14 @@ As a result of making this change, the `addBlockParser()` method on `Configurabl

See [the block parsing documentation](/2.0/customization/block-parsing/) for more information on this new approach.

## New Inline Parsing Approach

The `getCharacters()` method on `InlineParserInterface` has been replaced with a more-robust `getMatchDefinition()` method which allows your parser to match against more than just single characters. All custom inline parsers will need to change to this new approach.

Additionally, when the `parse()` method is called, the Cursor is no longer proactively advanced past the matching character/start position for you. You'll need to advance this yourself. However, the `InlineParserContext` now provides the fully-matched text and its length, allowing you to easily `advanceBy()` the cursor without having to do an expensive `$cursor->match()` yourself which is a nice performance optimization.

See [the inline parsing documentation](/2.0/customization/inline-parsing/) for more information on this new approach.

## Rendering Changes

This library no longer differentiates between block renderers and inline renderers - everything now uses "node renderers" which allow us to have a unified approach to rendering! As a result, the following changes were made, which you may need to change in your custom extensions:
Expand Down Expand Up @@ -237,6 +245,10 @@ This previously-deprecated constant was removed in 2.0. Use `HeadingPermalinkRen

This previously-deprecated configuration option was removed in 2.0. Use `heading_permalink/symbol` instead.

## `mentions/*/regex` configuration option

Full regexes are no longer supported. Remove the leading/trailing `/` delimiters and any PCRE flags. For example: `/[\w_]+/iu` should be changed to `[\w_]+`.

## `ArrayCollection` methods

Several methods were removed from this class - here are the methods along with possible alternatives you can switch to:
Expand Down
4 changes: 2 additions & 2 deletions src/Delimiter/DelimiterParser.php
Original file line number Diff line number Diff line change
Expand Up @@ -32,9 +32,9 @@ public function getMatchDefinition(): InlineParserMatch
return InlineParserMatch::oneOf(...$this->collection->getDelimiterCharacters());
}

public function parse(string $match, InlineParserContext $inlineContext): bool
public function parse(InlineParserContext $inlineContext): bool
{
$character = $match;
$character = $inlineContext->getFullMatch();
$numDelims = 0;
$cursor = $inlineContext->getCursor();
$processor = $this->collection->getDelimiterProcessor($character);
Expand Down
14 changes: 10 additions & 4 deletions src/Exception/InvalidOptionException.php
Original file line number Diff line number Diff line change
Expand Up @@ -16,12 +16,18 @@
final class InvalidOptionException extends \UnexpectedValueException
{
/**
* @param string $option Name/path of the option
* @param mixed $valueGiven The invalid option that was provided
* @param string $option Name/path of the option
* @param mixed $valueGiven The invalid option that was provided
* @param ?string $description Additional text describing the issue (optional)
*/
public static function forConfigOption(string $option, $valueGiven): self
public static function forConfigOption(string $option, $valueGiven, ?string $description = null): self
{
return new self(\sprintf('Invalid config option for "%s": %s', $option, self::getDebugValue($valueGiven)));
$message = \sprintf('Invalid config option for "%s": %s', $option, self::getDebugValue($valueGiven));
if ($description !== null) {
$message .= \sprintf(' (%s)', $description);
}

return new self($message);
}

/**
Expand Down
6 changes: 3 additions & 3 deletions src/Extension/Attributes/Parser/AttributesInlineParser.php
Original file line number Diff line number Diff line change
Expand Up @@ -27,12 +27,12 @@ public function getMatchDefinition(): InlineParserMatch
return InlineParserMatch::oneOf(' ', '{');
}

public function parse(string $match, InlineParserContext $inlineContext): bool
public function parse(InlineParserContext $inlineContext): bool
{
$char = $match;
$char = $inlineContext->getFullMatch();
$cursor = $inlineContext->getCursor();
if ($char === '{') {
$char = (string) $cursor->getCharacter($cursor->getPosition() - 1);
$char = (string) $cursor->peek(-1);
}

$attributes = AttributesHelper::parseAttributes($cursor);
Expand Down
13 changes: 7 additions & 6 deletions src/Extension/Autolink/EmailAutolinkParser.php
Original file line number Diff line number Diff line change
Expand Up @@ -27,20 +27,21 @@ public function getMatchDefinition(): InlineParserMatch
return InlineParserMatch::regex(self::REGEX);
}

public function parse(string $match, InlineParserContext $inlineContext): bool
public function parse(InlineParserContext $inlineContext): bool
{
$email = $inlineContext->getFullMatch();
// The last character cannot be - or _
if (\in_array(\substr($match, -1), ['-', '_'], true)) {
if (\in_array(\substr($email, -1), ['-', '_'], true)) {
return false;
}

// Does the URL end with punctuation that should be stripped?
if (\substr($match, -1) === '.') {
$match = \substr($match, 0, -1);
if (\substr($email, -1) === '.') {
$email = \substr($email, 0, -1);
}

$inlineContext->getCursor()->advanceBy(\strlen($match));
$inlineContext->getContainer()->appendChild(new Link('mailto:' . $match, $match));
$inlineContext->getCursor()->advanceBy(\strlen($email));
$inlineContext->getContainer()->appendChild(new Link('mailto:' . $email, $email));

return true;
}
Expand Down
2 changes: 1 addition & 1 deletion src/Extension/Autolink/UrlAutolinkParser.php
Original file line number Diff line number Diff line change
Expand Up @@ -76,7 +76,7 @@ public function getMatchDefinition(): InlineParserMatch
return InlineParserMatch::oneOf(...$this->prefixes);
}

public function parse(string $match, InlineParserContext $inlineContext): bool
public function parse(InlineParserContext $inlineContext): bool
{
$cursor = $inlineContext->getCursor();

Expand Down
20 changes: 10 additions & 10 deletions src/Extension/CommonMark/Parser/Inline/AutolinkParser.php
Original file line number Diff line number Diff line change
Expand Up @@ -25,30 +25,30 @@
final class AutolinkParser implements InlineParserInterface
{
private const EMAIL_REGEX = '<([a-zA-Z0-9.!#$%&\'*+\\/=?^_`{|}~-]+@[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?(?:\.[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?)*)>';
private const OTHER_LINK_REGEX = '<[A-Za-z][A-Za-z0-9.+-]{1,31}:[^<>\x00-\x20]*>';
private const OTHER_LINK_REGEX = '<([A-Za-z][A-Za-z0-9.+-]{1,31}:[^<>\x00-\x20]*)>';

public function getMatchDefinition(): InlineParserMatch
{
return InlineParserMatch::regex(self::EMAIL_REGEX . '|' . self::OTHER_LINK_REGEX);
}

public function parse(string $match, InlineParserContext $inlineContext): bool
public function parse(InlineParserContext $inlineContext): bool
{
$cursor = $inlineContext->getCursor();
if ($m = $cursor->match('/^' . self::EMAIL_REGEX . '/')) {
$email = \substr($m, 1, -1);
$inlineContext->getContainer()->appendChild(new Link('mailto:' . UrlEncoder::unescapeAndEncode($email), $email));
$inlineContext->getCursor()->advanceBy($inlineContext->getFullMatchLength());
$matches = $inlineContext->getMatches();

if ($matches[1] !== '') {
$inlineContext->getContainer()->appendChild(new Link('mailto:' . UrlEncoder::unescapeAndEncode($matches[1]), $matches[1]));

return true;
}

if ($m = $cursor->match('/^' . self::OTHER_LINK_REGEX . '/')) {
$dest = \substr($m, 1, -1);
$inlineContext->getContainer()->appendChild(new Link(UrlEncoder::unescapeAndEncode($dest), $dest));
if ($matches[2] !== '') {
$inlineContext->getContainer()->appendChild(new Link(UrlEncoder::unescapeAndEncode($matches[2]), $matches[2]));

return true;
}

return false;
return false; // This should never happen
}
}
6 changes: 3 additions & 3 deletions src/Extension/CommonMark/Parser/Inline/BacktickParser.php
Original file line number Diff line number Diff line change
Expand Up @@ -29,11 +29,11 @@ public function getMatchDefinition(): InlineParserMatch
return InlineParserMatch::regex('`+');
}

public function parse(string $match, InlineParserContext $inlineContext): bool
public function parse(InlineParserContext $inlineContext): bool
{
$ticks = $match;
$ticks = $inlineContext->getFullMatch();
$cursor = $inlineContext->getCursor();
$cursor->advanceBy(\mb_strlen($ticks));
$cursor->advanceBy($inlineContext->getFullMatchLength());

$currentPosition = $cursor->getPosition();
$previousState = $cursor->saveState();
Expand Down
4 changes: 2 additions & 2 deletions src/Extension/CommonMark/Parser/Inline/BangParser.php
Original file line number Diff line number Diff line change
Expand Up @@ -29,11 +29,11 @@ public function getMatchDefinition(): InlineParserMatch
return InlineParserMatch::string('![');
}

public function parse(string $match, InlineParserContext $inlineContext): bool
public function parse(InlineParserContext $inlineContext): bool
{
$cursor = $inlineContext->getCursor();

$cursor->advanceBy(2);

$node = new Text('![', ['delim' => true]);
$inlineContext->getContainer()->appendChild($node);

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,7 @@ public function getMatchDefinition(): InlineParserMatch
return InlineParserMatch::string(']');
}

public function parse(string $match, InlineParserContext $inlineContext): bool
public function parse(InlineParserContext $inlineContext): bool
{
// Look through stack of delimiters for a [ or !
$opener = $inlineContext->getDelimiterStack()->searchByCharacter(['[', '!']);
Expand Down
8 changes: 5 additions & 3 deletions src/Extension/CommonMark/Parser/Inline/EntityParser.php
Original file line number Diff line number Diff line change
Expand Up @@ -30,10 +30,12 @@ public function getMatchDefinition(): InlineParserMatch
return InlineParserMatch::regex(RegexHelper::PARTIAL_ENTITY);
}

public function parse(string $match, InlineParserContext $inlineContext): bool
public function parse(InlineParserContext $inlineContext): bool
{
$inlineContext->getCursor()->advanceBy(\mb_strlen($match));
$inlineContext->getContainer()->appendChild(new Text(Html5EntityDecoder::decode($match)));
$entity = $inlineContext->getFullMatch();

$inlineContext->getCursor()->advanceBy($inlineContext->getFullMatchLength());
$inlineContext->getContainer()->appendChild(new Text(Html5EntityDecoder::decode($entity)));

return true;
}
Expand Down
2 changes: 1 addition & 1 deletion src/Extension/CommonMark/Parser/Inline/EscapableParser.php
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ public function getMatchDefinition(): InlineParserMatch
return InlineParserMatch::string('\\');
}

public function parse(string $match, InlineParserContext $inlineContext): bool
public function parse(InlineParserContext $inlineContext): bool
{
$cursor = $inlineContext->getCursor();
$nextChar = $cursor->peek();
Expand Down
8 changes: 5 additions & 3 deletions src/Extension/CommonMark/Parser/Inline/HtmlInlineParser.php
Original file line number Diff line number Diff line change
Expand Up @@ -29,10 +29,12 @@ public function getMatchDefinition(): InlineParserMatch
return InlineParserMatch::regex(RegexHelper::PARTIAL_HTMLTAG);
}

public function parse(string $match, InlineParserContext $inlineContext): bool
public function parse(InlineParserContext $inlineContext): bool
{
$inlineContext->getCursor()->advanceBy(\mb_strlen($match));
$inlineContext->getContainer()->appendChild(new HtmlInline($match));
$inline = $inlineContext->getFullMatch();

$inlineContext->getCursor()->advanceBy($inlineContext->getFullMatchLength());
$inlineContext->getContainer()->appendChild(new HtmlInline($inline));

return true;
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ public function getMatchDefinition(): InlineParserMatch
return InlineParserMatch::string('[');
}

public function parse(string $match, InlineParserContext $inlineContext): bool
public function parse(InlineParserContext $inlineContext): bool
{
$inlineContext->getCursor()->advanceBy(1);
$node = new Text('[', ['delim' => true]);
Expand Down
15 changes: 6 additions & 9 deletions src/Extension/Footnote/Parser/AnonymousFootnoteRefParser.php
Original file line number Diff line number Diff line change
Expand Up @@ -43,19 +43,16 @@ public function __construct()

public function getMatchDefinition(): InlineParserMatch
{
return InlineParserMatch::regex('\^\[[^\]]+\]');
return InlineParserMatch::regex('\^\[([^\]]+)\]');
}

public function parse(string $match, InlineParserContext $inlineContext): bool
public function parse(InlineParserContext $inlineContext): bool
{
if (\preg_match('#\^\[([^\]]+)\]#', $match, $matches) <= 0) {
return false;
}
$inlineContext->getCursor()->advanceBy($inlineContext->getFullMatchLength());

$inlineContext->getCursor()->advanceBy(\mb_strlen($match));

$reference = $this->createReference($matches[1]);
$inlineContext->getContainer()->appendChild(new FootnoteRef($reference, $matches[1]));
[$label] = $inlineContext->getSubMatches();
$reference = $this->createReference($label);
$inlineContext->getContainer()->appendChild(new FootnoteRef($reference, $label));

return true;
}
Expand Down
Loading