Skip to content

Commit

Permalink
Add max_delimiters_per_line config option
Browse files Browse the repository at this point in the history
  • Loading branch information
colinodell committed Dec 7, 2024
1 parent 5ce491f commit 5156796
Show file tree
Hide file tree
Showing 10 changed files with 101 additions and 6 deletions.
1 change: 1 addition & 0 deletions .phpstorm.meta.php
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,7 @@
'html_input',
'allow_unsafe_links',
'max_nesting_level',
'max_delimiters_per_line',
'renderer',
'renderer/block_separator',
'renderer/inner_separator',
Expand Down
7 changes: 6 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,9 +8,14 @@ Updates should follow the [Keep a CHANGELOG](https://keepachangelog.com/) princi

### Added

- Added `max_delimiters_per_line` config option to prevent denial of service attacks when parsing malicious input
- Added `table/max_autocompleted_cells` config option to prevent denial of service attacks when parsing large tables
- The `AttributesExtension` now supports attributes without values (#985, #986)
- The `AutolinkExtension` exposes two new configuration options to override the default behavior (#969, #987):
- `autolink/allowed_protocols` - an array of protocols to allow autolinking for
- `autolink/default_protocol` - the default protocol to use when none is specified
- Added `RegexHelper::isWhitespace()` method to check if a given character is an ASCII whitespace character
- Added `CacheableDelimiterProcessorInterface` to ensure linear complexity for dynamic delimiter processing
- Added `table/max_autocompleted_cells` config option to prevent denial of service attacks when parsing large tables
- Added `Bracket` delimiter type to optimize bracket parsing

### Changed
Expand Down
2 changes: 2 additions & 0 deletions docs/2.5/configuration.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,7 @@ $config = [
'html_input' => 'escape',
'allow_unsafe_links' => false,
'max_nesting_level' => PHP_INT_MAX,
'max_delimiters_per_line' => PHP_INT_MAX,
'slug_normalizer' => [
'max_length' => 255,
],
Expand Down Expand Up @@ -73,6 +74,7 @@ Here's a list of the core configuration options available:
- `escape` - Escape all HTML
- `allow_unsafe_links` - Remove risky link and image URLs by setting this to `false` (default: `true`)
- `max_nesting_level` - The maximum nesting level for blocks (default: `PHP_INT_MAX`). Setting this to a positive integer can help protect against long parse times and/or segfaults if blocks are too deeply-nested.
- `max_delimiters_per_line` - The maximum number of delimiters (e.g. `*` or `_`) allowed in a single line (default: `PHP_INT_MAX`). Setting this to a positive integer can help protect against long parse times and/or segfaults if lines are too long.
- `slug_normalizer` - Array of options for configuring how URL-safe slugs are created; see [the slug normalizer docs](/2.5/customization/slug-normalizer/#configuration) for more details
- `instance` - An alternative normalizer to use (defaults to the included `SlugNormalizer`)
- `max_length` - Limits the size of generated slugs (defaults to 255 characters)
Expand Down
22 changes: 21 additions & 1 deletion docs/2.5/security.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,8 @@ In order to be fully compliant with the CommonMark spec, certain security settin

- `html_input`: How to handle raw HTML
- `allow_unsafe_links`: Whether unsafe links are permitted
- `max_nesting_level`: Protected against long render times or segfaults
- `max_nesting_level`: Protect against long render times or segfaults
- `max_delimiters_per_line`: Protect against long parse times or rendering segfaults

Further information about each option can be found below.

Expand Down Expand Up @@ -88,6 +89,25 @@ echo $converter->convert($markdown);

See the [configuration](/2.5/configuration/) section for more information.

## Max Delimiters Per Line

Similarly to the maximum nesting level, **no maximum number of delimiters per line is enforced by default.** Delimiters can be nested (like `*a **b** c*`) or un-nested (like `*a* *b* *c*`) - in either case, having too many in a single line can result in long parse times. We therefore have a separate option to limit the number of delimiters per line.

If you need to parse untrusted input, consider setting a reasonable `max_delimiters_per_line` (perhaps 100-1000) depending on your needs. Once this level is hit, any subsequent delimiters on that line will be rendered as plain text.

### Example - Prevent too many delimiters

```php
use League\CommonMark\CommonMarkConverter;

$markdown = '*a* **b *c **d** c* b**'; // 8 delimiters (* and **)

$converter = new CommonMarkConverter(['max_delimiters_per_line' => 6]);
echo $converter->convert($markdown);

// <p><em>a</em> **b *c <strong>d</strong> c* b**</p>
```

## Additional Filtering

Although this library does offer these security features out-of-the-box, some users may opt to also run the HTML output through additional filtering layers (like HTMLPurifier). If you do this, make sure you **thoroughly** test your additional post-processing steps and configure them to work properly with the types of HTML elements and attributes that converted Markdown might produce, otherwise, you may end up with weird behavior like missing images, broken links, mismatched HTML tags, etc.
7 changes: 7 additions & 0 deletions docs/2.6/upgrading.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,13 @@ redirect_from: /upgrading/

# Upgrading from 2.5 to 2.6

## `max_delimiters_per_line` Configuration Option

The `max_delimiters_per_line` configuration option was added in 2.6 to help protect against malicious input that could
cause excessive memory usage or denial of service attacks. It defaults to `PHP_INT_MAX` (no limit) for backwards
compatibility, which is safe when parsing trusted input. However, if you're parsing untrusted input from users, you
should probably set this to a reasonable value (somewhere between `100` and `1000`) to protect against malicious inputs.

## Custom Delimiter Processors

If you're implementing a custom delimiter processor, and `getDelimiterUse()` has more logic than just a
Expand Down
11 changes: 10 additions & 1 deletion src/Delimiter/DelimiterStack.php
Original file line number Diff line number Diff line change
Expand Up @@ -39,8 +39,13 @@ final class DelimiterStack
*/
private $missingIndexCache;

public function __construct()

private int $remainingDelimiters = 0;

public function __construct(int $maximumStackSize = PHP_INT_MAX)
{
$this->remainingDelimiters = $maximumStackSize;

if (\PHP_VERSION_ID >= 80000) {
/** @psalm-suppress PropertyTypeCoercion */
$this->missingIndexCache = new \WeakMap(); // @phpstan-ignore-line
Expand All @@ -51,6 +56,10 @@ public function __construct()

public function push(DelimiterInterface $newDelimiter): void
{
if ($this->remainingDelimiters-- <= 0) {
return;
}

$newDelimiter->setPrevious($this->top);

if ($this->top !== null) {
Expand Down
1 change: 1 addition & 0 deletions src/Environment/Environment.php
Original file line number Diff line number Diff line change
Expand Up @@ -432,6 +432,7 @@ public static function createDefaultConfiguration(): Configuration
'html_input' => Expect::anyOf(HtmlFilter::STRIP, HtmlFilter::ALLOW, HtmlFilter::ESCAPE)->default(HtmlFilter::ALLOW),
'allow_unsafe_links' => Expect::bool(true),
'max_nesting_level' => Expect::type('int')->default(PHP_INT_MAX),
'max_delimiters_per_line' => Expect::type('int')->default(PHP_INT_MAX),
'renderer' => Expect::structure([
'block_separator' => Expect::string("\n"),
'inner_separator' => Expect::string("\n"),
Expand Down
4 changes: 2 additions & 2 deletions src/Parser/InlineParserContext.php
Original file line number Diff line number Diff line change
Expand Up @@ -42,12 +42,12 @@ final class InlineParserContext
*/
private array $matches;

public function __construct(Cursor $contents, AbstractBlock $container, ReferenceMapInterface $referenceMap)
public function __construct(Cursor $contents, AbstractBlock $container, ReferenceMapInterface $referenceMap, int $maxDelimitersPerLine = PHP_INT_MAX)
{
$this->referenceMap = $referenceMap;
$this->container = $container;
$this->cursor = $contents;
$this->delimiterStack = new DelimiterStack();
$this->delimiterStack = new DelimiterStack($maxDelimitersPerLine);
}

public function getContainer(): AbstractBlock
Expand Down
2 changes: 1 addition & 1 deletion src/Parser/InlineParserEngine.php
Original file line number Diff line number Diff line change
Expand Up @@ -59,7 +59,7 @@ public function parse(string $contents, AbstractBlock $block): void
$contents = \trim($contents);
$cursor = new Cursor($contents);

$inlineParserContext = new InlineParserContext($cursor, $block, $this->referenceMap);
$inlineParserContext = new InlineParserContext($cursor, $block, $this->referenceMap, $this->environment->getConfiguration()->get('max_delimiters_per_line'));

// Have all parsers look at the line to determine what they might want to parse and what positions they exist at
foreach ($this->matchParsers($contents) as $matchPosition => $parsers) {
Expand Down
50 changes: 50 additions & 0 deletions tests/functional/MaxDelimitersPerLineTest.php
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
<?php

declare(strict_types=1);

/*
* This file is part of the league/commonmark package.
*
* (c) Colin O'Dell <colinodell@gmail.com>
*
* For the full copyright and license information, please view the LICENSE
* file that was distributed with this source code.
*/

namespace League\CommonMark\Tests\Functional;

use League\CommonMark\CommonMarkConverter;
use PHPUnit\Framework\TestCase;

final class MaxDelimitersPerLineTest extends TestCase
{
/**
* @dataProvider provideTestCases
*/
public function testIt(string $input, int $maxDelimsPerLine, string $expectedOutput): void
{
$converter = new CommonMarkConverter(['max_delimiters_per_line' => $maxDelimsPerLine]);

$this->assertEquals($expectedOutput, \trim($converter->convert($input)->getContent()));
}

/**
* @return iterable<array<mixed>>
*/
public function provideTestCases(): iterable
{
yield ['*a* **b *c* b**', 6, '<p><em>a</em> <strong>b <em>c</em> b</strong></p>'];

yield ['*a* **b *c **d** c* b**', 0, '<p>*a* **b *c **d** c* b**</p>'];
yield ['*a* **b *c **d** c* b**', 1, '<p>*a* **b *c **d** c* b**</p>'];
yield ['*a* **b *c **d** c* b**', 2, '<p><em>a</em> **b *c **d** c* b**</p>'];
yield ['*a* **b *c **d** c* b**', 3, '<p><em>a</em> **b *c **d** c* b**</p>'];
yield ['*a* **b *c **d** c* b**', 4, '<p><em>a</em> **b *c **d** c* b**</p>'];
yield ['*a* **b *c **d** c* b**', 5, '<p><em>a</em> **b *c **d** c* b**</p>'];
yield ['*a* **b *c **d** c* b**', 6, '<p><em>a</em> **b *c <strong>d</strong> c* b**</p>'];
yield ['*a* **b *c **d** c* b**', 7, '<p><em>a</em> **b <em>c <strong>d</strong> c</em> b**</p>'];
yield ['*a* **b *c **d** c* b**', 8, '<p><em>a</em> <strong>b <em>c <strong>d</strong> c</em> b</strong></p>'];
yield ['*a* **b *c **d** c* b**', 9, '<p><em>a</em> <strong>b <em>c <strong>d</strong> c</em> b</strong></p>'];
yield ['*a* **b *c **d** c* b**', 100, '<p><em>a</em> <strong>b <em>c <strong>d</strong> c</em> b</strong></p>'];
}
}

0 comments on commit 5156796

Please sign in to comment.