-
-
Notifications
You must be signed in to change notification settings - Fork 198
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Max nesting limit #243
Comments
This could either be implemented in core or as an external Document Processor which users can include as needed. Would you mind sharing why this particular feature would be useful for you? In other words, what's the major use case it solves? |
Mainly just useful to prevent abuse via user input. For example, Reddit comments allow up to 10000 characters. If nearly all 10,000 of these characters were '>', then you would end up with incredibly deeply nested quotes and a large AST that could take several seconds to render to HTML, assuming it doesn't run out of memory first. I don't think a custom Document Processor would prevent this abuse, since the issue is related to parsing and creation of a massive AST. Snudown considers this limit while parsing. |
@colinodell here is a little demo about the problem: Here is the gist: https://gist.github.com/cebe/615bef8086c9a11d81f568284c18fec5
This should result in a segmentation fault of PHP. |
Here is how I implemented this in my markdown parser: https://github.com/cebe/markdown/blob/e2a490ceec590bf5bfd1b43bd424fb9dceceb7c5/Parser.php#L157-L160 |
Ah okay, that makes sense. Thanks for helping to clarify that. The parsing and AST traversing is done without using any type of recursion to avoid segfaults. The rendering process doesn't use this same approach though. We could potentially change that to avoid segfaults, but we'd still the memory limit that @mareeo mentioned. I think adding this to the core is a great idea, and I like the approach you're using @cebe - if the limit is reached, simply encode the remaining Markdown as text. |
I've got the nesting limits working for blocks - those were easy. Limiting the inlines (while adhering to the CommonMark spec) is proving to be quite difficult though. We essentially can't reliably know how deep the inlines will be until after they're processed (because of how delimiter runs need to be processed according to the spec), and by then it's too late. I found one possible approach that almost works, but the output isn't even close to what you'd expect, doesn't fit with the spec, and it severely reduces performance. IMO a feature to ensure performance in 0.01% of cases shouldn't negatively impact performance for the other 99.99%. And even if it did, it would violate our top priority of maintaining spec compliance. I think we ultimately have four options:
Option 4 would be nice, but I simply don't have the time to implement a major refactoring of that size right now. Option 3 seems like the next-best choice IMO. What are your thoughts? |
I don't think option 3 solves this problem, since if the issues are going to happen, they're likely to happen during parsing. Options 2 is what I've done in the meantime, but I don't think it's very good (use this configuration options to prevent issues...sometimes!). From what you've said and the testing I've done, it just doesn't seem like limiting inline element depth while parsing is going to be possible while adhering to the CommonMark spec. Personally, I like the idea that if you set a max nesting limit and it is reached, then there's no guarantee the element containing the deep nesting will be according to spec. Garbage in, garbage out. But this may conflict with the '100% compliance' goal of this project. A simple warning that if a nesting limit is used and reached that 100% compliance is no longer guaranteed would be fine for me. This lets the user decide between safety and compliance. Other Markdown parsers just automatically make this choice for you. |
Implement new max_nesting_level setting (#243)
Some Markdown parsers include nesting limits which causes elements past a certain depth to be omitted. This is useful allowing user input and wanting to put certain restrictions on it. Thoughts on adding this as an configuration parameter?
The text was updated successfully, but these errors were encountered: