-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ruff_python_formatter: support reformatting Markdown code blocks #9030
Conversation
When I renamed `line` to `trimmed`, I forgot one. (Previously, I had rebound `line`, and I think that made these sorts of bugs impossible.) This regression is covered by new tests for Markdown supported in a subsequent commit.
8e1a295
to
9ffad6d
Compare
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great.
crates/ruff_python_formatter/src/expression/string/docstring.rs
Outdated
Show resolved
Hide resolved
crates/ruff_python_formatter/src/expression/string/docstring.rs
Outdated
Show resolved
Hide resolved
return None; | ||
} | ||
if !original.line.trim_whitespace().is_empty() | ||
&& indentation_length(original.line) < self.opening_fence_indent |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is this case? The code is under-indented compared to the opening of the code fence? I actually don't know what Markdown does in that case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah it's a weird case. I believe it is technically valid Markdown. But when I allowed it, it led to an idempotency bug that seemed a little tricky to fix. Basically, it seemed like a bug that was probably okay to ship with, and we can prioritize fixing it based on user feedback.
Good call out though. I added this comment:
// When a line in a Markdown fenced closed block is indented *less*
// than the opening indent, we treat the entire block as invalid.
//
// I believe that code blocks of this form are actually valid Markdown
// in some cases, but the interplay between it and our docstring
// whitespace normalization leads to undesirable outcomes. For example,
// if the line here is unindented out beyond the initial indent of the
// docstring itself, then this causes the entire docstring to have
// its indent normalized. And, at the time of writing, a subsequent
// formatting run undoes this indentation, thus violating idempotency.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
My only fear is into how many weird edge cases we'll run where we would need a full markdown parser to parse nested structures correctly. But that's something we can figure out later.
I recommend releasing this soon under the formatter Beta where we have some more wiggle room to make breaking changes if any of our assumptions were wrong.
pass | ||
|
||
|
||
def markdown_rst_directive(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice test suite and I like how you documented the cases.
Should we add a test where the code block itself contains closing quotes at a different indentation level:
def test() {
"""
You can use this function to render markdown, even conditionally.
```py
if 5 == 4:
a = """
It supports **code blocks**
```py
print("Hello World")
```
"""
A simplified version that triggered me to make this example is that
def test() {
"""
Code block with incorrectly placed closing fences
```python
```
# The above fences should not close the code block
a = 10
```
Now the code block is closed
"""
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So in the first case, the """
ends the docstring and makes the entire thing invalid Python. I tried playing with it a bit to massage it into a more "useful" test, but couldn't get to anything I liked.
In the second case, we do actually allow the first set of closing fences to close the block. Markdown technically allows up to three spaces of indentation before the start of the fences, but I think that's a little tricky to enforce in this context given that we're in a docstring. So I ended up just deciding that Markdown fences could be indented as much as possible. This isn't as much of an issue here, since we specifically do not support (at least, not yet anyway) the sort of Markdown code block that is inserted purely by indentation and not fences. I added this as a test case though.
crates/ruff_python_formatter/src/expression/string/docstring.rs
Outdated
Show resolved
Hide resolved
Yeah right, this PR only covers naked code fence blocks. It won't work with fenced code blocks inside of quote blocks for example. I do indeed feel like that's something we can add later if there's demand for it. And yeah, that may wind up requiring a full Markdown parser (
Yeah the plan is to get this out soon. Basically want to get this merged, get #8855 fixed up and then write a blog post. :-) |
9ffad6d
to
02fa505
Compare
This commit slots in support for formatting Markdown fenced code blocks[1]. With the refactoring done for reStructuredText previously, this ended up being pretty easy to add. Markdown code blocks are also quite a bit easier to parse and recognize correctly. One point of contention in #8860 is whether to assume that unlabeled Markdown code fences are Python or not by default. In this PR, we make such an assumption. This follows what `rustdoc` does. The mitigation here is that if an unlabeled code block isn't Python, then it probably won't parse as Python. And we'll end up skipping it. So in the vast majority of cases, the worst thing that can happen is a little bit of wasted work. Closes #8860 [1]: https://spec.commonmark.org/0.30/#fenced-code-blocks
02fa505
to
d838fa1
Compare
Going to bring this in. As usual, if y'all have any more feedback, I'd be happy to address it in follow-up PRs. |
Is it supported to have additional text after the language marker? For example: https://github.com/pypa/hatch/blob/52a81d80cd03bfb90d99a5837c778971ea0c8c22/src/hatch/env/plugin/interface.py#L23-L41 |
@ofek asked [about this][ref]. I did specifically add support for it, but neglected to add a test. This PR adds a test. [ref]: #9030 (comment)
Ah yeah good catch. I did indeed specifically add support for that, but neglected to add a test. I added one in #9050. |
@ofek asked [about this][ref]. I did specifically add support for it, but neglected to add a test. This PR adds a test. [ref]: #9030 (comment)
(This is not possible to actually use until #8854 is merged.)
This commit slots in support for formatting Markdown fenced code blocks1. With the refactoring done for reStructuredText previously, this ended up being pretty easy to add. Markdown code blocks are also quite a bit easier to parse and recognize correctly.
One point of contention in #8860 is whether to assume that unlabeled Markdown code fences are Python or not by default. In this PR, we make such an assumption. This follows what
rustdoc
does. The mitigation here is that if an unlabeled code block isn't Python, then it probably won't parse as Python. And we'll end up skipping it. So in the vast majority of cases, the worst thing that can happen is a little bit of wasted work.Closes #8860