-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add support for formatting reStructuredText code snippets #9003
Conversation
782586b
to
9e0f147
Compare
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work.
My only concern is that I find it difficult to assess how the changes introducing the action queue impact performance.
I think we can get a good answer to this by simply enabling docstring formatting for our benchmarks until we find the time to add dedicated docstring tests. This gives us at least an idea on how expensive the feature is for docstring not containing code examples.
crates/ruff_python_formatter/resources/test/fixtures/ruff/docstring_code_examples.py
Show resolved
Hide resolved
crates/ruff_python_formatter/resources/test/fixtures/ruff/docstring_code_examples.py
Show resolved
Hide resolved
self.kind = Some(CodeExampleKind::Rst(litblock)); | ||
queue.push_back(CodeExampleAddAction::Print { original }); | ||
} else { | ||
queue.push_back(CodeExampleAddAction::Print { original }); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this mean that we'll push to the VecDeque
even in case where the docstring contains no code exampels? I would be interested in understanding the performance implication of allocating here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah it will. Although it will push to the queue for every line, it should only allocate the first time. Even in the case where a code example is present, I don't expect the queue to ever grow beyond more than a couple elements, so the allocation should be amortized. But it's only amortized within each docstring.
It does indeed look like the happy path in the status quo does not allocate. I still perceive this to likely be a marginal cost, but I'll see about adding some benchmarks to this PR.
crates/ruff_python_formatter/src/expression/string/docstring.rs
Outdated
Show resolved
Hide resolved
// Pad to the next multiple of tab_width | ||
seen_indent_len += 8 - (seen_indent_len.rem_euclid(8)); | ||
line = &line[1..]; | ||
} else if char.is_whitespace() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Python only supports \t
and
(space) as valid indentation characters (and \f
form feeds that reset the indentation). It's unclear to me if supporting all white space is okay because we're inside a docstring or should limit it to Python-whitespace only (the same applies for indent_with_suffix
, we have trim_whitespace_start
helper to do so)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm. I see. I hadn't known about this or the trim_whitespace_start
helper.
One possible issue I see here is that the docstring whitespace normalization uses just the bare trim_start()
and trim_end()
routines in places. Is that correct, or should those be switched to the Python specific definition too? Same deal with indentation_length
. It uses char::is_whitespace
from std, which covers all the Unicode forms of whitespace too.
In any case, I've updated my uses of whitespace to trim_whitespace_start
and is_python_whitespace
. (Probably that module should get re-tooled a bit. i.e., Add an extension trait for char
and rename trim_whitespace_start
and assorted methods to trim_python_start
.)
crates/ruff_python_formatter/src/expression/string/docstring.rs
Outdated
Show resolved
Hide resolved
@zanieb should we remove the formatter label from the PR, considering that the functionality isn't available yet or will we remove the changelog entry manually? |
@MichaReiser - We clean the changelogs manually, so no issue keeping the label on there. |
This fixes a bug where if the very last line in a docstring is a doctest, then it wasn't getting formatted. We do a little more than fix that bug in this commit. In particular, we industrialize how we handle actions by using a queue. Instead of trying to do too much with each action, we make each one a bit simpler and build the infrastructure necessary to permit more arbitrary composition. This in particular lets us handle weird corner cases like "the code example and the docstring end at the same time" without much fuss. In essence, actions like Format and Reset no longer carry the "original" line. Instead of doing that, we generate two actions: the first to format and the second (if necessary) to print the original line. This infrastructure will become more apparently useful when dealing with reStructuredText blocks.
There was really no good reason to have it panic if the code block given is empty. Instead, we can just return `None` and everything will get handled correctly. (It will turn into a `Reset` with zero lines, which will in turn be a no-op.) In practice I do believe empty code blocks are not possible, but I felt like this makes the code a bit more robust.
The comment in the source code should explain, but basically, if the user has tab indentation enabled, then this winds up screwing with code snippet formatting. The essential problem seems to be that docstring normalization strips tabs, and this interplay between code snippet reformatting with tabs winds up making formatting non-idempotent. We justify this hard-coded option by pointing out that tabs get stripped as part of docstring normalization anyway.
Small style tweak. . o O { too bad `imports_granularity` isn't stable in rustfmt yet }
9e0f147
to
9769bf6
Compare
This commit makes use of the refactoring done in prior commits to slot in reStructuredText support. Essentially, we add a new type of code example and look for *both* literal blocks and code block directives. Literal blocks are treated as Python by default because it seems to be a common practice[1]. That is, literal blocks like this: ``` def example(): """ Here's an example:: foo( 1 ) All done. """ pass ``` Will get reformatted. And code blocks (via reStructuredText directives) will also get reformatted: ``` def example(): """ Here's an example: .. code-block:: python foo( 1 ) All done. """ pass ``` When looking for a code block, it is possible for it to become invalid. In which case, we back out of looking for a code example and print the lines out as they are. As with doctest formatting, if reformatting the code would result in invalid Python or if the code collected from the block is invalid, then formatting is also skipped. A number of tests have been added to check both the formatting and resetting behavior. Mixed indentation is also tested a fair bit, since one of my initial attempts at dealing with mixed indentation ended up not working. Closes #8859 [1]: adamchainz/blacken-docs#195
This is what seems consistent with the prevailing code.
9769bf6
to
618e48d
Compare
In lieu of micro-benchmarks, I decided to do a little ad hoc benchmarking with hyperfine on dagster. Using this branch with a cherry-pick of #8854, I ran formatting with the default config and formatting with
Where:
I ran it multiple times, and the same result occurred. So there is just a very slight observable slow-down here. But, this particular pile of code does have a large number of reStructuredText code blocks. So you'd expect it to possibly run a little more slowly since it is doing more work. On a profile, I can see if !self.code_example.kind.is_none()
|| CodeExampleDoctest::new(line).is_some()
|| CodeExampleRst::new(line).is_some()
{
self.code_example.add(line, &mut self.action_queue);
self.run_action_queue()
} else {
self.print_one(&line.as_output())
} So basically, as long as we weren't already collecting a code example and the current line didn't look like the start of one, then we could avoid the queue and just print the line directly. I then baked this off against what I had:
I ran this a few times and sometimes the fast path would be a hair faster and other times it would be flipped. Running without docstring formatting enabled at all was consistently faster by 1.00-1.01 times. I think this satisfies me personally that the queue overhead is probably negligible, at least until we have some data suggesting otherwise. (I'm sure there are more synthetic benchmarks one could construct that might show a bigger difference.) |
Going to bring this, but @MichaReiser feel free to leave more feedback and I'll address it in a follow-up PR. :-) |
Thanks for running the manual benchmark. This looks good to me. |
(This is not possible to actually use until #8854 is merged.)
ruff_python_formatter: add reStructuredText docstring formatting support
This commit makes use of the refactoring done in prior commits to slot
in reStructuredText support. Essentially, we add a new type of code
example and look for both literal blocks and code block directives.
Literal blocks are treated as Python by default because it seems to be a
common practice.
That is, literal blocks like this:
Will get reformatted. And code blocks (via reStructuredText directives)
will also get reformatted:
When looking for a code block, it is possible for it to become invalid.
In which case, we back out of looking for a code example and print the
lines out as they are. As with doctest formatting, if reformatting the
code would result in invalid Python or if the code collected from the
block is invalid, then formatting is also skipped.
A number of tests have been added to check both the formatting and
resetting behavior. Mixed indentation is also tested a fair bit, since
one of my initial attempts at dealing with mixed indentation ended up
not working.
I recommend working through this PR commit-by-commit. There is in
particular a somewhat gnarly refactoring before reST support is added.
Closes #8859