Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docstring code formatter: add support for Markdown Python code snippets #8860

Closed
BurntSushi opened this issue Nov 27, 2023 · 8 comments · Fixed by #9030
Closed

docstring code formatter: add support for Markdown Python code snippets #8860

BurntSushi opened this issue Nov 27, 2023 · 8 comments · Fixed by #9030
Labels
docstring Related to docstring linting or formatting formatter Related to the formatter

Comments

@BurntSushi
Copy link
Member

The initial implementation of the docstring code snippet formatter in #8811 only reformats doctest code snippets. This issue tracks the work for making it also reformat Markdown Python code snippets. Work should probably proceed by adding a new CodeExampleKind:

/// The kind of code example observed in a docstring.
#[derive(Clone, Debug, Eq, PartialEq)]
enum CodeExampleKind {
/// Code found in Python "doctests."
///
/// Documentation describing doctests and how they're recognized can be
/// found as part of the Python standard library:
/// https://docs.python.org/3/library/doctest.html.
///
/// (You'll likely need to read the regex matching used internally by the
/// doctest module to determine more precisely how it works.)
Doctest(CodeExampleDoctest),
}

@BurntSushi BurntSushi added the formatter Related to the formatter label Nov 27, 2023
@ofek
Copy link
Contributor

ofek commented Nov 27, 2023

I am especially looking forward to this as my code snippets render as documentation and are almost exclusively written as Markdown Python code blocks.

@BurntSushi
Copy link
Member Author

@ofek Do you have any example code I could take a look at that uses Markdown blocks?

@konstin
Copy link
Member

konstin commented Nov 27, 2023

Another example, with mixed non-python and python blocks: https://github.com/sacdallago/bio_embeddings/blob/efb9801f0de9b9d51d19b741088763a7d2d0c3a2/bio_embeddings/embed/bepler_embedder.py#L1-L38. i think it makes sense to assume blocks are python just like rustdoc defaults to code blocks being python, but we should skip blocks with an explicitly different language

@ofek
Copy link
Contributor

ofek commented Nov 27, 2023

I would prefer if we don't treat Markdown blocks differently compared to other parsers and assume unspecified blocks are generic, similar to the first block in your example where they show shell commands to run.

@BurntSushi
Copy link
Member Author

I think it makes sense to assume unmarked blocks are Python code. As @konstin said, this is what rustdoc does and it works great. The mitigation here is that if the blocks are not valid Python code, then we can leave them as-is untouched.

@ofek
Copy link
Contributor

ofek commented Nov 27, 2023

I understand what you both are saying but I'm saying that model does not work for Python because historically we do not use Markdown and have no such concept of embedded documentation that is rendered in a standard way. If you do what you say here you're going to immediately fail on the vast majority of code blocks, because most are not Python, because we do not do that.

@MichaReiser MichaReiser added the docstring Related to docstring linting or formatting label Nov 27, 2023
@BurntSushi BurntSushi added this to the Formatter: Stable milestone Nov 29, 2023
BurntSushi added a commit that referenced this issue Dec 6, 2023
This commit slots in support for formatting Markdown fenced code
blocks[1]. With the refactoring done for reStructuredText previously,
this ended up being pretty easy to add. Markdown code blocks are also
quite a bit easier to parse and recognize correctly.

Closes #8860

[1]: https://spec.commonmark.org/0.30/#fenced-code-blocks
BurntSushi added a commit that referenced this issue Dec 6, 2023
This commit slots in support for formatting Markdown fenced code
blocks[1]. With the refactoring done for reStructuredText previously,
this ended up being pretty easy to add. Markdown code blocks are also
quite a bit easier to parse and recognize correctly.

One point of contention in #8860 is whether to assume that unlabeled
Markdown code fences are Python or not by default. In this PR, we make
such an assumption. This follows what `rustdoc` does. The mitigation
here is that if an unlabeled code block isn't Python, then it probably
won't parse as Python. And we'll end up skipping it. So in the vast
majority of cases, the worst thing that can happen is a little bit of
wasted work.

Closes #8860

[1]: https://spec.commonmark.org/0.30/#fenced-code-blocks
BurntSushi added a commit that referenced this issue Dec 6, 2023
This commit slots in support for formatting Markdown fenced code
blocks[1]. With the refactoring done for reStructuredText previously,
this ended up being pretty easy to add. Markdown code blocks are also
quite a bit easier to parse and recognize correctly.

One point of contention in #8860 is whether to assume that unlabeled
Markdown code fences are Python or not by default. In this PR, we make
such an assumption. This follows what `rustdoc` does. The mitigation
here is that if an unlabeled code block isn't Python, then it probably
won't parse as Python. And we'll end up skipping it. So in the vast
majority of cases, the worst thing that can happen is a little bit of
wasted work.

Closes #8860

[1]: https://spec.commonmark.org/0.30/#fenced-code-blocks
BurntSushi added a commit that referenced this issue Dec 7, 2023
This commit slots in support for formatting Markdown fenced code
blocks[1]. With the refactoring done for reStructuredText previously,
this ended up being pretty easy to add. Markdown code blocks are also
quite a bit easier to parse and recognize correctly.

One point of contention in #8860 is whether to assume that unlabeled
Markdown code fences are Python or not by default. In this PR, we make
such an assumption. This follows what `rustdoc` does. The mitigation
here is that if an unlabeled code block isn't Python, then it probably
won't parse as Python. And we'll end up skipping it. So in the vast
majority of cases, the worst thing that can happen is a little bit of
wasted work.

Closes #8860

[1]: https://spec.commonmark.org/0.30/#fenced-code-blocks
BurntSushi added a commit that referenced this issue Dec 7, 2023
This commit slots in support for formatting Markdown fenced code
blocks[1]. With the refactoring done for reStructuredText previously,
this ended up being pretty easy to add. Markdown code blocks are also
quite a bit easier to parse and recognize correctly.

One point of contention in #8860 is whether to assume that unlabeled
Markdown code fences are Python or not by default. In this PR, we make
such an assumption. This follows what `rustdoc` does. The mitigation
here is that if an unlabeled code block isn't Python, then it probably
won't parse as Python. And we'll end up skipping it. So in the vast
majority of cases, the worst thing that can happen is a little bit of
wasted work.

Closes #8860

[1]: https://spec.commonmark.org/0.30/#fenced-code-blocks
BurntSushi added a commit that referenced this issue Dec 7, 2023
(This is not possible to actually use until
#8854 is merged.)

This commit slots in support for formatting Markdown fenced code
blocks[1]. With the refactoring done for reStructuredText previously,
this ended up being pretty easy to add. Markdown code blocks are also
quite a bit easier to parse and recognize correctly.

One point of contention in #8860 is whether to assume that unlabeled
Markdown code fences are Python or not by default. In this PR, we make
such an assumption. This follows what `rustdoc` does. The mitigation
here is that if an unlabeled code block isn't Python, then it probably
won't parse as Python. And we'll end up skipping it. So in the vast
majority of cases, the worst thing that can happen is a little bit of
wasted work.

Closes #8860

[1]: https://spec.commonmark.org/0.30/#fenced-code-blocks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
docstring Related to docstring linting or formatting formatter Related to the formatter
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants