rustdoc - implement #[doc(codeblock_attr = ...)] #84597

jsgf · 2021-04-26T17:39:25Z

This PR implements a #[doc(codeblock_attr = )] specification which allows all the codeblock attrs in a doc comment to be overridden. For example, if the doc comment has come from outside the Rust ecosystem (eg from bindgen, protoc, etc) then it may appear to have code blocks in its doc comments, but they are not intended to be in Rust. They will either fail to parse, causing spurious warnings, or worse, parse and do unintended things.

To address this, this PR allows all the code block attributes to be overridden for both ``` and indent-based code blocks (which don't currently have a way to specify attributes). For example:

#[doc = r#"
This is a random piece of code

    10 print "I don't have a closing quote
"#]
#[doc(codeblock_attr = "text")] // force indented code block to be treated as plain text
struct Foo;

I have some possible discussion points:

Name of codeblock_attr
Should there also be an "extend" counterpart. I don't have a need for it, in the use-cases I'm thinking of, but it's worth considering (not for this PR, but making sure this PR doesn't preclude a future extension).
module/crate-level override/default? Doing it at a per-doc comment level seems simplest, so long as you can update your code generator to emit it. Doing it at a larger scope just means updating a prolog/template.

Personally I think this PR lands on the right answers for all of these, but I'm open to other thoughts.

https://internals.rust-lang.org/t/blanket-tags-for-doc-code-blocks/14567/ for irlo discussion.

Relevant issue: #59867

cc @joshtriplett @ehuss

Restructure the loop to avoid awkward mid-loop continues which complicate the flow control, and the duplicated code. Result is simpler and should be functionally identical.

The `codeblock_attr` parameter for `#[doc]` overrides the codeblock attributes - that is, the text `foo` after ```` ```foo ````. This also allows a codeblock attribute to be applied to indentation code blocks. This is intended for documentation coming from non-Rust-aware sources which may contain code examples, but are not written in Rust. Such examples - at best - will cause compilation warnings from rustdoc, or could compile but have unintended consequences.

rust-highfive · 2021-04-26T17:39:29Z

r? @jackh726

(rust-highfive has picked a reviewer for you, use r? to override)

jsgf · 2021-04-26T17:40:51Z

Oh, need to update the docs.

joshtriplett · 2021-04-26T17:43:30Z

This looks great, and thank you for adding the detailed tests for an item-level attr overriding a module-level one.

jsgf · 2021-04-26T17:49:45Z

@joshtriplett - The module-level attr only affects the module's #[doc] attribute - it doesn't have any effect on the docs within the module. I should probably add a test to make sure the module-level one doesn't affect the docs within the module.

joshtriplett · 2021-04-26T17:57:26Z

Oh, I misread the test then. I would have expected the attribute at the module level to affect any items within the module, unless the item itself had an overriding attribute or an individual code block did. That would make this easier to apply to an entire module at once.

jsgf · 2021-04-26T18:16:40Z

Yeah, but a module can have its own doc comment, so there's an ambiguity about whether you want to just override the module-level codeblocks or all the codeblocks within the module. That could be resolved by having a second doc attribute parameter, but that's getting more complex (esp since we'd need to define how it interacts with the per-doc parameters).

It just seemed simpler to keep it as a doc-by-doc thing. It enables the basic functionality, and could be extended if this form is too awkward in practice.

jyn514 · 2021-04-26T20:02:58Z

Has this gone through an RFC? Rustdoc has a lot of #[doc] attributes already.

r? @GuillaumeGomez

jyn514 · 2021-04-26T20:07:23Z

@jsgf what should happen if there are multiple codeblocks for the same item? Will this apply to all of them? If the item is re-exported, will codeblock_attr apply to comments added on the re-export?

I don't think codeblock_attr is a good name, "attribute" has a specific meaning in rust and this isn't it - I would call these "lang strings" or something. Maybe codeblock_lang is better?

jyn514 · 2021-04-26T20:10:15Z

compiler/rustc_passes/src/check_attr.rs

+        if let Some(c) = doc_codeblock_attr
+            .chars()
+            .find(|&c| c == ',' || c == '`' || c == '"' || c == '\'' || c.is_whitespace())


This check is used somewhere else in rustdoc - could you find it and factor out an "is_lang_string_separator" function? Getting it wrong means rustdoc can't fix it without breaking backwards compatibility.

jyn514 · 2021-04-26T20:11:18Z

compiler/rustc_passes/src/check_attr.rs

+            );
+            return false;
+        }
+        if doc_codeblock_attr.starts_with(' ') || doc_codeblock_attr.ends_with(' ') {


This check will never run, you already give an error above if the lang string contains whitespace.

jsgf · 2021-04-26T20:14:41Z

@jyn514

what should happen if there are multiple codeblocks for the same item? Will this apply to all of them? If the item is re-exported, will codeblock_attr apply to comments added on the re-export?

Yes, it overrides all the codeblocks in a given doc comment. I don't know how re-export works; I didn't test that. Does re-export also re-export the doc tests? Mechanically the override is attached to the same doc attr node that has the docs themselves, so I would imagine that they'd follow along on any re-export.

I don't think codeblock_attr is a good name, "attribute" has a specific meaning in rust and this isn't it - I would call these "lang strings" or something. Maybe codeblock_lang is better?

Yeah I agree about "attributes" but apparently that's the markdown name for them. pulldown_cmark avoids giving them a name at all other than "the language", and rustdoc internally uses "syntax". I'm happy to rename it to codeblock_lang.

jyn514 · 2021-04-26T20:16:43Z

Yes, it overrides all the codeblocks in a given doc comment. I don't know how re-export works; I didn't test that. Does re-export also re-export the doc tests? Mechanically the override is attached to the same doc attr node that has the docs themselves, so I would imagine that they'd follow along on any re-export.

Yes, any re-export inherits all the attributes of the original item, see merge_attrs:

rust/src/librustdoc/clean/inline.rs

Line 286 in ae54ee6

fn merge_attrs(

But anyway my question isn't how it currently behaves - re-exports are the most complicated part of rustdoc and I fully expect the first draft to be wrong - my question is how you think it should behave.

GuillaumeGomez · 2021-04-26T20:23:01Z

I'm shared on this: on one side, I'd say to simply re-format the text before putting it into your rust code, that's up to you, not to rustdoc to fix your doc comment. On another, I can see how it could be used, so let's say that for now, I'm skeptical but not completely opposed to this feature.

Now a few issues remain here. @jyn514 already underlined the issue in case you have multiple code blocks in one doc comment. In this case, it's all or nothing and I don't see how to go around that limitation.

Another problem to be seen is: is it really that useful? It feels like we're using a bazooka to kill a fly here. Markdown provides already everything you need if you don't want your code to be interpreted as a rust one. It's being done everywhere, so why not in your case? You can easily overload bindgen output after all.

And yes, I think it'd need to go at least through an RFC here to see if the rest of the team is up with this feature. Because that's also something to be taken into consideration: new feature means more code to be supported, tested and kept up-to-date. Considering how small the gain is, I'd personally say it's not worth it, but if people think otherwise, why not. Hence the need to go through an RFC first here.

jyn514 · 2021-04-26T21:49:55Z

Another problem to be seen is: is it really that useful? It feels like we're using a bazooka to kill a fly here. Markdown provides already everything you need if you don't want your code to be interpreted as a rust one. It's being done everywhere, so why not in your case? You can easily overload bindgen output after all.

Actually yeah, why is this the easiest solution? Couldn't you do the equivalent of re.sub(docs, "^ ", " ") in your build.rs? (edit: github is mangling my spaces, but there are 4 in the first arg and 3 in the second)

jsgf · 2021-04-26T22:15:18Z

@jyn514 @GuillaumeGomez:

OK, so my use case is this:

I'm specifically focusing on the fbthrift compiler, which generates the Rust code via moustache templates: https://github.com/facebook/fbthrift/blob/master/thrift/compiler/generate/templates/rust/lib.rs.mustache (this doesn't currently have #[doc] support in it, but it will land shortly).
The build is being driven by Buck in our case, so build.rs isn't involved. However for Cargo builds it would probably use a build.rs, and I'd want it to work in a uniform way regardless.
Thrift supports many languages, and the formatting of the doc comments, uh, reflects that diversity, so unfortunately there's no uniform syntax or markup language used. But enough of it is markdown-ish that treating it that way often gives reasonable results. Either way, even if it is markdown with code blocks, they're almost certainly not Rust.

In the common case in Rust code, the docs are being written for that specific piece of Rust code, so any example or doctest will also be in Rust. But in this case, we're effectively importing documentation from an external source, which only incidentally supports Rust. The goal here is to have a uniform way to neutralize any unintended consequences of doing this. At the very least it could result in spurious compiler warnings about malformed code, but it could extend to a possible attack vector (nobody would expect a thrift or protobuf file could result in arbitrary code, but a doc comment and associated doctests allow arbitrary code to be injected).

I'd really like to avoid any kind of ad-hoc regex approach - if nothing else I don't see how one could reliably implement it. You'd have to at least tokenize the .rs file to get the full docstring text, then apply the regex to it, and then regenerate the source with the updated text. I don't see how a re.sub(...) approach would work directly on the source file itself given all the possible ways of expressing the same doc comment (several kinds of cooked comment format, and numerous string literal formats and ways of quoting newlines).

In my case, in principle the thrift compiler could parse the comment as markdown while generating the Rust code and do the cleanups there. But in practice this is awkward because it's written in C++ and there's no guarantee that its markdown implementation would match Rust's.

So I do think this is the right place to implement this feature. But I'm open to suggestions about details.

jsgf · 2021-04-26T22:16:52Z

@jyn514

But anyway my question isn't how it currently behaves - re-exports are the most complicated part of rustdoc and I fully expect the first draft to be wrong - my question is how you think it should behave.

I think the codeblock_lang parameter should just stick with their corresponding docs regardless of how they're re-exported. Since there's no inheritance mechanism, there are no ambiguities about they should be resolved (as there would be if there were crate or module-wide overrides).

jyn514 · 2021-04-26T22:20:59Z

I think the codeblock_lang parameter should just stick with their corresponding docs regardless of how they're re-exported. Since there's no inheritance mechanism, there are no ambiguities about they should be resolved (as there would be if there were crate or module-wide overrides).

Sorry, I don't follow - do you mean that re-exporting them shouldn't apply to the new docs, or that it should?

jyn514 · 2021-04-26T22:26:16Z

Ok, looking at #59867, the attribute suggested there is much more limited in scope:

So what I need is an opt-in flag that you can declare module-wide (and children modules and such) so that any code block without a declared language is not treated as rust code by default, so then it won't become a doctest.

I think an attribute is better so you can have it in code, so maybe something like #![doc(default_lang = "text")] would work. Making this general and customizable per-item just seems like a headache. If you still want to opt-in to doc-tests for some items, you can use ```rust like normal.

jsgf · 2021-04-26T23:12:49Z

What I'm saying is that if you have something like:

#[doc = r#"
Random doc from unknown source
```
hi_some_random_code()
```
"#]
#[doc(codeblock_lang = "text")]
pub struct SomeImportantStruct { ... }

then if you re-export SomeImportantStruct and it's associated documentation, then the #doc[codeblock_lang = "text"] should be considered an intrinsic part of the documentation wherever it ends up. (Though if I've misunderstood what re-exporting docs means then please point me in the right direction. I can see how re-exporting docs as docs makes sense, but it's unclear to me whether it makes sense to talk about re-exporting a doctest.)

My concern with any kind of module or crate-level #[doc(default_lang = ...)] is that I'm not sure how that would interact with re-exporting - should it only apply to the original doc, or should it follow the re-export? If it's re-exported to a context with a different default_lang, which should take precedence?

We can definitely come up with something reasonable, but it seems like more up-front complexity. I prefer the clarity of explicitly tagging each #[doc] attribute, since it gives us the basic capability in a relatively simple to explain (and implement) way. If it turns out that this is a useful capability but it's awkward to apply to every doc attribute, then we can extend it with a defaulting mechanism later.

jyn514 · 2021-04-26T23:35:12Z

My concern with any kind of module or crate-level #[doc(default_lang = ...)] is that I'm not sure how that would interact with re-exporting - should it only apply to the original doc, or should it follow the re-export? If it's re-exported to a context with a different default_lang, which should take precedence?

The attribute from the original crate should take precedence, the same way intra-doc links when re-exported use the scope they were originally written in: https://doc.rust-lang.org/nightly/rustdoc/linking-to-items-by-name.html#warnings-re-exports-and-scoping. So for example this would not run any doctest:

// crate1
#![doc(default_lang = "text")]
pub struct S {}

// crate2
```
assert!(true);
```
pub use crate1::S;

if you re-export SomeImportantStruct and it's associated documentation, then the #doc[codeblock_lang = "text"] should be considered an intrinsic part of the documentation wherever it ends up.

Sorry, my question was unclear. That behavior looks right to me, but it wasn't what I was asking about, I meant this case:

mod inner {
    /// ```
    /// assert!(true);
    /// ```
    pub fn f() {}
}

#[doc(codeblock_lang = "text")]
/// ```
/// this is invalid syntax
/// ```
pub use inner::f; // should the assert!(true) run or not?

I prefer the clarity of explicitly tagging each #[doc] attribute

It's impossible to tag each attribute, because that means you'd have to write each attribute as #[doc = "..."]. I think you meant each item.

If it turns out that this is a useful capability but it's awkward to apply to every doc attribute, then we can extend it with a defaulting mechanism later.

That seems really complicated, rustdoc already has enough bugs around re-exports. If we limit this to a crate-level setting, then it's simple: use the default of the crate the item was defined in.

jsgf · 2021-04-27T00:02:58Z

Hm, I'm not sure I really understand what the doc comment on the use statement really does - I couldn't get it to appear in the generated docs. But it does emit a doctest, which seems separate and distinct from the one on the mod inner doctest, so I would say that its codeblock_attr only applies to it, and not the inner one.

That seems really complicated, rustdoc already has enough bugs around re-exports. If we limit this to a crate-level setting, then it's simple: use the default of the crate the item was defined in.

Hm, well I'll think about it and maybe give the PR a new spin. Crate-level seems too coarse, but maybe module level.

jyn514 · 2021-04-27T03:11:40Z

I would rather not do module level because modules can also be re-exported. It also introduces ambiguity about whether the attribute applies to all items in the module or just the module itself.

GuillaumeGomez · 2021-04-27T11:59:25Z

@jyn514 @GuillaumeGomez:

OK, so my use case is this:

* I'm specifically focusing on the fbthrift compiler, which generates the Rust code via moustache templates: https://github.com/facebook/fbthrift/blob/master/thrift/compiler/generate/templates/rust/lib.rs.mustache (this doesn't currently have `#[doc]` support in it, but it will land shortly).

* The build is being driven by Buck in our case, so build.rs isn't involved. However for Cargo builds it would probably use a build.rs, and I'd want it to work in a uniform way regardless.

* Thrift supports _many_ languages, and the formatting of the doc comments, uh, reflects that diversity, so unfortunately there's no uniform syntax or markup language used. But enough of it is markdown-ish that treating it that way often gives reasonable results. Either way, even if it is markdown with code blocks, they're almost certainly not Rust.

...

That definitely sounds like an issue on your side. You can always use a markdown parser on your side if you don't want to rely on regex. But again, that brings a lot of changes on rustdoc for a very specific issue that I'm pretty sure isn't common.

jyn514 · 2021-04-27T15:25:13Z

Again, I think this should go through an RFC first. As we've found yesterday and today, this is a very large design space and I'd like to hear input from other potential users.

Hm, I'm not sure I really understand what the doc comment on the use statement really does - I couldn't get it to appear in the generated docs.

Oh ugh, I just checked and you're right, it works with cross-crate re-exports but not intra-crate re-exports. I'll open an issue.

(I think this reinforces my point that re-exports are the most complicated part of rustdoc.)

Mark-Simulacrum · 2021-05-04T13:01:29Z

It does seem like at least the warnings rustdoc currently emits should be allowable with an attribute (including at the crate level). It may be reasonable for rustdoc to not treat code blocks implicitly as rust code if they don't parse as such, too, including not running doc tests.

The case of valid rust code that gets run and is not intended to be run (e.g., deleting important files or whatever) seems much less "real"; but even so, could be mitigated with existing cfgs it sounds like.

jyn514 · 2021-05-04T13:28:04Z

It does seem like at least the warnings rustdoc currently emits should be allowable with an attribute (including at the crate level).

#84587

bors · 2021-05-04T23:51:45Z

☔ The latest upstream changes (presumably #84707) made this pull request unmergeable. Please resolve the merge conflicts.

jyn514 · 2021-05-18T18:35:35Z

@jsgf do you still think this is useful now that #84587 has been merged?

crlf0710 · 2021-06-05T09:41:10Z

@jsgf Ping from triage! Any updates on this?

crlf0710 · 2021-07-09T11:10:15Z

@jsgf I'm gonna close this due to inactivity. Feel free to reopen or create a new pr when you've got time to work on this again. Thanks!

jsgf · 2021-07-09T18:23:41Z

@crlf0710 Yeah, I think this is stalled on needing an RFC.

jsgf added 2 commits April 26, 2021 01:39

rustdoc: refactor rust_code_blocks

3c0c49d

Restructure the loop to avoid awkward mid-loop continues which complicate the flow control, and the duplicated code. Result is simpler and should be functionally identical.

rust-highfive assigned jackh726 Apr 26, 2021

rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Apr 26, 2021

rust-highfive assigned GuillaumeGomez and unassigned jackh726 Apr 26, 2021

jyn514 added T-rustdoc needs-fcp This change is insta-stable, so needs a completed FCP to proceed. labels Apr 26, 2021

jyn514 reviewed Apr 26, 2021

View reviewed changes

jyn514 mentioned this pull request Apr 27, 2021

Documentation on re-exports is ignored if the item is defined in the same crate #84619

Closed

jyn514 mentioned this pull request May 14, 2021

Scrape code examples from examples/ directory for Rustdoc rust-lang/rfcs#3123

Merged

crlf0710 added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jun 5, 2021

jyn514 added T-rustdoc Relevant to the rustdoc team, which will review and decide on the PR/issue. and removed T-rustdoc-temp labels Jun 22, 2021

crlf0710 added S-inactive Status: Inactive and waiting on the author. This is often applied to closed PRs. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Jul 9, 2021

crlf0710 closed this Jul 9, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

rustdoc - implement #[doc(codeblock_attr = ...)] #84597

rustdoc - implement #[doc(codeblock_attr = ...)] #84597

jsgf commented Apr 26, 2021 •

edited

Loading

rust-highfive commented Apr 26, 2021

jsgf commented Apr 26, 2021

joshtriplett commented Apr 26, 2021

jsgf commented Apr 26, 2021 •

edited

Loading

joshtriplett commented Apr 26, 2021

jsgf commented Apr 26, 2021 •

edited

Loading

jyn514 commented Apr 26, 2021

jyn514 commented Apr 26, 2021

jyn514 Apr 26, 2021

jyn514 Apr 26, 2021

jsgf commented Apr 26, 2021

jyn514 commented Apr 26, 2021

GuillaumeGomez commented Apr 26, 2021

jyn514 commented Apr 26, 2021 •

edited

Loading

jsgf commented Apr 26, 2021

jsgf commented Apr 26, 2021

jyn514 commented Apr 26, 2021 •

edited

Loading

jyn514 commented Apr 26, 2021 •

edited

Loading

jsgf commented Apr 26, 2021

jyn514 commented Apr 26, 2021

jsgf commented Apr 27, 2021

jyn514 commented Apr 27, 2021 •

edited

Loading

GuillaumeGomez commented Apr 27, 2021

jyn514 commented Apr 27, 2021

Mark-Simulacrum commented May 4, 2021

jyn514 commented May 4, 2021

bors commented May 4, 2021

jyn514 commented May 18, 2021

crlf0710 commented Jun 5, 2021

crlf0710 commented Jul 9, 2021

jsgf commented Jul 9, 2021

rustdoc - implement #[doc(codeblock_attr = ...)] #84597

rustdoc - implement #[doc(codeblock_attr = ...)] #84597

Conversation

jsgf commented Apr 26, 2021 • edited Loading

rust-highfive commented Apr 26, 2021

jsgf commented Apr 26, 2021

joshtriplett commented Apr 26, 2021

jsgf commented Apr 26, 2021 • edited Loading

joshtriplett commented Apr 26, 2021

jsgf commented Apr 26, 2021 • edited Loading

jyn514 commented Apr 26, 2021

jyn514 commented Apr 26, 2021

jyn514 Apr 26, 2021

Choose a reason for hiding this comment

jyn514 Apr 26, 2021

Choose a reason for hiding this comment

jsgf commented Apr 26, 2021

jyn514 commented Apr 26, 2021

GuillaumeGomez commented Apr 26, 2021

jyn514 commented Apr 26, 2021 • edited Loading

jsgf commented Apr 26, 2021

jsgf commented Apr 26, 2021

jyn514 commented Apr 26, 2021 • edited Loading

jyn514 commented Apr 26, 2021 • edited Loading

jsgf commented Apr 26, 2021

jyn514 commented Apr 26, 2021

jsgf commented Apr 27, 2021

jyn514 commented Apr 27, 2021 • edited Loading

GuillaumeGomez commented Apr 27, 2021

jyn514 commented Apr 27, 2021

Mark-Simulacrum commented May 4, 2021

jyn514 commented May 4, 2021

bors commented May 4, 2021

jyn514 commented May 18, 2021

crlf0710 commented Jun 5, 2021

crlf0710 commented Jul 9, 2021

jsgf commented Jul 9, 2021

jsgf commented Apr 26, 2021 •

edited

Loading

jsgf commented Apr 26, 2021 •

edited

Loading

jsgf commented Apr 26, 2021 •

edited

Loading

jyn514 commented Apr 26, 2021 •

edited

Loading

jyn514 commented Apr 26, 2021 •

edited

Loading

jyn514 commented Apr 26, 2021 •

edited

Loading

jyn514 commented Apr 27, 2021 •

edited

Loading