-
Notifications
You must be signed in to change notification settings - Fork 13k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rustdoc - implement #[doc(codeblock_attr = ...)] #84597
Conversation
Restructure the loop to avoid awkward mid-loop continues which complicate the flow control, and the duplicated code. Result is simpler and should be functionally identical.
The `codeblock_attr` parameter for `#[doc]` overrides the codeblock attributes - that is, the text `foo` after ```` ```foo ````. This also allows a codeblock attribute to be applied to indentation code blocks. This is intended for documentation coming from non-Rust-aware sources which may contain code examples, but are not written in Rust. Such examples - at best - will cause compilation warnings from rustdoc, or could compile but have unintended consequences.
r? @jackh726 (rust-highfive has picked a reviewer for you, use r? to override) |
Oh, need to update the docs. |
This looks great, and thank you for adding the detailed tests for an item-level attr overriding a module-level one. |
@joshtriplett - The module-level attr only affects the module's #[doc] attribute - it doesn't have any effect on the docs within the module. I should probably add a test to make sure the module-level one doesn't affect the docs within the module. |
Oh, I misread the test then. I would have expected the attribute at the module level to affect any items within the module, unless the item itself had an overriding attribute or an individual code block did. That would make this easier to apply to an entire module at once. |
Yeah, but a module can have its own doc comment, so there's an ambiguity about whether you want to just override the module-level codeblocks or all the codeblocks within the module. That could be resolved by having a second It just seemed simpler to keep it as a doc-by-doc thing. It enables the basic functionality, and could be extended if this form is too awkward in practice. |
Has this gone through an RFC? Rustdoc has a lot of |
@jsgf what should happen if there are multiple codeblocks for the same item? Will this apply to all of them? If the item is re-exported, will codeblock_attr apply to comments added on the re-export? I don't think |
if let Some(c) = doc_codeblock_attr | ||
.chars() | ||
.find(|&c| c == ',' || c == '`' || c == '"' || c == '\'' || c.is_whitespace()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This check is used somewhere else in rustdoc - could you find it and factor out an "is_lang_string_separator" function? Getting it wrong means rustdoc can't fix it without breaking backwards compatibility.
); | ||
return false; | ||
} | ||
if doc_codeblock_attr.starts_with(' ') || doc_codeblock_attr.ends_with(' ') { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This check will never run, you already give an error above if the lang string contains whitespace.
Yes, it overrides all the codeblocks in a given doc comment. I don't know how re-export works; I didn't test that. Does re-export also re-export the doc tests? Mechanically the override is attached to the same
Yeah I agree about "attributes" but apparently that's the markdown name for them. |
Yes, any re-export inherits all the attributes of the original item, see rust/src/librustdoc/clean/inline.rs Line 286 in ae54ee6
But anyway my question isn't how it currently behaves - re-exports are the most complicated part of rustdoc and I fully expect the first draft to be wrong - my question is how you think it should behave. |
I'm shared on this: on one side, I'd say to simply re-format the text before putting it into your rust code, that's up to you, not to rustdoc to fix your doc comment. On another, I can see how it could be used, so let's say that for now, I'm skeptical but not completely opposed to this feature. Now a few issues remain here. @jyn514 already underlined the issue in case you have multiple code blocks in one doc comment. In this case, it's all or nothing and I don't see how to go around that limitation. Another problem to be seen is: is it really that useful? It feels like we're using a bazooka to kill a fly here. Markdown provides already everything you need if you don't want your code to be interpreted as a rust one. It's being done everywhere, so why not in your case? You can easily overload bindgen output after all. And yes, I think it'd need to go at least through an RFC here to see if the rest of the team is up with this feature. Because that's also something to be taken into consideration: new feature means more code to be supported, tested and kept up-to-date. Considering how small the gain is, I'd personally say it's not worth it, but if people think otherwise, why not. Hence the need to go through an RFC first here. |
Actually yeah, why is this the easiest solution? Couldn't you do the equivalent of |
OK, so my use case is this:
In the common case in Rust code, the docs are being written for that specific piece of Rust code, so any example or doctest will also be in Rust. But in this case, we're effectively importing documentation from an external source, which only incidentally supports Rust. The goal here is to have a uniform way to neutralize any unintended consequences of doing this. At the very least it could result in spurious compiler warnings about malformed code, but it could extend to a possible attack vector (nobody would expect a thrift or protobuf file could result in arbitrary code, but a doc comment and associated doctests allow arbitrary code to be injected). I'd really like to avoid any kind of ad-hoc regex approach - if nothing else I don't see how one could reliably implement it. You'd have to at least tokenize the .rs file to get the full docstring text, then apply the regex to it, and then regenerate the source with the updated text. I don't see how a In my case, in principle the thrift compiler could parse the comment as markdown while generating the Rust code and do the cleanups there. But in practice this is awkward because it's written in C++ and there's no guarantee that its markdown implementation would match Rust's. So I do think this is the right place to implement this feature. But I'm open to suggestions about details. |
I think the codeblock_lang parameter should just stick with their corresponding docs regardless of how they're re-exported. Since there's no inheritance mechanism, there are no ambiguities about they should be resolved (as there would be if there were crate or module-wide overrides). |
Sorry, I don't follow - do you mean that re-exporting them shouldn't apply to the new docs, or that it should? |
Ok, looking at #59867, the attribute suggested there is much more limited in scope:
I think an attribute is better so you can have it in code, so maybe something like |
What I'm saying is that if you have something like:
then if you re-export My concern with any kind of module or crate-level We can definitely come up with something reasonable, but it seems like more up-front complexity. I prefer the clarity of explicitly tagging each |
The attribute from the original crate should take precedence, the same way intra-doc links when re-exported use the scope they were originally written in: https://doc.rust-lang.org/nightly/rustdoc/linking-to-items-by-name.html#warnings-re-exports-and-scoping. So for example this would not run any doctest: // crate1
#![doc(default_lang = "text")]
pub struct S {}
// crate2
```
assert!(true);
```
pub use crate1::S;
Sorry, my question was unclear. That behavior looks right to me, but it wasn't what I was asking about, I meant this case: mod inner {
/// ```
/// assert!(true);
/// ```
pub fn f() {}
}
#[doc(codeblock_lang = "text")]
/// ```
/// this is invalid syntax
/// ```
pub use inner::f; // should the assert!(true) run or not?
It's impossible to tag each attribute, because that means you'd have to write each attribute as
That seems really complicated, rustdoc already has enough bugs around re-exports. If we limit this to a crate-level setting, then it's simple: use the default of the crate the item was defined in. |
Hm, I'm not sure I really understand what the doc comment on the
Hm, well I'll think about it and maybe give the PR a new spin. Crate-level seems too coarse, but maybe module level. |
I would rather not do module level because modules can also be re-exported. It also introduces ambiguity about whether the attribute applies to all items in the module or just the module itself. |
That definitely sounds like an issue on your side. You can always use a markdown parser on your side if you don't want to rely on regex. But again, that brings a lot of changes on rustdoc for a very specific issue that I'm pretty sure isn't common. |
Again, I think this should go through an RFC first. As we've found yesterday and today, this is a very large design space and I'd like to hear input from other potential users.
Oh ugh, I just checked and you're right, it works with cross-crate re-exports but not intra-crate re-exports. I'll open an issue. (I think this reinforces my point that re-exports are the most complicated part of rustdoc.) |
It does seem like at least the warnings rustdoc currently emits should be allowable with an attribute (including at the crate level). It may be reasonable for rustdoc to not treat code blocks implicitly as rust code if they don't parse as such, too, including not running doc tests. The case of valid rust code that gets run and is not intended to be run (e.g., deleting important files or whatever) seems much less "real"; but even so, could be mitigated with existing cfgs it sounds like. |
|
☔ The latest upstream changes (presumably #84707) made this pull request unmergeable. Please resolve the merge conflicts. |
@jsgf Ping from triage! Any updates on this? |
@jsgf I'm gonna close this due to inactivity. Feel free to reopen or create a new pr when you've got time to work on this again. Thanks! |
@crlf0710 Yeah, I think this is stalled on needing an RFC. |
This PR implements a
#[doc(codeblock_attr = )]
specification which allows all the codeblock attrs in a doc comment to be overridden. For example, if the doc comment has come from outside the Rust ecosystem (eg from bindgen, protoc, etc) then it may appear to have code blocks in its doc comments, but they are not intended to be in Rust. They will either fail to parse, causing spurious warnings, or worse, parse and do unintended things.To address this, this PR allows all the code block attributes to be overridden for both
```
and indent-based code blocks (which don't currently have a way to specify attributes). For example:I have some possible discussion points:
codeblock_attr
Personally I think this PR lands on the right answers for all of these, but I'm open to other thoughts.
https://internals.rust-lang.org/t/blanket-tags-for-doc-code-blocks/14567/ for irlo discussion.
Relevant issue: #59867
cc @joshtriplett @ehuss