- Feature Name:
stable_rustdoc_urls
- Start Date: 2020-09-20
- RFC PR: rust-lang/rfcs#0000
- Rust Issue: rust-lang/rust#0000
Make the URLs that rustdoc generates stable relative to the docs being generated, not just relative to the rustdoc version.
Rustdoc generates a separate HTML page for each item in a crate.
The URL for this page is currently stable relative to rustdoc; in other words,
Rustdoc guarantees that updating rustdoc
without changing the source code will not change the URL generated.
This is a 'de facto' guarantee - it's not documented, but there's been no breaking change to the format since pre-1.0.
However, Rustdoc does not currently guarantee that making a semver-compatible change to your code will preserve the same URL.
This means that, for instance, making a type an enum
instead of a struct
will change the URL,
even if your change is in every other way semver-compatible. After this RFC, Rustdoc will guarantee that the URL would stay the same.
The primary motivation for this feature is to allow linking to a semantic version of the docs, rather than an exact version. This has several applications:
- docs.rs could link to
/package/0.2/path
instead of/package/0.2.5/path
, making the documentation users see more up-to-date (rust-lang/docs.rs#1055) - blogs could link to exact URLs without fear of the URL breaking (rust-lang/rust#55160 (comment))
- URLs in the standard library documentation would change less often (rust-lang/rust#55160)
Note that this is a different, but related, use case than intra-doc links. Intra-doc links allow linking consistently in the presence of re-exports for relative links. This is intended to be used for absolute links. Additionally, this would allow linking consistently outside of Rust code.
Rustdoc will make the following changes to URL structure:
-
Item pages will be dependent only on the namespace, not the type of the item.
Consider the struct
std::process::Command
. Currently, the URL for it looks likestd/process/struct.Command.html
. This RFC proposes to change the URL tostd/process/type.Command.html
. Pages namedkind.name.html
would still be generated (to avoid breaking existing links), but would immediately redirect to the new URL. -
Re-exports will generate a page pointing to the canonical version of the documentation.
Consider the following Rust code:
pub struct Foo;
Rustdoc currently generates a page for this at
struct.Foo.html
. Now, consider what happens when you move the struct to a different module and re-export it (which is a semver-compatible change):pub mod foo { pub struct Foo; } pub use foo::Foo;
This generates a page at
foo/struct.Foo.html
, but not atstruct.Foo.html
. After this change, rustdoc will generate a page at the top level which redirects to the version nested in the module.This does not affect inlining; if
mod foo
is private, the page will stlil be displayed at the top-level without a redirect.
Rust has three namespaces. For simplicity, this will only consider items that can be at the module level, since function locals cannot be documented.
- The value namespace. This includes
fn
,const
, andstatic
. - The type namespace. This includes
mod
,struct
,union
,enum
,trait
, andtype
. - The macro namespace. This includes
macro_rules!
, attribute macros, and derive macros.
Rust does not permit there to be overlaps within a namespace; overlaps in globbing cause the glob import to be shadowed and unusable. This means that a name and namespace is always sufficient to identify an item.
Rustdoc will use the following links, depending on the namespace:
value.Name.html
for valuestype.Name.html
for typesmacro.Name.html
for macros
Rustdoc will continue to use directories (and index.html
) for modules.
The redirect page will go in the same place as the re-export would be if it
were inlined with #[doc(inline)]
after this RFC.
There will not be a page generated at kind.name.html
at the level of the re-export, since it's not possible for there to be any existing links there that were not broken.
- Rust is case-sensitive, but some filesystems (especially on Windows) are not, so there are naming collisions in the files Rustdoc generates (#25879). If Rustdoc combines several 'kinds' into one namespace, there will be more conflicts than currently:
struct Command; // page generated at `type.Command.html`
enum command {} // page generated at `type.command.html`
@nemo157 has kindly conducted a survey of the docs.rs documentation and found that there are about 700,000 items that currently overlap. After this change, that would go up to about 850,000 items that overlap. docs.rs has 308,064,859 total items in the inventory, so previously 0.23% files conflicted and after this RFC 0.28% files will conflict.
In the opinion of the author, since this is an existing problem, it does not need to be solved in order to go forward with the RFC.
There were three main criteria for choosing the URLs (in vauge order of priority):
- They should be based on the namespace, not the 'kind' of the item. Otherwise there's not much point to the RFC, because the URLs won't be stable.
- They should make sense when viewed; for example
a
,b
,c
would be bad choices for the names. - They should be fairly short, so they're easy to type; for example
type_namespace.
would not be a great choice.
Note that these names are easy to 'bikeshed' and don't substantially change the RFC.
- Rustdoc could remove the
value.
prefix for items in the value namespace. This would make the URLs for functions slightly less confusing, but introduce a conflict for functions namedindex()
, since rustdoc has to generateindex.html
for modules. - Rustdoc could shorten the prefixes to
v.
,t.
andm.
. This makes the URLs shorter, but also harder to read. It does however remove some possible confusion aroundtype.Trait.html
. This URL scheme has precedent in in #35236. - Rustdoc could use the existing specific names (
struct.
) only when there is no risk of a semver-compatible change being able to change the kind. This would need careful inspection to make sure there is in fact no risk. It would also be slightly inconsistent with other URLs.
These alternatives are substantial changes to the RFC.
- Rustdoc could stabilize the links it uses, but without keeping backwards compatibility by not generating
kind.name.html
. This has little benefit over the RFC, other than slightly less disk space used and implementation complexity. - Rustdoc could keep the status quo. This can cause no naming conflicts on Windows, but has the drawback that links could silently break even for semver-compatible changes.
- Rustdoc could choose to make URLs stable neither across rustdoc versions nor the version of the code being documented,
for example by using
kind.name.SHA256SUM(rustdoc version).html
. This makes it more clear that the URLs are not intended to be stable, at the cost of breaking links across much of the ecosystem.
go doc
generates all documentation on one page and uses URL hashes, without namespacing. This causes conflicts when two items from different namespaces are in the same package.java
only allows classes at the top-level, sojavadoc
has no need for namespacing. To distinguish between methods and fields,javadoc
includes()
in the URL fragment for methods.Racket
only allows functions at the top-level, and so has no need for namespacing.doxygen
names HTML pages after their C++ source files, and appends a random hash in the URL fragment to avoid namespace conflicts.
- Is there a way to resolve the naming conflicts on Windows? If not, is that worth blocking the RFC, given there are existing conflicts?
- Are there other semver-incompatibilities in the current pages that haven't been addressed?
Rustdoc could stabilize page hashes:
-
Associated items for traits will contain the name of the trait if there is a conflict.
Currently, the
from
function in both of the trait implementations has the same hash:enum Int { A(usize), B(isize), } impl From<usize> for Int { fn from(u: usize) { Int::A(u) } } impl From<isize> for Int { fn from(i: isize) { Int::B(i) } }
This means it is impossible to refer to one or the other (which has caused trouble for intra-doc links). Rustdoc could instead include the name and generic parameters in the hash:
#method.from-usize.from
andmethod.from-isize.from
. It is an unresolved question how this would deal with multiple traits with the same name, or how this would deal with types with characters that can't go in URL hashes (such as()
). Rustdoc could possibly use percent-encoding for the second issue. -
All other URL fragments would be kept the same:
#method.{name}
for inherent methods#tymethod.{name}
for methods from a trait#variant.{name}
for enum variants#structfield.{name}
for struct fields#variant.{parent}.field.{name}
for anonymous structs in enums (enum Parent { A { field: usize }}
). This may require redesign to avoid conflicts in fields between different variants.#associatedconstant.{name}
for associated constants in traits. This may require redesign when RFC 195 is implemented.#associatedtype.{name}
for associated types (same as above)