Feature Name: stable_rustdoc_urls
Start Date: 2020-09-20
RFC PR: rust-lang/rfcs#0000

Rust Issue: rust-lang/rust#0000

Summary

Make the URLs that rustdoc generates stable relative to the docs being generated, not just relative to the rustdoc version.

Motivation

Rustdoc generates a separate HTML page for each item in a crate. The URL for this page is currently stable relative to rustdoc; in other words, Rustdoc guarantees that updating rustdoc without changing the source code will not change the URL generated. This is a 'de facto' guarantee - it's not documented, but there's been no breaking change to the format since pre-1.0.

However, Rustdoc does not currently guarantee that making a semver-compatible change to your code will preserve the same URL. This means that, for instance, making a type an enum instead of a struct will change the URL, even if your change is in every other way semver-compatible. After this RFC, Rustdoc will guarantee that the URL would stay the same.

The primary motivation for this feature is to allow linking to a semantic version of the docs, rather than an exact version. This has several applications:

docs.rs could link to /package/0.2/path instead of /package/0.2.5/path, making the documentation users see more up-to-date (rust-lang/docs.rs#1055)
blogs could link to exact URLs without fear of the URL breaking (rust-lang/rust#55160 (comment))
URLs in the standard library documentation would change less often (rust-lang/rust#55160)

Note that this is a different, but related, use case than intra-doc links. Intra-doc links allow linking consistently in the presence of re-exports for relative links. This is intended to be used for absolute links. Additionally, this would allow linking consistently outside of Rust code.

Guide-level explanation

Rustdoc will make the following changes to URL structure:

Item pages will be dependent only on the namespace, not the type of the item.

Consider the struct std::process::Command. Currently, the URL for it looks like std/process/struct.Command.html. This RFC proposes to change the URL to std/process/type.Command.html. Pages named kind.name.html would still be generated (to avoid breaking existing links), but would immediately redirect to the new URL.
Re-exports will generate a page pointing to the canonical version of the documentation.

Consider the following Rust code:
```
 pub struct Foo;
```
Rustdoc currently generates a page for this at struct.Foo.html. Now, consider what happens when you move the struct to a different module and re-export it (which is a semver-compatible change):
```
 pub mod foo { pub struct Foo; }
 pub use foo::Foo;
```
This generates a page at foo/struct.Foo.html, but not at struct.Foo.html. After this change, rustdoc will generate a page at the top level which redirects to the version nested in the module.

This does not affect inlining; if mod foo is private, the page will stlil be displayed at the top-level without a redirect.

Reference-level explanation

Item pages will be dependent only on the namespace

Rust has three namespaces. For simplicity, this will only consider items that can be at the module level, since function locals cannot be documented.

The value namespace. This includes fn, const, and static.
The type namespace. This includes mod, struct, union, enum, trait, and type.
The macro namespace. This includes macro_rules!, attribute macros, and derive macros.

Rust does not permit there to be overlaps within a namespace; overlaps in globbing cause the glob import to be shadowed and unusable. This means that a name and namespace is always sufficient to identify an item.

Rustdoc will use the following links, depending on the namespace:

value.Name.html for values
type.Name.html for types
macro.Name.html for macros

Rustdoc will continue to use directories (and index.html) for modules.

Re-exports will generate a page pointing to the canonical version

The redirect page will go in the same place as the re-export would be if it were inlined with #[doc(inline)] after this RFC.

There will not be a page generated at kind.name.html at the level of the re-export, since it's not possible for there to be any existing links there that were not broken.

Drawbacks

Rust is case-sensitive, but some filesystems (especially on Windows) are not, so there are naming collisions in the files Rustdoc generates (#25879). If Rustdoc combines several 'kinds' into one namespace, there will be more conflicts than currently:

struct Command; // page generated at `type.Command.html`
enum command {} // page generated at `type.command.html`

@nemo157 has kindly conducted a survey of the docs.rs documentation and found that there are about 700,000 items that currently overlap. After this change, that would go up to about 850,000 items that overlap. docs.rs has 308,064,859 total items in the inventory, so previously 0.23% files conflicted and after this RFC 0.28% files will conflict.

In the opinion of the author, since this is an existing problem, it does not need to be solved in order to go forward with the RFC.

Rationale and alternatives

How were the URLs chosen?

There were three main criteria for choosing the URLs (in vauge order of priority):

They should be based on the namespace, not the 'kind' of the item. Otherwise there's not much point to the RFC, because the URLs won't be stable.
They should make sense when viewed; for example a, b, c would be bad choices for the names.
They should be fairly short, so they're easy to type; for example type_namespace. would not be a great choice.

Naming alternatives

Note that these names are easy to 'bikeshed' and don't substantially change the RFC.

Rustdoc could remove the value. prefix for items in the value namespace. This would make the URLs for functions slightly less confusing, but introduce a conflict for functions named index(), since rustdoc has to generate index.html for modules.
Rustdoc could shorten the prefixes to v., t. and m.. This makes the URLs shorter, but also harder to read. It does however remove some possible confusion around type.Trait.html. This URL scheme has precedent in in #35236.
Rustdoc could use the existing specific names (struct.) only when there is no risk of a semver-compatible change being able to change the kind. This would need careful inspection to make sure there is in fact no risk. It would also be slightly inconsistent with other URLs.

Alternatives

These alternatives are substantial changes to the RFC.

Rustdoc could stabilize the links it uses, but without keeping backwards compatibility by not generating kind.name.html. This has little benefit over the RFC, other than slightly less disk space used and implementation complexity.
Rustdoc could keep the status quo. This can cause no naming conflicts on Windows, but has the drawback that links could silently break even for semver-compatible changes.
Rustdoc could choose to make URLs stable neither across rustdoc versions nor the version of the code being documented, for example by using kind.name.SHA256SUM(rustdoc version).html. This makes it more clear that the URLs are not intended to be stable, at the cost of breaking links across much of the ecosystem.

Prior art

go doc generates all documentation on one page and uses URL hashes, without namespacing. This causes conflicts when two items from different namespaces are in the same package.
java only allows classes at the top-level, so javadoc has no need for namespacing. To distinguish between methods and fields, javadoc includes () in the URL fragment for methods.
Racket only allows functions at the top-level, and so has no need for namespacing.
doxygen names HTML pages after their C++ source files, and appends a random hash in the URL fragment to avoid namespace conflicts.

Unresolved questions

Is there a way to resolve the naming conflicts on Windows? If not, is that worth blocking the RFC, given there are existing conflicts?
Are there other semver-incompatibilities in the current pages that haven't been addressed?

Future possibilities

Rustdoc could stabilize page hashes:

Associated items for traits will contain the name of the trait if there is a conflict.

Currently, the from function in both of the trait implementations has the same hash:
```
 enum Int {
 	A(usize),
 	B(isize),
 }
 impl From<usize> for Int {
 	fn from(u: usize) {
 		Int::A(u)
 	}
 }
 impl From<isize> for Int {
 	fn from(i: isize) {
 		Int::B(i)
 	}
 }
```
This means it is impossible to refer to one or the other (which has caused trouble for intra-doc links). Rustdoc could instead include the name and generic parameters in the hash: #method.from-usize.from and method.from-isize.from. It is an unresolved question how this would deal with multiple traits with the same name, or how this would deal with types with characters that can't go in URL hashes (such as ()). Rustdoc could possibly use percent-encoding for the second issue.
All other URL fragments would be kept the same:
- #method.{name} for inherent methods
- #tymethod.{name} for methods from a trait
- #variant.{name} for enum variants
- #structfield.{name} for struct fields
- #variant.{parent}.field.{name} for anonymous structs in enums (enum Parent { A { field: usize }}). This may require redesign to avoid conflicts in fields between different variants.
- #associatedconstant.{name} for associated constants in traits. This may require redesign when RFC 195 is implemented.
- #associatedtype.{name} for associated types (same as above)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

0000-stable-rustdoc-urls.md

0000-stable-rustdoc-urls.md

Summary

Motivation

Guide-level explanation

Reference-level explanation

Item pages will be dependent only on the namespace

Re-exports will generate a page pointing to the canonical version

Drawbacks

Rationale and alternatives

How were the URLs chosen?

Naming alternatives

Alternatives

Prior art

Unresolved questions

Future possibilities

Files

0000-stable-rustdoc-urls.md

Latest commit

History

0000-stable-rustdoc-urls.md

File metadata and controls

Summary

Motivation

Guide-level explanation

Reference-level explanation

Item pages will be dependent only on the namespace

Re-exports will generate a page pointing to the canonical version

Drawbacks

Rationale and alternatives

How were the URLs chosen?

Naming alternatives

Alternatives

Prior art

Unresolved questions

Future possibilities