Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rework the README.md for rustc and add other readmes #44505

Merged
merged 8 commits into from
Sep 20, 2017
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
341 changes: 185 additions & 156 deletions src/librustc/README.md

Large diffs are not rendered by default.

119 changes: 119 additions & 0 deletions src/librustc/hir/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,119 @@
# Introduction to the HIR

The HIR -- "High-level IR" -- is the primary IR used in most of
rustc. It is a desugared version of the "abstract syntax tree" (AST)
that is generated after parsing, macro expansion, and name resolution
have completed. Many parts of HIR resemble Rust surface syntax quite
closely, with the exception that some of Rust's expression forms have
been desugared away (as an example, `for` loops are converted into a
`loop` and do not appear in the HIR).

This README covers the main concepts of the HIR.

### Out-of-band storage and the `Crate` type

The top-level data-structure in the HIR is the `Crate`, which stores
the contents of the crate currently being compiled (we only ever
construct HIR for the current crate). Whereas in the AST the crate
data structure basically just contains the root module, the HIR
`Crate` structure contains a number of maps and other things that
serve to organize the content of the crate for easier access.

For example, the contents of individual items (e.g., modules,
functions, traits, impls, etc) in the HIR are not immediately
accessible in the parents. So, for example, if had a module item `foo`
containing a function `bar()`:

```
mod foo {
fn bar() { }
}
```

Then in the HIR the representation of module `foo` (the `Mod`
stuct) would have only the **`ItemId`** `I` of `bar()`. To get the
details of the function `bar()`, we would lookup `I` in the
`items` map.

One nice result from this representation is that one can iterate
over all items in the crate by iterating over the key-value pairs
in these maps (without the need to trawl through the IR in total).
There are similar maps for things like trait items and impl items,
as well as "bodies" (explained below).

The other reason to setup the representation this way is for better
integration with incremental compilation. This way, if you gain access
to a `&hir::Item` (e.g. for the mod `foo`), you do not immediately
gain access to the contents of the function `bar()`. Instead, you only
gain access to the **id** for `bar()`, and you must invoke some
function to lookup the contents of `bar()` given its id; this gives us
a chance to observe that you accessed the data for `bar()` and record
the dependency.

### Identifiers in the HIR

Most of the code that has to deal with things in HIR tends not to
carry around references into the HIR, but rather to carry around
*identifier numbers* (or just "ids"). Right now, you will find four
sorts of identifiers in active use:

- `DefId`, which primarily name "definitions" or top-level items.
- You can think of a `DefId` as being shorthand for a very explicit
and complete path, like `std::collections::HashMap`. However,
these paths are able to name things that are not nameable in
normal Rust (e.g., impls), and they also include extra information
about the crate (such as its version number, as two versions of
the same crate can co-exist).
- A `DefId` really consists of two parts, a `CrateNum` (which
identifies the crate) and a `DefIndex` (which indixes into a list
of items that is maintained per crate).
- `HirId`, which combines the index of a particular item with an
offset within that item.
- the key point of a `HirId` is that it is *relative* to some item (which is named
via a `DefId`).
- `BodyId`, this is an absolute identifier that refers to a specific
body (definition of a function or constant) in the crate. It is currently
effectively a "newtype'd" `NodeId`.
- `NodeId`, which is an absolute id that identifies a single node in the HIR tree.
- While these are still in common use, **they are being slowly phased out**.
- Since they are absolute within the crate, adding a new node
anywhere in the tree causes the node-ids of all subsequent code in
the crate to change. This is terrible for incremental compilation,
as you can perhaps imagine.

### HIR Map

Most of the time when you are working with the HIR, you will do so via
the **HIR Map**, accessible in the tcx via `tcx.hir` (and defined in
the `hir::map` module). The HIR map contains a number of methods to
convert between ids of various kinds and to lookup data associated
with a HIR node.

For example, if you have a `DefId`, and you would like to convert it
to a `NodeId`, you can use `tcx.hir.as_local_node_id(def_id)`. This
returns an `Option<NodeId>` -- this will be `None` if the def-id
refers to something outside of the current crate (since then it has no
HIR node), but otherwise returns `Some(n)` where `n` is the node-id of
the definition.

Similarly, you can use `tcx.hir.find(n)` to lookup the node for a
`NodeId`. This returns a `Option<Node<'tcx>>`, where `Node` is an enum
defined in the map; by matching on this you can find out what sort of
node the node-id referred to and also get a pointer to the data
itself. Often, you know what sort of node `n` is -- e.g., if you know
that `n` must be some HIR expression, you can do
`tcx.hir.expect_expr(n)`, which will extract and return the
`&hir::Expr`, panicking if `n` is not in fact an expression.

Finally, you can use the HIR map to find the parents of nodes, via
calls like `tcx.hir.get_parent_node(n)`.

### HIR Bodies

A **body** represents some kind of executable code, such as the body
of a function/closure or the definition of a constant. Bodies are
associated with an **owner**, which is typically some kind of item
(e.g., a `fn()` or `const`), but could also be a closure expression
(e.g., `|x, y| x + y`). You can use the HIR map to find find the body
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: find find

associated with a given def-id (`maybe_body_owned_by()`) or to find
the owner of a body (`body_owner_def_id()`).
4 changes: 4 additions & 0 deletions src/librustc/hir/map/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
The HIR map, accessible via `tcx.hir`, allows you to quickly navigate the
HIR and convert between various forms of identifiers. See [the HIR README] for more information.

[the HIR README]: ../README.md
26 changes: 25 additions & 1 deletion src/librustc/hir/mod.rs
Original file line number Diff line number Diff line change
@@ -413,6 +413,10 @@ pub struct WhereEqPredicate {

pub type CrateConfig = HirVec<P<MetaItem>>;

/// The top-level data structure that stores the entire contents of
/// the crate currently being compiled.
///
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this newline serve a purpose?

/// For more details, see [the module-level README](README.md).
#[derive(Clone, PartialEq, Eq, RustcEncodable, RustcDecodable, Debug)]
pub struct Crate {
pub module: Mod,
@@ -927,7 +931,27 @@ pub struct BodyId {
pub node_id: NodeId,
}

/// The body of a function or constant value.
/// The body of a function, closure, or constant value. In the case of
/// a function, the body contains not only the function body itself
/// (which is an expression), but also the argument patterns, since
/// those are something that the caller doesn't really care about.
///
/// # Examples
///
/// ```
/// fn foo((x, y): (u32, u32)) -> u32 {
/// x + y
/// }
/// ```
///
/// Here, the `Body` associated with `foo()` would contain:
///
/// - an `arguments` array containing the `(x, y)` pattern
/// - a `value` containing the `x + y` expression (maybe wrapped in a block)
/// - `is_generator` would be false
///
/// All bodies have an **owner**, which can be accessed via the HIR
/// map using `body_owner_def_id()`.
#[derive(Clone, PartialEq, Eq, RustcEncodable, RustcDecodable, Hash, Debug)]
pub struct Body {
pub arguments: HirVec<Arg>,
23 changes: 22 additions & 1 deletion src/librustc/lib.rs
Original file line number Diff line number Diff line change
@@ -8,7 +8,28 @@
// option. This file may not be copied, modified, or distributed
// except according to those terms.

//! The Rust compiler.
//! The "main crate" of the Rust compiler. This crate contains common
//! type definitions that are used by the other crates in the rustc
//! "family". Some prominent examples (note that each of these modules
//! has their own README with further details).
//!
//! - **HIR.** The "high-level (H) intermediate representation (IR)" is
//! defined in the `hir` module.
//! - **MIR.** The "mid-level (M) intermediate representation (IR)" is
//! defined in the `mir` module. This module contains only the
//! *definition* of the MIR; the passes that transform and operate
//! on MIR are found in `librustc_mir` crate.
//! - **Types.** The internal representation of types used in rustc is
//! defined in the `ty` module. This includes the **type context**
//! (or `tcx`), which is the central context during most of
//! compilation, containing the interners and other things.
//! - **Traits.** Trait resolution is implemented in the `traits` module.
//! - **Type inference.** The type inference code can be found in the `infer` module;
//! this code handles low-level equality and subtyping operations. The
//! type check pass in the compiler is found in the `librustc_typeck` crate.
//!
//! For a deeper explanation of how the compiler works and is
//! organized, see the README.md file in this directory.
//!
//! # Note
//!
165 changes: 165 additions & 0 deletions src/librustc/ty/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,165 @@
# Types and the Type Context

The `ty` module defines how the Rust compiler represents types
internally. It also defines the *typing context* (`tcx` or `TyCtxt`),
which is the central data structure in the compiler.

## The tcx and how it uses lifetimes

The `tcx` ("typing context") is the central data structure in the
compiler. It is the context that you use to perform all manner of
queries. The struct `TyCtxt` defines a reference to this shared context:

```rust
tcx: TyCtxt<'a, 'gcx, 'tcx>
// -- ---- ----
// | | |
// | | innermost arena lifetime (if any)
// | "global arena" lifetime
// lifetime of this reference
```

As you can see, the `TyCtxt` type takes three lifetime parameters.
These lifetimes are perhaps the most complex thing to understand about
the tcx. During Rust compilation, we allocate most of our memory in
**arenas**, which are basically pools of memory that get freed all at
once. When you see a reference with a lifetime like `'tcx` or `'gcx`,
you know that it refers to arena-allocated data (or data that lives as
long as the arenas, anyhow).

We use two distinct levels of arenas. The outer level is the "global
arena". This arena lasts for the entire compilation: so anything you
allocate in there is only freed once compilation is basically over
(actually, when we shift to executing LLVM).

To reduce peak memory usage, when we do type inference, we also use an
inner level of arena. These arenas get thrown away once type inference
is over. This is done because type inference generates a lot of
"throw-away" types that are not particularly interesting after type
inference completes, so keeping around those allocations would be
wasteful.

Often, we wish to write code that explicitly asserts that it is not
taking place during inference. In that case, there is no "local"
arena, and all the types that you can access are allocated in the
global arena. To express this, the idea is to us the same lifetime
for the `'gcx` and `'tcx` parameters of `TyCtxt`. Just to be a touch
confusing, we tend to use the name `'tcx` in such contexts. Here is an
example:

```rust
fn not_in_inference<'a, 'tcx>(tcx: TyCtxt<'a, 'tcx, 'tcx>, def_id: DefId) {
// ---- ----
// Using the same lifetime here asserts
// that the innermost arena accessible through
// this reference *is* the global arena.
}
```

In contrast, if we want to code that can be usable during type inference, then you
need to declare a distinct `'gcx` and `'tcx` lifetime parameter:

```rust
fn maybe_in_inference<'a, 'gcx, 'tcx>(tcx: TyCtxt<'a, 'gcx, 'tcx>, def_id: DefId) {
// ---- ----
// Using different lifetimes here means that
// the innermost arena *may* be distinct
// from the global arena (but doesn't have to be).
}
```

### Allocating and working with types

Rust types are represented using the `Ty<'tcx>` defined in the `ty`
module (not to be confused with the `Ty` struct from [the HIR]). This
is in fact a simple type alias for a reference with `'tcx` lifetime:

```rust
pub type Ty<'tcx> = &'tcx TyS<'tcx>;
```

[the HIR]: ../hir/README.md

You can basically ignore the `TyS` struct -- you will basically never
access it explicitly. We always pass it by reference using the
`Ty<'tcx>` alias -- the only exception I think is to define inherent
methods on types. Instances of `TyS` are only ever allocated in one of
the rustc arenas (never e.g. on the stack).

One common operation on types is to **match** and see what kinds of
types they are. This is done by doing `match ty.sty`, sort of like this:

```rust
fn test_type<'tcx>(ty: Ty<'tcx>) {
match ty.sty {
ty::TyArray(elem_ty, len) => { ... }
...
}
}
```

The `sty` field (the origin of this name is unclear to me; perhaps
structural type?) is of type `TypeVariants<'tcx>`, which is an enum
definined all of the different kinds of types in the compiler.

> NB: inspecting the `sty` field on types during type inference can be
> risky, as there are may be inference variables and other things to
> consider, or sometimes types are not yet known that will become
> known later.).
To allocate a new type, you can use the various `mk_` methods defined
on the `tcx`. These have names that correpond mostly to the various kinds
of type variants. For example:

```rust
let array_ty = tcx.mk_array(elem_ty, len * 2);
```

These methods all return a `Ty<'tcx>` -- note that the lifetime you
get back is the lifetime of the innermost arena that this `tcx` has
access to. In fact, types are always canonicalized and interned (so we
never allocate exactly the same type twice) and are always allocated
in the outermost arena where they can be (so, if they do not contain
any inference variables or other "temporary" types, they will be
allocated in the global arena). However, the lifetime `'tcx` is always
a safe approximation, so that is what you get back.

> NB. Because types are interned, it is possible to compare them for
> equality efficiently using `==` -- however, this is almost never what
> you want to do unless you happen to be hashing and looking for
> duplicates. This is because often in Rust there are multiple ways to
> represent the same type, particularly once inference is involved. If
> you are going to be testing for type equality, you probably need to
> start looking into the inference code to do it right.
You can also find various common types in the tcx itself by accessing
`tcx.types.bool`, `tcx.types.char`, etc (see `CommonTypes` for more).

### Beyond types: Other kinds of arena-allocated data structures

In addition to types, there are a number of other arena-allocated data
structures that you can allocate, and which are found in this
module. Here are a few examples:

- `Substs`, allocated with `mk_substs` -- this will intern a slice of types, often used to
specify the values to be substituted for generics (e.g., `HashMap<i32, u32>`
would be represented as a slice `&'tcx [tcx.types.i32, tcx.types.u32]`.
- `TraitRef`, typically passed by value -- a **trait reference**
consists of a reference to a trait along with its various type
parameters (including `Self`), like `i32: Display` (here, the def-id
would reference the `Display` trait, and the substs would contain
`i32`).
- `Predicate` defines something the trait system has to prove (see `traits` module).

### Import conventions

Although there is no hard and fast rule, the `ty` module tends to be used like so:

```rust
use ty::{self, Ty, TyCtxt};
```

In particular, since they are so common, the `Ty` and `TyCtxt` types
are imported directly. Other types are often referenced with an
explicit `ty::` prefix (e.g., `ty::TraitRef<'tcx>`). But some modules
choose to import a larger or smaller set of names explicitly.
7 changes: 4 additions & 3 deletions src/librustc/ty/context.rs
Original file line number Diff line number Diff line change
@@ -793,9 +793,10 @@ impl<'tcx> CommonTypes<'tcx> {
}
}

/// The data structure to keep track of all the information that typechecker
/// generates so that so that it can be reused and doesn't have to be redone
/// later on.
/// The central data structure of the compiler. It stores references
/// to the various **arenas** and also houses the results of the
/// various **compiler queries** that have been performed. See [the
/// README](README.md) for more deatils.
#[derive(Copy, Clone)]
pub struct TyCtxt<'a, 'gcx: 'a+'tcx, 'tcx: 'a> {
gcx: &'a GlobalCtxt<'gcx>,
1,551 changes: 0 additions & 1,551 deletions src/librustc/ty/maps.rs

This file was deleted.

302 changes: 302 additions & 0 deletions src/librustc/ty/maps/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,302 @@
# The Rust Compiler Query System

The Compiler Query System is the key to our new demand-driven
organization. The idea is pretty simple. You have various queries
that compute things about the input -- for example, there is a query
called `type_of(def_id)` that, given the def-id of some item, will
compute the type of that item and return it to you.

Query execution is **memoized** -- so the first time you invoke a
query, it will go do the computation, but the next time, the result is
returned from a hashtable. Moreover, query execution fits nicely into
**incremental computation**; the idea is roughly that, when you do a
query, the result **may** be returned to you by loading stored data
from disk (but that's a separate topic we won't discuss further here).

The overall vision is that, eventually, the entire compiler
control-flow will be query driven. There will effectively be one
top-level query ("compile") that will run compilation on a crate; this
will in turn demand information about that crate, starting from the
*end*. For example:

- This "compile" query might demand to get a list of codegen-units
(i.e., modules that need to be compiled by LLVM).
- But computing the list of codegen-units would invoke some subquery
that returns the list of all modules defined in the Rust source.
- That query in turn would invoke something asking for the HIR.
- This keeps going further and further back until we wind up doing the
actual parsing.

However, that vision is not fully realized. Still, big chunks of the
compiler (for example, generating MIR) work exactly like this.

### Invoking queries

To invoke a query is simple. The tcx ("type context") offers a method
for each defined query. So, for example, to invoke the `type_of`
query, you would just do this:

```rust
let ty = tcx.type_of(some_def_id);
```

### Cycles between queries

Currently, cycles during query execution should always result in a
compilation error. Typically, they arise because of illegal programs
that contain cyclic references they shouldn't (though sometimes they
arise because of compiler bugs, in which case we need to factor our
queries in a more fine-grained fashion to avoid them).

However, it is nonetheless often useful to *recover* from a cycle
(after reporting an error, say) and try to soldier on, so as to give a
better user experience. In order to recover from a cycle, you don't
get to use the nice method-call-style syntax. Instead, you invoke
using the `try_get` method, which looks roughly like this:

```rust
use ty::maps::queries;
...
match queries::type_of::try_get(tcx, DUMMY_SP, self.did) {
Ok(result) => {
// no cycle occurred! You can use `result`
}
Err(err) => {
// A cycle occurred! The error value `err` is a `DiagnosticBuilder`,
// meaning essentially an "in-progress", not-yet-reported error message.
// See below for more details on what to do here.
}
}
```

So, if you get back an `Err` from `try_get`, then a cycle *did* occur. This means that
you must ensure that a compiler error message is reported. You can do that in two ways:

The simplest is to invoke `err.emit()`. This will emit the cycle error to the user.

However, often cycles happen because of an illegal program, and you
know at that point that an error either already has been reported or
will be reported due to this cycle by some other bit of code. In that
case, you can invoke `err.cancel()` to not emit any error. It is
traditional to then invoke:

```
tcx.sess.delay_span_bug(some_span, "some message")
```

`delay_span_bug()` is a helper that says: we expect a compilation
error to have happened or to happen in the future; so, if compilation
ultimately succeeds, make an ICE with the message `"some
message"`. This is basically just a precaution in case you are wrong.

### How the compiler executes a query

So you may be wondering what happens when you invoke a query
method. The answer is that, for each query, the compiler maintains a
cache -- if your query has already been executed, then, the answer is
simple: we clone the return value out of the cache and return it
(therefore, you should try to ensure that the return types of queries
are cheaply cloneable; insert a `Rc` if necessary).

#### Providers

If, however, the query is *not* in the cache, then the compiler will
try to find a suitable **provider**. A provider is a function that has
been defined and linked into the compiler somewhere that contains the
code to compute the result of the query.

**Providers are defined per-crate.** The compiler maintains,
internally, a table of providers for every crate, at least
conceptually. Right now, there are really two sets: the providers for
queries about the **local crate** (that is, the one being compiled)
and providers for queries about **external crates** (that is,
dependencies of the local crate). Note that what determines the crate
that a query is targeting is not the *kind* of query, but the *key*.
For example, when you invoke `tcx.type_of(def_id)`, that could be a
local query or an external query, depending on what crate the `def_id`
is referring to (see the `self::keys::Key` trait for more information
on how that works).

Providers always have the same signature:

```rust
fn provider<'cx, 'tcx>(tcx: TyCtxt<'cx, 'tcx, 'tcx>,
key: QUERY_KEY)
-> QUERY_RESULT
{
...
}
```

Providers take two arguments: the `tcx` and the query key. Note also
that they take the *global* tcx (i.e., they use the `'tcx` lifetime
twice), rather than taking a tcx with some active inference context.
They return the result of the query.

#### How providers are setup

When the tcx is created, it is given the providers by its creator using
the `Providers` struct. This struct is generate by the macros here, but it
is basically a big list of function pointers:

```rust
struct Providers {
type_of: for<'cx, 'tcx> fn(TyCtxt<'cx, 'tcx, 'tcx>, DefId) -> Ty<'tcx>,
...
}
```

At present, we have one copy of the struct for local crates, and one
for external crates, though the plan is that we may eventually have
one per crate.

These `Provider` structs are ultimately created and populated by
`librustc_driver`, but it does this by distributing the work
throughout the other `rustc_*` crates. This is done by invoking
various `provide` functions. These functions tend to look something
like this:

```rust
pub fn provide(providers: &mut Providers) {
*providers = Providers {
type_of,
..*providers
};
}
```

That is, they take an `&mut Providers` and mutate it in place. Usually
we use the formulation above just because it looks nice, but you could
as well do `providers.type_of = type_of`, which would be equivalent.
(Here, `type_of` would be a top-level function, defined as we saw
before.) So, if we wanted to have add a provider for some other query,
let's call it `fubar`, into the crate above, we might modify the `provide()`
function like so:

```rust
pub fn provide(providers: &mut Providers) {
*providers = Providers {
type_of,
fubar,
..*providers
};
}

fn fubar<'cx, 'tcx>(tcx: TyCtxt<'cx, 'tcx>, key: DefId) -> Fubar<'tcx> { .. }
```

NB. Most of the `rustc_*` crate only provide **local
providers**. Almost all **extern providers** wind up going through the
`rustc_metadata` crate, which loads the information from the crate
metadata. But in some cases there are crates that provide queries for
*both* local and external crates, in which case they define both a
`provide` and a `provide_extern` function that `rustc_driver` can
invoke.

### Adding a new kind of query

So suppose you want to add a new kind of query, how do you do so?
Well, defining a query takes place in two steps:

1. first, you have to specify the query name and arguments; and then,
2. you have to supply query providers where needed.

The specify the query name and arguments, you simply add an entry
to the big macro invocation in `mod.rs`. This will probably have changed
by the time you read this README, but at present it looks something
like:

```
define_maps! { <'tcx>
/// Records the type of every item.
[] fn type_of: TypeOfItem(DefId) -> Ty<'tcx>,
...
}
```

Each line of the macro defines one query. The name is broken up like this:

```
[] fn type_of: TypeOfItem(DefId) -> Ty<'tcx>,
^^ ^^^^^^^ ^^^^^^^^^^ ^^^^^ ^^^^^^^^
| | | | |
| | | | result type of query
| | | query key type
| | dep-node constructor
| name of query
query flags
```

Let's go over them one by one:

- **Query flags:** these are largely unused right now, but the intention
is that we'll be able to customize various aspects of how the query is
processed.
- **Name of query:** the name of the query method
(`tcx.type_of(..)`). Also used as the name of a struct
(`ty::maps::queries::type_of`) that will be generated to represent
this query.
- **Dep-node constructor:** indicates the constructor function that
connects this query to incremental compilation. Typically, this is a
`DepNode` variant, which can be added by modifying the
`define_dep_nodes!` macro invocation in
`librustc/dep_graph/dep_node.rs`.
- However, sometimes we use a custom function, in which case the
name will be in snake case and the function will be defined at the
bottom of the file. This is typically used when the query key is
not a def-id, or just not the type that the dep-node expects.
- **Query key type:** the type of the argument to this query.
This type must implement the `ty::maps::keys::Key` trait, which
defines (for example) how to map it to a crate, and so forth.
- **Result type of query:** the type produced by this query. This type
should (a) not use `RefCell` or other interior mutability and (b) be
cheaply cloneable. Interning or using `Rc` or `Arc` is recommended for
non-trivial data types.
- The one exception to those rules is the `ty::steal::Steal` type,
which is used to cheaply modify MIR in place. See the definition
of `Steal` for more details. New uses of `Steal` should **not** be
added without alerting `@rust-lang/compiler`.

So, to add a query:

- Add an entry to `define_maps!` using the format above.
- Possibly add a corresponding entry to the dep-node macro.
- Link the provider by modifying the appropriate `provide` method;
or add a new one if needed and ensure that `rustc_driver` is invoking it.

#### Query structs and descriptions

For each kind, the `define_maps` macro will generate a "query struct"
named after the query. This struct is a kind of a place-holder
describing the query. Each such struct implements the
`self::config::QueryConfig` trait, which has associated types for the
key/value of that particular query. Basically the code generated looks something
like this:

```rust
// Dummy struct representing a particular kind of query:
pub struct type_of<'tcx> { phantom: PhantomData<&'tcx ()> }

impl<'tcx> QueryConfig for type_of<'tcx> {
type Key = DefId;
type Value = Ty<'tcx>;
}
```

There is an additional trait that you may wish to implement called
`self::config::QueryDescription`. This trait is used during cycle
errors to give a "human readable" name for the query, so that we can
summarize what was happening when the cycle occurred. Implementing
this trait is optional if the query key is `DefId`, but if you *don't*
implement it, you get a pretty generic error ("processing `foo`...").
You can put new impls into the `config` module. They look something like this:

```rust
impl<'tcx> QueryDescription for queries::type_of<'tcx> {
fn describe(tcx: TyCtxt, key: DefId) -> String {
format!("computing the type of `{}`", tcx.item_path_str(key))
}
}
```

492 changes: 492 additions & 0 deletions src/librustc/ty/maps/config.rs

Large diffs are not rendered by default.

162 changes: 162 additions & 0 deletions src/librustc/ty/maps/keys.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,162 @@
// Copyright 2012-2015 The Rust Project Developers. See the COPYRIGHT
// file at the top-level directory of this distribution and at
// http://rust-lang.org/COPYRIGHT.
//
// Licensed under the Apache License, Version 2.0 <LICENSE-APACHE or
// http://www.apache.org/licenses/LICENSE-2.0> or the MIT license
// <LICENSE-MIT or http://opensource.org/licenses/MIT>, at your
// option. This file may not be copied, modified, or distributed
// except according to those terms.

//! Defines the set of legal keys that can be used in queries.
use hir::def_id::{CrateNum, DefId, LOCAL_CRATE, DefIndex};
use mir::transform::{MirSuite, MirPassIndex};
use ty::{self, Ty, TyCtxt};
use ty::subst::Substs;
use ty::fast_reject::SimplifiedType;

use std::fmt::Debug;
use std::hash::Hash;
use syntax_pos::{Span, DUMMY_SP};
use syntax_pos::symbol::InternedString;

/// The `Key` trait controls what types can legally be used as the key
/// for a query.
pub trait Key: Clone + Hash + Eq + Debug {
/// Given an instance of this key, what crate is it referring to?
/// This is used to find the provider.
fn map_crate(&self) -> CrateNum;

/// In the event that a cycle occurs, if no explicit span has been
/// given for a query with key `self`, what span should we use?
fn default_span(&self, tcx: TyCtxt) -> Span;
}

impl<'tcx> Key for ty::InstanceDef<'tcx> {
fn map_crate(&self) -> CrateNum {
LOCAL_CRATE
}

fn default_span(&self, tcx: TyCtxt) -> Span {
tcx.def_span(self.def_id())
}
}

impl<'tcx> Key for ty::Instance<'tcx> {
fn map_crate(&self) -> CrateNum {
LOCAL_CRATE
}

fn default_span(&self, tcx: TyCtxt) -> Span {
tcx.def_span(self.def_id())
}
}

impl Key for CrateNum {
fn map_crate(&self) -> CrateNum {
*self
}
fn default_span(&self, _: TyCtxt) -> Span {
DUMMY_SP
}
}

impl Key for DefIndex {
fn map_crate(&self) -> CrateNum {
LOCAL_CRATE
}
fn default_span(&self, _tcx: TyCtxt) -> Span {
DUMMY_SP
}
}

impl Key for DefId {
fn map_crate(&self) -> CrateNum {
self.krate
}
fn default_span(&self, tcx: TyCtxt) -> Span {
tcx.def_span(*self)
}
}

impl Key for (DefId, DefId) {
fn map_crate(&self) -> CrateNum {
self.0.krate
}
fn default_span(&self, tcx: TyCtxt) -> Span {
self.1.default_span(tcx)
}
}

impl Key for (CrateNum, DefId) {
fn map_crate(&self) -> CrateNum {
self.0
}
fn default_span(&self, tcx: TyCtxt) -> Span {
self.1.default_span(tcx)
}
}

impl Key for (DefId, SimplifiedType) {
fn map_crate(&self) -> CrateNum {
self.0.krate
}
fn default_span(&self, tcx: TyCtxt) -> Span {
self.0.default_span(tcx)
}
}

impl<'tcx> Key for (DefId, &'tcx Substs<'tcx>) {
fn map_crate(&self) -> CrateNum {
self.0.krate
}
fn default_span(&self, tcx: TyCtxt) -> Span {
self.0.default_span(tcx)
}
}

impl Key for (MirSuite, DefId) {
fn map_crate(&self) -> CrateNum {
self.1.map_crate()
}
fn default_span(&self, tcx: TyCtxt) -> Span {
self.1.default_span(tcx)
}
}

impl Key for (MirSuite, MirPassIndex, DefId) {
fn map_crate(&self) -> CrateNum {
self.2.map_crate()
}
fn default_span(&self, tcx: TyCtxt) -> Span {
self.2.default_span(tcx)
}
}

impl<'tcx> Key for Ty<'tcx> {
fn map_crate(&self) -> CrateNum {
LOCAL_CRATE
}
fn default_span(&self, _: TyCtxt) -> Span {
DUMMY_SP
}
}

impl<'tcx, T: Key> Key for ty::ParamEnvAnd<'tcx, T> {
fn map_crate(&self) -> CrateNum {
self.value.map_crate()
}
fn default_span(&self, tcx: TyCtxt) -> Span {
self.value.default_span(tcx)
}
}

impl Key for InternedString {
fn map_crate(&self) -> CrateNum {
LOCAL_CRATE
}
fn default_span(&self, _tcx: TyCtxt) -> Span {
DUMMY_SP
}
}
453 changes: 453 additions & 0 deletions src/librustc/ty/maps/mod.rs

Large diffs are not rendered by default.

494 changes: 494 additions & 0 deletions src/librustc/ty/maps/plumbing.rs

Large diffs are not rendered by default.

49 changes: 49 additions & 0 deletions src/librustc/ty/maps/values.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
// Copyright 2012-2015 The Rust Project Developers. See the COPYRIGHT
// file at the top-level directory of this distribution and at
// http://rust-lang.org/COPYRIGHT.
//
// Licensed under the Apache License, Version 2.0 <LICENSE-APACHE or
// http://www.apache.org/licenses/LICENSE-2.0> or the MIT license
// <LICENSE-MIT or http://opensource.org/licenses/MIT>, at your
// option. This file may not be copied, modified, or distributed
// except according to those terms.

use ty::{self, Ty, TyCtxt};

use syntax::symbol::Symbol;

pub(super) trait Value<'tcx>: Sized {
fn from_cycle_error<'a>(tcx: TyCtxt<'a, 'tcx, 'tcx>) -> Self;
}

impl<'tcx, T> Value<'tcx> for T {
default fn from_cycle_error<'a>(tcx: TyCtxt<'a, 'tcx, 'tcx>) -> T {
tcx.sess.abort_if_errors();
bug!("Value::from_cycle_error called without errors");
}
}

impl<'tcx, T: Default> Value<'tcx> for T {
default fn from_cycle_error<'a>(_: TyCtxt<'a, 'tcx, 'tcx>) -> T {
T::default()
}
}

impl<'tcx> Value<'tcx> for Ty<'tcx> {
fn from_cycle_error<'a>(tcx: TyCtxt<'a, 'tcx, 'tcx>) -> Ty<'tcx> {
tcx.types.err
}
}

impl<'tcx> Value<'tcx> for ty::DtorckConstraint<'tcx> {
fn from_cycle_error<'a>(_: TyCtxt<'a, 'tcx, 'tcx>) -> Self {
Self::empty()
}
}

impl<'tcx> Value<'tcx> for ty::SymbolName {
fn from_cycle_error<'a>(_: TyCtxt<'a, 'tcx, 'tcx>) -> Self {
ty::SymbolName { name: Symbol::intern("<error>").as_str() }
}
}

6 changes: 6 additions & 0 deletions src/librustc_back/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
NB: This crate is part of the Rust compiler. For an overview of the
compiler as a whole, see
[the README.md file found in `librustc`](../librustc/README.md).

`librustc_back` contains some very low-level details that are
specific to different LLVM targets and so forth.
12 changes: 12 additions & 0 deletions src/librustc_driver/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
NB: This crate is part of the Rust compiler. For an overview of the
compiler as a whole, see
[the README.md file found in `librustc`](../librustc/README.md).

The `driver` crate is effectively the "main" function for the rust
compiler. It orchstrates the compilation process and "knits together"
the code from the other crates within rustc. This crate itself does
not contain any of the "main logic" of the compiler (though it does
have some code related to pretty printing or other minor compiler
options).
Copy link
Member

@qmx qmx Sep 12, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This definition of driver as being the "main" function for the compiler is really nice. What about giving more emphasis to it on the top-level README file?



8 changes: 7 additions & 1 deletion src/librustc_trans/README.md
Original file line number Diff line number Diff line change
@@ -1 +1,7 @@
See [librustc/README.md](../librustc/README.md).
NB: This crate is part of the Rust compiler. For an overview of the
compiler as a whole, see
[the README.md file found in `librustc`](../librustc/README.md).

The `trans` crate contains the code to convert from MIR into LLVM IR,
and then from LLVM IR into machine code. In general it contains code
that runs towards the end of the compilation process.
48 changes: 48 additions & 0 deletions src/librustc_typeck/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
NB: This crate is part of the Rust compiler. For an overview of the
compiler as a whole, see
[the README.md file found in `librustc`](../librustc/README.md).

The `rustc_typeck` crate contains the source for "type collection" and
"type checking", as well as a few other bits of related functionality.
(It draws heavily on the [type inferencing][infer] and
[trait solving][traits] code found in librustc.)

[infer]: ../librustc/infer/README.md
[traits]: ../librustc/traits/README.md

## Type collection

Type "collection" is the process of convering the types found in the
HIR (`hir::Ty`), which represent the syntactic things that the user
wrote, into the **internal representation** used by the compiler
(`Ty<'tcx>`) -- we also do similar conversions for where-clauses and
other bits of the function signature.

To try and get a sense for the difference, consider this function:

```rust
struct Foo { }
fn foo(x: Foo, y: self::Foo) { .. }
// ^^^ ^^^^^^^^^
```

Those two parameters `x` and `y` each have the same type: but they
will have distinct `hir::Ty` nodes. Those nodes will have different
spans, and of course they encode the path somewhat differently. But
once they are "collected" into `Ty<'tcx>` nodes, they will be
represented by the exact same internal type.

Collection is defined as a bundle of queries (e.g., `type_of`) for
computing information about the various functions, traits, and other
items in the crate being compiled. Note that each of these queries is
concerned with *interprocedural* things -- for example, for a function
definition, collection will figure out the type and signature of the
function, but it will not visit the *body* of the function in any way,
nor examine type annotations on local variables (that's the job of
type *checking*).

For more details, see the `collect` module.

## Type checking

TODO
59 changes: 15 additions & 44 deletions src/librustc_typeck/collect.rs
Original file line number Diff line number Diff line change
@@ -8,50 +8,21 @@
// option. This file may not be copied, modified, or distributed
// except according to those terms.

/*
# Collect phase
The collect phase of type check has the job of visiting all items,
determining their type, and writing that type into the `tcx.types`
table. Despite its name, this table does not really operate as a
*cache*, at least not for the types of items defined within the
current crate: we assume that after the collect phase, the types of
all local items will be present in the table.
Unlike most of the types that are present in Rust, the types computed
for each item are in fact type schemes. This means that they are
generic types that may have type parameters. TypeSchemes are
represented by a pair of `Generics` and `Ty`. Type
parameters themselves are represented as `ty_param()` instances.
The phasing of type conversion is somewhat complicated. There is no
clear set of phases we can enforce (e.g., converting traits first,
then types, or something like that) because the user can introduce
arbitrary interdependencies. So instead we generally convert things
lazilly and on demand, and include logic that checks for cycles.
Demand is driven by calls to `AstConv::get_item_type_scheme` or
`AstConv::trait_def`.
Currently, we "convert" types and traits in two phases (note that
conversion only affects the types of items / enum variants / methods;
it does not e.g. compute the types of individual expressions):
0. Intrinsics
1. Trait/Type definitions
Conversion itself is done by simply walking each of the items in turn
and invoking an appropriate function (e.g., `trait_def_of_item` or
`convert_item`). However, it is possible that while converting an
item, we may need to compute the *type scheme* or *trait definition*
for other items.
There are some shortcomings in this design:
- Because the item generics include defaults, cycles through type
parameter defaults are illegal even if those defaults are never
employed. This is not necessarily a bug.
*/
//! "Collection" is the process of determining the type and other external
//! details of each item in Rust. Collection is specifically concerned
//! with *interprocedural* things -- for example, for a function
//! definition, collection will figure out the type and signature of the
//! function, but it will not visit the *body* of the function in any way,
//! nor examine type annotations on local variables (that's the job of
//! type *checking*).
//!
//! Collecting is ultimately defined by a bundle of queries that
//! inquire after various facts about the items in the crate (e.g.,
//! `type_of`, `generics_of`, `predicates_of`, etc). See the `provide` function
//! for the full set.
//!
//! At present, however, we do run collection across all items in the
//! crate as a kind of pass. This should eventually be factored away.
use astconv::{AstConv, Bounds};
use lint;
7 changes: 7 additions & 0 deletions src/libsyntax/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
NB: This crate is part of the Rust compiler. For an overview of the
compiler as a whole, see
[the README.md file found in `librustc`](../librustc/README.md).

The `syntax` crate contains those things concerned purely with syntax
– that is, the AST ("abstract syntax tree"), parser, pretty-printer,
lexer, macro expander, and utilities for traversing ASTs.