Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix IDL subtyping #832

Closed
wants to merge 8 commits into from
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
180 changes: 154 additions & 26 deletions design/IDL.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,8 +7,8 @@ To document, discover, and interact with actors on the platform, we need an inte
* Language-independent description of actor interfaces and the data they
exchange (names, parameter and result formats of actor methods)
* Simple and canonical constructs (C-like; algebraically: sums, products, exponentials)
* Extensible, backwards-compatible
* Well-formedness is checkable and guaranteed by the platform
* Extensible, backwards-compatibleBecuase this is an old branch that does things that cause headaches to the infra team, closing. One cold also update it and reopen; as long as the bulid system is recent it’s fine to keep them around in open state.
* Well-formedness is checkable
* Deterministic mapping to serialised representation
* Human-readable and machine-readable
* Declarative, usable as input to binding code generators
Expand Down Expand Up @@ -77,8 +77,7 @@ This is a summary of the grammar proposed:
| bool
| text
| null
| reserved
| empty
| any | empty

<constype> ::=
| opt <datatype>
Expand Down Expand Up @@ -274,22 +273,22 @@ The type `null` has exactly one value (the *null* value) and therefor carries no
<primtype> ::= ... | null | ...
```

#### Reserved
#### Any and Empty

The type `reserved` is a type with unknown content that ought to be ignored. Its purpose is to occupy field ids in records in order to prevent backwards/forwards compatibility problems, see the description of record types below.
The type `any` is a type with unknown content (i.e., it is the type of all possible values) that ought to be ignored. Its primary purpose is to occupy yet unused field ids in records in order to prevent backwards/forwards compatibility problems, see the description of record type subtyping below.
```
<primtype> ::= ... | reserved
<primtype> ::= ... | any
```
**Note:** This type has a similar role as *reserved fields* in proto buffers.


#### Empty

The type `empty` is the type of values that are not present. Its purpose is to mark variants that are not actually there, or -- as argument types of function reference -- indicate that they will not be called.
Dually, the type `empty` is the type of unused or absent content (i.e., it is the type with no values). Its primary purpose is to occupy no longer used field ids in records in order to prevent backwards/forwards compatibility problems, see the description of record type subtyping below.
It can also be used to mark variants that are not actually there, or -- as argument types of function reference -- indicate that they will not be called.
```
<primtype> ::= ... | empty
```

Technically, the types of the IDL form a lattice by subtyping, where `any` and `empty` are the top and bottom element.


### Constructed Data

*Constructed types* describe compound or aggregated forms of values.
Expand All @@ -301,6 +300,17 @@ An *option* is a value of a specific data type that may be absent.
```
<constype> ::= opt <datatype> | ...
```
##### Shorthands: Reserved and Deprecated

When evolving record types over time (see below), it sometimes is useful to *reserve* fields for future use, or *deprecate* old fields so that they are no longer used. Defining the fields or keeping them around as "occupied" can prevent backwards/forwards compatibility problems.
The types `reserved` and `deprecated` can be used for this purpose.
See the discussion of record subtyping for more details.
```
<constype> ::= ....
| reserved := opt empty
| deprecated := opt any
```


#### Vectors

Expand Down Expand Up @@ -541,8 +551,23 @@ For upgrading data structures passed between service and client, it is important

That is, outbound message results can only be replaced with a subtype (more fields) in an upgrade, while inbound message parameters can only be replaced with a supertype (fewer fields). This corresponds to the notions of co-variance and contra-variance in type systems.

In addition, we define

* *Round-tripping* data is both outbound and inbound, i.e., it is returned from the server to the client and later passed back to the server. Such round trips ought to be possible without losing fields that the client read (it may still lose new fields if the client is out of date with the server's interface).

Subtyping applies recursively to the types of the fields themselves. Moreover, the directions get *inverted* for inbound function and actor references, in compliance with standard rules.

To make these constraints as flexible as possible, two special rules apply:

* An absent record field is considered equivalent to a present field with value `null`. Moreover, a record field of type `null` is a subtype of a field with type `opt <datatype>`. That way, a field of option (or null) type can always be added to a record type, no matter whether in co- or contra-variant position. If an optional field is added to an inbound record, and the client did not provide it, the service will read it as if its value was null.

- in an outbound record, a field of option (or null) type can also be removed in an upgrade, in which case the client will read it as if its value was null;
- in an inbound record, a field of option (or null) type can also be added, in which case the service will read it as if its value was null.

Taken together, the rules allow even round-tripping record types to be upgraded by adding new fields, as long as these fields are optional.

Possible future extensions: defaults for options, including for variants?

### Rules

#### Primitive Types
Expand Down Expand Up @@ -587,7 +612,7 @@ vec <datatype> <: vec <datatype'>
```
Furthermore, an option type can be specialised to either `null` or to its constituent type:
```
------------------------
----------------------
null <: opt <datatype>

not (null <: <datatype>)
Expand All @@ -610,11 +635,92 @@ record { <fieldtype'>;* } <: record { }

<datatype> <: <datatype'>
record { <fieldtype>;* } <: record { <fieldtype'>;* }
----------------------------------------------------------------------------------------------
record { <nat> : <datatype>; <fieldtype>;* } <: record { <nat> : <datatype'>; <fieldtype'>;* }
--------------------------------------------------------------------------------------------
record { <fieldtype>;* <nat> : <datatype> } <: record { <fieldtype'>;* <nat> : <datatype'> }
```

In addition, record fields of `null` or option type can be removed, treating the absent field as having value `null`.
```
record { <fieldtype>;* } <: record { <fieldtype'>;* }
------------------------------------------------------------------
record { <fieldtype>;* } <: record { <fieldtype'>;* <nat> : null }

record { <fieldtype>;* } <: record { <fieldtype'>;* }
----------------------------------------------------------------------------
record { <fieldtype>;* } <: record { <fieldtype'>;* <nat> : opt <datatype> }
```
Taken together, these rules ensure that adding an optional field creates both a co- and a contra-variant subtype. Consequently, it is always possible to add an optional field. In particular, that allows extending round-tripping record types as well. For example,
```
type T = {};
actor { f : T -> T };
```
can safely be upgraded to
```
type T' = {x : opt text};
actor { f : T' -> T' };
```
for all first-order uses of `T`, because both of the following hold:
```
T' <: T (by usual width subtyping)
T <: T' (by option field subtyping)
```
And hence:
```
(T' -> T') <: (T -> T)
```
Moreover, this extends to the higher-order case, where e.g. a function of the above type is expected as a parameter:
```
actor { g : (h : T -> T) -> ()}
```
Upgrading `T` as above contra-variantly requires
```
(T -> T) <: (T' -> T')
```
which also holds.

Finally, one auxiliary rule is needed to ensure that subtyping is still transitive:
```
not (<datatype> <: opt <datatype'>)
record { <fieldtype>;* } <: record { <fieldtype'>;* }
------------------------------------------------------------------------------------------------
record { <fieldtype>;* <nat> : <datatype> } <: record { <fieldtype'>;* <nat> : opt <datatype'> }
```
This rule allows the consecutive removal and later re-addition of a field with different types over multiple upgrades. For example:
```
record {x : text} <: record {} <: record {x : opt nat}
```
The field will also read as `null` in this case.
Transitivity is necessary so that data compatibility is guaranteed in all cases under the earlier rules, even across multiple upgrades.

Note that this rule does not cover the case where the new version of the field has the same type as the old:
```
record {x : opt text} <: record {} <: record {x : opt text}
```
In this case, a direct access of the old value under the new interface, without going through the intermediate version, will be handled according to the ordinary rule for options, keeping its value.
The semantics has no way of knowing whether that is intended or not, but this behaviour avoids any semantic ambiguity.

However, this implies that -- although subtyping is transitive and the respective conversions compose -- the composition of conversions is not always *coherent*, i.e., it may return values different from a direct conversion.
Unfortunately, a fully coherent semantics seems impossible to achieve for this case.

In practice, users are not recommended to remove optional fields in an upgrade, avoiding the scenario.
Instead, the preferred way to avoid backwards compatibility issues is to *deprecate* such fields by giving them type `deprecated`, a shorthand for `opt empty`.
Accordingly, both the following hold:
```
record {x : opt T} <: record {x : deprecated}
record {x : deprecated} <: record {x : opt T}
```
The first is for inbound uses and holds by usual subtyping, the second is for outbound uses, by the auxiliary rule above.

Dually, forward compatibility can be ensured by *reserving* a field via the type `reserved`, a shorthand for `opt any`.
Accordingly, both the following hold:
```
record {x : reserved} <: record {x : opt T}
record {x : opt T} <: record {x : reserved}
```
The first use again is for inbound uses, by usual subtyping, the second for outbound ones, by the auxiliary rule above.

Note: The auxiliary subtyping rule above should not be invoked in a static check for upgradability of a service, with the exception of these two cases. We expect tooling to warn about any such dependency on the rule. The only purpose of allowing the other cases is for dynamic coercions, i.e., to make sure that subtyping is transitive and therefore avoids unexpected runtime failure after multiple upgrades in all cases.

**NOTE**: There is a need for a mechanism to also remove fields (which means adding a field when a record appears as an argument). The precise mechanism is still work in progress.

#### Variants

Expand All @@ -626,8 +732,8 @@ variant { } <: variant { <fieldtype'>;* }

<datatype> <: <datatype'>
variant { <fieldtype>;* } <: variant { <fieldtype'>;* }
------------------------------------------------------------------------------------------------
variant { <nat> : <datatype>; <fieldtype>;* } <: variant { <nat> : <datatype'>; <fieldtype'>;* }
----------------------------------------------------------------------------------------------
variant { <fieldtype>;* <nat> : <datatype> } <: variant { <fieldtype'>;* <nat> : <datatype'> }
```

#### Functions
Expand Down Expand Up @@ -661,6 +767,7 @@ service { <name> : <functype>; <methtype>;* } <: service { <name> : <functype'>;
### Elaboration

To define the actual coercion function, we extend the subtyping relation to a ternary *elaboration* relation `T <: T' ~> f`, where `f` is a suitable coercion function of type `T -> T'`.
To express this function, we assume a simple lambda calculus over IDL values.


#### Primitive Types
Expand All @@ -671,15 +778,15 @@ To define the actual coercion function, we extend the subtyping relation to a te
<primtype> <: <primtype> ~> \x.x


------------------
Nat <: Int ~> \x.x
-------------------
Nat <: Int ~> \x.+x


------------------------------
<datatype> <: reserved ~> \x.x
---------------------------------
<datatype> <: reserved ~> \x.null


------------------------------
-------------------------------------
empty <: <datatype> ~> \_.unreachable
```

Expand Down Expand Up @@ -715,10 +822,27 @@ record { <fieldtype'>;* } <: record { } ~> \x.{}
<datatype> <: <datatype'> ~> f1
record { <fieldtype>;* } <: record { <fieldtype'>;* } ~> f2
----------------------------------------------------------------------------------------------
record { <nat> : <datatype>; <fieldtype>;* } <: record { <nat> : <datatype'>; <fieldtype'>;* }
record { <fieldtype>;* <nat> : <datatype> } <: record { <fieldtype'>;* <nat> : <datatype'> }
~> \x.{f2 x with <nat> = f1 x.<nat>}

record { <fieldtype>;* } <: record { <fieldtype'>;* } ~> f
------------------------------------------------------------------
record { <fieldtype>;* } <: record { <fieldtype'>;* <nat> : null }
~> \x.{f x; <nat> = null}

record { <fieldtype>;* } <: record { <fieldtype'>;* } ~> f
-----------------------------------------------------------------------------
record { <fieldtype>;* } <: record { <fieldtype'>;* <nat> : opt <datatype> }
~> \x.{f x; <nat> = null}

not (<datatype> <: opt <datatype')
record { <fieldtype>;* } <: record { <fieldtype'>;* } ~> f
------------------------------------------------------------------------------------------------
record { <fieldtype>;* <nat> : <datatype> } <: record { <fieldtype'>;* <nat> : opt <datatype'> }
~> \x.{f x; <nat> = null}
```


#### Variants

```
Expand Down Expand Up @@ -877,7 +1001,7 @@ T(int64) = sleb128(-12) = 0x74
T(float32) = sleb128(-13) = 0x73
T(float64) = sleb128(-14) = 0x72
T(text) = sleb128(-15) = 0x71
T(reserved) = sleb128(-16) = 0x70
T(any) = sleb128(-16) = 0x70
T(empty) = sleb128(-17) = 0x6f

T : <constype> -> i8*
Expand Down Expand Up @@ -944,7 +1068,7 @@ M(z : float<N>) = f<N>(z)
M(b : bool) = i8(if b then 1 else 0)
M(t : text) = leb128(|utf8(t)|) i8*(utf8(t))
M(_ : null) = .
M(_ : reserved) = .
M(_ : any) = .
// NB: M(_ : empty) will never be called

M : <val> -> <constype> -> i8*
Expand Down Expand Up @@ -1040,6 +1164,10 @@ Note:

To enable convenient debugging, we also specify a text format for IDL values.
The types of these values are assumed to be known from context, so the syntax does not attempt to be self-describing.
Type annotations can be given to disambiguate.

Note: It is expected that reading IDL values (in diagnostics or logs) is by far the dominant use case.
For that reason, the syntax is designed for easy readability instead of convenient writability. To that end, it intentionally matches the type syntax.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sneakily sneaking controverial changes in with unrelated PRs… ;-)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not a change, just a clarification/rationale for what's there already. ;) But I'm happy to do that in a separate PR if you think that's worthwhile.


```
<val> ::=
Expand Down