Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spec: Do a subtyping check when decoding #168

Merged
merged 13 commits into from
Apr 22, 2021
173 changes: 62 additions & 111 deletions spec/Candid.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
# Candid Specification

Version: 0.1.2
Version: 0.1.3

Date: Aug 25, 2020
Date: February 3, 2021

## Motivation

Expand Down Expand Up @@ -882,200 +882,150 @@ service { <name> : <functype>; <methtype>;* } <: service { <name> : <functype'>;

### Coercion

This subtyping is implemented during the deserialisation of Candid at an expected type. As described in [Section Deserialisation](#deserialisation), the binary value is conceptually first _decoded_ into an abstract value, and then _coerced_ to the expected type.
This subtyping is implemented during the deserialisation of Candid at an expected type. As described in [Section Deserialisation](#deserialisation), the binary value is conceptually first _decoded_ into the actual type and a value of that type, and then that value is _coerced_ into the expected type.

This section describes the coercion, as a ternary relation `V ~> V' : T` to describe when a value `V` can be coerced to a value `V'` of type `T`. The fields `V` and `T` can be understood as inputs and `V'` as the output of this relation.
To model this, we define, for every `t1, t2` with `t1 <: t2`, a function `C[t1<:t2] : t1 -> t2`. This function maps values of type `t1` to values of type `t2`, and is indeed total.

Here `V` models untyped values, which form an abstract data model of both the message in transit (`V`), as well as the result of the coercion (`V'`). In the following, we re-use the syntax of the textual representation, with the following changes to make it free of overloading:

* Number literals (`<primval>`) must be immediately enclosed with an `<annval>` that clarifies the precise number type.
* The canonical value of type `reserved` is expressed as `(null : reserved)`.
* No other use of `<annval>`.
to describe these values, we re-use the syntax of the textual representation, and use the the `<annval>` syntax (i.e. `(v : t)`) if necessary to resolve overloading.

#### Primitive Types

Values of primitive types coerce successfully at their own type:
On primitve types, coercion is the identity:
```
--------------------------------------------------
(<x> : <numtype>) ~> (<x> : <numtype>) : <numtype>

-------------------
true ~> true : bool

---------------------
false ~> false : bool

-----------------------
<text> ~> <text> : text

-------------------
null ~> null : null
C[<t> <: <t>](x) = x for every <t> ∈ <numtype>, bool, text, null
```

Values of type `nat` coerce successfully at type `int`:
Values of type `nat` coerce at type `int`:
```
--------------------------------
(<x> : nat) ~> (<x> : int) : int
C[nat <: int](<nat>) = <nat>
```

Any value coerces at type `reserved`, producing the canonical value of type `reserved`:

Coercion into `reserved` is the constant map (this is arbitrarily using `null` as “the” value of `reserved`):
```
-----------------------------------
<v> ~> (null : reserved) : reserved
C[<t> <: reserved](_) = null
```

NB: No rule is needed for type `empty`, because there are no values of that type. By construction, `<v> ~> _ : empty` will not hold.
NB: No rule is needed for type `empty`, because there are no values of that type. By construction, `C[empty <: <t>]` is the unique function on the empty domain.

#### Vectors

Only vectors coerce at vector types, and only if all elements coerce successfully.

Vectors coerce pointwise:
```
<v> ~> <v'> : <t>
-----------------------------------------
vec { <v>;* } ~> vec { <v'>;* } : vec <t>
C[vec <t> <: vec <t'>]( vec { <v>;* } ) = vec { C[<t> <: <t'>](<v>);* }
```

#### Options

The null value coerce at any option type:
The null type coerces into any option type:
```
----------------------
null ~> null : opt <t>
C[null <: opt <t>](null) = null
```

An optional value coerces at an option type, if the constituent value coerces at the constituent type:
```
<v> ~> <v'> : <t>
-----------------------------
opt <v> ~> opt <v'> : opt <t>
An optional value coerces at an option type, if the constituent value has a suitable type, and else goes to `null`:
```

If an optional value _fails_ to coerce at an optional type, or the value is `reserved`, the result is `null`, not failure:
C[opt <t> <: opt <t'>](null) = null
C[opt <t> <: opt <t'>](opt <v>) = opt C[<t> <: <t'>](v) if <t> <: <t'>
C[opt <t> <: opt <t'>](opt <v>) = null if not(<t> <: <t'>)
```
not (<v> ~> _ : <t>)
-------------------------
opt <v> ~> null : opt <t>

-----------------------------------
(null : reserved) ~> null : opt <t>
```
NOTE: These rules above imply that a Candid decoder has to be able to decide the subtyping relation at runtime.

Coercing a non-null, non-optional and non-reserved value at an option type treats it as an optional value:
```
<v> ≠ null
<v> ≠ (null : reserved)
<v> ≠ opt _
not (null <: <t>)
opt <v> ~> <v'> : opt <t>
-------------------------
<v> ~> <v'> : opt <t>
C[<t> <: opt <t'>](<v>) = opt C[<t> <: <t'>](v) if not (null <: <t>) and <t> <: <t'>
C[<t> <: opt <t'>](_) = null if not (null <: <t>) and not (<t> <: <t'>)
Comment on lines +934 to +935
Copy link
Collaborator Author

@nomeata nomeata Apr 7, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
C[<t> <: opt <t'>](<v>) = opt C[<t> <: <t'>](v) if not (null <: <t>) and <t> <: <t'>
C[<t> <: opt <t'>](_) = null if not (null <: <t>) and not (<t> <: <t'>)
C[<t> <: opt <t'>](<v>) = opt C[<t> <: <t'>](v) if not (null <: <t'>) and <t> <: <t'>
C[<t> <: opt <t'>](_) = null if not (null <: <t'>) and not (<t> <: <t'>)

These conditions need to be stricter, see https://dfinity.slack.com/archives/C016H8MCVRC/p1617755023039700, else C[bool <: fix opt](true) = fix opt, and we certainly don’t want infinite values here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about the empty record case?
C[ μx. record { x } <: μx. opt record { x } ](_) = opt C[μx. record { x } <: record { μx. opt record { x } } ]

Do we want to allow μx. record { x } to be subtype of opt t?

Copy link
Collaborator Author

@nomeata nomeata Apr 8, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Every type is a subtype of opt t. The question is only whether it coerces to null or something else.

I think this case is ok: Note that the coercion function takes a value v : µx.record{x}, and no such value exists, so C[ μx. record { x } <: μx. opt record { x } ] is (independent of the equations) the empty function with an empty domain.

But you are right, this is a bit fishy, in the sense that implementations must be careful not to loop. But we already have similar issues with µx.record{x}, even in the value decoder, where we’d easily run into a loop if we are not careful.

Copy link
Contributor

@chenyan-dfinity chenyan-dfinity Apr 10, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure how to detect loops in this case. The usual trick is to count the depth of the continuous nesting of records. But here, µx.record{x} doesn't change, we are only reducing the expected type, i.e. C[ μx. record { x } <: μx. opt record { x } ] = opt C[ μx. record { x } <: record { μx. opt record { x } } ]. Operationally, the type on the left doesn't change at all, so we cannot count the depth.

Of course, there are more expensive ways to detect loops, e.g. memoization or decoding the value with the actual type first. But to rule out other possible implementations just because of this case seems a bit wasteful.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this really a new problem? The problem with µx.record{x} is about decoding, not coercion, and that isn’t changed by this PR? There are no values of that type anyways, so one wouldn’t ever get past the decoding stage before erroring out (or looping.)

In the Motoko decoder, I bump a counter whenever I go from record { t1, … } to t1, and reset whenever I actually consum a byte from the input stream. If that counter exceeds the table size I have recognized a loop.

In the Haskell decoder, I pre-process the type table and replace any empty type like µx. record { …., x , …} with empty.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
C[<t> <: opt <t'>](_) = null if not (null <: <t>) and not (<t> <: <t'>)
C[<t> <: opt <t'>](_) = null otherwise

We don't have a rule for null <: t' then. I think this applies to the null <: t' case as well?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you are right. But let’s be explicit here, and and not rely on fall-through-semantics.

So it seems the fomulation that we want, so that each line is correct independent of their order, might be

C[<t> <: opt <t'>](<v>) = opt C[<t> <: <t'>](v)  if not (null <: <t'>) and <t> <: <t'>
C[<t> <: opt <t'>](_)   = null                   if not (null <: <t'>) and not (<t> <: <t'>)
C[<t> <: opt <t'>](_)   = null                   if null <: <t'> and not (null <: <t>)

(Remember that null <: t implies that t = null, t = opt … or t = reserved, and these cases have their own rules)

I think this needs a table:

null <: t null <: t' t <: t' Rule?
no no no second rule
no no yes first rule
no yes no third rule
no yes yes third rule
yes no no second rule
yes no yes impossible (<: is transitive)
yes yes no see rules for t = null, t = opt … or t = reserved
yes yes yes see rules for t = null, t = opt … or t = reserved

C[reserved <: opt <t'>](_) = null
```


#### Records

Only records coerce at record type. Missing fields of option or reserved type turn into `null`.
In the following rule:

* The `<nat1>*` field names are those present in both the actual and expected type. The values are coerced.
* the `<nat2>*` field names those only in the actual type. The values are ignored.
* the `<nat3>*` field names are those only in the expected type, which therefore must be of optional or reserved type. The `null` value is used for these.

In the following rule, the `<nat1>*` field names are those present in both the value and the type, the `<nat2>*` field names only those in the value, and `<nat3>*` are those only in the type.
```
<v1> ~> <v1'> : <t1>
opt empty <: <t3>
---------------------------------------------------------------------------------------------------------------------------------------
record { <nat1> = <v1>;* <nat2> = <v2>;* } ~> record { <nat1> = <v1'>;* <nat3> = null;* } : record { <nat1> = <t1>;* <nat3> = <t3>;* }
C[record { <nat1> = <t1>;* <nat2> = <t2>;* } <: record { <nat1> = <t1'>;* <nat3> = <t3>;* }](record { <nat1> = <v1>;* <nat2> = <v2>;* })
= record { <nat1> = C[<t1> <: <t1'>](<v1>);* <nat3> = null;* }
```


#### Variants

Only a variant value with an expected tag coerces at variant type.

```
<v> ~> <v'> : <t>
----------------------------------------------------------------------------------
variant { <nat> = <v> } ~> variant { <nat> = <v'> } : variant { <nat> = <t>; _;* }
C[variant { <nat> = <t>; _;* } <: variant { <nat> = <t'>; _;* }](variant { <nat> = <v> })
= variant { <nat> = C[<t> <: <t'>](<v>) }
```


#### References

Function and services references coerce unconditionally

```
------------------------------------------------------
func <text>.<id> ~> func <text>.<id> : func <functype>
```

```
------------------------------------------------------
service <text> ~> service <text> : service <actortype>
```
Function and services reference values are untyped, so the coercion function is the identity here:

```
------------------------------------------------
principal <text> ~> principal <text> : principal
C[func <functype> <: func <functype'>](func <text>.<id>) = func <text>.<id>
C[service <actortype> <: service <actortype'>](service <text>) = service <text>
C[principal <: principal](principal <text>) = principal <text>
```

#### Tuple types

Whole argument and result sequences are coerced with the same rules as tuple-like records. In particular, extra arguments are ignored, and optional parameters read as as `null` if the argument is missing or fails to coerce:

```
record {<v>;*} ~> record {<v'>,*} : record {<t>;*}
--------------------------------------------------
(<v>,*) ~> (<v'>,*) : (<t>,*)
C[(<t>,*) <: (<t'>,*)]((<v>,*)) = (<v'>,*)
if C[record {<t>;*} <: record {<t'>,*}](record {<v>;*}) = record {<v'>;*}
```


## Properties

The relations above have certain properties. To express them, we need the relation `V : T`, expressing that `V` has inherently type `T`. Instead of defining this relation on its own, we take the first property below as its definition:
The relations above have certain properties. As in the previous section, `<v> : <t>` means that <v> has inherently type `<t>`.

* Correctness and completeness of decoding:
* Reflexivity of subtyping:
```
(v : T) ⟺ (∃ v'. v' ~> v : T)
<t> <: <t>
```

* Roundtripping:
* Transitivity of subtyping:
```
(v : T) ⟺ v ~> v : T
<t> <: <t'>, <t'> <: <t''> ⇒ <t> <: <t'>
```

* Uniqueness of decoding:
* Roundtripping:
```
v ~> v1 : T ∧ v ~> v2 : T ⇒ v1 = v2
(<v> : <t>) ⟺ C[<t> <: <t>](<v>) = <v>
```

* Soundness of subtyping:
* Soundness of subtyping (or, alternatively, well-definedness of coercion)
```
T <: T' ⇒ ∀ v : T. ∃ v'. v ~> v' : T
<t> <: <t'> ⇒ ∀ <v> : <t>. v[<t> <: <t'>](<v>) : <t'>
```

* Higher-order soundness of subtyping
See <./IDL-Soundness.md>, with the following instantiations:
```
s1 ~> s2 ≔ s2 <: s1
t1 <: t2 ≔ (∀ v. v : t1 ⇒ v ~> _ : t2 )
t1 <: t2 ≔ t1 <: t2
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose you could just drop this line?

Copy link
Collaborator Author

@nomeata nomeata Feb 3, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don’t think so, the left hand side is “parameters of IDL-Soundness”, the right hand side is “stuff defined here”. The punnig is mostly accidential (and if we would distinguish “desired subtyping” and “value subtyping”, it might matter.)

s1 in t1 <: s2 in t2 ≔ (to be done)
s1 <:h s2 ≔ (host-language dependent)
```

* Transitivity of subtyping:
* NB: Transitive coherence does not hold:
```
T1 <: T2, T2 <: T3 ⇒ T1 <: T3
<t1> <: <t2>, <t2> <: <t3>
```

* Transitive coherence does not hold:
does not imply
```
T1 <: T2, T2 <: T3
v1 : T1
v1 ~> v3 : T3
v1 ~> v2 : T2, v2 ~> v2 : T3
C[<t1> <: <t3>] = C[<t2> <: <t3>] ⚬ C[<t1> <: <t2>]
```
does not imply `v3 = v3'`.

However, it implies that `R(v3,v3')`, where `R` is the smallest homomorphic, reflexive, symmetric relation `R` that satisfies `∀ v. R(opt v, null)`.
However, it implies
```
C[<t1> <: <t3>] ~ (C[<t2> <: <t3>] ⚬ C[<t1> <: <t2>])
```
where ~ is the smallest homomorphic, reflexive, symmetric relation that satisfies `∀ v. opt v ~ null`.

The goal of “subtyping completeness” has not been cast into a formal formulation yet.

Expand Down Expand Up @@ -1299,9 +1249,10 @@ Note:
Deserialisation at an expected type sequence `(<t'>,*)` proceeds by

* checking for the magic number `DIDL`
* using the inverse of the `T` function to decode the type definitions `(<t>*)`
* using the inverse of the `M` function, indexed by `(<t>*)`, to decode the values `(<v>*)`
* use the coercion relation `(<v>,*) ~> (<v'>,*) : (<t'>,*)` to try to understand the decoded values at the expected type.
* using the inverse of the `T` function to decode the type definitions `(<t>,*)`
* check that `(<t>,*) <: (<t'>,*)`, else fail
* using the inverse of the `M` function, indexed by `(<t>,*)`, to decode the values `(<v>,*)`
* use the coercion function `C[(<t>,*) <: (<t'>,*)]((<v>,*))` to understand the decoded values at the expected type.

### Deserialisation of future types

Expand Down
2 changes: 2 additions & 0 deletions test/construct.test.did
Original file line number Diff line number Diff line change
@@ -1,5 +1,7 @@
/*
Encoding tests for construct types

Corresponding to spec version version 0.1.2
*/

// Type definitions
Expand Down
2 changes: 2 additions & 0 deletions test/overshoot.test.did
Original file line number Diff line number Diff line change
@@ -1,4 +1,6 @@
/*
Corresponding to spec version version 0.1.2

The tests in this file involve large leb-encoded values that indicate sizes (of
tables, arrays, blobs, etc.). In all cases, a decode should be able to catch
that eary by noticing that the overall input is too short to potentially
Expand Down
2 changes: 2 additions & 0 deletions test/prim.test.did
Original file line number Diff line number Diff line change
@@ -1,5 +1,7 @@
/*
Encoding tests for primitive types

Corresponding to spec version version 0.1.2
*/

// fundamentally wrong
Expand Down
2 changes: 2 additions & 0 deletions test/reference.test.did
Original file line number Diff line number Diff line change
@@ -1,5 +1,7 @@
/*
Encoding tests for reference types

Corresponding to spec version version 0.1.2
*/

// principal
Expand Down