-
Notifications
You must be signed in to change notification settings - Fork 105
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
History based IDL subtyping #1106
Comments
Thanks for picking this up again, I must admit that I still lack a good idea how to get unstuck on this. Quick question: from a first read it sounds as if you are basically suggesting to move from structural to nominal subtyping, i.e., your history tree is essentially a nominal subtype hierarchy (a.k.a., interface inheritance hierarchy). Is that right or is there something more subtle happening? |
Sounds like you could be right. It is still structural in the sense the hstory doesn't have identity: If your type evolution and my type evoluion are the same (because, commonly, we both imported the types from the same IDL), then our types are compatible. |
FWIW, I endorse this line of thinking enthusiastically. @nomeata This is a concise, clear exposition of the idea I was trying to form and communicate late last year. Thanks for thinking this through, and writing this up. Much appreciated. : ) BTW, this "structural versus nominal" typing distinction is exactly the idea I have in mind too. IDL should give canisters a nominal typing when we enter the world of evolution (canister types changing over time) so that their identity can remain fixed amidst this evolution, however we characterize it. |
Would a simple motivating example extend its life here? |
Unfortunately, I do not see any other way to make IDL typing work amidst multiple versions of the API. I think the way that this could be made practical is the following two assumptions:
Aside: This window may make it easier for canisters to "merge" again too; not sure if that's desirable or not. We may want to prevent it in some cases? |
It would surely be enlightening! But the bigger problem is keeping track of the history. I am not worried about keeping a long history. I am worried about he tooling impact of keeping any history at all; it is pretty unusual that the content of your source tree somehow depends on when and which versions you deployed. Assuming we could solve that, I don't think a long history is a problem, so I don’t see the need to worry about windows etc. Maybe there is a solution where the history isn’t stored with the source, but rather with the canister. I.e. the source code has (or generates) IDL with a normal type, but upon build and deploy, the the deployment tool fetches the IDL history from the canister, checks that the current type can be evolved to the new type, and includes that full history in the canister. But all very hairy 💇 |
My understanding is that the primary goal of @nomeata's idea is to avoid a structural subtype test at runtime in favour of a cheaper "nominal" test. But if you ignore implementation pragmatics, then from a purely semantic perspective, nominal is strictly less flexible and more bureaucratic than structural. @matthewhammer, you seem to think that is a semantic advantage, but can you give an argument why? |
nomeata:
Yeah, no matter where you store the history, my gut feeling is that it would induce a global implementation (and usability) cost that is much higher than the local implementation cost it tries to avoid. I guess that's what you mean by impractical? |
There is a bit more than just replacing the runtime subtype test with a simpler one. There is also the idea to “to cast via the common ancestor”, and not directly across. It is unclear to me how much the practical effect is. But sometimes it does kick in, as in this example:
This situation arises when two parties at some point agree on the interface The dynamic subtype test would find that In this example, this is desired behavor. But it’s too fragile: If two parties independently change the interface the same way, the data still comes through. And the intermediate steps in the type history are significant. |
Presumably, this could be made more solid by stamping versions, i.e., going to a truly nominal semantics (I guess that's what @matthewhammer is getting at?). But it's not clear where these stamps would be coming from, and it looks like requiring new global infrastructure. |
Timestamps? Or, less hacky, UUIDs? |
The latter, probably. But that would require the notion of an "owner" of a type definition, and it's not clear what that means across importing IDLs, or what if you replace an import with a copy. |
Would it? Maybe on a level of convention. I assume that whenever someone changes a type, they pick a new UUID. If I copy a type from fromwhere else, and then change it, then I update the UUID, because I just forked the evolution.
If you import, you import with the UUID, because “import” likely means that you want to be compatible with that service.
With an unchanged UUID? Then you are still compatible, as expected. If you copy and change the UUID, then that’s a way of forking, and thus indicating that although the types happen to be the same, you don't intend to be compatible. (Maybe it shouldn’t be a pure UUID, but rather a hash of a UUID and a canonicalized version of the type, to have the property that while same types can have differnent ids, you can’t have differnent types with the same ids.) |
You mean manually putting the id in the IDL sources? How can you ensure that's done consistently, i.e., nobody reuses the same id for different versions? And how/when would it be generated from Motoko? |
Maybe by mixing in a (hash) of the type, as just described?
Right, the toll on the tooling is just too taxing. |
Regarding my motivating examplesI always come back to wanting "free evolution" of variants, or more generally, "free evolution" of algebraic data types from ML. Nearly every canister I'd ever want to write myself and deploy would expose at least one DSL, and perhaps many. To make things concrete here, let's imagine that I have a spreadsheet DSL as part of my IDL spec, for a canister that does something like Excel, but very simple for now. For each operation you might think of as a public canister call in a "calculator" canister, I instead want to have a binary constructor in my DSL for this spreadsheet canister. Then, I'll construct formulas as expression trees, as usual. The expressions will be stored in cells, which I'll just ignore for now. Let's focus on the expression tree type, in IDL/Motoko:
In the context of this IDL evolution discussion, we've been discussing adding and removing public functions or record fields. That's not the issue here, where I really want to add or remove a variant case (e.g., add My understanding of Andreas's other proposal (the "current front runner" in this design space), is that we cannot evolve ADTs freely. My understanding of Joachim's design above is that we can. (Please correct me if any of that is unclear or seems wrong, of course.) Regarding "ownership" of a canister's IDL typeIf we try to divorce ourselves from our PL design principles (e.g., nominal typing introduces certain problems, at a fundamental, conceptual level) we cannot deny that many of our canister developers (perhaps even most of them?) will think of themselves as authors of the running canisters that they maintain and evolve. By extension, these canisters become "life-like" things whose evolution that they own and control. In past meetings we've discussed briefly how our bigger story of platform risk seems connected, fundamentally, to questions about this evolutionary control. In my mind, the root question is all about identity of canisters over time. Meaning, if I have a canister named As a user of this canister, I want to use it with a fixed identity that is tied to its purpose, so that I can use for all time, and so that I can enjoy liberation from "platform risk" (the risk of this canister changing in a way that's out of spirit with its original purpose and mission, and why I chose to start using it and storing data there, etc.). Etc
Ah, okay. I'm not either. Thanks for clarifying. FWIW, I imagined that whatever layer of "the system" that does the IDL checks we want to impose would also be responsible for knowing this history. How it is best stored in between those checks (and updates) is not clear to me. |
Yeah, I don't see why that would be the case. Nominal typing allows strictly fewer relations than structural (which sometimes is what you want), but it doesn't magically allow additional relations, because structural is already characterised by allowing all operationally possible relations. So to make evolution of variants work you'd need additional mechanisms, and those would be unlikely to depend on nominal typing AFAICS.
Hm, it seems to me that you are conflating canisters with canister interfaces. The latter are used in dual roles as both assurances of a given canister as well as assumptions about other canisters, e.g., expected as a parameter or import. In the parameter case there is a many-to-one relationship, so any possible interface naming would generally be independent from canister identities. And even for imports it depends on who provides the IDL. Note @nomeata talking about version trees for interfaces, whereas evolution of an individual canister is linear. Subtyping is what connects assumptions and assurances. And the challenge here is that both ends of such a relation in a given network can evolve independently. What we need to define, then, is what independent interface evolution is legal and how it is handled in the case of divergence. Version naming only helps in so far that it potentially enables imposing additional restrictions. |
@rossberg said above about this proposal:
Yes, I think I agree with this (still), but to be clear, I think of the lattice elements of the ordering as entire histories, aka, lists of IDL service types. (I believe this is also what Joachim is saying above too, but getting explicit confirmation is always nice.)
But in what you call the "nominal case" (aka "history-based IDL subtyping", this proposal) we are not choosing names as strings of characters that form identifers. Rather, "names" (the histories) are things with lots of structure, and specifically, their structure is a list of IDL types. In that sense, this "nominal" idea for naming IDL versions is really "structural", at least in some pretty concrete formal ways. Given this, don't see how to map your argument about nominal-vs-structural onto the proposal that Joachim gives above. I believe the "rules of evolution" here are more permissive than the subtyping rules in your proposal (wrt adding/removing variants); note that the "subtyping" here is not about the individual rules of evolution here, but rather, about lists of related evolutions (aka, histories). It feels like your argument is assuming that we have an apples and apples comparison to draw, but we have apples (evolution via subtyping) and "oranges" (evolution via an independent relation, which then is latter used to define subtyping over the histories it permits). |
Fair point. I feel myself doing this subconsciously all of the time. However, I do not think that the arguments I am making here are doing that, at least how I'm thinking of them internally. Perhaps the wording and communication of those arguments is doing that, though.
This table above is (partially) how I've internalized the key contrasts, and why I think Andreas's arguments about nominal-vs-structural typing may not apply here, to this proposal. (@nomeata Can you verify this matches what you are saying above? I believe it does, but I've been wrong before.) |
To be fair, it doesn't seem like variants are considered explicitly in the list of evolution rules given above. But, I think you can imagine having analogues to the rules for adding and removing fields of records that correspond to adding and removing constructors of sum types. |
An example of how I'd like to change variants, in both arguments and results, without being hamstrung by the ordering imposed by subtyping. Version A:
Notably, the variant Version B, evolved from A.Now, the service evolves by supporting an additional operation, expressed as a new constructor for
The other definitions are not directly affected, but are indirectly affected. I could have repeated the same text for the RHS of Evolution checkI argue that we want to permit service evolution to support this example, which represents a design pattern, really, not a corner case at all (at least to me). If we agree (do we?), I think we can do so by extending Joachim's proposal accordingly, where we have a notion of "optional constructors" that can be added in the same spirit as optional fields. I do not see how there's an obstacle there, but I'm not 100% confident. AFAIK, the only reason this wasn't considered before is because the records were already "hard" to support, and it wasn't clear how to generalize that solution, which conflates the IDL notions of valid subtyping and valid evolution. This proposal offers a way to separate them, and unblock us. |
True, but as @nomeata points out above, this naming scheme actually is too weak to really protect against all accidents, while proper generative names would do that. So, I think that particular difference is rather accidental than fundamental.
I'm afraid you lost me there.
I don't think that's what this proposal does in any interesting sense. AFAICS, all it does, effectively, is restricting the subtype lattice to certain edges, i.e., it's a smaller relation than the full structural one. @nomeata, please correct me if I'm misinterpreting it.
Yes, but then you have to define what that means. And that requires coming up with some rules for mapping unknown cases, which is the non-obvious part. But I don't see why such rules would have any dependency on the suggestion in this issue; I'm pretty sure they would essentially be equivalent to some coercive subtyping rules. At least I don't see what else they would possibly be -- they ultimately mediate individual values between two concrete types, not between hierarchies. |
Yes and no. It may be true that if
|
@nomeata, yes, I agree. But to clarify, the context (that I elided above) of the quote was @matthewhammer's hope that this semantics would somehow enable evolution of variants or similar relaxations, which are currently absent from the IDL. I was trying to explain why that's mostly an orthogonal problem and does not follow from your proposal. |
I agree it is orthogonal. In my proposal, the problems around extending variants would be part of the evolves-to releation ( And of course the problem is dual:
So yes, variant evolution is no harder of simpler than record evolution: You “just” have to only allow it inside And yes, orthogonal to history-based or not. |
Ah, great observation about the duality.
More precisely, it would have to be wrapped in opt already. And that makes it much less useful from a practical perspective, I fear. I had previously thought along the lines of mapping to a "default" variant, but that seems even more ad-hoc, as it would require marking such a variant. |
Thanks for all of the discussion and clarifications above. To step back a bit, my primary concern is that whatever enforcement mechanism we have for evolving the IDL of my canister affords me the ability to add new variant constructors, somehow. (As in my expression tree example above). I had thought that the "fully structural" approach (#832) would somehow never extend to that, because it would somehow be too complex to do so. After Joachim's duality observation above, I hope I'm interpreting the reaction accurately. The message I'm getting from Andreas and Joachim above is that we can in fact extend #832 to get what I want, and support my expression tree evolution above. Correct? If so, then great. This history-based approach is interesting, but once separated from my motivating example through this orthogonality argument, it looses most of its interest to me, at least for now. Again, all of the discussion above was really helpful, in any case. Thanks again. |
I realize it's not the perfect solution, given what Andreas says ("...less useful from a practical perspective"), but it's not totally forbidden, either. In the case of my expression tree example, I can imagine how the If that all sounds plausible, then I'm happy. |
👍 👍 |
Just a (hopefully clarifying) note: Andreas was confused by me saying this above:
I now see that I was confused. Backing up, I didn't appreciate the orthogonality that Andreas and Joachim have both observed above, that the question of history-based versus history-independent (ordinary, structural) subtyping is separate from supporting the variant evolution in my expression tree example. Prior to seeing this fact, I had thought that it was important to separate the concerns of subtyping (and coercions, etc) from evolving the types defined in the service. Obviously they cannot be totally separated, of course. I thought that histories were going to provide that indirection. In fact, the concern is orthogonal. I welcome this outcome. |
I think the discussion here has ebbed down, and there isn’t much more to learn from these ideas. Closing to keep the issue list tidy, but nothnig is lost of course. More fresh ideas in #1150. |
Closing, Candid discussion can happen at the Candid repo. |
These are just some ideas, the result of more thinking about how to add optional fields in records, and the hope of finding a solution that is not as “dynamic” as Andreas’ #832.
This is also thinking more about what @matthewhammer might have in mind or gut feeling when keeps repeating that type evolution isn't subtyping.
I do not think this scheme is practically viable! This is more food for further thought.
Core idea
IDL types remember their history.
This way the receiver of a message can interpret that value at the common ancestor of the message type and the expected type.
Type evolution
First, inspired by Matthew, I define a relation of “allowed type evolution”
t1 ~> t2
. For records, the allowed steps are the deprecated of fields, the addition of optional fields, and evolving a type in a nested position:{ x : t; … } ~> { x : deprecated; … }
(with some dedicateddeleted
type){ … } ~> { x : opt t; … }
{ x : t; … } ~> { x : t'; … }
ift ~> t'
.t1 ~> t3
ift1 ~> t2
andt2 ~> t3
(we can do multiple steps at once).Given
t1 ~> t2
we can calculate (unique) coercionsf: t1 -> t2
andf : t2 -> t1
in the obivous way, and write(f1, f2) : t1 ~> t2
for that.Type history
Canister methods’s arguments and result types are now not just simple types (
arg_t -> ret_t
) but type histories (ret_th -> arg_th
), where type histories are non-empty sequences of types related by evolutionLet’s define
current(t1,…,tn) = tn
.When passing a value
v : current(th)
to a methodfoo : th -> ()
, the sender sents(th,v)
, i.e. includes a type description as now.When the implementation of
foo : th2 -> ()
receives(th1,v)
, itth1
andth2
. Trap if none is found. Lett
be the last type of that sequence.(_, f) : t ~> current(th1)
and(g, _) : t ~> current(th1)
g(f(v)) : current(th1)
and continue with that.Type evolution is a tree
The type histories here are sequences, but if you take the set of all the sequences ever encountered, you get the evolution tree of types (in the set-of-paths representation). In particlar, if some type is forked in incompatible ways at some point, this tree branches.
During decoding, the type of the value is located on that tree, and then the coercion is built along a path on that tree.
What if it is a dag?
The formulation above does not allow two type evolutions to merge again. One could allow that (e.g. picking the “best”
t
hat is both inth1
andth2
, but that is somewhat non-deterministic. Maybe “the latestt
inth2
that is also inth1
is the best candidate, as that has the most useful information to the receiver.What are the effects?
(One could argue that I am implementing a subtype check of sorts, just with a small set of possible “subtypes” to check, therefore reducing it a finite number of type equalities.)
{ foo: Int; bar: Int; …}
and the other one adds{ foo : Int; bar : Text; … }
, then the messages will ignore bothfoo
andbar
, on the grounds that the overall type is different, and that therefore passing throughfoo
would be accidential.(But this would be much more convincing if we get the same effect if both parties happen to add just
{ foo : Int, …}
, but independently. But I think this will be impossible in a structurally typed system: Distinguishing between “two parties added the same type by chance” and “two parties add the same type by coordination”.Conclusion
Because of the need to track the history of types, this is not practical. And I fail to work out an convincing example why this would even be good.
I’ll leave this open for a week or so if people find it insightful, and close it aftwards.
The text was updated successfully, but these errors were encountered: