-
Notifications
You must be signed in to change notification settings - Fork 329
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CIP-0085? | Sums-of-products in Plutus Core #455
CIP-0085? | Sums-of-products in Plutus Core #455
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm approving this mainly (and immediately) because any disputes or changes to this would be well above my level. The explanations are good enough that even at a dumb level I feel like I can see why this is important & worth the "costs" that are admitted.
The Path to Active suggests this is fait accompli, and although that in itself shouldn't be grounds for approving the CIP, the proposal establishes that the team is going ahead with this after having considered & tested enough alternatives & confirming that it's the best way forward.
Since it mentions interfacing with the Ledger I also want to ask @JaredCorduan if he & any others in that area could question & comment on the Ledger aspect of the proposal. 🤓
That's not quite what I meant to convey! I wouldn't say it's a fait accompli - we have a prototype, but we had to have a prototype in order to know whether it was a good idea or not! We have yet to turn it into a production implementation, and it may be that we'll get input from other people that makes us reconsider things. This is of course complicated in that it's a CIP by the currently listed reviewer of the Plutus CIPs i.e. me 😅 So yeah, it's the case that I currently think we should accept it. Also, I guess I could have written the initial version of this CIP when we first had the idea, and then iterated from there. That would have been more open (there were at least two private design docs that preceded this), but I didn't want to do a proper proposal until I had a high-quality argument, which required a lot of performance work in the background (e.g. I actually spent a long time on multi-lambdas before eventually discarding them). |
Interesting details will probably emerge when we go to actually implement this, but as of right now I think that I understand the implications to ledger and I think that it looks like a great idea. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very interesting proposal, some things are a bit over my level of understanding as well (operational semantics). So my first question is only on how other usages of Data
will be affected?
Yes please |
CIP-????/README.md
Outdated
|
||
### Discussion | ||
|
||
#### Why is case not strict? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will this allow us to make a faster ifThenElse
than the builtins-based one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wouldn't be surprised if the Scott-encoded boolean case expression was already faster than the builtin version. As far as I remember, we mostly have the builtin one because we need an eliminator for builtin booleans, not because it's necessarily fast!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wouldn't be surprised if the Scott-encoded boolean case expression was already faster than the builtin version.
Sure, but that one has to go through the delay
/force
hoops, while case-expressions give us enough leeway to bypass that.
As far as I remember, we mostly have the builtin one because we need an eliminator for builtin booleans, not because it's necessarily fast!
Right... So do we normally use Scott-encoded booleans and only resort to built-in ones when dealing with builtins, which we then/immediately convert to Scott-encoded ones? Can't remember. If that's the case, then we should benefit a little from not Scott-encoding booleans (which I assume we'll get to automatically just by giving up on Scott-encoding in favor of native data types?) and not introducing any ifThenElse
, but instead inlining the case
at each call site, so that we can remove delay
/force
noise.
Which makes me wonder if there's a lot more pattern matching that we'd better inline during compilation instead of abstracting it out into a matcher.
Or am I saying nonsense?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, but that one has to go through the delay/force hoops, while case-expressions give us enough leeway to bypass that.
Annoyingly, I think we'll still have to put that in. The problem is that the moment you abstract over the case branches (which we do in the compiler), you get strictness again 🤷 We can probably optimize a lot of it away, though.
Right... So do we normally use Scott-encoded booleans and only resort to built-in ones when dealing with builtins, which we then/immediately convert to Scott-encoded ones?
Yes.
which I assume we'll get to automatically just by giving up on Scott-encoding in favor of native data types?
Yes.
Which makes me wonder if there's a lot more pattern matching that we'd better inline during compilation instead of abstracting it out into a matcher.
We do inline it, but we can't do it until UPLC for annoying type abstraction reasons. We should write some test cases to check that things get as optimized as we like. I have some, but possibly not for this exact case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as I remember, we mostly have the builtin one because we need an eliminator for builtin booleans, not because it's necessarily fast!
It seems like a lost opportunity to not be able to get rid of Builtin types which are sum of products such as booleans. Would it be possible to have a notion of predefined datatypes (such as the sop of booleans), and allow builtin functions to return them?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the current builtins it seems it would be enough to return a Plutus Core term. Even a Plutus Core value seems enough. All the current (and future) builtins that deconstruct a value could be done with a case
. The costing of polymorphic functions should stay the same, as long as we require certain reasonable sanity on future built-ins. But yeah, it is a big change...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
current builtins
Uh huh, and how about future builtins? Will we allow builtins to case on SOPs or not? We'll have to decide when someone asks. If we don't, we're inconsistent. Making these kind of incrementally-sensible-but-globally-incoherent decisions is how we ended up with the current builtins :D
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will we allow builtins to case on SOPs or not?
If we want to keep current builtins working, then we'd need casing on a SOP in IfThenElse
(even though technically IfThenElse
would be redundant as a builtin).
So currently we convert constants of built-in types to their respective Plutus data types via dedicated converter functions generated by the compiler. I think what Mauro proposes amounts to converting arguments of some predefined types into constants within builtins. Imagine a builtin taking a Bool
as an example, you'd have to to convert constr 0
to False
(or True
?) somehow. It would be natural to require such a converter alongside the definition of the predefined data type, but then we're at square one -- what's the point of pushing more context into builtins if you end up doing all the same things.
And the denotation of a builtin does take a Bool
or a ()
or whatever, it doesn't take some arbitrary constr 0
thing. Imagine a cryptographic function doing something different depending on a Bool
argument that it has, we will have to convert SoPs to the actual Haskell values.
Bool
is probably not too complicated though. Something like [a]
is going to be so much more complex.
So I'm not sure I understand how we'd go about getting rid of data types easily expressible in Plutus and actually do less without forbidding a predefined type to serve as the type of an argument of a builtin.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see how this follows. Built-in booleans aren't being removed, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see how this follows. Built-in booleans aren't being removed, no?
They are not. And they'll have to stay in order for us to be able to validate old-style scripts regardless. But maybe with SoP we can generate more efficient code by using native Plutus booleans instead of the built-in ones -- that's what we're discussing here.
CIP-????/README.md
Outdated
|
||
#### Why is case not strict? | ||
|
||
This makes it different to all other Plutus Core terms, but we believe it is the expected behaviour: _no_ language has strict case destructuring. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like this opens up an interesting potential meta-issue. Currently we don't support basic short-circuiting boolean operations like &&
and ||
. If somebody wanted to get those, they'd have to use a separate toolchain for generating UPLC code and that toolchain would need to elaborate an expression like x || y
to something like
force (ifThenElse x (delay True) (delay y))
which is pretty complex and is awkward to integrate into the surface language in a seamless way, particularly if you're aiming for the general case. And isn't zero-cost!
However elaborating x || y
to
case x of
True -> True
False -> y
is fast and much more convenient and begs for being included in a DSL or even a standalone language with macros or two-stage evaluation or whatever. It might become hard for us to compete with that!
Realistically though, if it becomes a problem we too can special-case &&
and ||
and it's less likely someone will need lazy semantics for other stuff. Although in the blockchain context where every bit matters it's generally a good idea to choose tools giving you more control over performance and we've seen how far people are willing to go to improve upon costs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
However elaborating x || y to is fast and much more convenient and begs for being included in a DSL or even a standalone language with macros or two-stage evaluation or whatever.
Note that even today elaborating it to the case expression will get evaluated lazily, just with more overhead. The problem is that our function calls are strict. Indeed, a macro system or something would fix that (and indeed you can easily do that today with some TH if you actually care ). I'm unsure how big a deal it is...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Giving an approval just for this to become mergable, but I'd really like to see how the proposed changes affect evaluation times of validation scripts (as opposed to the nofib
benchmarks).
UPD: ah, I don't have write access, so my approval doesn't count.
If it would be helpful, SundaeSwap is happy to add you to our Contracts repos (again?) if you want some more real world use cases to test on. |
Right yes, I was going to do that and forgot! |
I would ♡ to see how Hydra contracts benchmark is affected by this proposal: https://hydra.family/head-protocol/benchmarks/transaction-cost I am afraid this might be complicated without the ledger interface though 🤔 |
I like the idea; feels like "what Data should have been to be begin with". However, I have few concerns:
|
Sadly I don't think this enables us to get rid of anything. In particular, we can't get rid of I think
To be honest, I'm surprised that those things are confusing. They are very different things: one is Plutus Core builtin integers; one is a constructor of a specific structured data type; and one is a constructor for Plutus Core datatypes. It doesn't seem more confusing than the difference between, say, the Haskell
I think the thing that's problematically unspecified is the translation of the script context into its representation for the script. And yes, I agree that's very problematic and has been a hole for a while. Part of the reason I haven't written the new version down here is that I don't know how the old version works :( I will however say that I don't think that this CIP makes the situation any worse. So while I'd like to clear that up as part of the ledger specification effort, I would be sad if we blocked this CIP on doing that.
This doesn't seem true to me? Can you expand on your argument here? I agree it's problematic that the translation is under-specified, but I think that mostly is a problem for someone trying to produce an alternative "ledger driver" for Plutus scripts, rather than anything else? |
3cb2cab
to
9ecd61d
Compare
9ecd61d
to
d744361
Compare
It is confusing because they do live at the same level on the same execution engine. Which means that, in practice, a script can actually accept any of the UPLC term as datum or redeemer. One doesn't have to use
It doesn't indeed. But that's something I'd love to see happening. Otherwise, it doesn't arguably make the situation much better outside of the Haskell sphere. I wish Plutus and the ledger were written in two different languages, so that we could better enforce (and document) interfaces at the components' boundaries.
That's what I meant by "it only works because". "it" referring to: being able to not specify the interface between plutus and the ledger. If the components were not both written in Haskell and were not both relying on the same libraries, how to represent a ScriptContext on-chain would have required an ambiguous specification.
Or, alternative compilers targeting UPLC that want to interface with the existing ledger. The ledger will provide script context to a script in one given unspecified format. So if you're producing some code that requires this script context, it becomes tricky. Again, I am simply raising my concerns here. It would be a miss opportunity to introduce this change, entirely rework how the script context is produced by the ledger, and still not document it. |
I don't think we disagree on the facts, but I think I disagree with your implication here. Even if the components had been written in different languages, that would not necessarily have meant we had to specify it better. We could equally well have ended up with implementation-defined behaviour. The thing that makes the difference is whether or not you write down the spec, not the languages IMO.
I'm not sure it's the compiler's problem so much as a problem for the user-space libraries that support working with the script context data, but I totally agree that it's problematic. I've expanded on the acceptance criteria to explicitly call out specification of the script context translation :) |
CIP-????/README.md
Outdated
|
||
### Discussion | ||
|
||
#### Why is case not strict? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as I remember, we mostly have the builtin one because we need an eliminator for builtin booleans, not because it's necessarily fast!
It seems like a lost opportunity to not be able to get rid of Builtin types which are sum of products such as booleans. Would it be possible to have a notion of predefined datatypes (such as the sop of booleans), and allow builtin functions to return them?
don't worry @michele-nuzzi - CIP PR's can stay at |
Thanks for tagging me @michele-nuzzi! I have also read through the proposal before and had similar thoughts but didn't get around to write them down. Another thing that bugs me is that the current PlutusData implementation yields a very simple way to compare data for equivalence (equalsData) and makes analysis of structured data possible (I.e. case distinction for bytes, int, constr). This is currently not possible on the UPLC level (which in itself is slightly confusing) and would not be solved by this CIP as I see it? |
Status updateThis CIP really has two parts. The first part, adding SOPs to PLC, is I think uncontroversial. It allows significantly better code generation for those that want it and has basically no downsides. The second part is about using SOPs for the script context. This has unfortunately turned out to be more controversial. I had expected that it would be an improvement or at least not worse for everyone. However, there are a few real cases where it makes things worse. The most serious to my eye is the loss of the way to rapidly compare bits of the script context for equality and hash them. This was something of a "coincidental" feature, but the speed advantages are significant. Annoyingly, there is a very stark tradeoff, since for other functions that work with structured data, using SOPs is much faster. The ideal solution would be to provide a way to do these operations with SOPs. However, I haven't been able to find a way to do that. We can't even write a homogeneous equality function in untyped PLC because we (deliberately) don't have a way to do "type case" in UPLC. Perhaps a builtin could do it, but it would be wildly different from any builtin we have so far, and very difficult to cost. So we are on the horns of a dilemma: we can either forgo fast equality and hashing to get a significant speedup across the board; or forgo that speedup to keep our special cases. So where to go from here? My inclination is to split the CIP into two: SOPs, and script-context-as-SOP. Most of the technical work is in implementing SOPs, and I have been busy getting that in shape for the next HF. I think we can just accept the SOPs CIP. The script-context-as-SOP CIP is harder. At least it will need benchmarking of some realistic examples using the builtins that would be lost, e.g. the Hydra validator (other examples welcome!). I don't know what to do about a proposal that has significant costs as well as significant benefits. Probably we should not accept it until we have a way to avoid the costs, which requires some more ideas. I'd like some guidance from the editors (cc @rphair @KtorZ ) on what course of action they recommend. Comment responses
Plutus Tx scripts don't need to do it either, they can also just take a I think I need to rephrase the argument a bit, though: decoding the script context is not just a weird Plutus Tx quirk. Using a representation other than
I think this is a misunderstanding. Unlifting is not about converting
They are very different. One of them is native to the language, one of them is a builtin type with a whole suite of separate functions. They're not really very easy to confuse (in my biased opinion). To use your own analogy, you don't find JSON and Javascript classes confusing, even though they're both ways of representing structured data.
(covered the equalsData stuff above, I think) The ability to distinguish different terms in that way actually adds significant power to the language (it gives us a kind of "type case"), and makes the metatheory more complicated. So its omission is deliberate. |
To see how type-case can be problematic, suppose we had the following function:
Now I can write this:
which is pretty surprising and breaks parametricity.
|
Possible "fix": Remove ScriptContext entirely. I have never been satisfied with the current integration between the scripts and the ledger. |
Could I get a link to the branch that the code is on? Curious on vm changes so I can add it to Aiken. Edit: nvm I think it's mpj/products-non-binding. I see how the vm handles the new terms now. |
@michaelpj I think you make a good argument for splitting the CIP so we can accept the less disputable one straightaway. The technical considerations are beyond me so I'm recommending this on the general principle that it would give the community a solid point of reference while continuing to work on & debate the uncertain part. It also sounds like forking the proposal would yield one CIP satisfying @KtorZ in both developer & editor hats as per #455 (comment). 🤠 In the meantime taking this out of |
@rphair should I make two new PRs for that? Or I could reuse this PR for one of them. Either way I'll link to this as a place where discussions have happened. |
thanks for asking me @michaelpj although @KtorZ @Ryun1 @SebastienGllmt might also indicate their preference. Since this PR is where the discussion started, I think it's logical to use ("reuse") it as the basis for the uncontested portion, and create one new PR for the delta. Then this "uncontested" portion could be the baseline for the 2nd PR which would then be an update to a |
I think I'll just make it a completely separate CIP. It is logically separate after all. |
Pushed to just be SOPs, I think this can go back into final check, it was almost entirely deleting content. I realised I had the folder name wrong, sorry, so the comments have got a bit detached now, please squash merge this when you do! CIP for the script context stuff coming soon. |
Here it is: #507 |
Since we are splitting the CIP so that CIP 85 only implies the introduction of SOPs in UPLC and the As stated before I'm against the introduction of the possibility of representing structured data in multiple ways as it is much more error-prone than how working with raw data will ever be. An alternative solution, if we really want this new SOP representation then the I do like the idea of an ad-hoc construct for matching SOPs but I don't think is enough to justify the introduction of SOPs. An alternative solution, (assuming it is really needed, even if I'm sure every language provides support for a similar control flow) would be the introduction of a built-in function to select a given continuation based on an index. Here I call it it would take
It would work by using the constructor index as the index of the list of continuations to which apply the list of data used as fields. Even if messy I would still prefer it to have both SOPs and Data representations. |
I do share Michele's opinion here as also stated previously. I see no reason for I don't know (m)any people who write UPLC directly; pretty much everyone use a higher-level abstraction framework that generates UPLC code for them. Might it be PlutusTx or Aiken (or [choose-your-weapon]). I think that's a fair thing to ask compiler / framework developers to take care of. Given how the ecosystem has already developed around the use of Data in the transaction building process, it's also a good idea to keep the Data representation at this level; since any modification impacts literally everyone there. Yet, removing Having said that, I am more than happy with the CIP from an editor standpoint and I'll give it a (well-deserved!) checkmark ✔️. I'd like also now to deflect any further discussions regarding |
This exists (well, not the continuation version since builtins can't call functions) in the form of
There are a few reasons:
That is to say: I don't think we would introduce it now if we didn't have it already, but it's hard to remove when people rely on it. Maybe we can still do it, but we'll have to at least get #507 over the line first. |
I'm confident that none of the "alternative languages" is relying on The implementation assumes the passed data is The proposed implementation is "unsafe" from the typing perspective but that shouldn't bother us otherwise what's the point of a typed IR form of UPLC in the standard compilation process? I would be surprised if any of the other implementations would use something too different than the implementation based on continuations.
I'm all in for performances but not if it means introducing the possibility of unsafe contracts. In my opinion, some more data from contracts that are the results of other compilation processes would help the decision of introducing SOPs or not. In the event the additional data shows an actual improvement in performance, I'd love to support this proposal under the condition that it doesn't introduce additional ambiguity (aka. only SOPs, removing Data). |
I like SOP for its fast destructuring, but the problem is the blocker on missing out equals data and serialise_data. As a different compromise how about a new builtin that is similar to case. Say case_constr. Takes in a constant of type data that is a constr and any term. Then it applies all field arguments from constr plus the constr index to that term. Then that single term can handle constr index and whatever fields it has applies. Now I know this step is computationally more expensive. But since we have a lot of builtins that operate on data, I wonder if operating on data in a more efficient way might be the solution. Even converting script context to SOP does not change that we still need better ways to operate on data because of datum, redeemer, and script context still has data fields in datums, inputs, outputs, and redeemers fields. *on phone so I'll clean up this comment in a bit. |
If I understand correctly something like
|
Not quite way less code. I'll post the example in the new CIP PR since this was merged Edit: Posted in #507 |
* Sums-of-products * Address some comments * Address more comments * More on benchmarking * More acceptance criteria * Address more comments * More work * More * Update metadata * More benchmarks * More comments * Rename folder * Rework to just be SOPs
* Sums-of-products * Address some comments * Address more comments * More on benchmarking * More acceptance criteria * Address more comments * More work * More * Update metadata * More benchmarks * More comments * Rename folder * Rework to just be SOPs
This CIP proposes a substantial extension to Plutus Core to support sums-of-products, with the aim of realising some substantial performance improvements.
(rendered proposal in branch)