Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CIP-0085? | Sums-of-products in Plutus Core #455

Merged
merged 13 commits into from
May 2, 2023

Conversation

michaelpj
Copy link
Contributor

@michaelpj michaelpj commented Jan 30, 2023

This CIP proposes a substantial extension to Plutus Core to support sums-of-products, with the aim of realising some substantial performance improvements.


(rendered proposal in branch)

Copy link
Collaborator

@rphair rphair left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm approving this mainly (and immediately) because any disputes or changes to this would be well above my level. The explanations are good enough that even at a dumb level I feel like I can see why this is important & worth the "costs" that are admitted.

The Path to Active suggests this is fait accompli, and although that in itself shouldn't be grounds for approving the CIP, the proposal establishes that the team is going ahead with this after having considered & tested enough alternatives & confirming that it's the best way forward.

Since it mentions interfacing with the Ledger I also want to ask @JaredCorduan if he & any others in that area could question & comment on the Ledger aspect of the proposal. 🤓

CIP-????/README.md Outdated Show resolved Hide resolved
CIP-????/README.md Outdated Show resolved Hide resolved
CIP-????/README.md Outdated Show resolved Hide resolved
@rphair rphair changed the title Sums-of-products in Plutus Core CIP-???? | Sums-of-products in Plutus Core Jan 30, 2023
@michaelpj
Copy link
Contributor Author

The Path to Active suggests this is fait accompli, and although that in itself shouldn't be grounds for approving the CIP, the proposal establishes that the team is going ahead with this after having considered & tested enough alternatives & confirming that it's the best way forward.

That's not quite what I meant to convey! I wouldn't say it's a fait accompli - we have a prototype, but we had to have a prototype in order to know whether it was a good idea or not! We have yet to turn it into a production implementation, and it may be that we'll get input from other people that makes us reconsider things.

This is of course complicated in that it's a CIP by the currently listed reviewer of the Plutus CIPs i.e. me 😅 So yeah, it's the case that I currently think we should accept it.

Also, I guess I could have written the initial version of this CIP when we first had the idea, and then iterated from there. That would have been more open (there were at least two private design docs that preceded this), but I didn't want to do a proper proposal until I had a high-quality argument, which required a lot of performance work in the background (e.g. I actually spent a long time on multi-lambdas before eventually discarding them).

@JaredCorduan
Copy link
Contributor

JaredCorduan commented Jan 30, 2023

Since it mentions interfacing with the Ledger I also want to ask @JaredCorduan if he & any others in that area could question & comment on that aspect of the proposal.

Interesting details will probably emerge when we go to actually implement this, but as of right now I think that I understand the implications to ledger and I think that it looks like a great idea.

Copy link
Contributor

@ch1bo ch1bo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very interesting proposal, some things are a bit over my level of understanding as well (operational semantics). So my first question is only on how other usages of Data will be affected?

CIP-????/README.md Outdated Show resolved Hide resolved
@rvcas
Copy link

rvcas commented Jan 30, 2023

Yes please

CIP-????/README.md Outdated Show resolved Hide resolved
CIP-????/README.md Outdated Show resolved Hide resolved
CIP-????/README.md Outdated Show resolved Hide resolved
CIP-????/README.md Outdated Show resolved Hide resolved
CIP-????/README.md Outdated Show resolved Hide resolved

### Discussion

#### Why is case not strict?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this allow us to make a faster ifThenElse than the builtins-based one?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't be surprised if the Scott-encoded boolean case expression was already faster than the builtin version. As far as I remember, we mostly have the builtin one because we need an eliminator for builtin booleans, not because it's necessarily fast!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't be surprised if the Scott-encoded boolean case expression was already faster than the builtin version.

Sure, but that one has to go through the delay/force hoops, while case-expressions give us enough leeway to bypass that.

As far as I remember, we mostly have the builtin one because we need an eliminator for builtin booleans, not because it's necessarily fast!

Right... So do we normally use Scott-encoded booleans and only resort to built-in ones when dealing with builtins, which we then/immediately convert to Scott-encoded ones? Can't remember. If that's the case, then we should benefit a little from not Scott-encoding booleans (which I assume we'll get to automatically just by giving up on Scott-encoding in favor of native data types?) and not introducing any ifThenElse, but instead inlining the case at each call site, so that we can remove delay/force noise.

Which makes me wonder if there's a lot more pattern matching that we'd better inline during compilation instead of abstracting it out into a matcher.

Or am I saying nonsense?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, but that one has to go through the delay/force hoops, while case-expressions give us enough leeway to bypass that.

Annoyingly, I think we'll still have to put that in. The problem is that the moment you abstract over the case branches (which we do in the compiler), you get strictness again 🤷 We can probably optimize a lot of it away, though.

Right... So do we normally use Scott-encoded booleans and only resort to built-in ones when dealing with builtins, which we then/immediately convert to Scott-encoded ones?

Yes.

which I assume we'll get to automatically just by giving up on Scott-encoding in favor of native data types?

Yes.

Which makes me wonder if there's a lot more pattern matching that we'd better inline during compilation instead of abstracting it out into a matcher.

We do inline it, but we can't do it until UPLC for annoying type abstraction reasons. We should write some test cases to check that things get as optimized as we like. I have some, but possibly not for this exact case.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I remember, we mostly have the builtin one because we need an eliminator for builtin booleans, not because it's necessarily fast!

It seems like a lost opportunity to not be able to get rid of Builtin types which are sum of products such as booleans. Would it be possible to have a notion of predefined datatypes (such as the sop of booleans), and allow builtin functions to return them?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the current builtins it seems it would be enough to return a Plutus Core term. Even a Plutus Core value seems enough. All the current (and future) builtins that deconstruct a value could be done with a case. The costing of polymorphic functions should stay the same, as long as we require certain reasonable sanity on future built-ins. But yeah, it is a big change...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

current builtins

Uh huh, and how about future builtins? Will we allow builtins to case on SOPs or not? We'll have to decide when someone asks. If we don't, we're inconsistent. Making these kind of incrementally-sensible-but-globally-incoherent decisions is how we ended up with the current builtins :D

Copy link
Contributor

@effectfully effectfully Feb 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will we allow builtins to case on SOPs or not?

If we want to keep current builtins working, then we'd need casing on a SOP in IfThenElse (even though technically IfThenElse would be redundant as a builtin).

So currently we convert constants of built-in types to their respective Plutus data types via dedicated converter functions generated by the compiler. I think what Mauro proposes amounts to converting arguments of some predefined types into constants within builtins. Imagine a builtin taking a Bool as an example, you'd have to to convert constr 0 to False (or True?) somehow. It would be natural to require such a converter alongside the definition of the predefined data type, but then we're at square one -- what's the point of pushing more context into builtins if you end up doing all the same things.

And the denotation of a builtin does take a Bool or a () or whatever, it doesn't take some arbitrary constr 0 thing. Imagine a cryptographic function doing something different depending on a Bool argument that it has, we will have to convert SoPs to the actual Haskell values.

Bool is probably not too complicated though. Something like [a] is going to be so much more complex.

So I'm not sure I understand how we'd go about getting rid of data types easily expressible in Plutus and actually do less without forbidding a predefined type to serve as the type of an argument of a builtin.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see how this follows. Built-in booleans aren't being removed, no?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see how this follows. Built-in booleans aren't being removed, no?

They are not. And they'll have to stay in order for us to be able to validate old-style scripts regardless. But maybe with SoP we can generate more efficient code by using native Plutus booleans instead of the built-in ones -- that's what we're discussing here.

CIP-????/README.md Outdated Show resolved Hide resolved

#### Why is case not strict?

This makes it different to all other Plutus Core terms, but we believe it is the expected behaviour: _no_ language has strict case destructuring.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like this opens up an interesting potential meta-issue. Currently we don't support basic short-circuiting boolean operations like && and ||. If somebody wanted to get those, they'd have to use a separate toolchain for generating UPLC code and that toolchain would need to elaborate an expression like x || y to something like

force (ifThenElse x (delay True) (delay y))

which is pretty complex and is awkward to integrate into the surface language in a seamless way, particularly if you're aiming for the general case. And isn't zero-cost!

However elaborating x || y to

case x of
  True  -> True
  False -> y

is fast and much more convenient and begs for being included in a DSL or even a standalone language with macros or two-stage evaluation or whatever. It might become hard for us to compete with that!

Realistically though, if it becomes a problem we too can special-case && and || and it's less likely someone will need lazy semantics for other stuff. Although in the blockchain context where every bit matters it's generally a good idea to choose tools giving you more control over performance and we've seen how far people are willing to go to improve upon costs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However elaborating x || y to is fast and much more convenient and begs for being included in a DSL or even a standalone language with macros or two-stage evaluation or whatever.

Note that even today elaborating it to the case expression will get evaluated lazily, just with more overhead. The problem is that our function calls are strict. Indeed, a macro system or something would fix that (and indeed you can easily do that today with some TH if you actually care :trollface: ). I'm unsure how big a deal it is...

CIP-????/README.md Outdated Show resolved Hide resolved
CIP-????/README.md Outdated Show resolved Hide resolved
Copy link
Contributor

@effectfully effectfully left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Giving an approval just for this to become mergable, but I'd really like to see how the proposed changes affect evaluation times of validation scripts (as opposed to the nofib benchmarks).

UPD: ah, I don't have write access, so my approval doesn't count.

@Quantumplation
Copy link
Contributor

If it would be helpful, SundaeSwap is happy to add you to our Contracts repos (again?) if you want some more real world use cases to test on.

@michaelpj
Copy link
Contributor Author

I'd really like to see how the proposed changes affect evaluation times of validation scripts

Right yes, I was going to do that and forgot!

CIP-????/README.md Outdated Show resolved Hide resolved
CIP-????/README.md Outdated Show resolved Hide resolved
@ghost
Copy link

ghost commented Jan 30, 2023

I would ♡ to see how Hydra contracts benchmark is affected by this proposal: https://hydra.family/head-protocol/benchmarks/transaction-cost
/cc @ch1bo

I am afraid this might be complicated without the ledger interface though 🤔

CIP-????/README.md Outdated Show resolved Hide resolved
CIP-????/README.md Outdated Show resolved Hide resolved
CIP-????/README.md Outdated Show resolved Hide resolved
CIP-????/README.md Outdated Show resolved Hide resolved
@KtorZ
Copy link
Member

KtorZ commented Feb 1, 2023

I like the idea; feels like "what Data should have been to be begin with". However, I have few concerns:

  • The proposal only talks about adding new term constructors but say nothing about removing stuff and uncluttering the core language. I find the current Plutus Core language not so easy to work with and quite confusing for builders that seek on building tools. While the introduction of new terms is great, it also adds more rooms for errors. Now we have 3 ways to represent an integer (as a raw UPLC primitive, as a wrapped data I, or as a wrapped constr). If Data was initially introduced as a work-around in the early days of Plutus - as suggested in the proposal:

    The solution that we chose was to pick a single generic structured data type which would be used for data interchange between the ledger and scripts. This is the 'Data' type.

    Then the introduction of constr and case make Data (and all its associated builtins, ChooseData, IData, ConstrData, unConstrData etc...) redundant. Since this would already require a new version of Plutus, what about also "cleaning up" the language on the same occasion?

  • The proposal suggests that this change would be adopted by the ledger as well. Which raises a second concern regarding interoperability. At the moment, the on-chain format used between the Plutus VM and ledger is unspecified. One has to reverse-engineer Haskell code and rewind type-class instances to figure out how is the ledger constructing a script context. Some things even simply rely on conventions from Haskell's library (like the ordering of keys in a map) which I find (a) not practical to work with and (b) potentially harmful. So, if anything is done in the ledger to adopt the sums-of-products, may I strongly suggest to document this interface in a language-agnostic way (i.e. not Haskell). This is fundamentally a data-serialization format. (Note that this is part of what we're trying to achieve with CIP-0057). Right now it only works because the ledger and plutus happen to both be written in Haskell and rely on the same underlying libraries. I argue that there should be a clearer separation between this two components.

@michaelpj
Copy link
Contributor Author

@KtorZ

The proposal only talks about adding new term constructors but say nothing about removing stuff and uncluttering the core language.

Then the introduction of constr and case make Data (and all its associated builtins, ChooseData, IData, ConstrData, unConstrData etc...) redundant.

Sadly I don't think this enables us to get rid of anything. In particular, we can't get rid of Data because we still want it for datums and redeemers. See my comment above, I'll expand on this in the text.

I think Data is the only thing that this proposal even comes close to replacing.

Now we have 3 ways to represent an integer (as a raw UPLC primitive, as a wrapped data I, or as a wrapped constr).

To be honest, I'm surprised that those things are confusing. They are very different things: one is Plutus Core builtin integers; one is a constructor of a specific structured data type; and one is a constructor for Plutus Core datatypes. It doesn't seem more confusing than the difference between, say, the Haskell Integer and the TInteger constructor of the cborg Term type (using that example since Data was originally designed as a quasi-subset of CBOR).

The proposal suggests that this change would be adopted by the ledger as well. Which raises a second concern regarding interoperability. At the moment, the on-chain format used between the Plutus VM and ledger is unspecified.

I think the thing that's problematically unspecified is the translation of the script context into its representation for the script. And yes, I agree that's very problematic and has been a hole for a while. Part of the reason I haven't written the new version down here is that I don't know how the old version works :(

I will however say that I don't think that this CIP makes the situation any worse. So while I'd like to clear that up as part of the ledger specification effort, I would be sad if we blocked this CIP on doing that.

Right now it only works because the ledger and plutus happen to both be written in Haskell and rely on the same underlying libraries.

This doesn't seem true to me? Can you expand on your argument here? I agree it's problematic that the translation is under-specified, but I think that mostly is a problem for someone trying to produce an alternative "ledger driver" for Plutus scripts, rather than anything else?

@michaelpj michaelpj force-pushed the mpj/sums-of-products branch from 3cb2cab to 9ecd61d Compare February 1, 2023 15:56
@michaelpj michaelpj force-pushed the mpj/sums-of-products branch from 9ecd61d to d744361 Compare February 1, 2023 15:57
@KtorZ
Copy link
Member

KtorZ commented Feb 1, 2023

@michaelpj To be honest, I'm surprised that those things are confusing. They are very different things: one is Plutus Core builtin integers; one is a constructor of a specific structured data type; and one is a constructor for Plutus Core datatypes. It doesn't seem more confusing than the difference between, say, the Haskell Integer and the TInteger constructor of the cborg Term type (using that example since Data was originally designed as a quasi-subset of CBOR).

It is confusing because they do live at the same level on the same execution engine. Which means that, in practice, a script can actually accept any of the UPLC term as datum or redeemer. One doesn't have to use Data for that, and it just happens that the Haskell codebase only generates values (through ToData) that wraps terms as UPLC's Data. It's easy to violate the abstraction outside of the Haskell realm -- purposely or not.

@michaelpj I will however say that I don't think that this CIP makes the situation any worse. So while I'd like to clear that up as part of the ledger specification effort, I would be sad if we blocked this CIP on doing that.

It doesn't indeed. But that's something I'd love to see happening. Otherwise, it doesn't arguably make the situation much better outside of the Haskell sphere. I wish Plutus and the ledger were written in two different languages, so that we could better enforce (and document) interfaces at the components' boundaries.

@michaelpj This doesn't seem true to me? Can you expand on your argument here? I agree it's problematic that the translation is under-specified.

That's what I meant by "it only works because". "it" referring to: being able to not specify the interface between plutus and the ledger. If the components were not both written in Haskell and were not both relying on the same libraries, how to represent a ScriptContext on-chain would have required an ambiguous specification.

@michaelpj but I think that mostly is a problem for someone trying to produce an alternative "ledger driver" for Plutus scripts, rather than anything else?

Or, alternative compilers targeting UPLC that want to interface with the existing ledger. The ledger will provide script context to a script in one given unspecified format. So if you're producing some code that requires this script context, it becomes tricky. Again, I am simply raising my concerns here. It would be a miss opportunity to introduce this change, entirely rework how the script context is produced by the ledger, and still not document it.

@michaelpj
Copy link
Contributor Author

I wish Plutus and the ledger were written in two different languages, so that we could better enforce (and document) interfaces at the components' boundaries.

If the components were not both written in Haskell and were not both relying on the same libraries, how to represent a ScriptContext on-chain would have required an ambiguous specification.

I don't think we disagree on the facts, but I think I disagree with your implication here. Even if the components had been written in different languages, that would not necessarily have meant we had to specify it better. We could equally well have ended up with implementation-defined behaviour. The thing that makes the difference is whether or not you write down the spec, not the languages IMO.

Or, alternative compilers targeting UPLC that want to interface with the existing ledger. The ledger will provide script context to a script in one given unspecified format. So if you're producing some code that requires this script context, it becomes tricky.

I'm not sure it's the compiler's problem so much as a problem for the user-space libraries that support working with the script context data, but I totally agree that it's problematic.

I've expanded on the acceptance criteria to explicitly call out specification of the script context translation :)

CIP-????/README.md Outdated Show resolved Hide resolved
CIP-????/README.md Outdated Show resolved Hide resolved

### Discussion

#### Why is case not strict?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I remember, we mostly have the builtin one because we need an eliminator for builtin booleans, not because it's necessarily fast!

It seems like a lost opportunity to not be able to get rid of Builtin types which are sum of products such as booleans. Would it be possible to have a notion of predefined datatypes (such as the sop of booleans), and allow builtin functions to return them?

@rphair
Copy link
Collaborator

rphair commented Apr 7, 2023

don't worry @michele-nuzzi - CIP PR's can stay at Last Check for a while... until we're sure both authors & community are happy as it stands and/or that it represents a usable alternative that can be aimed for in acceptance & implementation.

@nielstron
Copy link
Contributor

Thanks for tagging me @michele-nuzzi! I have also read through the proposal before and had similar thoughts but didn't get around to write them down.

Another thing that bugs me is that the current PlutusData implementation yields a very simple way to compare data for equivalence (equalsData) and makes analysis of structured data possible (I.e. case distinction for bytes, int, constr). This is currently not possible on the UPLC level (which in itself is slightly confusing) and would not be solved by this CIP as I see it?

@michaelpj
Copy link
Contributor Author

Status update

This CIP really has two parts. The first part, adding SOPs to PLC, is I think uncontroversial. It allows significantly better code generation for those that want it and has basically no downsides.

The second part is about using SOPs for the script context. This has unfortunately turned out to be more controversial. I had expected that it would be an improvement or at least not worse for everyone. However, there are a few real cases where it makes things worse.

The most serious to my eye is the loss of the way to rapidly compare bits of the script context for equality and hash them. This was something of a "coincidental" feature, but the speed advantages are significant. Annoyingly, there is a very stark tradeoff, since for other functions that work with structured data, using SOPs is much faster.

The ideal solution would be to provide a way to do these operations with SOPs. However, I haven't been able to find a way to do that. We can't even write a homogeneous equality function in untyped PLC because we (deliberately) don't have a way to do "type case" in UPLC. Perhaps a builtin could do it, but it would be wildly different from any builtin we have so far, and very difficult to cost.

So we are on the horns of a dilemma: we can either forgo fast equality and hashing to get a significant speedup across the board; or forgo that speedup to keep our special cases.

So where to go from here? My inclination is to split the CIP into two: SOPs, and script-context-as-SOP. Most of the technical work is in implementing SOPs, and I have been busy getting that in shape for the next HF. I think we can just accept the SOPs CIP.

The script-context-as-SOP CIP is harder. At least it will need benchmarking of some realistic examples using the builtins that would be lost, e.g. the Hydra validator (other examples welcome!). I don't know what to do about a proposal that has significant costs as well as significant benefits. Probably we should not accept it until we have a way to avoid the costs, which requires some more ideas.

I'd like some guidance from the editors (cc @rphair @KtorZ ) on what course of action they recommend.

Comment responses

I would like to precise that only PlutusTX scripts keep decoding entire script contexts;

in my opinion, this is a non-issue since plutarch came out and any other language (plu-ts,Aiken,OpShin,Helios etx.) does not decode the script context other than the fields needed.

Plutus Tx scripts don't need to do it either, they can also just take a Data argument. This is a bit of a myth.

I think I need to rephrase the argument a bit, though: decoding the script context is not just a weird Plutus Tx quirk. Using a representation other than Data is much faster for operations in general (see benchmarks in this CIP, for example). Going through the builtin system multiple times for every pattern match is terrible. So regardless of how you produce your PLC script, you're going to have a faster script if you can process the script context as something other than Data... the problem is just getting it there.

unlifting is not more complicated than reading a json

I think this is a misunderstanding. Unlifting is not about converting Data to anything, it's about interpreting general PLC terms. I'll try and clarify the text to explain the problem more clearly.

I dislike the idea of supporting multiple ways to represent structured data.

They are very different. One of them is native to the language, one of them is a builtin type with a whole suite of separate functions. They're not really very easy to confuse (in my biased opinion). To use your own analogy, you don't find JSON and Javascript classes confusing, even though they're both ways of representing structured data.

Another thing that bugs me is that the current PlutusData implementation yields a very simple way to compare data for equivalence (equalsData) and makes analysis of structured data possible (I.e. case distinction for bytes, int, constr). This is currently not possible on the UPLC level (which in itself is slightly confusing)

(covered the equalsData stuff above, I think)

The ability to distinguish different terms in that way actually adds significant power to the language (it gives us a kind of "type case"), and makes the metatheory more complicated. So its omission is deliberate.

@michaelpj
Copy link
Contributor Author

To see how type-case can be problematic, suppose we had the following function:

isIntegerConstant :: forall a . a -> Bool

Now I can write this:

evilIdentity :: forall a . a -> a
evilIdentity x = if isIntegerConstant x then traceError "haha, no integers!" else x

which is pretty surprising and breaks parametricity.

@L-as
Copy link
Contributor

L-as commented Apr 11, 2023

Possible "fix":

Remove ScriptContext entirely. I have never been satisfied with the current integration between the scripts and the ledger.
Scripts have too much power, the ability to inspect anything is not good. This is also what causes the double satisfaction attack.
Taking this into account, if scripts instead returned a list of constraints, we wouldn't have to worry about which format ScriptContext is in at all. The constraints would of course use SoPs, as there isn't any need to check equalities of constraints.

@MicroProofs
Copy link
Contributor

MicroProofs commented Apr 11, 2023

Could I get a link to the branch that the code is on? Curious on vm changes so I can add it to Aiken.

Edit: nvm I think it's mpj/products-non-binding.

I see how the vm handles the new terms now.

@rphair
Copy link
Collaborator

rphair commented Apr 11, 2023

I'd like some guidance from the editors (cc @rphair @KtorZ ) on what course of action they recommend.

@michaelpj I think you make a good argument for splitting the CIP so we can accept the less disputable one straightaway. The technical considerations are beyond me so I'm recommending this on the general principle that it would give the community a solid point of reference while continuing to work on & debate the uncertain part.

It also sounds like forking the proposal would yield one CIP satisfying @KtorZ in both developer & editor hats as per #455 (comment). 🤠 In the meantime taking this out of Last Check at least until such a split is fully considered.

@rphair rphair removed the State: Last Check Review favourable with disputes resolved; staged for merging. label Apr 11, 2023
@michaelpj
Copy link
Contributor Author

@rphair should I make two new PRs for that? Or I could reuse this PR for one of them. Either way I'll link to this as a place where discussions have happened.

@rphair
Copy link
Collaborator

rphair commented Apr 17, 2023

thanks for asking me @michaelpj although @KtorZ @Ryun1 @SebastienGllmt might also indicate their preference.

Since this PR is where the discussion started, I think it's logical to use ("reuse") it as the basis for the uncontested portion, and create one new PR for the delta. Then this "uncontested" portion could be the baseline for the 2nd PR which would then be an update to a CIP-0085 merged in the near future.

@michaelpj
Copy link
Contributor Author

Then this "uncontested" portion could be the baseline for the 2nd PR which would then be an update to a CIP-0085 merged in the near future.

I think I'll just make it a completely separate CIP. It is logically separate after all.

@michaelpj
Copy link
Contributor Author

Pushed to just be SOPs, I think this can go back into final check, it was almost entirely deleting content.

I realised I had the folder name wrong, sorry, so the comments have got a bit detached now, please squash merge this when you do!

CIP for the script context stuff coming soon.

@rphair rphair added the State: Last Check Review favourable with disputes resolved; staged for merging. label Apr 20, 2023
@michaelpj
Copy link
Contributor Author

Here it is: #507

@michele-nuzzi
Copy link

Since we are splitting the CIP so that CIP 85 only implies the introduction of SOPs in UPLC and the case construct I would still propose an alternative solution.

As stated before I'm against the introduction of the possibility of representing structured data in multiple ways as it is much more error-prone than how working with raw data will ever be.

An alternative solution, if we really want this new SOP representation then the Data representation should not be present at all.

I do like the idea of an ad-hoc construct for matching SOPs but I don't think is enough to justify the introduction of SOPs.

An alternative solution, (assuming it is really needed, even if I'm sure every language provides support for a similar control flow) would be the introduction of a built-in function to select a given continuation based on an index.

Here I call it selector for lack of creativity.

it would take data as the first input and a list of continuations as the second (a list of functions, or lists of delayed terms should be integrated)

note I'm using "list" here but the builtin could also be used with an indefinite number of applications just like the case construct described in the proposal would work

It would work by using the constructor index as the index of the list of continuations to which apply the list of data used as fields.

Even if messy I would still prefer it to have both SOPs and Data representations.

@KtorZ
Copy link
Member

KtorZ commented May 2, 2023

@michele-nuzzi
As stated before I'm against the introduction of the possibility of representing structured data in multiple ways as it is much more error-prone than how working with raw data will ever be.

An alternative solution, if we really want this new SOP representation then the Data representation should not be present at all.

I do share Michele's opinion here as also stated previously. I see no reason for Data to still exists once we have term-level constructors for SOP? So far, the only argument I've really seen from @michaelpj for the use of Data regards the interfacing between on-chain and the outside world. Although, @michaelpj, I don't buy the argument that using a different representation on-chain than the one "in the transaction" makes it harder for script authors; It makes it (possibly) harder for tool / compiler writers only.

I don't know (m)any people who write UPLC directly; pretty much everyone use a higher-level abstraction framework that generates UPLC code for them. Might it be PlutusTx or Aiken (or [choose-your-weapon]). I think that's a fair thing to ask compiler / framework developers to take care of. Given how the ecosystem has already developed around the use of Data in the transaction building process, it's also a good idea to keep the Data representation at this level; since any modification impacts literally everyone there. Yet, removing Data from the Plutus VM entirely and having various compiler implementors be aware of that Data -> SOP transformation sounds totally reasonable. I think we can even all fit in a room if needs be 😊


Having said that, I am more than happy with the CIP from an editor standpoint and I'll give it a (well-deserved!) checkmark ✔️. I'd like also now to deflect any further discussions regarding Data vs SOP to #507 as these goes together. We can't "remove Data" while it's still used to encode the script context.

@rphair
Copy link
Collaborator

rphair commented May 2, 2023

@KtorZ sounds good... I'd been planning to elbow a "last check" on this PR into the agenda with #507 but it's good to see you've put it on Last Check officially in the agenda.

@rphair rphair merged commit e635f7d into cardano-foundation:master May 2, 2023
@rphair rphair removed the State: Last Check Review favourable with disputes resolved; staged for merging. label May 2, 2023
@michaelpj
Copy link
Contributor Author

An alternative solution, (assuming it is really needed, even if I'm sure every language provides support for a similar control flow) would be the introduction of a built-in function to select a given continuation based on an index.

This exists (well, not the continuation version since builtins can't call functions) in the form of chooseData. You can implement a version with continuations on top of this easily, there's one in the plutus-tx library. But it's slow. One of the primary reasons that we're doing this is performance - I feel like this is the thing that you are missing in this discussion, @michele-nuzzi .

I do share Michele's opinion here as also stated previously. I see no reason for Data to still exists once we have term-level constructors for SOP?

There are a few reasons:

  1. It requires CIP-0097? | Representing the Script Context as a SOP term #507, which has IMO blocking problems because it removes some fast operations that people rely on.
  2. You lose the ability to "type-case", which you can do on Data.
  3. The argument about it matching more closely with the external world.

That is to say: I don't think we would introduce it now if we didn't have it already, but it's hard to remove when people rely on it. Maybe we can still do it, but we'll have to at least get #507 over the line first.

@michele-nuzzi
Copy link

This exists (well, not the continuation version since builtins can't call functions) in the form of chooseData. You can implement a version with continuations on top of this easily, there's one in the plutus-tx library. But it's slow. One of the primary reasons that we're doing this is performance - I feel like this is the thing that you are missing in this discussion, @michele-nuzzi .

I'm confident that none of the "alternative languages" is relying on chooseData exactly because of its performance. I can talk for plu-ts and I know for sure this isn't the case as shown before in this discussion pointing to the plu-ts implementation.

The implementation assumes the passed data is Constr and automatically destructures it; checks the index and passes the list of data to the continuation.

The proposed implementation is "unsafe" from the typing perspective but that shouldn't bother us otherwise what's the point of a typed IR form of UPLC in the standard compilation process?

I would be surprised if any of the other implementations would use something too different than the implementation based on continuations.

One of the primary reasons that we're doing this is performance

I'm all in for performances but not if it means introducing the possibility of unsafe contracts.

In my opinion, some more data from contracts that are the results of other compilation processes would help the decision of introducing SOPs or not.

In the event the additional data shows an actual improvement in performance, I'd love to support this proposal under the condition that it doesn't introduce additional ambiguity (aka. only SOPs, removing Data).

@MicroProofs
Copy link
Contributor

MicroProofs commented May 2, 2023

I like SOP for its fast destructuring, but the problem is the blocker on missing out equals data and serialise_data. As a different compromise how about a new builtin that is similar to case. Say case_constr.

Takes in a constant of type data that is a constr and any term. Then it applies all field arguments from constr plus the constr index to that term. Then that single term can handle constr index and whatever fields it has applies.

Now I know this step is computationally more expensive. But since we have a lot of builtins that operate on data, I wonder if operating on data in a more efficient way might be the solution. Even converting script context to SOP does not change that we still need better ways to operate on data because of datum, redeemer, and script context still has data fields in datums, inputs, outputs, and redeemers fields.

*on phone so I'll clean up this comment in a bit.

@michele-nuzzi
Copy link

Takes in a constant of type data that is a constr and any term. Then it applies all field arguments from constr plus the constr index to that term. Then that single term can handle constr index and whatever fields it has applies.

If I understand correctly something like case_constr would be something like

(lam someConstr
  (lam someContinuation
    [
      (lam unConstrPair
        [
          [
            someContinuation
            [(force (force (builtin fstPair))) unConstrPair]
          ]
          [(force (force (builtin sndPair))) unConstrPair]
        ]
      )
      [(builtin unConstrData) someConstr]
    ]
  )
)

@MicroProofs
Copy link
Contributor

MicroProofs commented May 2, 2023

Takes in a constant of type data that is a constr and any term. Then it applies all field arguments from constr plus the constr index to that term. Then that single term can handle constr index and whatever fields it has applies.

If I understand correctly something like case_constr would be something like

(lam someConstr
  (lam someContinuation
    [
      (lam unConstrPair
        [
          [
            someContinuation
            [(force (force (builtin fstPair))) unConstrPair]
          ]
          [(force (force (builtin sndPair))) unConstrPair]
        ]
      )
      [(builtin unConstrData) someConstr]
    ]
  )
)

Not quite way less code. I'll post the example in the new CIP PR since this was merged

Edit: Posted in #507

Ryun1 pushed a commit to Ryun1/CIPs that referenced this pull request Jul 28, 2023
* Sums-of-products

* Address some comments

* Address more comments

* More on benchmarking

* More acceptance criteria

* Address more comments

* More work

* More

* Update metadata

* More benchmarks

* More comments

* Rename folder

* Rework to just be SOPs
Ryun1 pushed a commit to Ryun1/CIPs that referenced this pull request Nov 17, 2023
* Sums-of-products

* Address some comments

* Address more comments

* More on benchmarking

* More acceptance criteria

* Address more comments

* More work

* More

* Update metadata

* More benchmarks

* More comments

* Rename folder

* Rework to just be SOPs
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Category: Plutus Proposals belonging to the 'Plutus' category.
Projects
None yet
Development

Successfully merging this pull request may close these issues.