Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CIP-32: Inline datums #160

Merged
merged 4 commits into from
Mar 15, 2022
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
129 changes: 129 additions & 0 deletions CIP-0032/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,129 @@
---
CIP: 32
Title: Inline datums
Authors: Michael Peyton Jones <michael.peyton-jones@iohk.io>
Comments-Summary: No comments
Comments-URI:
Status: Draft
Type: Standards Track
Created: 2021-11-29
License: CC-BY-4.0
---

# Inline datums

## Abstract

We propose to allow datums themselves to be attached to outputs instead of datum hashes.
This will allow much simpler communication of datum values between users.

## Motivation

Conceptually, datums are pieces of data that are attached to outputs.
However, in practice datums are implemented by attaching _hashes_ of datums to outputs, and requiring that the spending transaction provides the actual datum.

This is quite inconvenient for users.
Datums tend to represent the result of computation done by the party who creates the output, and as such there is almost no chance that the spending party will know the datum without communicating with the creating party.
That means that either the datum must be communicated between parties off-chain, or communicated on-chain by attaching it to the transaction that creates the output (which is also inconvenient in that the spending party must watch the whole chain to spot it).

It would be much more convenient to just put the _datum itself_ in an output, which is what we propose.

### Use cases

We expect that, provided we are able to bring the cost low enough, a large proportion of dapp developers will make use of this feature, as it will simplify their systems substantially.

## Specification

Transaction outputs are changed so that the datum field can contain either a hash or a datum (an "inline datum").

The min UTXO value for an output with an inline datum depends on the size of the datum, following the `coinsPerUTxOWord` protocol parameter.

When an output with an inline datum is spent, the spending transaction does not need to provide the datum itself.

### Script context

Scripts are passed information about transactions via the script context.
The script context therefore needs to be augmented to contain information about inline datums.

Changing the script context will require a new Plutus language version in the ledger to support the new interface.

There are two changes in the new version of the interface:
- The datum field on transaction outputs can either be a hash or the actual datum.
- The datum field on transaction inputs can either be a hash or the actual datum.

Old versions of the language will retain the old interface, but the construction of the script context in the old interface is also changed as follows: if either an input or an output has an inline datum, we replace it by its hash, and put the datum in the existing datum-hash-to-datum mapping.

### CDDL

The CDDL for transaction outputs will change as follows to reflect the new alternative.
```
transaction_output =
[ address
, amount : value
, ? datum : $hash32 / plutus_data
]
```
TODO: should there be a dedicated production for datum-hash-or-datum? Does it need to be tagged?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main use for creating this is that the Rust codebase and a few others are automatically generated from the CDDL definition so if you create a datum-hash-or-datum definition, it means this will be a common struct that represents this for all implementations


## Rationale

The key idea of this proposal is simply to restore the conceptually straightforward situation where datums are attached to outputs.
Historically, this was the way that the EUTXO model was designed, and switching to datum hashes on outputs was done to avoid bloating UTXO entries, which at that time (pre-multiasset) were constant-size (see [1] page 7).

Now that we have variable-sized UTXO entries and the accounting to support them, we can restore inline datums.

Since inline datums change very little about the model apart from where data is stored, we don't need to worry about violating any of the other requirements of the ledger, but we do need to worry about the effect on the size of the UTXO set.

### UTXO set size

This proposal gives users a way to put much larger amounts of data into the UTXO set.
Won’t this lead to much worse UTXO set bloat?

The answer is that we already have a mechanism to discourage this, namely the minimum UTXO value.
If inline datums turns out to drive significantly increased space usage, then we may need to increase `coinsPerUTxOWord` in order to keep the UTXO size down.
That will be costly and inconvenient for users, but will still allow them to use inline datums where they are most useful and the cost is bearable.
Furthermore, we hope that we will in fact be able to _reduce_ `coinsPerUTxOWord` when the upcoming work on moving the UTXO mostly to on-disk storage is complete.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. To add to this, probably we should have a follow-up CIP to this that changes the tx fee function so that inline datums contribute less towards tx fee than other fields (since unlike other fields, it's just data that don't require extra computation to simply inline into the tx)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They do contribute to block size though, which is also a very limited resource. Maybe the correct way of dealing with this is to split up txfee into a memory and a time component, and have two constants that can be used to adjust them independently.

Copy link
Contributor Author

@michaelpj michaelpj Dec 7, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are two things here:

  • UTXO size deposits. This is mostly driven by memory constraints, and as such there is no reason to treat inline datums differently, we just need a single parameter based on size.
  • Transaction size fee. This is really about processing time, and there there arguably is a reason to pay more or less for different parts of the transaction. But that would be a fairly major change.


Another guard rail would be to enforce upper limits on the size of inline datums.
At the extreme, we could bound them to the size of a hash, which would guarantee no more space usage than today.
However, this is much worse for users, since it introduces a sharp discontinuity where an inline datum is entirely acceptable, until it crosses the size threshold at which point it is unacceptable, and there is no way to avoid this.
Generally we prefer to avoid such discontinuities in favour of gradually increasing costs.

In practice, what is implemented here may depend on whether the UTXO-on-disk work is completed by the time that this proposal is implemented.

### Other modes of specifying datums

We could deprecate the other methods of specifying datums (datum hashes or datum hashes+optional datums).
KtorZ marked this conversation as resolved.
Show resolved Hide resolved
However, the other approaches also have some advantages.

- Transmission costs: creator pays versus consumer pays
- Inline datums: creator pays
- Datum hashes: consumer pays
- Datum hashes+optional datums: both pay
- Min UTXO value costs
- Inline datums: depends on data size
- Datum hashes: fixed cost
- Datum hashes+optional datums: fixed cost
- Privacy
- Inline datums: datum is immediately public
- Datum hashes: datum is not public until consumed
- Datum hashes+optional datums: datum is immediately public, but only to chain-followers
- Communication of datums
- Inline datums: easy on-chain communication
- Datum hashes: off-chain communication necessary
- Datum hashes+optional datums: complicated on-chain communication

Any one of these factors could be important to particular use cases, so it is good to retain the other options.

### Changing the script context

We don't strictly need to change the script context.
We could use the fall-back approach that we use for the old interface, namely continuing to pretend that datums are always hashes, and providing the mapping from hashes to datums.
This would be functionally equivalent.

The only advantage of changing it is that it's more honest to the real representation of transactions, and therefore allows scripts to insist on inline datums if they want.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For data-availability uses cases being able to enforce the usage of inline datums is crucial so I would definitely make some way to enforce it (be it through its presence in the script context or some other way)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This only makes a difference off-chain. IICR validators can already enforce datums for new outputs to be present in a transaction, so the only real difference between inline or not is whether to look it up in the transaction history or the UTXO. Making the inlining status visible to validators would simplify this lookup, at the cost of adding complexity to the on-chain code for everyone.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is whether to look it up in the transaction history or the UTXO

Which is a big deal! So I think Sebastian's point is relevant: people will want to be able to enforce this.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@WhatisRT Can you explain how to find datums trustlessly offchain? I can't seem to find any answers online, this issue comes closest

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the hash is published on-chain, there's no requirement for trustlessness here

It's unclear whether this is desirable.

## References

[1]: Chakravarty, Manuel MT, et al. "The extended UTXO model."