-
Notifications
You must be signed in to change notification settings - Fork 161
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add doc/ledger-consensus-api.md #668
Open
erikd
wants to merge
1
commit into
master
Choose a base branch
from
erikd/ledger-consensus-api
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,146 @@ | ||
# An Ideal API for Consensus and the Ledger | ||
|
||
This is written from the point of view of a consumer (ie `cardano-db-sync`) of data from the | ||
consensus and ledger layers. It describes the problems I, as the main dev on `db-sync`, see | ||
in what we have now and then gives top level details of how I think consensus and ledger | ||
should provide to the consumer like `db-sync`. | ||
|
||
### Problems with What we Have Now | ||
|
||
Consensus (along with the networking code) and ledger-specs are developed in two separate | ||
Git repositories and neither has a well thought out or evolved API. Instead they simply expose | ||
their internals. The different era's (Shelley, Allegra, Mary etc) all have their own data | ||
types which are unified using type families. Unfortunately, with type families, changes in the | ||
library can cause particularly obtuse error messages in client code. However useful type | ||
families might be for developing the code for the different eras, having to deal with the | ||
different types, even when unified using type families, only makes things more difficult for | ||
clients like `db-sync`. | ||
|
||
In order to populate it's database `db-sync` currently requires: | ||
|
||
* A `LocalChainSync` client to get blocks from the blockchain. | ||
* A ledger state to get information that is not recorded on chain (rewards, stake distribution | ||
etc). | ||
* A ledger state query to get block information that is not on the blockchain (block time stamp, | ||
epoch number, slot within epoch etc). | ||
* Data obtained as an aggregation query on data already in the database (sum of tx inputs, | ||
deposit value in a transaction etc). | ||
|
||
For the case of database aggregation queries, it is not a problem for things like populating the | ||
epoch table, but is a significant performance hit when calculating things like the deposit amount | ||
for every transaction. | ||
|
||
It should also be noted that because `db-sync` has to insert data into PostgreSQL, it will likely | ||
be the first Cardano component to hit issues where its performance cannot keep up with a new | ||
block arriving every 20 seconds. Anything that can reduce the amount of computation `db-sync` | ||
needs to do improves its performance. | ||
|
||
To summarise, the problems with the current approach: | ||
|
||
* There are four different mechanisms to get all the information needed by the database. | ||
* `db-sync` needs to maintain a copy of ledger state that is identical to the copy of the | ||
ledger state in the `node`. With `db-sync`, that means applying a block to an existing ledger | ||
state is done twice; once in the `node` and once in `db-sync`. The amount of code required to | ||
maintain ledger state is significant and basically duplicates code in `consensus`. | ||
* Some data that goes into the database (eg the `deposit` field of the `tx` table) must be | ||
calculated using a database query. This needs to be done for every transaction in every | ||
block and is not a cheap operation. | ||
|
||
|
||
### What Data is Needed? | ||
|
||
The data needed by `db-sync` but not actually part of the blockchain (not necessarily an | ||
exhaustive list): | ||
|
||
* UTC time stamp for each block (currently calculated with a local state query). | ||
* Epoch number and slot number within an epoch (currently calculated with a local state query). | ||
* The rewards for each epoch (extracted from the ledger state). | ||
* The stake distribution for each epoch (extracted from the ledger state). | ||
* Sum of the transaction input amounts (this is needed to calculate the deposit which currently | ||
requires a relatively complex database query). | ||
* Refunds of pool deposits after a pool is de-registered (currently not available). | ||
* Rewards (tagged as either good or orphaned) so that `db-sync` does not have to emulate what | ||
`ledger-specs` does (currently not available). | ||
|
||
### An Ideal API for a `db-sync`-like Consumer | ||
|
||
The ideal API for a `db-sync` like consumer would not require the consumer to maintain its | ||
own ledger state. Instead, the API would provide two things: | ||
|
||
* Enhanced or annotated blocks (these are not blocks as they would appear on the blockchain), | ||
with addition information like a UTC time stamp, current era, epoch number, slot within | ||
an epoch etc. These annotated blocks are era independent. | ||
* Enhanced/annotated blocks would not contain the blockchain version of blocks, but an enhanced | ||
block with things like the deposit value for each transaction. Like the annotated blocks, | ||
annotated transactions would also be era independent. | ||
* Ledger events notifying of things that are difficult to obtain just by looking at the current | ||
block. This would include things like: | ||
|
||
``` | ||
data LedgerEvent | ||
= LedgerNewEra !Era | ||
| LedgerNewEpoch !EpochNo | ||
| LedgerRewards !EpochNo !Rewards | ||
| LedgerStakeDist !EpochNo !StakeDist | ||
| LedgerBlock !AnnotatedBlock | ||
```` | ||
|
||
Rewards and stake distribution could be calculated incrementally by the ledger and the partial | ||
results could be passed to `db-sync` as they become available. | ||
|
||
With regards to rewards, there are currently two sources; rewards from minting blocks and | ||
instantaneous rewards. Currently these are two separate data structures, but are only distributed | ||
at epoch boundaries. In this new API, it would make sense to merge these two, with each separate | ||
reward being annotated by its source (eg pool owner, pool member, instantaneous treasury and | ||
instantaneous reserve payment). | ||
|
||
There would then be something like the `LocalChainSync` protocol that passes `LedgerEvent` | ||
objects over the connection rather than the current blockchain version of the blocks. | ||
|
||
It is important to note that the `cardano-wallet` *also* maintains its own copy of ledger state and | ||
*also* uses the local state query mechanism. Providing this new API would mean that both `db-sync` | ||
and `cardano-wallet` can remove large chunks of relatively complex code (and probably tests) to | ||
replace it with the new `LedgerEvent` date from the node. | ||
|
||
Similarly, both the `cardano-wallet` and `db-sync` already have an era independent block type | ||
(for `db-sync` this is Shelley/Allegra/Mary/Alonzo only) with conversion functions to convert | ||
from the era specific block/tx types to the generic block/tx types. Having `ledger-specs` | ||
provide this generic block type, reduces the amount of code `cardano-wallet` and `db-sync` needs | ||
to maintain. | ||
|
||
### Potential Problems Implementing the New API | ||
|
||
On startup of `db-sync` it needs to negotiate with the `node` on the most recent valid block. | ||
In the worst case, `db-sync` will need to sync from genesis or sync from some point for which | ||
the `node` does not have the required ledger state. This means that `node` will need to | ||
reconstruct the required ledger state. | ||
|
||
|
||
### Annotated Data Structures | ||
|
||
The idea here is that a single block type captures the super set of Byron, Shelley, Allegra, | ||
Mary, Alonzo and and future era. Fields that are not valid for all eras can be encoded as | ||
`Maybe`s or lists, so that fields that are not valid would be `Nothing` or `[]`. | ||
|
||
There are two main data structures required, `AnnotatedBlock` and `AnnotatedTx` which can be | ||
sketched out as follows: | ||
|
||
``` | ||
data AnnotatedBlock = AnnotatedBlock | ||
{ abEra :: !Era | ||
, abEpochNo :: !EpochNo | ||
, abSlotNo :: !SlotNo | ||
, abEpockSlot :: !Word -- The slot within the epoch (starts at 0 for first slot of each epoch | ||
, abTimeStamp :: !UTCTime -- The slot number converted to UTCTime | ||
, abEpochSize :: !EpochSize -- Number of slots in current epoch | ||
-- All fields in the superset of all block types | ||
, abTxs :: [AnnotatedTx] | ||
} | ||
|
||
data AnnotatedTx = AnnotatedTx | ||
{ atInputSum :: !Coin -- Sum of the tx inputs | ||
, atOutSum :: !Coin -- Sum of the tx outputs | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
|
||
, txFees :: !Coin | ||
-- All fields in the superset of all tx types | ||
} | ||
``` |
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Computing this at Node may be impossible. The reason is that this requires the current UTxO before a Tx is applied, which won't be available to the Node when we request for a block. This can only be available after db-sync has received the block and applies it, as a ledger event.
Let me explain with an example:
Node tip is at Slot 10000 and db-sync requests block 100. The ledger state at slot 10000 can be used to produce the time metadata for block 100, because the time horizon for time metadata is pretty wide. However it has no way to resolve the inputs for block 100: the UTxO at slot 10000 is useless for this. The UTxO at slot 100 is also not enough, because it is already stale when a first transaction is applied. This can only be done in the general case as ledger events.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My idea was that the ledger would calculate this. It gets the next block from the LocalChainSync protocol, applies it, and then returns a new ledger state, with an annotated block which includes annotations with the data specified in that document..
Unless the inputs have already been spent, they should still be part of the UTxO set before the block is applied. If they have already been spent at that time, the transaction is not valid anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The TxOut may not exist yet, if it is created by a previous Tx on the same block. So the UTxO state before the block application is not enough in the general case. The only place where the correct UTxO is available is somewhere inside the
applyBlock
, at the point where a Tx is applied. The only way to make this available is with ledger events, which is, in my mind at least, metadata of a block application.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My idea is all predicated on there being a function like
applyBlock
but with a type signature:where one of the constructors of
LedgerEvent
carries anAnnotatedBlock
containingAnnotatedTx
s.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes this is exactly what we need eventually. 50% of db-sync code could be removed by a function like this. Unfortunately
applyblock
is a black box fordb-sync
which only returns theLedgerState
currently.