Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] Upgradeable contract components #726

Merged
merged 8 commits into from
May 31, 2019
360 changes: 360 additions & 0 deletions docs/rfc/rfc-9-upgradeable-contract-components.adoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,360 @@
:toc: macro

= RFC 9: Upgrading contracts by separate components

:icons: font
:numbered:
toc::[]

== Background

Stakers in the Keep network use work contracts to perform work for customers.
Work contracts make the off-chain work of stakers
available to on-chain applications
using a system of rewards and punishments to enforce correct behavior.

Because work contracts must be able to penalize misbehaving stakers
by slashing their staked tokens,
it is important that they can be upgraded securely.
For an upgrade to be considered secure,
two things must be achieved:

1. stakers must authorize any contracts that could slash their stakes
2. once authorized, contract code must be immutable

Condition 1 is important to protect stakers
against errors and adversaries.
Upon staking their tokens,
each staker appoints an authorizer
who must approve of all contracts that could punish the staker.
This ensures that stakes cannot be stolen
even if the work contract upgrade scheme is thoroughly compromised,
as long as the individual authorizers reject bad work contracts.

Condition 2 is a practical requirement to make condition 1 meaningful.
If an authorized contract's code can be changed later,
the authorizations provide no protection.
This means that contracts can only be upgraded by deploying a new version,
which the stakers then authorize individually.

However, deploying a new contract on every upgrade
makes migrating between implementations difficult.
pdyraga marked this conversation as resolved.
Show resolved Hide resolved
Common ways of upgrading contracts,
such as the "eternal storage" pattern
where the contract holds data in its nominal address
while delegating logic to a different, upgradeable contract address,
are incompatible with individual upgrade authorization.

=== Current Functionality

The current design for upgrading work contracts
is to deploy a new version on every upgrade,
and wait for stakers to start operating on it.
Older versions would by necessity keep existing
in parallel with newer ones,
and migrating state between versions
is difficult if not impossible.
For example, different versions of threshold BLS contracts,
even if the changes are as minor as adjusting hardcoded constants,
would create entirely separate random beacons.
It is possible that the difficulty of migration
would lead customers to prefer the old version with established state.

== Proposal

Customer continuity can be reconciled with staker sovereignty
by dividing work contracts into
_operator contracts_ which handle incentive-critical operations with stakers;
and _service contracts_ which use _operator contracts_
to provide services to customers.
A single _service contract_
can abstract over multiple different _operator contracts_,
permitting secure upgrades with reduced or minimal disruption to customers.
pdyraga marked this conversation as resolved.
Show resolved Hide resolved

=== Goal

This RFC seeks to provide a method for upgrading contracts securely,
maintaining individual staker authorization for all contracts
so any contract touching staked tokens
must be pre-approved by the respective staker or their appointed agent.
The upgrade process should permit gradual improvements
without disrupting users or stakers,
and it should deal gracefully with major changes.
pdyraga marked this conversation as resolved.
Show resolved Hide resolved

=== Implementation

For each service provided on the Keep network,
the components critical for correct incentives are identified
and separated from the rest of the service contract as a _operator contract_.
Operator contracts aim to be self-contained incentive-wise,
minimizing or eliminating the need for stakers to trust other contracts.

Customer applications use a _service contract_ to perform work.
The _service contract_ provides a unified interface to customers,
handling the version management of _operator contracts_
and any shared non-critical state between them.

==== Operator contract

The operator contracts handle all operations
that may have an impact on staked tokens.

Each operator contract is an independent "microservice",
keeping its own state on security-critical data.
The operator contracts provide simplified functionality
that is stripped to the minimum necessary
for security and correct incentives.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💯 !


Each operator contract is associated with one or more service contracts
the operator contract provides functionality for.
Only these specified service contracts
may use the operator contract's interface.
When the interface is designed,
the service contract must be treated as untrusted
and the operator contract must maintain correctness
regardless of the service contract input.

When a operator contract performs a service,
it is paid for the service provided.
The payment is distributed according to the operator contract's own rules.

==== Service contract

Service contracts use the basic functionality
performed by operator contracts,
to provide useful services to customers.

Service contracts receive requests for services from customers,
and divide the provided service to operator contract and service contract components.
Elements that are critical for security and incentives
are delegated to a operator contract,
while other parts of the work are performed in the service contract.

Service contracts can use
multiple different versions of operator contracts
to perform the operator contract functions.
The service contract keeps shared state which is not security-critical.

==== Operator contract upgrades

An operator contract is upgraded by deploying a new version,
and adding it to the available operator contracts of a service contract.
As stakers authorize the new operator contract,
the service contract can gradually migrate
to use the new operator contract over older versions.

Operator contracts can be upgraded
without losing service contract state,
but critical state is held within the operator contract
and cannot be migrated.
pdyraga marked this conversation as resolved.
Show resolved Hide resolved

==== Service contract upgrades

Because operator contracts are designed
to be independent of service contract security and correctness,
there is much more leeway to upgrade service contracts.

Service contracts can be directly upgradeable,
e.g. with the eternal storage pattern;
they may be immutable,
only accepting new operator contract versions;
or they may implement a similar pattern
pdyraga marked this conversation as resolved.
Show resolved Hide resolved
in the direction of customers,
unifying core functionality
between immutable public interface contracts.

Because operator contracts can serve multiple service contracts,
immutable service contracts can be upgraded
by deploying the new service contract version,
along with a new operator contract version
which can serve both the new and the old service contract.
The old service contract can migrate work onto the new operator contract,
remaining perfectly functional while the new service contract is spun up.

==== Staking contract upgrades

Staking contracts can be upgraded
by deploying a new version and waiting for stakers to migrate
by withdrawing their stakes on the old contract
and staking them again on the new one.
Migrating between staking contracts requires
waiting the unstaking period
and suffering the associated opportunity cost,
but staking partial amounts can mitigate the impact
as overall network revenue is not expected to change.

Each operator contract needs to identify
which staking contracts it accepts.
When a new staking contract is deployed,
all operator contracts need to be upgraded
to a version recognizing the new staking contract,
either exclusively or in addition to the old one.
When a sufficient amount of time has elapsed
and stakers have had the opportunity to migrate,
support for the old staking contract can be dropped.

=== Limitations

Untrusted service contracts mean
that security-critical state must be kept in the operator contract.
If the network service has complex security needs,
the operator contract may have to implement most of the work logic.

Security-critical state cannot be migrated between operator contracts;
a new operator contract has to start from a blank slate.
Inherently long-running operations
present a limit to how rapidly the system can be upgraded.

The division of service contracts and operator contracts
doesn't map cleanly to situations
where the service to applications consists of providing keeps,
individual smart contracts deployed for individual groups of operators.

=== Example: Random Beacon

The random beacon generates random numbers in response to requests,
using BLS threshold signatures on some specific input.
The signatures are generated by signing groups
that have been created using random sortition
from all eligible and active stakers.
Rewards and punishments are used to incentivize correct behavior.

To split the random beacon into a service contract-operator contract design,
the security-critical elements need to be identified.

In this case the operator contract needs to handle
group creation and expiration,
BLS signature verification,
and incentives.

Handling entry requests and pricing;
determining the signing input for generating new entries;
calling callbacks;
and requesting the creation of new groups
are responsibilities that are not critical for beacon integrity
from the perspective of the stakers.
These can be performed by the service contract
without individual staker authorization of upgrades.

==== Operator contract

The operator contract for the random beacon
provides the following interface to the service contract:

`create_group(payment)`::
Create a new group when requested by the service contract,
selecting members using pseudorandom sortition,
and performing DKG.
The operator contract does not accept input from the service contract,
but instead uses its own pseudorandom seed,
to ensure that group composition cannot be manipulated.
`payment` must exceed a minimum amount
and is used to cover gas fees and to reward stakers.

`sign(entry_id, group_input, signing_input, payment)`::
pdyraga marked this conversation as resolved.
Show resolved Hide resolved
Use `group_input` to select a signing group,
and generate a valid BLS threshold signature for `signing_input`.
Once generated, use `payment` to reward stakers.
`payment` must exceed a set minimum value
that covers necessary gas fees.
When the entry is created,
the operator contract calls the service contract with the new entry,
using `entry_id` to identify the entry.

Behind this interface,
the operator contract tracks its own groups, their members
and their threshold public keys.
The service contract trusts the operator contract
to only provide valid entries when given specific inputs.
Alternatively the operator contract could provide
the associated public key so the entry can be validated,
but even then the operator contract needs to be trusted
to provide a public key corresponding to a random valid group.

==== Service contract

The service contract for the random beacon
handles customer-facing features and ties the operator contracts together.
The interface of the service contract towards the operator contract is:

`group_created(n_groups)`::
The call to `create_group()` has finished
(successfully or unsuccessfully)
and expired groups have been removed.
The operator contract now has `n_groups` active.

`entry_created(entry_id, entry)`::
The previous call for the operator contract to `sign(entry_id, ...)`
completed successfully,
resulting in the new `entry`.

The service contract keeps a list of operator contracts
along with the number of active groups in each.

When receiving a request,
the service contract determines what values should be
the group selection input
and the signing input.
The group selection input is used to select a operator contract,
weighted by the number of active groups on each,
to serve the request.

When the operator contract is determined,
the group selection input and signing input are passed to it
along with an appropriate payment.
When the operator contract returns a valid entry with `entry_created(...)`,
the service contract stores it and calls the customer-specified callback.

If a new group should be created,
the service contract determines which operator contract should create one
(the most recent one, or a random one weighted by recent-ness),
and calls `create_group()` on the selected operator contract
with an appropriate payment.
Once the operator contract has finished DKG and expired old groups,
it returns the new number of active groups using `group_created(n_groups)`.

Unlike the operator contract which needs to maintain integrity
for arbitrary, malicious inputs,
the service contract relies heavily on trusting the operator contracts.
This is acceptable because the operator contracts are known, unchangeable code,
and the service contract only has access to what customers have paid for entries;
boycotting a compromised or malfunctioning service contract
and deploying a new one
is sufficient to mitigate attacks or errors.

== Future Work

An exact architecture for service contract upgrades
is not specified.
The service contract upgrade process should be resilient to minor compromise
and relying on a global master key may be undesirable
as a single point of failure.

For greater assurance towards customers,
the service contracts could be made immutable
in a manner similar to the operator contracts.
When a customer uses a specific service contract to request an entry,
they could trust that only that service contract
and its associated operator contracts
will be involved in the generation of their entry.
However, this would reduce the ability to maintain
a global "canonical" chain of entries,
each linked to the previous ones.

Most currently envisioned services on the Keep network
consist of providing keeps to customer applications.
The operator contract/service contract split
is less than ideal for this use-case,
and the random beacon could be rearchitected to follow the same pattern.
Each keep is a separate contract which inherits its logic from its factory,
but applications communicate directly with the keeps.
The role of the service contract would be limited
to managing different versions of keep factories,
with the rest of its functionality being subsumed by the application.

[bibliography]
== Related Links

- [System upgrade handling](https://github.com/keep-network/keep-core/issues/133)
- [Specify contract upgrade scheme](https://github.com/keep-network/keep-core/issues/725)
- [RFC 4: Secure upgrades for contracts operating staked balances](https://github.com/keep-network/keep-core/pull/446)