keep-network · pdyraga · May 31, 2019 · Mar 28, 2019 · Apr 8, 2019 · May 28, 2019
diff --git a/docs/rfc/rfc-9-upgradeable-contract-components.adoc b/docs/rfc/rfc-9-upgradeable-contract-components.adoc
@@ -0,0 +1,360 @@
+:toc: macro
+
+= RFC 9: Upgrading contracts by separate components
+
+:icons: font
+:numbered:
+toc::[]
+
+== Background
+
+Stakers in the Keep network use work contracts to perform work for customers.
+Work contracts make the off-chain work of stakers
+available to on-chain applications
+using a system of rewards and punishments to enforce correct behavior.
+
+Because work contracts must be able to penalize misbehaving stakers
+by slashing their staked tokens,
+it is important that they can be upgraded securely.
+For an upgrade to be considered secure,
+two things must be achieved:
+
+1. stakers must authorize any contracts that could slash their stakes
+2. once authorized, contract code must be immutable
+
+Condition 1 is important to protect stakers
+against errors and adversaries.
+Upon staking their tokens,
+each staker appoints an authorizer
+who must approve of all contracts that could punish the staker.
+This ensures that stakes cannot be stolen
+even if the work contract upgrade scheme is thoroughly compromised,
+as long as the individual authorizers reject bad work contracts.
+
+Condition 2 is a practical requirement to make condition 1 meaningful.
+If an authorized contract's code can be changed later,
+the authorizations provide no protection.
+This means that contracts can only be upgraded by deploying a new version,
+which the stakers then authorize individually.
+
+However, deploying a new contract on every upgrade
+makes migrating between implementations difficult.
+Common ways of upgrading contracts,
+such as the "eternal storage" pattern
+where the contract holds data in its nominal address
+while delegating logic to a different, upgradeable contract address,
+are incompatible with individual upgrade authorization.
+
+=== Current Functionality
+
+The current design for upgrading work contracts
+is to deploy a new version on every upgrade,
+and wait for stakers to start operating on it.
+Older versions would by necessity keep existing
+in parallel with newer ones,
+and migrating state between versions
+is difficult if not impossible.
+For example, different versions of threshold BLS contracts,
+even if the changes are as minor as adjusting hardcoded constants,
+would create entirely separate random beacons.
+It is possible that the difficulty of migration
+would lead customers to prefer the old version with established state.
+
+== Proposal
+
+Customer continuity can be reconciled with staker sovereignty
+by dividing work contracts into
+_operator contracts_ which handle incentive-critical operations with stakers;
+and _service contracts_ which use _operator contracts_
+to provide services to customers.
+A single _service contract_
+can abstract over multiple different _operator contracts_,
+permitting secure upgrades with reduced or minimal disruption to customers.
+
+=== Goal
+
+This RFC seeks to provide a method for upgrading contracts securely,
+maintaining individual staker authorization for all contracts
+so any contract touching staked tokens
+must be pre-approved by the respective staker or their appointed agent.
+The upgrade process should permit gradual improvements
+without disrupting users or stakers,
+and it should deal gracefully with major changes.
+
+=== Implementation
+
+For each service provided on the Keep network,
+the components critical for correct incentives are identified
+and separated from the rest of the service contract as a _operator contract_.
+Operator contracts aim to be self-contained incentive-wise,
+minimizing or eliminating the need for stakers to trust other contracts.
+
+Customer applications use a _service contract_ to perform work.
+The _service contract_ provides a unified interface to customers,
+handling the version management of _operator contracts_
+and any shared non-critical state between them.
+
+==== Operator contract
+
+The operator contracts handle all operations
+that may have an impact on staked tokens.
+
+Each operator contract is an independent "microservice",
+keeping its own state on security-critical data.
+The operator contracts provide simplified functionality
+that is stripped to the minimum necessary
+for security and correct incentives.
+
+Each operator contract is associated with one or more service contracts
+the operator contract provides functionality for.
+Only these specified service contracts
+may use the operator contract's interface.
+When the interface is designed,
+the service contract must be treated as untrusted
+and the operator contract must maintain correctness
+regardless of the service contract input.
+
+When a operator contract performs a service,
+it is paid for the service provided.
+The payment is distributed according to the operator contract's own rules.
+
+==== Service contract
+
+Service contracts use the basic functionality
+performed by operator contracts,
+to provide useful services to customers.
+
+Service contracts receive requests for services from customers,
+and divide the provided service to operator contract and service contract components.
+Elements that are critical for security and incentives
+are delegated to a operator contract,
+while other parts of the work are performed in the service contract.
+
+Service contracts can use
+multiple different versions of operator contracts
+to perform the operator contract functions.
+The service contract keeps shared state which is not security-critical.
+
+==== Operator contract upgrades
+
+An operator contract is upgraded by deploying a new version,
+and adding it to the available operator contracts of a service contract.
+As stakers authorize the new operator contract,
+the service contract can gradually migrate
+to use the new operator contract over older versions.
+
+Operator contracts can be upgraded
+without losing service contract state,
+but critical state is held within the operator contract
+and cannot be migrated.
+
+==== Service contract upgrades
+
+Because operator contracts are designed
+to be independent of service contract security and correctness,
+there is much more leeway to upgrade service contracts.
+
+Service contracts can be directly upgradeable,
+e.g. with the eternal storage pattern;
+they may be immutable,
+only accepting new operator contract versions;
+or they may implement a similar pattern
+in the direction of customers,
+unifying core functionality
+between immutable public interface contracts.
+
+Because operator contracts can serve multiple service contracts,
+immutable service contracts can be upgraded
+by deploying the new service contract version,
+along with a new operator contract version
+which can serve both the new and the old service contract.
+The old service contract can migrate work onto the new operator contract,
+remaining perfectly functional while the new service contract is spun up.
+
+==== Staking contract upgrades
+
+Staking contracts can be upgraded
+by deploying a new version and waiting for stakers to migrate
+by withdrawing their stakes on the old contract
+and staking them again on the new one.
+Migrating between staking contracts requires
+waiting the unstaking period
+and suffering the associated opportunity cost,
+but staking partial amounts can mitigate the impact
+as overall network revenue is not expected to change.
+
+Each operator contract needs to identify
+which staking contracts it accepts.
+When a new staking contract is deployed,
+all operator contracts need to be upgraded
+to a version recognizing the new staking contract,
+either exclusively or in addition to the old one.
+When a sufficient amount of time has elapsed
+and stakers have had the opportunity to migrate,
+support for the old staking contract can be dropped.
+
+=== Limitations
+
+Untrusted service contracts mean
+that security-critical state must be kept in the operator contract.
+If the network service has complex security needs,
+the operator contract may have to implement most of the work logic.
+
+Security-critical state cannot be migrated between operator contracts;
+a new operator contract has to start from a blank slate.
+Inherently long-running operations
+present a limit to how rapidly the system can be upgraded.
+
+The division of service contracts and operator contracts
+doesn't map cleanly to situations
+where the service to applications consists of providing keeps,
+individual smart contracts deployed for individual groups of operators.
+
+=== Example: Random Beacon
+
+The random beacon generates random numbers in response to requests,
+using BLS threshold signatures on some specific input.
+The signatures are generated by signing groups
+that have been created using random sortition
+from all eligible and active stakers.
+Rewards and punishments are used to incentivize correct behavior.
+
+To split the random beacon into a service contract-operator contract design,
+the security-critical elements need to be identified.
+
+In this case the operator contract needs to handle
+group creation and expiration,
+BLS signature verification,
+and incentives.
+
+Handling entry requests and pricing;
+determining the signing input for generating new entries;
+calling callbacks;
+and requesting the creation of new groups
+are responsibilities that are not critical for beacon integrity
+from the perspective of the stakers.
+These can be performed by the service contract
+without individual staker authorization of upgrades.
+
+==== Operator contract
+
+The operator contract for the random beacon
+provides the following interface to the service contract:
+
+`create_group(payment)`::
+Create a new group when requested by the service contract,
+selecting members using pseudorandom sortition,
+and performing DKG.
+The operator contract does not accept input from the service contract,
+but instead uses its own pseudorandom seed,
+to ensure that group composition cannot be manipulated.
+`payment` must exceed a minimum amount
+and is used to cover gas fees and to reward stakers.
+
+`sign(entry_id, group_input, signing_input, payment)`::
+Use `group_input` to select a signing group,
+and generate a valid BLS threshold signature for `signing_input`.
+Once generated, use `payment` to reward stakers.
+`payment` must exceed a set minimum value
+that covers necessary gas fees.
+When the entry is created,
+the operator contract calls the service contract with the new entry,
+using `entry_id` to identify the entry.
+
+Behind this interface,
+the operator contract tracks its own groups, their members
+and their threshold public keys.
+The service contract trusts the operator contract
+to only provide valid entries when given specific inputs.
+Alternatively the operator contract could provide
+the associated public key so the entry can be validated,
+but even then the operator contract needs to be trusted
+to provide a public key corresponding to a random valid group.
+
+==== Service contract
+
+The service contract for the random beacon
+handles customer-facing features and ties the operator contracts together.
+The interface of the service contract towards the operator contract is:
+
+`group_created(n_groups)`::
+The call to `create_group()` has finished
+(successfully or unsuccessfully)
+and expired groups have been removed.
+The operator contract now has `n_groups` active.
+
+`entry_created(entry_id, entry)`::
+The previous call for the operator contract to `sign(entry_id, ...)`
+completed successfully,
+resulting in the new `entry`.
+
+The service contract keeps a list of operator contracts
+along with the number of active groups in each.
+
+When receiving a request,
+the service contract determines what values should be
+the group selection input
+and the signing input.
+The group selection input is used to select a operator contract,
+weighted by the number of active groups on each,
+to serve the request.
+
+When the operator contract is determined,
+the group selection input and signing input are passed to it
+along with an appropriate payment.
+When the operator contract returns a valid entry with `entry_created(...)`,
+the service contract stores it and calls the customer-specified callback.
+
+If a new group should be created,
+the service contract determines which operator contract should create one
+(the most recent one, or a random one weighted by recent-ness),
+and calls `create_group()` on the selected operator contract
+with an appropriate payment.
+Once the operator contract has finished DKG and expired old groups,
+it returns the new number of active groups using `group_created(n_groups)`.
+
+Unlike the operator contract which needs to maintain integrity
+for arbitrary, malicious inputs,
+the service contract relies heavily on trusting the operator contracts.
+This is acceptable because the operator contracts are known, unchangeable code,
+and the service contract only has access to what customers have paid for entries;
+boycotting a compromised or malfunctioning service contract
+and deploying a new one
+is sufficient to mitigate attacks or errors.
+
+== Future Work
+
+An exact architecture for service contract upgrades
+is not specified.
+The service contract upgrade process should be resilient to minor compromise
+and relying on a global master key may be undesirable
+as a single point of failure.
+
+For greater assurance towards customers,
+the service contracts could be made immutable
+in a manner similar to the operator contracts.
+When a customer uses a specific service contract to request an entry,
+they could trust that only that service contract
+and its associated operator contracts
+will be involved in the generation of their entry.
+However, this would reduce the ability to maintain
+a global "canonical" chain of entries,
+each linked to the previous ones.
+
+Most currently envisioned services on the Keep network
+consist of providing keeps to customer applications.
+The operator contract/service contract split
+is less than ideal for this use-case,
+and the random beacon could be rearchitected to follow the same pattern.
+Each keep is a separate contract which inherits its logic from its factory,
+but applications communicate directly with the keeps.
+The role of the service contract would be limited
+to managing different versions of keep factories,
+with the rest of its functionality being subsumed by the application.
+
+[bibliography]
+== Related Links
+
+- [System upgrade handling](https://github.com/keep-network/keep-core/issues/133)
+- [Specify contract upgrade scheme](https://github.com/keep-network/keep-core/issues/725)
+- [RFC 4: Secure upgrades for contracts operating staked balances](https://github.com/keep-network/keep-core/pull/446)