A possible path for the future of how code on the EOS blockchain is structured and managed

arhag · May 17, 2021, 1:45pm

Below I describe my half-baked ideas for a possible path forward for structuring code on the EOS blockchain into layers and separating out the responsibilities of different designated groups within EOS so that they can focus on doing the best job they can in their limited domain.

Summary

Further break up the current blockchain code into layers. The native blockchain software as the first layer (e.g. running native code implemented in nodeos) still exists but I would like to see it become more general over time and move more of its functionality into layer 2 which consists of WebAssembly code managed on the blockchain itself. The system-managed contracts become layer 3 and all the other user-managed contracts fit into layer 4 and higher.
Integrate EdenOS into layer 3. If Eden proves itself, then let inflation funds be directed by Eden and let certain other privileges that currently belong to the block producers (BPs) move to a new group called the Committee which is made up of top elected representatives from Eden communities on EOS.
BPs now just become block proposers and block finalizers and their role is limited solely to ensuring proper operation of the layer 1 code of the blockchain (and in some cases layer 2 code, e.g. the pluggable consensus algorithm). They are no longer selected by voters directly. Instead, the committee selects them and compensates them for a job well done.
New system-managed contracts are introduced as tooling to allow layer 4 and higher contracts to better mitigate against hacks and other attacks. One example includes a way to track the source of recently received tokens and enforce a settlement time (typically 2 days) during which the tokens are restricted in terms of where they can be sent and the source of those tokens can freeze and even seize those tokens. Another example is a way to restrict smart contract upgrades to either: require a time delay before the new code is deployed by the manager unilaterally; or allow instant update by the manager if a multisig of pre-selected code auditors approve that the new code matches the pre-committed intent of the smart contract which does not change; or, in some cases, allow instant update by the manager unilaterally but all further token movements into the contract’s control are blocked until the sender acknowledges the new version of the code as valid. This tooling will also induce the creation of market places for code auditors and contract management groups (the ones given the freeze and seize powers of unsettled funds sourced from contracts in a particular cluster managed by the management group). The idea is that these entities will be separate from Eden members and representatives, though of course Eden members and representatives can serve double duty in these roles if their talents fit appropriately.

Layers of blockchain code

I think the blockchain code should eventually be structured into layers that build on top of each other:

Layer 1: Native blockchain software

This would be the code implemented natively (not involving WebAssembly code) made available through the binaries of the node software (e.g. nodeos) running the base layer EOSIO protocol. Everyone is expected to run a sufficiently up-to-date version of this software to be able to remain secure and keep in sync with the latest blocks of the EOS blockchain. Specially selected entities (block proposers and block finalizers) run this software with particular plugins enabled to facilitate the creation and finalization of blocks containing users’ transactions.

Layer 2: Core blockchain modules

This includes critical WebAssembly code to transform the barebones general computational platform of layer 1 into the particular blockchain that the community understands and loves with its various features and properties. Properties of the blockchain that people would take for granted would be defined and introduced at this layer, for example: the particular consensus algorithm; block proposer scheduling rules; transaction structure (and aspects of it like replay protection, expiration, how signatures are provided); basic foundations of an account and authentication/authorization system; introduction of the concept of a smart contract and how they are deployed and able to communicate with one another; and, foundations of computational resource tracking, enforcement, and management. It would also include code that allows a special root account the power to update these core blockchain modules (perhaps with enforced time delays and/or as an opt-in protocol change that requires action to be taken by node operators). It would also provide a mechanism for this root account to select a group of accounts to act as the block proposers and block finalizers of the network.

Layer 3: System-managed contracts

These would be the smart contracts (again as WebAssembly code) deployed on the blockchain which provide a lot of the flavor of the particular blockchain people use. For example, it would be where the definition of the core token of the blockchain exists or where concepts like DPoS exist. So this is basically nearly the same layer as what the community currently thinks of as “system contracts”; however some of the functionality that is currently in the current privileged system contract would instead belong in layer 2 in my design. Though the foundations of computational resource tracking, enforcement, and management would be defined in layer 2, layer 3 would include particular ways of using that that are apparent to the users of the blockchain (e.g. PowerUp). Though the foundations of account management and authentication/authorization would be introduced in layer 2, layer 3 would add a lot more functionality to it. This layer would also include useful tooling that allows for a more secure and decentralized way to deploy and upgrade smart contracts.

What I would like to see the EOS blockchain do with layer 3 in particular is to include EdenOS in this layer and, assuming Eden proves itself over time, to take several elected members from Eden communities and put them into a group known as the Committee. Personally, I think only have a single representative from each Eden community may, at least initially, make the size of this Committee too small depending on how many Eden communities there are on EOS. So perhaps it would be best if each Eden community added a few representatives to this Committee. The Committee would have the power to collectively, specific with more than two-third supermajority consensus, act as the root account to: update these core blockchain modules, and select the group of entities to act as the block proposers and block finalizers of the network. Via the root account they would also have the power to update system-managed contracts at this layer.

So there would no longer be BP selection directly by the voters. The Committee would be responsible for selecting and updating the list of BPs to ensure they have chosen competent people to keep the network functioning optimally. The role of the chosen BPs would just be to keep the blockchain operating properly. They would no longer be responsible for upgrading system-managed contracts (or core blockchain modules); that would be the responsibility of the committee. They would still be responsible for keeping their node software of layer 1 up-to-date to make sure the network is performing securely and efficiently. They would also no longer be responsible for directing inflation funds. They would receive pay from the network to compensate them for their work and that pay schedule would be determined by the Committee. General flow of inflation funds for funding things that benefit the network (or if desired some directed to staked token holders) would be the responsibility of the broader Eden system (i.e. not just the Committee).

Layer 4 and higher: Other smart contracts

This layer would include other smart contracts that are deployed and managed by users in blockchain community and are not the responsibility of the Committee to manage.

Regarding bugs and hacks, I believe that the Committee should limit their powers to fix issues in code within layers 3 and lower. Note that bugs in layer 1 typically require a hard fork to fix which requires all node operators to upgrade to new software prior to the Committee activating a protocol feature. This would also require cooperating with the block proposers and finalizers. Also often bugs can be mitigated in the short term through soft forks which would be carried out by the block proposers and finalizers alone, and it would be their responsibility to do so, likely under the direction of the committee.

There are issues of practically, scale, and fairness when trying to tackle bugs and hacks in contracts at layer 4 and higher. However, I believe a lot can be done to mitigate against hacks and other issues of contracts at layer 4 and higher by introducing new layer 3 contracts that the other contracts can opt into to increase their security (perhaps at the cost of a little more friction). I talk about that in some more detail in the next section.

System-managed contracts to assist other contracts in mitigating against hacks and other attacks

Several new contracts can be introduced to reduce the risk of a smart contract attack or to mitigate against the damage done in the case of a successful attack.

Token settlement contract

Many contracts need to deal with users’ tokens. The biggest concern for these contract managers is that a bug in their code leads to attackers illegitimately acquiring control of these tokens thus causing the legitimate users of this contract to lose their tokens.

The contract needs to apply arbitrary logic to change the ownership rights of tokens under its custody (potentially very frequently and in ways that must not fail). The sensible way of achieving this is through the deposit-and-spend pattern in which users move their tokens fully into the control of the contract but the contract notes within its internal state who sent the tokens so that it can appropriate track its ownership (this is the deposit phase) and then when that user takes actions within the smart contract which changes ownership of those tokens (this is the spend phase) that merely requires changes to the contract’s internal state. If the user wants to take back custody of the tokens, they can use a withdraw action to make the contract transfer the appropriate amount of tokens to them (assuming they still own the amount requested).

But this arbitrary logic may have flaws in it which are exploited by an attacker. This can cause the smart contract state to have ownership tracking that is not aligned with what it should have been according to the intent of the contract (as opposed to the reality of the code). Prior to withdrawing the tokens out of the custody of the smart contract, this is a bad but not yet critical situation since the smart contract could be updated by the managers to correct the mistakes in state (as well as the bug in the code to prevent that particular mistake from occurring again). However, if the attacker withdraws funds that the smart contract mistakenly believes is under their ownership, then under the current paradigm it is too late for the contract manager to do anything to get those tokens back to their rightful owners.

However, if there was an intentional delay added to withdrawing, then perhaps such mistakes could be detected early enough for the contract managers to freeze the functionality of the contract and take the time to correct the mistake while the appropriate amount of tokens still remained under the custody of the contract. The downside of this approach is that adds friction to the user experience through the intentional delay on withdrawing. If the user wants to interact with one DeFi application with their tokens, then use the proceeds from that DeFi application to then interact with another DeFi application, they would be forced to wait an arbitrary amount of time between the two steps (potentially 2 days to make such mitigations practically beneficial against attacks). Imagine if they wanted to chain together such flows across 5 different DeFi applications; what used to take them less than a minute would now take more than a week.

An alternative strategy is to move this withdrawing logic to a separate contract, a token settlement contract, which enforces a settlement time on tokens received from a contract. Prior to that settlement, the tokens sources from a contract are at risk of indefinite freeze and even seizure by an account designated by the source contract. This provides the contract manager the same capabilities they need to mitigate against attacks as they would have if they just added withdraw delays to their own contract. After settlement and assuming the tokens haven’t already been frozen, those tokens can no longer be seized or frozen.

However, this alone would not be sufficient to protect the contract manager’s interests unless there was a further restriction that non-frozen tokens prior to settlement were limited in where they could be transferred. It would actually be acceptable to transfer unsettled tokens back to the source contract. As a generalization of this, there could be a cluster of smart contracts defined who share a common management group who is in charge of freezes and seizures of unsettled tokens sourced from any of the contracts within that cluster. Then unsettled tokens could be instantly moved between contracts in the same cluster. There would be an additional burden on the managers of each of the individual contracts within the cluster to work with other contract managers in the cluster to make things right in the event of a hack. The problem is that a compromise of the intended integrity of the state of one contract in a cluster can have cascading effects that compromise the intended integrity of the state of other contracts in cluster; meaning that making things right is not just a matter of the initially compromised contract adjusting its state and getting back control of the tokens that were withdrawn from it but it is also a matter of potentially adjusting the state of other contracts in that cluster that were impacted by the tainted tokens.

Clearly that is a burden that many contract managers would not want to take on in general. There would likely be a limited set of peer contracts that a contract manager would be willing to be in a cluster with. If they choose a particularly badly written contract as a peer, they may be burdened with taking corrective actions to regularly which can be disruptive to their operations. And they may not have a choice to not comply in any particular circumstance since the common management of that cluster would have the ability to freeze their contract and seize the portion of tokens currently under the contract’s custody that were tainted and need to be returned back to the initially compromised contract so that it can make things right with its users. Also, this means that adding a new contract into a cluster should require unanimous consent of all existing members of the cluster. But a contract can unilaterally decide to secede from a cluster it currently belongs after a short delay of the settlement time to ensure that it cannot run off with any tainted tokens before the common management of that cluster has a chance to freeze the tokens if necessary.

Notice that contracts in separate clusters do not have to worry about this burden. They can assume that tokens sent to them are instantly irreversible. The downside is that it creates friction in terms of time delays for a user to move their tokens received from a contract in one cluster into another cluster (say to use with a contract that only exists within that cluster).

Finally, I think a marketplace can form for the common management groups that can be selected to manage a particular cluster. One particular management group may be selected to manage many different clusters. Top ranked management groups would be preferred and would be able to charge a higher fee. And none of this has to be the concern of the BPs, Committee, or any other groups delegated with the responsibility of managing code in layer 3 and below.

Contract upgrade management contract

A common problem contract developers face is that they need to be able to upgrade their contracts over time (to fix inevitable bugs or just to upgrade the functionality of their contracts) but that also gives them the power to change the intent behind their already deployed contracts that users are already using. Understandably, users worry about the developers or managers of the contract suddenly committing a malicious act and stealing their funds that they have trusted in the custody of the contract (which is often necessary to do in order to actually use the contract as was intended).

The common solution to this problem is to set permissions up so that smart contract changes can only be done through a multisig involving a few independent and trusted members in the community. This is a decent solution assuming that the contract managers have set up a deal with these multisig participants for them to be available to actually evaluate proposed updates to their code and approve (assuming it meets the pre-established rules that they have all publicly committed to) in a timely manner. Otherwise, it can be very disruptive (and perhaps even existentially so) to the operations of the contract.

I think it would be beneficial to create a contract to formalize this common solution and to provide a few different alternative paths to updating the smart contract that does not put burden of all contract upgrades on the multisig of trusted members.

For example, perhaps the contract manager could unilaterally update the smart contract after a 14 day delay. This could provide their users sufficient time to realize a change to the code was scheduled and to remove their tokens for the custody of the existing contract if they did not like (or were not provided sufficient information about) the new contract. Normally, the contract manager could publish the source code that deterministically compiles to the new compiled WebAssembly code they intend to deploy well ahead of time. That would give trusted auditors (or even just users who want to check the code diff themselves) enough time to evaluate whether the change were benign and still matching the accepted intent of the smart contract that users expect. Then when users of that smart contract were alerted of a change to the smart contract schedule for 14 days later, it would be relatively easy for them to see that the particular code update was already deemed by people they trust as a benign (or even greatly beneficial) change. Otherwise, they would have sufficient time to remove their tokens from the existing smart contract. (The number of 14 days could be adjusted as appropriate to meet the needs of the particular smart contract, but whatever number it is, it should be configured ahead of time and any updates to that delay must respect the existing delay configured.)

But sometimes it is not acceptable to wait 14 days to update the smart contract. There may be a critical bug that needs to be dealt with immediately otherwise users will lose funds. In that case, several other options exist.

First, if the smart contract had a functionality to freeze all operations other than user’s withdrawing their tokens, the contract managers could use that assuming the vulnerability wasn’t already exploited. Then they could use the regular 14 day delay process to fix the bug while still allowing users to withdraw their tokens if they didn’t like the upcoming change (or just because they wanted to get access to their tokens so they can use another DeFi product that wasn’t frozen in the meantime). But this can be too disruptive to the operations of the contract.

Alternatively, if the contract manager could get the multisig of pre-selected trusted code auditors to approve, they could instantly update the smart contract code and patch the bug with no downtime. The auditors would be selected so that users had trust that they would alone approve of such instant updates if the code changes matched the already committed intent of the existing smart contract (typically described using natural language say as a Ricardian contract).

Finally, some contracts may choose to enable a third option for themselves which allows them to instantly upgrade their contracts unilaterally. This allows them to deal with discovered bugs quickly with potentially very little to no downtime (though there may be some downtime for users as they figure out how to respond to the contract upgrade), and does not make them reliant on the time schedule of the pre-selected code auditors to do so. But it comes with the disadvantage that the users of that smart contract are at risk of a “rug pull”; in other words, the users would justifiably have a higher concern that the developers of this contract may become malicious and decide to steal the tokens in their custody. So enabling this third option is probably only sensible for contract managers who are very trusted by their users (or in an alternative scenario I will describe shortly). Even then, the contract upgrade management contract could provide an additional protection to the users beyond what the current status quo approach provides. And this additional protection makes this third option feasible even for contracts managed by untrusted people assuming the contract does not have custody of user tokens for longer than the duration of an atomic transaction.

The protection would ensure that users’ transactions interacting with this contract would only succeed if the sequence number of the contract deployment matched what was asserted in the transaction. This sequence number would increment each time the contract manager unilaterally and instantly updated their contract code. The client code constructing the transaction would always assert the last trusted sequence for that contract (which could be kept in blockchain state for redundancy/replication purposes if desired). So if the contract managers decided to update their contract to now try to steal all tokens sent to their contract, this protection would automatically cause all already in-flight signed transactions interacting with this contract to fail. Furthermore, if the client was attempting to interact with the new contract, it would inform the user that the contract had been updated and they would have to verify whether the new code was legitimate and approve it before they could proceed. This would force the user to notice a change occurred and encourage them to investigate within their trusted social network to see whether they should trust the new version of the contract or not.

Clearly using this protection makes using this third option somewhat disruptive for the users of the contract. Therefore, it would be desirable for the contract managers to use one of the prior two options normally instead. But this third option, if appropriate to enable for a particular contract (probably only if the contract does not maintain custody of user funds), could provide the contract manager another path to upgrading the contract that allows them to immediately upgrade the code to fix critical bugs without needing to depend on the timely participation of others.

tbfleming · May 16, 2021, 8:00pm

Breaking up the current native layer could happen gradually over time. Parts of nodeos could move from native to wasm, but this could initially appear as an implementation detail, invisible to the other nodes, enabling performance and compatibility testing. At some point, when most nodes have moved over, the newer version can start accepting on-chain wasm updates. Initially bug and performance fixes, but at some point an upgrade which forks away from non-upgraded nodes.

tbfleming · May 16, 2021, 8:22pm

My reply above is not an endorsement of changing the way producers are selected. I believe that should be a matter separate from improving the capabilities of the tech stack.

arhag · May 17, 2021, 1:45pm

Great point. While I do have personal views (and I will repeat those are my own personal views only) on how I think BP selection should be done, I think it worth pointing out as you did that the separation of the current breakdown of code/responsibilities between the EOSIO protocol and the system contracts into layers 1 to 3 as I describe in this proposal is a separate matter (really a technical one) from other governance decisions suggested in this post (such as how BPs could be selected, or how to divide up their powers into different groups).