Lessons from the DAO incident

Published on: 23 June, 2016

By Sergio D. Lerner

One of our goals at RSK has been to focus on real-world applications of the smart-contract technology. Although we are excited to see radical new approaches to very old trust problems such as DAOs, we think that Bitcoin, the blockchain, smart-wallets and crypto-assets hold enough disruption potential so that just with these innovations we can achieve truly financial inclusion. Most what’s needed to succeed is using simple and concise smart-building blocks which are easy to analyze, to reason about, to audit.

Last months we were a witness to the creation of The DAO with a crowdfunding campaign that surpassed 200M USD. However, I’m doubtful if The DAO authors were prepared to protect such amount of money, as the DAO was and still is and untested experiment. In this post, I will show you why the DAO experiment failed and which were the main reasons for such failure. Then we’ll analyze the RSK design and show how we’ve been working towards the minimization of such risk factors. Our main aim in RSK has been to provide layered security: where the failure of one layer does not imply a total collapse. Ethereum will survive the DAO fall. It will even get stronger. But it suffered. I hope the lessons are learned.

The Flying Plane analogy

When most of us started using Bitcoin (either by buying bitcoins or by being paid in bitcoin), Bitcoin was a working experiment: all its functionality was already live. Several times Bitcoin has been compared to a plane that must be repaired in mid-air. But Bitcoin is a plane that has been flying for seven years without landing, and that says a lot about its reliability. Contrary, when most people invested in the DAO, the code had never been executed. Would you trust your life to a software that handles an automatic pilot, if the software never flew a real plane? Sadly, that’s what the DAO investors did. They let themselves be pushed by the hype and forgot to do due diligence. One can point out that most DAO investors have invested very low amounts of money, so basically they were just a bet (I know highly qualified people that invested blindly). But there were several whales that invested millions in the DAO. So put some blame on the DAO incident on its investors.

Ethereum VM Design

One of the design goals of Ethereum was to simplify the specification of the consensus layer. That’s a noble goal, as it facilitates the re-implementation of the platform for different programming languages and constraints. But even if the minimum subset of instructions that enables Turing complete smart contracts is below 10, Ethereum did not limit itself to such minimal instruction set, for several reasons: (a) It reduces the performance considerably (b) it makes compiled code difficult to audit. So Ethereum has about 100 different opcodes. However it seems that for the sake of minimization the CALL opcode was overloaded with two functions: call a method of another contract, and send ether. But the semantics of these two functions and the contexts where each of these functions being used is very different. This lack of education was one of the factors that also led to the DAO hack. It is interesting to note that indirectly the VM already provides a mean to send ether without calling any function, by creating a temporary contract and using the suicide opcode, albeit with a much higher gas cost. This option leads to the simple conclusion that the VM should offer a SEND opcode that does not call any code, reducing the complexity of upper layers. One can argue that limiting the amount of gas offered for the call to 2300 gas has the side-effect that no other CALL can be performed, so it’s safe. This argument is false if we consider that the VM may later undergo hard-forks that may: reduce the cost of a CALL operation, or allow contracts to pay for its gas. So basically that solution is shortsighted, hides the real problem to the user and prevent future improvements. At RSK we’ve implemented a simple SEND opcode that does not call any code in the destination contract.

One can argue that most contracts will work with crypto-assets rather than ether, so sending an asset will be a call to the asset issuing contract. In these cases, the notation will be different, and the user may expect side-effects. The SEND opcode helps, but it is by no means the only required fix.

Solidity

Dynamically typed languages are well known to be much more difficult to prove correct (or even reason about) than statically typed languages. Also, the computational and memory costs of Ethereum smart-contracts make dynamically-typed languages object code much more expensive than statically-typed code. Solidity is a statically-typed language based on JavaScript. The choice of static-typing seems at a first glance to be good. However, it raises the question of why it was needed to invent a new programming language from scratch. A language that is notdomain specific and does not provide any special feature related to smart contracts. Aren’t existent languages not good enough? Some previous Smart-contract designs, such as QixCoin, were based on the emulation of a RISC processor core. Some newer designs emulate other processors: Codius run sandboxed x86 code, and Bloq Ora runs Moxie. Emulating an existing core enables the programmer to use standard mainstream languages and tested compilers. Solidity is immature: for example, I have a contract that makes Solidity segfault, and the reason is unclear. I’ve also detected a case where Solidity generates bad code. As people are starting to read the Ethereum contracts source code instead of the contract bytecode to audit them, the question of the quality of the compiler (and the risk of a tampered compiler) has become of extreme importance.

Getting back to the DAO incident, let’s see the DAO source code. We’re not going to discuss the vulnerability that has already been extensively analyzed, but we’re going to look at the two lines that may have been exploited by the hacker:

a) if (_recipient.call.value(_amount)()) {

b) if (p.splitData[0].newDAO.createTokenProxy.value(fundsToBeMoved)(msg.sender) == false)

In the first line, the untyped contract _recipient is sent _amount ether. No method is specified. In the second line, a particular method is called (createTokenProxy) sending fundsToBeMoved ether. Solity documentation 0.2.0 warns that adding .value(x) or .gas(y) only (locally) sets the value and amount of gas sent with the function call and only the parentheses at the end perform the actual call. If the .gas() Postfix is missing, the whole amount of unused gas is offered to the external call, so contract recursion is allowed (but this is not specified in the documentation). I believe that the call notation is flawed because is too unusual. Very few programming languages allow to modify a reference to a method by such modifiers (one of them is Python), and this construct is not widely used. So it becomes unclear what is the actual method called. It seems the method called is “value()” instead of “createTokenProxy.”

A lot of research has gone into creating useful languages that also help the programmer in understanding the code, golang, c# and Java are good examples of a right balances between human understanding, language expressiveness, and compact notation. Solidity does not seem to be one of them. However, I cannot recommend using Serpent, because there is less information and fewer examples.

Runtime help

It seems that the security of smart contracts was left entirely to the contract code itself. The VM does not provide any specific service to limit recursion, nor the Solidity runtime has any semaphore to prevent it. The recursion bug used in the DAO hack was known since 2014, so there was plenty of time to prepare. But it’s clear that these tools are still immature, and it will take years for high-quality tools to be developed. Decentralized development in Ethereum may have many benefits, but also it had bad consequences: it’s missing clear directives, security related upgrades became delayed forever.

At RSK we’re planning to provide a solidity runtime that prevents contract re-entries by default. Again, this does not solve the full problem but prevents most of the human mistakes.

Security Audits

Those who have worked doing security audits know that no single security audit can cover all possible vulnerabilities. Every security researcher or group tend to miss certain kind of problems, based on their experience. Even truer when there is a need for auditing code from a completely new perspective (smart-contracts), a new language (Solidity) and a new class of attacks (such as game-theoretic). The number and depth of security audits should be related to the amount of money the audited code must handle. For a new piece of code that should handle hundreds of million of dollars, several security audits, done by different teams, may be needed. Auditing is in fact what Ethereum did when it hired LeastAuthority, Dejavu, Coinspect for the platform audit. But The DAO creators did not, and the curators, which include some of the Ethereum founders, should have advised doing so.

Formalization

A security audit does not replace using formal models to design smart contracts and using static/dynamic verifiers to validate the correctness of the code. Formal methods always give stronger guarantees, even though the model may be difficult to get right. Also, domain specific languages (DSL) can prevent wide classes of bugs. The problem is that formalization is expensive, so it is many times disregarded. That’s another lesson to be learned: secure programming comes with a cost.

We expect new formalization tools specifically for RSK and Ethereum to emerge. However, many tools already exist for traditional programming languages such as Java (e.g. JML, KeY). That’s one of the reasons RSK is building a parallel toolchain so that smart contracts can be programmed in Java, and we expect that high-risk contracts to be developed using this toolchain.

Progressive Decentralization

In something as difficult as creating a DAO not only the correctness of the code, but the dynamics and possible flaws in the voting system cannot be easily predicted. Organization voting is a complex human process, and as such will require trial and error before it’s correctly formalized. A sensible approach is to start with a contract that can be easily and immediately upgraded by an (n,m) multi-signature possessed by a set of notaries, where n is a simple majority, and progressively increase n until complete notary consensus is required. Finally, when the contract is fully tested in real-work cases, the multisig feature is automatically removed after some time. Maybe the DAO would have less traction at the beginning, but in the long term, it increases the chances of success.

RSK adopted this criterion for the hybrid consensus system: it requires notary acknowledgments at the beginning, but the number of notary acks gets lower while the amount of merge-mining engagement rises. We call this approach progressive decentralization, and can be applied to smart contracts and consensus systems.

The Disregard of the Risks

Many times in the past when I handled a report of a security audit I’ve received responses claiming that there is no such vulnerability because the attack conditions are never met, or the attack is too difficult to be performed or fixing the problem would have a high impact on usability. Reality tends to show that those arguments are weak. Latent vulnerabilities exist because software changes, so a dubious piece of code that is not activated at one time can suddenly become vulnerable when the software is later updated. The complexity of attack is incorrectly estimated: the attacker correctly weights the profit against his work, much better than the victim, who does not know the attacker capabilities. The impact on usability may become irrelevant if the software is only usable for a short time until it is attacked and nobody ever tries to use it again. The recursive call problem is a clear case where the vulnerability was known and documented, but only a few people took it seriously.

Poor Documentation

Ethereum developers desperately need a specific site for the documentation of design patterns, common mistakes, misconceptions, and best security practices for smart-contract programming.

A resource area fully dedicated to smart contract security has been created and a number of researchers have already been invited to take part in establishing the standards. The mid-term plan is to create a platform for smart-contract hacking challenges and hosting hacking competitions (similar to CTF).

RSK Security Certifying Partners Program

At RSK we’re partnering with security companies to provide baseline security certification for smart contracts. These partnerships will enable startups to test their code against the most common smart-contract related flaws, as the first layer of security checks prior a full code audit. We invite all computer security companies interested in taking part of this partnership initiative to contact us at partners@rsk.co

Concluding Remarks

The smart contract field is still in its infancy and mistakes are inevitable. Until more tools and documentation are available, we should approach smart-contract programming with the “defense-in-depth paradigm”, while adding as many layers of protection as possible, to reduce the impact of vulnerabilities. Human errors are amplified by ambiguous language semantics and lack of documentation. We at RSK and the ecosystem as a whole are learning from the DAO incident, but at the same time, we’re confirming that the RSK approach on using a mature tested language toolset for contract development is the right path. We also encourage smart contract companies and platforms to perform security audits and investors/users to run rigorous due diligence.