Table of Links
Abstract and 1. Introduction
-
Background
2.1 Ethereum Primer
2.2 Whitelisted Address Verification
2.3 Taint Analysis on Smart Contracts and 2.4 Threat Model
-
Motivating Example and Challenges
3.1 Motivating Example
3.2 Challenges
3.3 Limitations of Existing Tools
-
Design of AVVERIFIER and 4.1 Overview
4.2 Notations
4.3 Component#1: Code Grapher
4.4 Component#2: EVM Simulator
4.5 Component#3: Vulnerability Detector
-
Evaluation
5.1 Experimental Setup & Research Questions
5.2 RQ1: Effectiveness & Efficiency
5.3 RQ2: Characteristics of Real-world Vulnerable Contracts
5.4 RQ3: Real-time Detection
-
Discussion
6.1 Threats to Validity and 6.2 Limitations
6.3 Ethical Consideration
-
Related Work
-
Conclusion, Availability, and References
4.4 Component#2: EVM Simulator
Based upon the F passed by the Grapher, the Simulator is responsible for: 1) maintaining the data structures required by EVM, and 2) updating the taint information. Specifically, S is composed of all user-controllable variables, e.g., ORIGIN, CALLDATA, and BALANCE. For each opcode, the Simulator specifies a set of rules to update both data structures and taint information in a static way. Without loss of generality, EVM opcodes are classified into four categories: stack related, load & store related, call related, and control flow related.
4.4.1 Stack Related Opcodes
4.4.2 Load & Store Related Opcodes
As we mentioned in §2.1, memory and storage are maintained by EVM. Data in them are organized in different ways. Specifically, storage operates like a key-value dictionary, i.e., the values are retrieved by the given key [3]. As for the memory, it can be seen as a flat and contiguous array of bytes. Accessing to an item in memory is achieved by a calculated offset. However, in Ethereum smart contracts, a memory is always sparse. Thus, we are motivated by the method raised by He et al. [41], i.e., abstracting the memory area as a key-value pairs, where key is the offset and value is the corresponding data. To this end, data in memory and offset can be retrieved and indexed uniformly, like Sto[key] or Mem[o f f set].
Load Related Opcodes. This includes MLOAD and SLOAD, i.e., load from memory or storage, respectively. Both of them take a single argument, tar, i.e., the target address, and return ret, indicating the retrieved data. The taint rules are defined as:
That is to say, if any of the val or dest is tainted, the final stored val will be tainted. When maintaining the taint dependency relation, we only consider if val depends on dest. There does not exist an edge from val to val.
4.4.3 Call Related Opcodes
4.4.4 Control Flow Related Opcodes
According to the specification, some of them do not take arguments, like RETURN, STOP, REVERT, and INVALID. Therefore, no taint marks are propagated when simulating these instructions. The Simulator only performs the control flow simulation on them. For example, when the Simulator encounters the INVALID instruction, it halts the simulation of the current path and moves to the next one.
JUMP and JUMPI opcodes are crucial in handling taint propagation as they take the jump destination to explicitly alter the control flow, where JUMPI takes an argument as the condition. Because JUMP can be seen as a special edition of JUMPI, we demonstrate the handling on JUMPI in the following. Specifically, JUMPI takes two arguments, i.e., dest and cond, referring to the jump destination and the jump condition, respectively. It returns no value and directly performs the control flow jump. For each JUMPI, there are two following branches, fallthrough and jumpdest. The former one corresponds to the successive opcode, which is executed once the condition is not met, while the latter one is executed when the condition meets. According to the specification [66], dest should not rely on any arguments, i.e., determined statically during the compilation. Therefore, according to whether the cond is determined dynamically, the Simulator adopts the following rules:
-
If the cond is a concrete number, either fallthrough or jumpdest is chosen by the Simulator deterministically.
-
Otherwise, two paths are all feasible.
(a) If the ancestor of the cond is CALLDATALOAD, prioritize the jumpdest path, i.e., cond = True, to emulate attackers have successfully bypassed the examination.
(b) Otherwise, it takes two paths into consideration.
The branch prioritization in step 2(a) is heuristic. Specifically, once a condition is tainted from a source, it means that it is totally controllable for attackers, like L17 in Listing 3, i.e., attackers can always bypass the assertion by constructing the owner function. On bytecode level, there is a conditional jump at L17, one is to L18 associated with a met condition, another is not shown in Listing 3, indicating the require at L17 fails. Thus, the heuristic in step 2(a) prioritizes the branch to L18, i.e., attackers successfully bypass verifications. Compared to symbolic execution, which collects path conditions along the simulation and queries the back-end SMT solver to obtain a concrete set of solutions, the advantages of this heuristic are twofold. On the one hand, collecting path conditions and asking for solving is time- and resourceconsuming, which is the greatest bottleneck [26]. On the other hand, symbolic execution sometimes even cannot handle complex address verification logic. For example, when handling the L17 in Listing 3, the symbolic executor has to conduct an inter-contract analysis to IVisor to obtain the value of owner(), which cannot be completed within an acceptable time. Though step 2(b) may introduce some false positives, it is a tradeoff between the efficiency and effectiveness. In §5.2, it proves its effectiveness and practicability.
As for the taint propagation rules, we should note that no values are returned by these opcodes. Moreover, they have no side effects on both memory and storage. In summary, no taint marks are propagated through these opcodes.
Authors:
(1) Tianle Sun, Huazhong University of Science and Technology;
(2) Ningyu He, Peking University;
(3) Jiang Xiao, Huazhong University of Science and Technology;
(4) Yinliang Yue, Zhongguancun Laboratory;
(5) Xiapu Luo, The Hong Kong Polytechnic University;
(6) Haoyu Wang, Huazhong University of Science and Technology.