Table of Links
Abstract and 1. Introduction
-
Background
2.1 Ethereum Primer
2.2 Whitelisted Address Verification
2.3 Taint Analysis on Smart Contracts and 2.4 Threat Model
-
Motivating Example and Challenges
3.1 Motivating Example
3.2 Challenges
3.3 Limitations of Existing Tools
-
Design of AVVERIFIER and 4.1 Overview
4.2 Notations
4.3 Component#1: Code Grapher
4.4 Component#2: EVM Simulator
4.5 Component#3: Vulnerability Detector
-
Evaluation
5.1 Experimental Setup & Research Questions
5.2 RQ1: Effectiveness & Efficiency
5.3 RQ2: Characteristics of Real-world Vulnerable Contracts
5.4 RQ3: Real-time Detection
-
Discussion
6.1 Threats to Validity and 6.2 Limitations
6.3 Ethical Consideration
-
Related Work
-
Conclusion, Availability, and References
Considering the aim of implementing a lightweight and effective detector for address verification vulnerability, we exclude the analyzers that adopt dynamic analysis. The reasons for this decision are twofold. On the one hand, dynamic analysis requires a runtime environment for execution, which is resource-intensive and time-consuming, contrary to our lightweight goal. On the other hand, dynamic analysis identifies vulnerabilities with generated test cases and oracles. For intricate contracts, especially for inter-contract analysis, this method might not always cover all vulnerable paths, potentially resulting in false negatives. Consequently, static methods are considered, including pattern-based matching, symbolic execution, and taint analysis. To the best of our knowledge, no existing tools can be directly deployed to detect this vulnerability. Although we can extend them with the ability to detect the vulnerability, we observe some intrinsic limitations.
Pattern-based Matching. It relies on heuristic rules summarized by developers. Lots of preliminary tools adopt this method, identifying vulnerabilities according to opcode sequences [37], transaction histories [5], and call traces [23]. However, such a manual process is unsound and error-prone, which cannot even handle C1, i.e., identifying semantics of bytecode sequences. Consequently, the constant update of Solidity syntax [75] and compilation toolchain [30] make this kind of analyzers ineffective.
Symbolic Execution & Model Checking. Both techniques are widely adopted in identifying vulnerabilities in software analysis. Taking advantages of abstraction on programs, symbolic executors and model checkers can recover some semantic information to overcome C1. However, they are inherently limited by the efficiency issue, mainly brought by the path/state explosion. For example, when dealing with the example shown in Listing 3, both of them should not only traverse all possible functions through the entry to find the depositlike functions, but also get stuck from L16 to L18, where they would try all possible contracts. Because they cannot effectively and efficiently conduct the inter-contract or even inter-procedural analysis, C2 is the main challenge hiders the adoption of them. Moreover, they try to model the memory area in a precise way, i.e., considering every bit in the memory. Although such a precise modeling benefits to the soundness of analyzing results, it comes at the cost of efficiency. In the context of address verification vulnerability, we only need to focus on the propagation of address parameters in the memory area instead of concrete data.
Taint Analysis. Existing taint analyzers cannot be directly adopted to identify this vulnerability. Firstly, some of them rely on the source code [31]. In Ethereum, such contracts only account for less than 1%. Secondly, some of them are limited by their adopted methods when collecting the taint propagation information and tracking taint propagation. For instance, Sereum [66] uses dynamic analysis to monitor execution at runtime. Furthermore, Mythril [70] and Osiris [73] both adopt symbolic execution to statically collect information. Mythril tries to traverse all feasible paths, while Osiris is only intraprocedural. Ethainter [14] is a strong contender. However, it exhibits proficiency in addressing C1 and C2, but it suffers due to the intricacies of EVM’s linear memory model. Specifically, it cannot properly handle dynamic memory allocations and deallocations in EVM’s memory, which potentially compromises the accuracy of its taint tracking, especially when contracts execute complex memory operations or utilize memory for storing and manipulating intermediate data (like the three memory verification mechanisms mentioned in §2.2).
Our Key Idea. As aforementioned, symbolic execution and taint analysis can effectively recover the semantics and conduct the inter-procedural analysis on control flow and data flow dependencies to some extent. However, considering the soundness of the analysis, symbolic execution always suffers the efficiency issue. To this end, we decided to adopt static EVM simulation based taint analysis, which tracks the taint propagation through a static simulation on the bytecode. To reduce the false positives introduced by static simulation and filter out paths that can lead to exploitation, we design a threestage detector corresponding to the three principles mentioned in §3.1. Moreover, to overcome the memory issue suffered by Ethainter, we decide to design the memory model motivated by He et al. [41] to abstract the sparse linear memory into key-value pairs. Consequently, our design can address both C1 and C2. Furthermore, AVVERIFIER simulates stack and memory through a self-implemented EVM simulator to capture the propagation of addresses, ensuring efficiency for non-dynamic data. As for the dynamic memory parameters, AVVERIFIER conservatively treats them as symbols and only guarantees the stack balance to avoid the negative influence brought by enumerating all possible values.
Authors:
(1) Tianle Sun, Huazhong University of Science and Technology;
(2) Ningyu He, Peking University;
(3) Jiang Xiao, Huazhong University of Science and Technology;
(4) Yinliang Yue, Zhongguancun Laboratory;
(5) Xiapu Luo, The Hong Kong Polytechnic University;
(6) Haoyu Wang, Huazhong University of Science and Technology.