Authors:
(1) Diwen Xue, University of Michigan;
(2) Reethika Ramesh, University of Michigan;
(3) Arham Jain, University of Michigan;
(4) Arham Jain, Merit Network, Inc.;
(5) J. Alex Halderman, University of Michigan;
(6) Jedidiah R. Crandall, Arizona State University/Breakpointing Bad;
(7) Roya Ensaf, University of Michigan.
Table of Links
Abstract and 1 Introduction
2 Background & Related Work
3 Challenges in Real-world VPN Detection
4 Adversary Model and Deployment
5 Ethics, Privacy, and Responsible Disclosure
6 Identifying Fingerprintable Features and 6.1 Opcode-based Fingerprinting
6.2 ACK-based Fingerprinting
6.3 Active Server Fingerprinting
6.4 Constructing Filters and Probers
7 Fine-tuning for Deployment and 7.1 ACK Fingerprint Thresholds
7.2 Choice of Observation Window N
7.3 Effects of Packet Loss
7.4 Server Churn for Asynchronous Probing
7.5 Probe UDP and Obfuscated OpenVPN Servers
8 Real-world Deployment Setup
9 Evaluation & Findings and 9.1 Results for control VPN flows
9.2 Results for all flows
10 Discussion and Mitigations
11 Conclusion
12 Acknowledgement and References
Appendix
6 Identifying Fingerprintable Features
In this section, we identify three features that fingerprint OpenVPN, exploiting byte pattern, packet length, and server behaviors, respectively.
6.1 Opcode-based Fingerprinting
As shown in Figure 3, each OpenVPN packet has a header of 24 bits in TCP mode or 8 bits in UDP mode, which is not part of the encrypted payload. Each OpenVPN header starts with an opcode that specifies the message type of the current packet and a key ID that refers to a (new) TLS session. The opcode field can take over 10 defined values, corresponding to message types transmitted during different communication stages. A typical OpenVPN session starts with the client sending a Client Reset packet. The server then responds with a Server Reset packet, and a TLS handshake follows. OpenVPN packets that carry TLS ciphertexts have P_Control as their message type. Since OpenVPN can run over UDP but has to provide a reliable channel for TLS, each P_Control packet is explicitly acknowledged by P_ACK packets. Finally, actual payloads are transmitted as P_Data packets. Figure 1 illustrates this packet exchange with opcode annotations.
A packet field taking a fixed number of values can be easy to fingerprint and has been exploited before against other protocols [1]. We fingerprint OpenVPN’s handshake sequence by analyzing each opcode byte for the first N packets of a flow (the threshold N is explored in Section 7.2). Algorithm 1 shows the process of opcode fingerprinting, with Opcode referring to the sequence of N opcode values found in the first N packets of a given flow. Briefly, the filter flags a flow if the number of different opcodes observed accords with the protocol and the Client and Server Resets are not seen once the handshake is completed.
Previous work and existing open-source DPIs [23, 29, 37, 75] considered statically matching opcode values and packet sizes based on the protocol specification. In contrast, we propose to dynamically capture the variation in opcode values that reflects the establishment of OpenVPN sessions. Notably, our heuristics do not require exact matching of opcode values or packet length (e.g., do not require the third byte of the first packet to be 0x38), thereby ensuring it works effectively against XOR-obfuscated flows. The XOR obfuscation masks packet payloads to ensure that the opcode bytes are altered. Notably, according to the specification [36], when it reverses the packet as one of the obfuscation steps, it excludes the first character of the buffer (where the opcode byte is located) from reversal, as shown in Figure 4. As such, the opcode byte is always XOR-ed with the same byte of the XOR key, and the same opcodes would be mapped to the same value after obfuscation. This behavior is preserved when Tunnelblick (a number of unique opcodes seen so far, our heuristics are more flexible and target various XOR-based obfuscations of OpenVPN.