Authors:
(1) Md Mainuddin, Department of Computer Science, Florida State University, Tallahassee, FL 32306 ([email protected]);
(2) Zhenhai Duan, Department of Computer Science Florida State University Tallahassee, FL 32306 ([email protected]);
(3) Yingfei Dong, Department of Electrical Engineering, University of Hawaii Honolulu, HI 96822 USA ([email protected]).
Table of Links
Abstract and 1. Introduction
2. Related Work
3. Background on Autoencoder and SPRT and 3.1. Autoencoder
3.2. Sequential Probability Ratio Test
4. Design of CUMAD and 4.1. Network Model
4.2. CUMAD: Cumulative Anomaly Detection
5. Evaluation Studies and 5.1. Dataset, Features, and CUMAD System Setup
5.2. Performance Results
6. Conclusions and References
Abstract—IoT devices fundamentally lack built-in security mechanisms to protect themselves from security attacks. Existing works on improving IoT security mostly focus on detecting anomalous behaviors of IoT devices. However, these existing anomaly detection schemes may trigger an overwhelmingly large number of false alerts, rendering them unusable in detecting compromised IoT devices. In this paper we develop an effective and efficient framework, named CUMAD, to detect compromised IoT devices. Instead of directly relying on individual anomalous events, CUMAD aims to accumulate sufficient evidence in detecting compromised IoT devices, by integrating an autoencoder-based anomaly detection subsystem with a sequential probability ratio test (SPRT)-based sequential hypothesis testing subsystem. CUMAD can effectively reduce the number of false alerts in detecting compromised IoT devices, and moreover, it can detect compromised IoT devices quickly. Our evaluation studies based on the public-domain N-BaIoT dataset show that CUMAD can on average reduce the false positive rate from about 3.57% using only the autoencoder-based anomaly detection scheme to about 0.5%; in addition, CUMAD can detect compromised IoT devices quickly, with less than 5 observations on average.
1. Introduction
In recent years Internet of Things (IoT) devices have been increasingly integrated into our daily lives and our society, with notable example environments such as smart homes, healthcare, transportation, and power grid. On one hand, this rapid development helps to improve the quality and efficiency of our daily lives. On the other hand, this same development also poses potentially unprecedented security and privacy challenges on the Internet, given that most of these IoT devices are low-cost systems with limited computation, memory, and energy resources. These devices often lack proper built-in security mechanisms to protect themselves and are vulnerable to various security attacks.
Many security attacks targeting or based on IoT devices have been reported in the past [1]. In response to the growing problems of IoT security, government agencies such as US NIST have developed many recommendations that manufacturers should adopt to mitigate the security risks associated with IoT devices [2]. In addition, many research efforts have been carried out to improve IoT security, including both proactive approaches to enhancing security mechanisms of IoT devices and more reactive solutions to monitor IoT device behaviors to detect rogue or infected IoT devices [3].
Although some of the recommendations, for example, avoiding default common credentials, are relatively easy to be incorporated into IoT device manufacturing and certainly help mitigate IoT security risks, IoT devices are still fundamentally vulnerable to security attacks. As low-cost systems, IoT devices are inherently constrained in resources to support advanced security mechanisms. In addition, from the perspectives of both manufacturers and users, there are often conflicting objectives of IoT device usability and security, which often discourage the adoption of advanced security mechanisms in IoT devices.
Given these constraints of deploying advanced security mechanisms on IoT devices, network-based solutions have attracted a great amount of research efforts in recent years [3]. In particular, many machine learning (ML) based methods have been developed in detecting anomalous network behaviors of IoT devices [3]. (In this paper we use the term ML to refer to both traditional machine learning algorithms such as SVM and deep learning (DL) algorithms such as RNN.) However, most existing solutions only targeted the problem of anomaly detection in IoT devices [4], instead of detecting compromised IoT devices. Although detecting individual anomalies is of critical importance in certain application domains [5], we note that these solutions may not be directly translated into the detection of compromised IoT devices. Given the large amount of network traffic, even a small false positive rate of an anomaly detection method can often translate into a large number of false alerts, rendering the detection method unusable in detecting compromised IoT devices in the real-world deployment.
In this paper we develop an effective and efficient framework to detect compromised IoT devices, named CUMAD (cumulative anomaly detection). In essence, CUMAD integrates an autoencoder-based anomaly detection subsystem with a sequential probability ratio test (SPRT)-based sequential hypothesis testing subsystem [6], [7]. In CUMAD, the normal behavior of each IoT device is learnt and modeled by an autoencoder. During the training of an autoencoder model, it learns a latent space representation of the training data. More importantly, due to the nature of autoencoder, it excels at reconstructing inputs that are similar to the data used in training the model, but performs poorly when the new data is very different from the training data, manifested as large reconstruction errors. Although autoencoder has been mainly used in dimensionality reduction and feature learning in the past, in recent years it has also attracted a great amount of interests in anomaly detection in many different application domains.
Instead of focusing on individual anomalous events detected by autoencoder, CUMAD aims to accumulate sufficient evidence to detect if an IoT device has been compromised. In CUMAD, the output of the autoencoder-based anomaly detection subsystem is fed into an SPRT-based sequential hypothesis testing subsystem. Unlike traditional probability ratio test methods that require a pre-defined fixed number of observations to reach a decision, SPRT works in an online manner and updates as observations arrive sequentially. SPRT reaches a conclusion whenever sufficient evidence has been observed. Therefore, SPRT can make a decision quickly (and consequently, CUMAD can detect compromised IoT devices quickly).
In this paper we develop the CUMAD framework, and we also evaluate the performance of CUMAD using a public-domain IoT dataset N-BaIoT [8], which contains both benign and (Mirai and Bashlite) attack traffic of IoT devices. Our evaluation studies show that CUMAD can greatly improve the performance in detecting IoT devices in terms of false positive rates, for example, compared to the simple autoencoder-based anomaly detection system, CUMAD on average reduces the false positive rate from about 3.57% to 0.5%, representing about 7 times performance improvement in terms of false positive rate of the systems. In addition, CUMAD can detect a compromised IoT device quickly, with less than 5 sequential observations on average. We note that although both autoencoder and SPRT have been proposed in developing anomaly detection systems before, to our knowledge, we are the first to integrate the two techniques to detect compromised IoT devices, instead of being used separately for anomaly detection. In addition, we are the first to introduce the notion of cumulative anomaly in detecting compromised IoT devices (see Section 2 for more details).
The remainder of the paper is organized as follows. In Section 2 we discuss related work. We present the background on autoencoder and SPRT in Section 3. We describe the design of CUMAD in Section 4, and evaluate its performance in Section 5. We conclude the paper in Section 6.
This paper is available on arxiv under CC by 4.0 Deed (Attribution 4.0 International) license.