Authors:
(1) Md Mainuddin, Department of Computer Science, Florida State University, Tallahassee, FL 32306 ([email protected]);
(2) Zhenhai Duan, Department of Computer Science Florida State University Tallahassee, FL 32306 ([email protected]);
(3) Yingfei Dong, Department of Electrical Engineering, University of Hawaii Honolulu, HI 96822 USA ([email protected]).
Table of Links
Abstract and 1. Introduction
2. Related Work
3. Background on Autoencoder and SPRT and 3.1. Autoencoder
3.2. Sequential Probability Ratio Test
4. Design of CUMAD and 4.1. Network Model
4.2. CUMAD: Cumulative Anomaly Detection
5. Evaluation Studies and 5.1. Dataset, Features, and CUMAD System Setup
5.2. Performance Results
6. Conclusions and References
The problem of anomaly detection has been studied in many different application domains and many techniques have been proposed, based on statistical inference, data mining, signal processing, and recently machine learning, among others. We note that in the literature of anomaly detection, anomalies have been classified into three categories: point anomaly, contextual anomaly, and collective anomaly [5]. However, they are all concerned with the detection of individual anomalous events, which are different from the cumulative anomaly we consider in this paper. In cumulative anomaly we are more concerned with the cause of anomalous events (for example, compromised IoT device), instead of individual anomalous events. As a consequence, we need to accumulate sufficient evidence (individual anomalous events) to reach a conclusion (for example, if an IoT device is compromised) in cumulative anomaly detection.
Given the importance of improving IoT security, many security attack detection techniques have been proposed, including various ML-based solutions [3], [9]. However, some of them required the training data of both benign and attack traffic. They cannot detect new security attacks. Others developed anomaly detection based schemes to detect anomalous traffic originated from IoT devices. However, as we have discussed in Section 1, they often trigger a large number of false alerts, rendering them unusable in detecting compromised IoT devices in the real-world deployment.
In [10], Gelenbe and Nakip developed an online scheme CDIS to detect compromised IoT devices based on autoassociative learning. However, the design of CDIS was tailored to Mirai botnet, and may not be effective to detect other types of compromised IoT devices. In addition, CDIS still only targeted individual anomalous events, instead of cumulative anomaly detection as we perform in this paper. The authors of [11] developed a federated self-learning based scheme D¨IoT to detect compromised IoT devices, where local security gateways communicate with remote IoT Security Service to build a more comprehensive normal traffic model of IoT devices. In order to further reduce the false alerts generated by the aggregated anomaly detection model, a window-based scheme was adopted, where anomaly alarm was triggered only if the fraction of anomalous packets was greater than a pre-defined threshold value. In [8], Meidan et al. presented an autoencoder-based anomaly detection system N-BaIoT to detect compromised IoT devices. N-BaIoT also tried to reduce the number of false alerts triggered by the pure anomaly detection system using a window-based scheme with a majority vote to reach a decision.
3. Background on Autoencoder and SPRT
In this section we provide the necessary background on autoencoder and sequential probability ratio test (SPRT) for understanding the development of the proposed CUMAD framework. We refer interested readers to [6] and [7], respectively, for the detailed treatment on these two topics.
3.1. Autoencoder
Autoencoder is an unsupervised neutral network that aims to reconstruct the input at the output. Figure 1 illustrates a simple standard (undercomplete) autoencoder.
An autoencoder can be considered as consisting of two components: an encoder f and an decoder g. Given input data x, the encoder function f maps x to a latent-space representation, or code h, that is h = f(x). Using the corresponding code h as the input, the decoder function g tries to reconstruct the original input x at its output x ′, that is, x′ = g(h). Combining both the encoder function and decoder function together, we have x′ = g(f(x)). Let L(x, x′) be the reconstruction error, that is, the difference between x and x′. The autoenceder aims to minimize L(x, x ′). We note that there are different definitions of L(x, x′) and one of the most common definitions is the mean squared errors (MSE). We note that in the example autoencoder of Figure 1, both the encoder and decoder have only one hidden layer. This is only for illustration purpose. In reality they can have many hidden layers, depending on the specific application requirement.
Autoencoders have been traditionally used in applications of dimensionality reduction and feature learning, by focusing on the compressed code of an autoencoder, which holds the latent-space representation of the original data. On the other hand, autoencoders also possess a few desired properties, making them an attractive candidate for anomaly detection. For example, an autoencoder is able to extract the salient features of the original data to remove dependency in the original data. More importantly, an autoencoder can only learn the properties or distributions of the data that it has seen during the training stage, that is, the data points in the training dataset. It excels at reconstructing data that are similar to the training data, but performs poorly on data that are very different from the training data, in terms of the reconstruction error L(x, x′).
This is an appealing property of autoencoders in the application of anomaly detection. For example, in the context of detecting compromised IoT devices, we can establish the normal behavioral model of an IoT device using an autoencoder by training it with benign network traffic before the device has been compromised. We can continue monitoring the IoT device by passing the corresponding network traffic of the device into the trained model. If the reconstruction error is no greater than a pre-specified threshold, we consider the corresponding network traffic to be benign. When the reconstruction error is greater than the threshold, we claim that the network traffic is anomalous.
This paper is available on arxiv under CC by 4.0 Deed (Attribution 4.0 International) license.