Why Log Semantics Matter More Than Sequence Data In Detecting Anomalies

Table of links

Abstract

1 Introduction

2 Background and Related Work

2.1 Different Formulations of the Log-based Anomaly Detection Task

2.2 Supervised v.s. Unsupervised

2.3 Information within Log Data

2.4 Fix-Window Grouping

2.5 Related Works

3 A Configurable Transformer-based Anomaly Detection Approach

3.1 Problem Formulation

3.2 Log Parsing and Log Embedding

3.3 Positional & Temporal Encoding

3.4 Model Structure

3.5 Supervised Binary Classification

4 Experimental Setup

4.1 Datasets

4.2 Evaluation Metrics

4.3 Generating Log Sequences of Varying Lengths

4.4 Implementation Details and Experimental Environment

5 Experimental Results

5.1 RQ1: How does our proposed anomaly detection model perform compared to the baselines?

5.2 RQ2: How much does the sequential and temporal information within log sequences affect anomaly detection?

5.3 RQ3: How much do the different types of information individually contribute to anomaly detection?

6 Discussion

7 Threats to validity

8 Conclusions and References

6 Discussion

We discuss our lessons learned according to the experimental results.

Semantic information contributes to anomaly detection

The findings of this study confirm the efficacy of utilizing semantic information within log messages for log-based anomaly detection. Recent studies show classical machine learning models and simple log representation (vectorization) techniques can outperform complex DL counterparts [7, 23]. In these simple approaches, log events within log data are substituted with event IDs or tokens, and semantic information is lost. However, according to our experimental results, the semantic information is valuable for subsequent models to distinguish anomalies, while the event occurrence information is also prominent.

We call for future contributions of new, high-quality datasets that can be combined with our flexible approach to evaluate the influence of different components in logs for anomaly detection. ***The results of our study confirm the findings of recent works [16, 23]. Most anomalies may not be associated with sequential information within log sequences. The occurrence of certain log templates and the semantics within log templates contribute to the anomalies. This finding highlights the importance of employing new datasets to validate the recent designs of DL models (e.g., LSTM [10], Transformer [11]). Moreover, our flexible approach can be used off-the-shelf with the new datasets to evaluate the influences of different components and contribute to high-quality anomaly detection that leverages the full capacity of logs.

The publicly available log datasets that are well-annotated for anomaly detection are limited, which greatly hinders the evaluation and development of anomaly detection approaches that have practical impacts. Except for the HDFS dataset, whose anomaly annotations are session-based, the existing public datasets contain annotations for each log entry within log data, which implies the anomalies are only associated with certain specific log events or associated parameters within the events. Under this setting, the causality or sequential information that may imply anomalous behaviors is ignored.

7 Threats to validity

We have identified the following threats to the validity of our findings:

Construct Validity

In our proposed anomaly detection method, we adopt the Drain parser to parse the log data. Although the Drain parser performs well and can generate relatively accurate parsing results, parsing errors still exist. The parsing error may influence the generation of log event embedding (i.e., logs from the same log event may have different embeddings) and thus influence the performance of the anomaly detection model. To mitigate this threat, we pass some extra regular expressions for each dataset to the parser. These regular expressions can help the parser filter some known dynamic areas in log messages and thus achieve more accurate results.

Internal Validity There are various hyperparameters involved in our proposed anomaly detection model and experiment settings: 1) In the process of generating samples for both training and test sets, we define minimum and maximum lengths, along with step sizes, to generate log sequences of varying lengths. We do not have prior knowledge about the range of sequence length in which anomalies may reside. However, we set these parameters according to the common practices of previous studies, which adopt fixlength grouping. 2) The Transformer-based anomaly detection model entails numerous hyperparameters, such as the number of transformer layers, attention heads, and the size of the fully-connected layer. As the number of combinations is huge, we were not able to do a grid search. However, we referred to the settings of similar models and experimented with different combinations of hyperparameters, selecting the bestperforming combination accordingly.

External Validity

In this study, we conducted experiments on four public log datasets for anomaly detection. Some findings and conclusions obtained from our experimental results are constrained to the studied datasets. However, the studied datasets are the most used ones to evaluate the log-based anomaly detection models. They have become the standard of the evaluation. As the annotation of the log datasets demands a lot of human effort, there are only a few publicly available datasets for log-based anomaly detection tasks. The studied datasets are representative, thus enabling the findings to illuminate prevalent challenges within the realm of anomaly detection.

Reliability

The reliability of our findings may be influenced by the reproducibility of results, as variations in dataset preprocessing, hyperparameter tuning, and log parsing configurations across different implementations could lead to discrepancies. To mitigate this threat, we adhered to well-used preprocessing processes and hyperparameter settings, which are detailed in the paper. However, even minor differences in experimental setups or parser configurations may yield divergent outcomes, potentially impacting the consistency of the model’s performance across independent studies.

8 Conclusions and References

The existing log-based anomaly detection approaches have used different types of information within log data. However, it remains unclear how these different types of information contribute to the identification of anomalies. In this study, we first propose a Transformer-based anomaly detection model, with which we conduct experiments with different input feature combinations to understand the role of different information in detecting anomalies within log sequences. The experimental results demonstrate that our proposed approach achieves competitive and more stable performance compared to simple machine learning models when handling log sequences of varying lengths. With the proposed model and the studied datasets, we find that sequential and temporal information do not contribute to the overall performance of anomaly detection when the event occurrence information is present. The event occurrence information is the most prominent feature for identifying anomalies, while the inclusion of semantic information from log templates is helpful for anomaly detection models. Our results and findings generally confirm that of the recent empirical studies and indicate the deficiency of using the existing public datasets to evaluate anomaly detection methods, especially the deep learning models. Our work highlights the need to utilize new datasets that contain different types of anomalies and align more closely with real-world systems to evaluate anomaly detection models. Our flexible approach can be readily applied with the new datasets to evaluate the influences of different components and enhance anomaly detection by leveraging the full capacity of log information.

:::info
Supplementary information: The source code of the proposed method is publicly available in our supplementary material package 1.

:::

Acknowledgements

We would like to gratefully acknowledge the Natural Sciences and Engineering Research Council of Canada (NSERC, RGPIN-2021-03900) and the Fonds de recherche du Qu´ebec – Nature et technologies (FRQNT, 326866) for their funding support for this work.

References

[1] He, S., Zhu, J., He, P., Lyu, M.R.: Experience report: System log analysis for anomaly detection. In: 2016 IEEE 27th International Symposium on Software Reliability Engineering (ISSRE), pp. 207–218 (2016). IEEE

[2] Oliner, A., Ganapathi, A., Xu, W.: Advances and challenges in log analysis. Communications of the ACM 55(2), 55–61 (2012)

[3] He, S., He, P., Chen, Z., Yang, T., Su, Y., Lyu, M.R.: A survey on automated log analysis for reliability engineering. ACM computing surveys (CSUR) 54(6), 1–37 (2021)

[4] Zhu, J., He, S., Liu, J., He, P., Xie, Q., Zheng, Z., Lyu, M.R.: Tools and benchmarks for automated log parsing. In: 2019 IEEE/ACM 41st International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP), pp. 121–130 (2019). IEEE

[5] Chen, Z., Liu, J., Gu, W., Su, Y., Lyu, M.R.: Experience report: Deep learningbased system log analysis for anomaly detection. arXiv preprint arXiv:2107.05908 (2021)

[6] Nedelkoski, S., Bogatinovski, J., Acker, A., Cardoso, J., Kao, O.: Self-attentive classification-based anomaly detection in unstructured logs. In: 2020 IEEE International Conference on Data Mining (ICDM), pp. 1196–1201 (2020). IEEE

[7] Wu, X., Li, H., Khomh, F.: On the effectiveness of log representation for log-based anomaly detection. Empirical Software Engineering 28(6), 137 (2023)

[8] Xu, W., Huang, L., Fox, A., Patterson, D., Jordan, M.I.: Detecting large-scale system problems by mining console logs. In: Proceedings of the ACM SIGOPS 22nd Symposium on Operating Systems Principles, pp. 117–132 (2009)

[9] Lou, J.-G., Fu, Q., Yang, S., Xu, Y., Li, J.: Mining invariants from console logs for system problem detection. In: 2010 USENIX Annual Technical Conference (USENIX ATC 10) (2010) 1https://github.com/mooselab/suppmaterial-CfgTransAnomalyDetector 21

[10] Du, M., Li, F., Zheng, G., Srikumar, V.: Deeplog: Anomaly detection and diagnosis from system logs through deep learning. In: Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, pp. 1285–1298 (2017)

[11] Le, V.-H., Zhang, H.: Log-based anomaly detection without log parsing. In: 2021 36th IEEE/ACM International Conference on Automated Software Engineering (ASE), pp. 492–504 (2021). IEEE

[12] Guo, H., Yang, J., Liu, J., Bai, J., Wang, B., Li, Z., Zheng, T., Zhang, B., Peng, J., Tian, Q.: Logformer: A pre-train and tuning pipeline for log anomaly detection. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 38, pp. 135–143 (2024)

[13] He, S., Lin, Q., Lou, J.-G., Zhang, H., Lyu, M.R., Zhang, D.: Identifying impactful service system problems via log analysis. In: Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, pp. 60–70 (2018)

[14] Farzad, A., Gulliver, T.A.: Unsupervised log message anomaly detection. ICT Express 6(3), 229–237 (2020)

[15] Le, V.-H., Zhang, H.: Log-based anomaly detection with deep learning: how far are we? In: 2022 IEEE/ACM 44th International Conference on Software Engineering (ICSE), pp. 1356–1367 (2022). IEEE

[16] Landauer, M., Skopik, F., Wurzenberger, M.: A critical review of common log data sets used for evaluation of sequence-based anomaly detection techniques. Proceedings of the ACM on Software Engineering 1(FSE), 1354–1375 (2024)

[17] Zhu, J., He, S., He, P., Liu, J., Lyu, M.R.: Loghub: A large collection of system log datasets for ai-driven log analytics. In: 2023 IEEE 34th International Symposium on Software Reliability Engineering (ISSRE), pp. 355–366 (2023). IEEE

[18] Bodik, P., Goldszmidt, M., Fox, A., Woodard, D.B., Andersen, H.: Fingerprinting the datacenter: automated classification of performance crises. In: Proceedings of the 5th European Conference on Computer Systems, pp. 111–124 (2010)

[19] Chen, M., Zheng, A.X., Lloyd, J., Jordan, M.I., Brewer, E.: Failure diagnosis using decision trees. In: International Conference on Autonomic Computing, 2004. Proceedings., pp. 36–43 (2004). IEEE

[20] Liang, Y., Zhang, Y., Xiong, H., Sahoo, R.: Failure prediction in ibm bluegene/l event logs. In: Seventh IEEE International Conference on Data Mining (ICDM 2007), pp. 583–588 (2007). IEEE

[21] Guo, H., Yuan, S., Wu, X.: Logbert: Log anomaly detection via bert. In: 2021 22 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2021). IEEE

[22] Lin, Q., Zhang, H., Lou, J.-G., Zhang, Y., Chen, X.: Log clustering based problem identification for online service systems. In: Proceedings of the 38th International Conference on Software Engineering Companion, pp. 102–111 (2016)

[23] Yu, B., Yao, J., Fu, Q., Zhong, Z., Xie, H., Wu, Y., Ma, Y., He, P.: Deep learning or classical machine learning? an empirical study on log-based anomaly detection. In: Proceedings of the 46th IEEE/ACM International Conference on Software Engineering, pp. 1–13 (2024)

[24] He, P., Zhu, J., Zheng, Z., Lyu, M.R.: Drain: An online log parsing approach with fixed depth tree. In: 2017 IEEE International Conference on Web Services (ICWS), pp. 33–40 (2017). IEEE

[25] Reimers, N., Gurevych, I.: Sentence-bert: Sentence embeddings using siamese bert-networks. arXiv preprint arXiv:1908.10084 (2019)

[26] Face, H.: all-MiniLM-L6-v2 Model. Accessed: April 8, 2024. https://huggingface. co/sentence-transformers/all-MiniLM-L6-v2

[27] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, L., Polosukhin, I.: Attention is all you need. Advances in neural information processing systems 30 (2017)

[28] Irie, K., Zeyer, A., Schl¨uter, R., Ney, H.: Language modeling with deep transformers. arXiv preprint arXiv:1905.04226 (2019)

[29] Haviv, A., Ram, O., Press, O., Izsak, P., Levy, O.: Transformer language models without positional encodings still learn positional information. In: Goldberg, Y., Kozareva, Z., Zhang, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2022, pp. 1382–1390. Association for Computational Linguistics, Abu Dhabi, United Arab Emirates (2022). https://doi.org/10.18653/v1/ 2022.findings-emnlp.99 . https://aclanthology.org/2022.findings-emnlp.99

[30] Kazemi, S.M., Goel, R., Eghbali, S., Ramanan, J., Sahota, J., Thakur, S., Wu, S., Smyth, C., Poupart, P., Brubaker, M.: Time2vec: Learning a vector representation of time. arXiv preprint arXiv:1907.05321 (2019)

[31] Devlin, J., Chang, M.-W., Lee, K., Toutanova, K.: Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)

[32] Oliner, A., Stearley, J.: What supercomputers say: A study of five system logs. In: 37th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN’07), pp. 575–584 (2007). IEEE

:::info
Authors:

Xingfang Wu
Heng Li
Foutse Khomh

:::

:::info
This paper is available on arxiv under CC by 4.0 Deed (Attribution 4.0 International) license.

:::

Why Log Semantics Matter More Than Sequence Data in Detecting Anomalies | HackerNoon