Authors:
(1) Mononito Goswami, Auton Lab, Robotics Insititute, Carnegie Mellon University, Pittsburgh, USA ([email protected])
(2) Konrad Szafer, Auton Lab, Robotics Institute, Carnegie Mellon University, Pittsburgh, USA, with equal contribution, order decided using a random generator;
(3) Arjun Choudhry, Auton Lab, Robotics Institute, Carnegie Mellon University, Pittsburgh, USA, with equal contribution, order decided using a random generator;
(4) Yifu Cai, Auton Lab, Robotics Institute, Carnegie Mellon University, Pittsburgh, USA;
(5) Shuo Li, University of Pennsylvania, Philadelphia, USA;
(6) Artur Dubrawski, Auton Lab, Robotics Institute, Carnegie Mellon University, Pittsburgh, USA.
Table of Links
Abstract and 1. Introduction
-
Related Work
-
Methodology
-
Experimental Setup and Results
-
Conclusion and Future Work
Acknowledgments
Reproducibility statement
Impact statement, and References
Abstract
We introduce MOMENT, a family of open-source foundation models for general-purpose time series analysis. Pre-training large models on time series data is challenging due to (1) the absence of a large and cohesive public time series repository, and (2) diverse time series characteristics which make multi-dataset training onerous. Additionally, (3) experimental benchmarks to evaluate these models, especially in scenarios with limited resources, time, and supervision, are still in their nascent stages. To address these challenges, we compile a large and diverse collection of public time series, called the Time series Pile, and systematically tackle time series-specific challenges to unlock large-scale multi-dataset pretraining. Finally, we build on recent work to design a benchmark to evaluate time series foundation models on diverse tasks and datasets in limited supervision settings. Experiments on this benchmark demonstrate the effectiveness of our pre-trained models with minimal data and task-specific fine-tuning. Finally, we present several interesting empirical observations about large pretrained time series models. Pre-trained models (AutonLab/MOMENT-1-large) and Time Series Pile (AutonLab/Timeseries-PILE) are available on https://huggingface.co/AutonLab.
1. Introduction
Time series analysis is an important field encompassing a wide range of applications ranging from forecasting weather patterns (Schneider & Dickinson, 1974) or detecting irregular heartbeats using Electrocardiograms (Goswami et al., 2021), to identifying anomalous software deployments (Xu et al., 2018). Due to its significant practical value and the unique challenges that modeling time series data poses, time series analysis continues to receive substantial interest from academia and industry alike. However, modeling such data typically requires substantial domain expertise, time, and task-specific design.
Large pre-trained language (Touvron et al., 2023; Devlin et al., 2019; Chung et al., 2022), vision (Li et al., 2023a), and video (Day et al., 2023) models, typically perform well on a variety of tasks on data from diverse domains, with little or no supervision, and they can be specialized to perform well on specific tasks. We unlock these key capabilities for time series data and release the first family of open-source large pre-trained time series models, which we call MOMENT. The models in this family (1) serve as a building block for diverse time series analysis tasks (e.g., forecasting, classification, anomaly detection, and imputation, etc.), (2) are effective out-of-the-box, i.e., with no (or few) particular task-specific exemplars (enabling e.g., zero-shot forecasting, few-shot classification, etc.), and (3) are tunable using in-distribution and task-specific data to improve performance.
MOMENT is a family of high-capacity transformer models, pre-trained using a masked time series prediction task on large amounts of time series data drawn from diverse domains. Below we summarize our key contributions.
C1: Pre-training data. A key limiting factor for pre-training large time series models from scratch was the lack of a large cohesive public time series data repositories (Zhou et al., 2023; Gruver et al., 2023; Jin et al., 2023; Ekambaram et al., 2024; Cao et al., 2023). Therefore, we compiled The Time series Pile, a large collection of publicly available data from diverse domains, ranging from healthcare to engineering to finance. The Time Series Pile comprises of over 5 public time series databases, from several diverse domains for pre-training and evaluation (Tab. 11).
C2: Multi-dataset pre-training. Unlike text and images, which have largely consistent sampling rates and number of channels, time series frequently vary in their temporal resolution, number of channels[1], lengths, and amplitudes, and sometimes have missing values. As a result, large-scale mixed dataset pre-training is largely unexplored. Instead, most methods are trained on a single dataset, and transferred across multiple datasets, but with modest success (Wu et al., 2023; Oreshkin et al., 2021; Narwariya et al., 2020).
C3: Evaluation. Holistic benchmarks to evaluate time series foundation models on diverse datasets and tasks are in their nascent stages. Recent studies (Goswami et al., 2023b) have highlighted the importance of well-defined benchmarks and large-scale experimentation in order to accurately assess the impact and effectiveness of novel methodologies. To evaluate MOMENT, we build on the multi-task time series modeling benchmark first proposed by Wu et al. (2023) along multiple dimensions. For each of the 5 time series modeling tasks, namely, short- and long-horizon forecasting, classification, anomaly detection, and imputation we evaluate MOMENT against (1) both state-of-the-art deep learning as well as statistical baselines, on (2) more task specific datasets, (3) using multiple evaluation metrics, (4) exclusively in limited supervision settings (e.g., zero-shot imputation, linear probing for forecasting, unsupervised representation learning for classification).
Finally, we explore various properties of these pre-trained time series models. In particular, we study whether MOMENT is aware of intuitive time series characteristics such as frequency and trend, and the impact of initialization, model size scaling, and cross-modal transfer.
[1] Temporal resolution reflects sampling frequency of time series (e.g., hourly, daily); Channel is a single univariate time series in multivariate data (Ekambaram et al., 2024).