How A New AI Model Is Taming The Chaos Of Time Series Data

Authors:

(1) Mononito Goswami, Auton Lab, Robotics Insititute, Carnegie Mellon University, Pittsburgh, USA ([email protected])

(2) Konrad Szafer, Auton Lab, Robotics Institute, Carnegie Mellon University, Pittsburgh, USA, with equal contribution, order decided using a random generator;

(3) Arjun Choudhry, Auton Lab, Robotics Institute, Carnegie Mellon University, Pittsburgh, USA, with equal contribution, order decided using a random generator;

(4) Yifu Cai, Auton Lab, Robotics Institute, Carnegie Mellon University, Pittsburgh, USA;

(5) Shuo Li, University of Pennsylvania, Philadelphia, USA;

(6) Artur Dubrawski, Auton Lab, Robotics Institute, Carnegie Mellon University, Pittsburgh, USA.

Table of Links

Abstract and 1. Introduction

Related Work
Methodology
Experimental Setup and Results
Conclusion and Future Work

Acknowledgments

Reproducibility statement

Impact statement, and References

Abstract

We introduce MOMENT, a family of open-source foundation models for general-purpose time series analysis. Pre-training large models on time series data is challenging due to (1) the absence of a large and cohesive public time series repository, and (2) diverse time series characteristics which make multi-dataset training onerous. Additionally, (3) experimental benchmarks to evaluate these models, especially in scenarios with limited resources, time, and supervision, are still in their nascent stages. To address these challenges, we compile a large and diverse collection of public time series, called the Time series Pile, and systematically tackle time series-specific challenges to unlock large-scale multi-dataset pretraining. Finally, we build on recent work to design a benchmark to evaluate time series foundation models on diverse tasks and datasets in limited supervision settings. Experiments on this benchmark demonstrate the effectiveness of our pre-trained models with minimal data and task-specific fine-tuning. Finally, we present several interesting empirical observations about large pretrained time series models. Pre-trained models (AutonLab/MOMENT-1-large) and Time Series Pile (AutonLab/Timeseries-PILE) are available on https://huggingface.co/AutonLab.

1. Introduction

Time series analysis is an important field encompassing a wide range of applications ranging from forecasting weather patterns (Schneider & Dickinson, 1974) or detecting irregular heartbeats using Electrocardiograms (Goswami et al., 2021), to identifying anomalous software deployments (Xu et al., 2018). Due to its significant practical value and the unique challenges that modeling time series data poses, time series analysis continues to receive substantial interest from academia and industry alike. However, modeling such data typically requires substantial domain expertise, time, and task-specific design.

Figure 1. MOMENT can solve multiple time series analysis tasks well (App. E).

Large pre-trained language (Touvron et al., 2023; Devlin et al., 2019; Chung et al., 2022), vision (Li et al., 2023a), and video (Day et al., 2023) models, typically perform well on a variety of tasks on data from diverse domains, with little or no supervision, and they can be specialized to perform well on specific tasks. We unlock these key capabilities for time series data and release the first family of open-source large pre-trained time series models, which we call MOMENT. The models in this family (1) serve as a building block for diverse time series analysis tasks (e.g., forecasting, classification, anomaly detection, and imputation, etc.), (2) are effective out-of-the-box, i.e., with no (or few) particular task-specific exemplars (enabling e.g., zero-shot forecasting, few-shot classification, etc.), and (3) are tunable using in-distribution and task-specific data to improve performance.

MOMENT is a family of high-capacity transformer models, pre-trained using a masked time series prediction task on large amounts of time series data drawn from diverse domains. Below we summarize our key contributions.

C1: Pre-training data. A key limiting factor for pre-training large time series models from scratch was the lack of a large cohesive public time series data repositories (Zhou et al., 2023; Gruver et al., 2023; Jin et al., 2023; Ekambaram et al., 2024; Cao et al., 2023). Therefore, we compiled The Time series Pile, a large collection of publicly available data from diverse domains, ranging from healthcare to engineering to finance. The Time Series Pile comprises of over 5 public time series databases, from several diverse domains for pre-training and evaluation (Tab. 11).

C2: Multi-dataset pre-training. Unlike text and images, which have largely consistent sampling rates and number of channels, time series frequently vary in their temporal resolution, number of channels[1], lengths, and amplitudes, and sometimes have missing values. As a result, large-scale mixed dataset pre-training is largely unexplored. Instead, most methods are trained on a single dataset, and transferred across multiple datasets, but with modest success (Wu et al., 2023; Oreshkin et al., 2021; Narwariya et al., 2020).

C3: Evaluation. Holistic benchmarks to evaluate time series foundation models on diverse datasets and tasks are in their nascent stages. Recent studies (Goswami et al., 2023b) have highlighted the importance of well-defined benchmarks and large-scale experimentation in order to accurately assess the impact and effectiveness of novel methodologies. To evaluate MOMENT, we build on the multi-task time series modeling benchmark first proposed by Wu et al. (2023) along multiple dimensions. For each of the 5 time series modeling tasks, namely, short- and long-horizon forecasting, classification, anomaly detection, and imputation we evaluate MOMENT against (1) both state-of-the-art deep learning as well as statistical baselines, on (2) more task specific datasets, (3) using multiple evaluation metrics, (4) exclusively in limited supervision settings (e.g., zero-shot imputation, linear probing for forecasting, unsupervised representation learning for classification).

Finally, we explore various properties of these pre-trained time series models. In particular, we study whether MOMENT is aware of intuitive time series characteristics such as frequency and trend, and the impact of initialization, model size scaling, and cross-modal transfer.

[1] Temporal resolution reflects sampling frequency of time series (e.g., hourly, daily); Channel is a single univariate time series in multivariate data (Ekambaram et al., 2024).

How a New AI Model is Taming the Chaos of Time Series Data | HackerNoon

Table of Links

Abstract

1. Introduction

Leave a Reply Cancel reply

Stay Connected

Latest News

Slack vs. WhatsApp: Which Chat App Is Better for Work? |

I’ve just driven the super fast Lucid Air Sapphire but I’m left wanting an Audi S6 e-tron instead | Stuff

Safety tech firm Inclutech secures investment – UKTN

U.S. Arrests Key Facilitator in North Korean IT Worker Scheme, Seizes $7.74 Million

World of Software is your one-stop website for the latest tech news and updates, follow us now to get the news that matters to you.

Quick Link

Topics

Sign Up for Our Newsletter

Table of Links

Abstract

1. Introduction

Sign Up For Daily Newsletter

Be keep up! Get the latest breaking news delivered straight to your inbox.

Leave a Reply Cancel reply

Stay Connected

Latest News