This AI Model Learns To Forecast With Almost No Training—Here’s How

Authors:

(1) Vijay Ekambaram, IBM Research;

(2) Arindam Jati, IBM Research;

(3) Nam H. Nguyen, IBM Research;

(4) Pankaj Dayama, IBM Research;

(5) Chandra Reddy, IBM Research;

(6) Wesley M. Gifford, IBM Research;

(7) Jayant Kalagnanam, IBM Research.

Editor’s note: this is part 3 of 5 of a study detailing the development of a tiny, fast AI model that delivers excellent accuracy. Read the rest below.

Table of Links

3 TTM Workflows

TTM works in 2 stages: pre-train and fine-tune (Figure 1(a)).

3.1 Pre-training Workflow

Multi-Resolution Pre-training via TTM Backbone

The majority of the pre-training happens in the TTM backbone. The primary challenge with the proposed pre-training technique is that the pre-training data is diverse and has multiple resolutions. There are two main options for pre-training: conducting separate pre-training for each resolution type or pre-training using all resolution data collectively. While it’s common to train a model per resolution type to overcome challenges in learning diverse seasonal patterns, this leads to diminished training data for each resolution due to limited data availability. Consequently, this motivated the exploration of pre-training a single model using datasets from all resolutions. To achieve this, we propose the following 3 enhancements.

Data Augmentation via Downsampling: A significant challenge in TS pre-training datasets is the scarcity of public datasets at specific resolutions. To overcome this, we employ a downsampling technique for high-resolution datasets, generating multiple datasets at lower resolutions. For example, from a one-second resolution dataset, we derive datasets at minute and hour resolutions. Note that, the original high-resolution dataset remains within the pool of pre-training datasets. This methodology significantly augments the number of datasets for each resolution which greatly improves the model performance (Section 4.5).

Resolution Prefix Tuning: This technique explicitly learns and incorporates a new patch embedding as a prefix into the input data based on the input resolution type (see Figure 1(b)). Similar to the concept of prefix tuning [Li and Liang, 2021], this approach provides an explicit signal to the model about the resolution type for resolution-conditioned modeling. First, we map every resolution to a unique integer, which is then passed through an embedding layer to project it to the hidden dimension, hf. Subsequently, we expand the embedding across all channels to have a representation of shape c×1×hf. This module is optional for the TTM backbone, particularly beneficial when the context length (sl) is short. In these scenarios, automatically detecting the resolution becomes a challenge for the model. Hence, by explicitly fusing the resolution information as a prefix, we can enhance the model’s ability to learn effectively across resolutions.

3.2 Fine-tuning Workflow

In the fine-tuning workflow, we deal with data from the target domain that has no overlap with the pre-training datasets. We have three options here: (a) In Zero-shot forecasting, we directly use the pre-trained model to evaluate on the test part of the target data, (b) In Few-shot forecasting, we utilize only a tiny portion (5-10%) of the train part of the target data to quickly update the pre-trained weights of the decoder and head, and subsequently, evaluate it on the test part, (c) In Full-shot forecasting, we fine-tune the pre-trained weights of the decoder and head on the entire train part of the target data, and then, evaluate on the test part.

The backbone is completely frozen during fine-tuning, and still operates in a channel-independent univariate fashion. However, the TTM decoder can be fine-tuned via channel-mixing (for multivariate) or a channel-independent (for univariate) way based on the nature of the target data. If pure multivariate modeling is needed, then the channel-mixer block in all the TSMixer components (see Figure 1(b)) in the decoder gets enabled to explicitly capture the channel correlation between the channels. The forecast head and reverse normalization perform similar operations as in the pretraining stage. The fine-tuning also optimizes the forecasting objective with MSE loss. This thoughtful multi-level design choice ensures our backbone excels in channel-independent pre-training, enabling effective temporal correlation modeling across diverse datasets. Simultaneously, the decoder handles target-data-specific tasks like channel-correlation modeling and fine-tuning. In addition, if the target data has exogenous variables, then an exogenous mixer block is applied to the actual forecasts as explained next.

Table 2: Computational Improvement in few-shot 10% for f l=96 using 1 A100 GPU. nX denotes n times average improvement across datasets (IMP).

Table 3: Cross transfer learning MSE improvement (IMP) for self-supervised pre-training methods in various few-shot settings (10%,25%,50%,75%,100%).

This AI Model Learns to Forecast With Almost No Training—Here’s How | HackerNoon

Table of Links

3 TTM Workflows

3.1 Pre-training Workflow

Multi-Resolution Pre-training via TTM Backbone

3.2 Fine-tuning Workflow

Leave a Reply Cancel reply

Stay Connected

Latest News

The 4 best VPN services for 2025, tested and reviewed

What Elon Musk Got Wrong About Why Federal Retirement Is Still Managed out of a Limestone Mine

Big Day for Crypto Goes South After Bybit Hack

Apple disables Advanced Data Protection in the UK after backdoor request – News

World of Software is your one-stop website for the latest tech news and updates, follow us now to get the news that matters to you.

Quick Link

Topics

Sign Up for Our Newsletter

Table of Links

3 TTM Workflows

3.1 Pre-training Workflow

Multi-Resolution Pre-training via TTM Backbone

3.2 Fine-tuning Workflow

Sign Up For Daily Newsletter

Be keep up! Get the latest breaking news delivered straight to your inbox.

Leave a Reply Cancel reply

Stay Connected

Latest News