By using this site, you agree to the Privacy Policy and Terms of Use.
Accept
World of SoftwareWorld of SoftwareWorld of Software
  • News
  • Software
  • Mobile
  • Computing
  • Gaming
  • Videos
  • More
    • Gadget
    • Web Stories
    • Trending
    • Press Release
Search
  • Privacy
  • Terms
  • Advertise
  • Contact
Copyright © All Rights Reserved. World of Software.
Reading: Why Training on Time Series Beats Fine-Tuning LLMs for Time Series Tasks | HackerNoon
Share
Sign In
Notification Show More
Font ResizerAa
World of SoftwareWorld of Software
Font ResizerAa
  • Software
  • Mobile
  • Computing
  • Gadget
  • Gaming
  • Videos
Search
  • News
  • Software
  • Mobile
  • Computing
  • Gaming
  • Videos
  • More
    • Gadget
    • Web Stories
    • Trending
    • Press Release
Have an existing account? Sign In
Follow US
  • Privacy
  • Terms
  • Advertise
  • Contact
Copyright © All Rights Reserved. World of Software.
World of Software > Computing > Why Training on Time Series Beats Fine-Tuning LLMs for Time Series Tasks | HackerNoon
Computing

Why Training on Time Series Beats Fine-Tuning LLMs for Time Series Tasks | HackerNoon

News Room
Last updated: 2025/06/30 at 9:07 PM
News Room Published 30 June 2025
Share
SHARE

Table of Links

Abstract and 1. Introduction

  1. Related Work

  2. Methodology

  3. Experimental Setup and Results

  4. Conclusion and Future Work

Acknowledgments

Reproducibility statement

Impact statement, and References

Transformers and patching for time series modeling. There is a growing body of work utilizing transformers for various time series analysis tasks (Wen et al., 2023). One issue with applying transformers to time series data is the complexity of the self-attention mechanism, which grows quadratically with the size of input tokens (or length of time series) (Li et al., 2019). Nie et al. (2023) demonstrated that treating time series sub-sequences (or patches) as tokens instead of individual time points is a simple, efficient, and effective mechanism for learning useful representations for forecasting. Drawing inspiration from prior work, we build on top of the transformer architecture which takes disjoint time series sub-sequences (or patches) as input.

Masked Representation Learning. Masked pre-training is a widely-used self-supervised learning task where a model learns to accurately reconstruct masked portions of its input. Masked language (Devlin et al., 2019; Raffel et al., 2020) and image modeling (Xie et al., 2022; Li et al., 2023b) have been successfully utilized to learn models from vast quantities of unlabeled data, which can generalize to a variety of downstream tasks.

For time series data, prior work has primarily focused on contrastive representation learning (Yue et al., 2022; Eldele et al., 2021; Franceschi et al., 2019). However, contrastive learning relies on data augmentation, which is both subjective and data-dependent. In contrast, some studies mask portions of time series using zeros and learn a model to reconstruct them (Nie et al., 2023; Zerveas et al., 2021; Dong et al., 2023; Li et al., 2023c).

Representation learning via masking is well-suited to all the downstream tasks we care about, especially forecasting and imputation, as they are instances of the masked reconstruction problem. Due to its simplicity and success in vision and language domains, we use the masked prediction task to pretrain our model, using a special embedding (see [MASK] in Fig. 3) to mask time series patches instead of zeros.

Cross-modal transfer learning using language models. Lu et al. (2022) had first shown that transformers pre-trained on text data (LLMs) can effectively solve sequence modeling tasks in other modalities. Subsequently, Shen et al. (2023) introduced ORCA, a general cross-modal fine-tuning framework that extends the applicability of a single largescale pretrained model to diverse modalities by adapting to a target task via an align-then-refine workflow. Given the target input, ORCA first learns an embedding network that aligns the embedded feature distribution with the pretraining modality, then the pretrained model is fine-tuned on the embedded data, exploiting the knowledge shared across modalities. Some recent studies have leveraged this inherent ability of language pre-trained transformers to “reprogram” LLMs for time series analysis using parameter efficient fine-tuning and suitable tokenization strategies (Zhou et al., 2023; Gruver et al., 2023; Jin et al., 2023; Cao et al., 2023; Ekambaram et al., 2024). However, some of these models (Jin et al., 2023; Gruver et al., 2023) with billions of parameters demand significant memory and computational resources to perform well. We complement this line of research with three empirical observations (Sec 4.3): we

Figure 2. Time Series Pile data splits. To avoid data contamination, we carefully partition all datasets into disjoint train, validation, and test splits. We adhere to the predefined splits provided by the creators of each dataset. In cases where such splits are unavailable, we randomly sample 60% of the data for training, 10% for validation, and 30% for testing. We only use the training splits of all datasets for pre-training.Figure 2. Time Series Pile data splits. To avoid data contamination, we carefully partition all datasets into disjoint train, validation, and test splits. We adhere to the predefined splits provided by the creators of each dataset. In cases where such splits are unavailable, we randomly sample 60% of the data for training, 10% for validation, and 30% for testing. We only use the training splits of all datasets for pre-training.

show that (1) transformers trained on time series can also model sequences across modalities, (2) during pre-training, randomly initializing weights lead to lower pre-training loss, than initializing with language modeling weights, and (3) models pre-trained on time series outperform LLM-based models such as (Zhou et al., 2023; Jin et al., 2023) on many tasks and datasets.

Unanswered Questions. To the best of our knowledge, two questions remain largely unanswered in prior work on time series modeling. First, all existing time series models are (pre-)trained and fine-tuned on individual datasets (Nie et al., 2023; Yue et al., 2022; Wu et al., 2023; Zhou et al., 2023), and the benefits (or drawbacks) of large-scale multi-dataset pre-training remains unexplored (Wen et al., 2023). Second, there is very limited work on time series modeling in limited supervision settings, such as zero-shot forecasting (Oreshkin et al., 2021), or few-shot classification (Narwariya et al., 2020). In our work, we consider both these questions and show that pre-training a model of sufficient capacity on a large corpus of unlabeled time series data can in fact enable it to provide reasonably accurate predictions in limited-supervision settings.

Authors:

(1) Mononito Goswami, Auton Lab, Robotics Insititute, Carnegie Mellon University, Pittsburgh, USA ([email protected])

(2) Konrad Szafer, Auton Lab, Robotics Institute, Carnegie Mellon University, Pittsburgh, USA, with equal contribution, order decided using a random generator;

(3) Arjun Choudhry, Auton Lab, Robotics Institute, Carnegie Mellon University, Pittsburgh, USA, with equal contribution, order decided using a random generator;

(4) Yifu Cai, Auton Lab, Robotics Institute, Carnegie Mellon University, Pittsburgh, USA;

(5) Shuo Li, University of Pennsylvania, Philadelphia, USA;

(6) Artur Dubrawski, Auton Lab, Robotics Institute, Carnegie Mellon University, Pittsburgh, USA.


Sign Up For Daily Newsletter

Be keep up! Get the latest breaking news delivered straight to your inbox.
By signing up, you agree to our Terms of Use and acknowledge the data practices in our Privacy Policy. You may unsubscribe at any time.
Share This Article
Facebook Twitter Email Print
Share
What do you think?
Love0
Sad0
Happy0
Sleepy0
Angry0
Dead0
Wink0
Previous Article NASA is bringing live rocket launches and spacewalks to Netflix
Next Article Subscriptions are overrated — own Microsoft Office Pro for life for just A$61
Leave a comment

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Stay Connected

248.1k Like
69.1k Follow
134k Pin
54.3k Follow

Latest News

One UI 8’s Now Brief will make sure you don’t miss a birthday again
News
Cyber resiliency evolves in response to growing AI threat – News
News
Marketing or Manipulation? When Social Hype Moves the Solana Market
Computing
Love Island USA’s Huda clashes with Chris but makes up with ‘freaky’ kiss
News

You Might also Like

Computing

Marketing or Manipulation? When Social Hype Moves the Solana Market

15 Min Read
Computing

7 Short-Form Video Trends Marketers Should Know Of In 2025

20 Min Read
Computing

How to Create Social Media Teaser Videos that Generate Buzz

13 Min Read
Computing

How to Run Ads on Instagram: A Complete Guide for 2025

19 Min Read
//

World of Software is your one-stop website for the latest tech news and updates, follow us now to get the news that matters to you.

Quick Link

  • Privacy Policy
  • Terms of use
  • Advertise
  • Contact

Topics

  • Computing
  • Software
  • Press Release
  • Trending

Sign Up for Our Newsletter

Subscribe to our newsletter to get our newest articles instantly!

World of SoftwareWorld of Software
Follow US
Copyright © All Rights Reserved. World of Software.
Welcome Back!

Sign in to your account

Lost your password?