By using this site, you agree to the Privacy Policy and Terms of Use.
Accept
World of SoftwareWorld of SoftwareWorld of Software
  • News
  • Software
  • Mobile
  • Computing
  • Gaming
  • Videos
  • More
    • Gadget
    • Web Stories
    • Trending
    • Press Release
Search
  • Privacy
  • Terms
  • Advertise
  • Contact
Copyright © All Rights Reserved. World of Software.
Reading: Meta Details GEM Ads Model Using LLM-Scale Training, Hybrid Parallelism, and Knowledge Transfer
Share
Sign In
Notification Show More
Font ResizerAa
World of SoftwareWorld of Software
Font ResizerAa
  • Software
  • Mobile
  • Computing
  • Gadget
  • Gaming
  • Videos
Search
  • News
  • Software
  • Mobile
  • Computing
  • Gaming
  • Videos
  • More
    • Gadget
    • Web Stories
    • Trending
    • Press Release
Have an existing account? Sign In
Follow US
  • Privacy
  • Terms
  • Advertise
  • Contact
Copyright © All Rights Reserved. World of Software.
World of Software > News > Meta Details GEM Ads Model Using LLM-Scale Training, Hybrid Parallelism, and Knowledge Transfer
News

Meta Details GEM Ads Model Using LLM-Scale Training, Hybrid Parallelism, and Knowledge Transfer

News Room
Last updated: 2025/12/22 at 3:48 AM
News Room Published 22 December 2025
Share
Meta Details GEM Ads Model Using LLM-Scale Training, Hybrid Parallelism, and Knowledge Transfer
SHARE

Meta released details about its Generative Ads Model (GEM), a foundation model designed to improve ads recommendation across its platforms. The model addresses core challenges in recommendation systems (RecSys) by processing billions of daily user-ad interactions where meaningful signals such as clicks and conversions are very sparse. GEM tackles the complexity of learning from diverse ads data including advertiser goals, creative formats, measurement signals, and user behaviors across multiple delivery channels.

The company built the system using three approaches: model scaling with advanced architecture, post-training techniques for knowledge transfer, and enhanced training infrastructure that uses thousands of GPUs with advanced parallelism to support the computational demands of large-scale foundation model training.

Source: GEM Architecture

Meta re-engineered its training stack to support GEM at a scale comparable to modern large language models. The company employs multi-dimensional parallelism strategies tailored to different model components. Dense model parts use Hybrid Sharded Distributed Parallel (HSDP) to optimize memory usage and reduce communication costs across thousands of GPUs. Sparse components, primarily large embedding tables for user and item features, use a two-dimensional approach combining data parallelism and model parallelism.

Meta implemented several GPU-level optimizations to reduce training bottlenecks. These include a custom in-house GPU kernel designed for variable-length user sequences, graph-level compilation in PyTorch 2.0 that automates activation checkpointing and operator fusion, and memory compression techniques such as FP8 quantization for activations.

The team developed GPU communication collectives through NCCLX, Meta’s fork of NVIDIA’s NCCL, that operate without utilizing Streaming Multiprocessor resources. This eliminates contention between communication and compute workloads. Meta reduced job startup time by 5x through optimizations to trainer initialization, data reader setup, and checkpointing. PyTorch 2.0 compilation time decreased by 7x via caching strategies, improving the proportion of training time spent processing new data.

The system optimizes GPU efficiency across the model lifecycle. During exploration, lightweight model variants support over half of all experiments at lower cost compared to full-sized models. Meta performs continuous online training to refresh the foundation models and shares traffic between training and post-training knowledge generation to reduce computational demand.

Meta designed GEM to transfer knowledge to hundreds of user-facing vertical models that serve ads across its platforms. The company employs two transfer strategies to translate the foundation model’s capabilities into measurable gains.

Direct transfer enables GEM to pass knowledge to major vertical models within the same data spaces where GEM was trained. Hierarchical transfer distills knowledge from GEM into domain-specific foundation models, which then teach vertical models.

The approaches use knowledge distillation, representation learning, and parameter sharing to maximize transfer efficiency across Meta’s ad model ecosystem.

Swapnil Amin, former director at Tesla, commented that GEM

feels like the shift we all knew was coming — a model that actually learns creative, context, and user intent together instead of stitching pieces after the fact. 

He highlighted the 23x effective FLOPs jump as

the part that changes the economics.

Sri.P, a senior product manager at Microsoft, sees potential applications for advertisers and stated.

This is a game changer for marketers/advertisers! I can see it potentially saving small businesses a lot of money since they won’t have to experiment with marketing strategies and can instead rely on intelligent models to make the most of their ad spend

Meta envisions that foundation models for ads recommendation systems will develop a deeper understanding of user preferences and intent, designed to make interactions feel more personal. For advertisers, the company positions this as an approach to enable one-to-one connections at scale.

Sign Up For Daily Newsletter

Be keep up! Get the latest breaking news delivered straight to your inbox.
By signing up, you agree to our Terms of Use and acknowledge the data practices in our Privacy Policy. You may unsubscribe at any time.
Share This Article
Facebook Twitter Email Print
Share
What do you think?
Love0
Sad0
Happy0
Sleepy0
Angry0
Dead0
Wink0
Previous Article Meituan shifts focus from GMV to order volume amid declining sales · TechNode Meituan shifts focus from GMV to order volume amid declining sales · TechNode
Next Article Uncertainty looms over next week’s launch of Huawei’s HarmonyOS NEXT system · TechNode Uncertainty looms over next week’s launch of Huawei’s HarmonyOS NEXT system · TechNode
Leave a comment

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Stay Connected

248.1k Like
69.1k Follow
134k Pin
54.3k Follow

Latest News

Apple’s iPhone Fold isn’t even announced yet, and Samsung might already be working on a competitor
Apple’s iPhone Fold isn’t even announced yet, and Samsung might already be working on a competitor
News
China drafts national law on labeling AI-generated content · TechNode
China drafts national law on labeling AI-generated content · TechNode
Computing
BusnssSunsnknnngfngn,Syny(S)bvubUnvssn3s
News
Many AI software engineers hired by Google in 2025 were ex-employees
Many AI software engineers hired by Google in 2025 were ex-employees
Software

You Might also Like

Apple’s iPhone Fold isn’t even announced yet, and Samsung might already be working on a competitor
News

Apple’s iPhone Fold isn’t even announced yet, and Samsung might already be working on a competitor

3 Min Read

BusnssSunsnknnngfngn,Syny(S)bvubUnvssn3s

0 Min Read
Elon Musk’s B Tesla pay package restored by Delaware Supreme Court |  News
News

Elon Musk’s $56B Tesla pay package restored by Delaware Supreme Court | News

3 Min Read
4 Roku Accessories To Make The Most Of Your Streaming Setup – BGR
News

4 Roku Accessories To Make The Most Of Your Streaming Setup – BGR

9 Min Read
//

World of Software is your one-stop website for the latest tech news and updates, follow us now to get the news that matters to you.

Quick Link

  • Privacy Policy
  • Terms of use
  • Advertise
  • Contact

Topics

  • Computing
  • Software
  • Press Release
  • Trending

Sign Up for Our Newsletter

Subscribe to our newsletter to get our newest articles instantly!

World of SoftwareWorld of Software
Follow US
Copyright © All Rights Reserved. World of Software.
Welcome Back!

Sign in to your account

Lost your password?