By using this site, you agree to the Privacy Policy and Terms of Use.
Accept
World of SoftwareWorld of SoftwareWorld of Software
  • News
  • Software
  • Mobile
  • Computing
  • Gaming
  • Videos
  • More
    • Gadget
    • Web Stories
    • Trending
    • Press Release
Search
  • Privacy
  • Terms
  • Advertise
  • Contact
Copyright © All Rights Reserved. World of Software.
Reading: Google Cloud Run Now Offers Serverless GPUs for AI and Batch Processing
Share
Sign In
Notification Show More
Font ResizerAa
World of SoftwareWorld of Software
Font ResizerAa
  • Software
  • Mobile
  • Computing
  • Gadget
  • Gaming
  • Videos
Search
  • News
  • Software
  • Mobile
  • Computing
  • Gaming
  • Videos
  • More
    • Gadget
    • Web Stories
    • Trending
    • Press Release
Have an existing account? Sign In
Follow US
  • Privacy
  • Terms
  • Advertise
  • Contact
Copyright © All Rights Reserved. World of Software.
World of Software > News > Google Cloud Run Now Offers Serverless GPUs for AI and Batch Processing
News

Google Cloud Run Now Offers Serverless GPUs for AI and Batch Processing

News Room
Last updated: 2025/06/09 at 11:37 AM
News Room Published 9 June 2025
Share
SHARE

Google Cloud has announced the general availability of NVIDIA GPU support for Cloud Run, its serverless runtime. With this enhancement, Google Cloud aims to provide a powerful, yet remarkably cost-efficient, environment for a wide range of GPU-accelerated use cases, particularly in AI inference and batch processing.

In a company blog post, Google highlights that developers favor Cloud Run for its simplicity, flexibility, and scalability. With the addition of GPU support, it now extends its core benefits to GPU resources:

  • Pay-per-second billing: Users are now charged only for the GPU resources they consume, down to the second – thus minimizing waste.
  • Scale to zero: Cloud Run automatically scales GPU instances down to zero when inactive, eliminating idle costs – particularly beneficial for sporadic or unpredictable workloads.
  • Rapid startup and scaling: Instances with GPUs and drivers can start up in under 5 seconds, enabling applications to respond to demand very quickly.
  • Full streaming support: Built-in support for HTTP and WebSocket streaming allows for interactive applications, such as real-time LLM responses.

Dave Salvator, director of accelerated computing products at NVIDIA, commented:

Serverless GPU acceleration represents a major advancement in making cutting-edge AI computing more accessible. With seamless access to NVIDIA L4 GPUs, developers can now bring AI applications to production faster and more cost-effectively than ever before.

A significant barrier to entry has been removed, as NVIDIA L4 GPU support on Cloud Run is now available to all users with no quota request required. Developers can enable GPU support via a simple command-line flag (–gpu 1) or by checking a box in the Google Cloud console.

Cloud Run with GPU support is production-ready, covered by Cloud Run’s Service Level Agreement (SLA) for reliability and uptime. It offers zonal redundancy by default for resilience, with an option for lower pricing for best-effort failover in case of a zonal outage by turning off zonal redundancy.

The general availability of GPU support on Cloud Run has also sparked a discussion within the developer community regarding its competitive implications, particularly in relation to other major cloud providers. Rubén del Campo, a principal software engineer at ZenRows,  highlighted Google’s move as something “AWS should have built years ago: serverless GPU compute that actually works.”

His perspective highlights a perceived “massive gap in AWS Lambda’s capabilities,” specifically citing Lambda’s 15-minute timeout and CPU-only compute as prohibitive for modern AI workloads, such as Stable Diffusion inference, model fine-tuning, or real-time video analysis. “Try running Stable Diffusion inference, fine-tuning a model, or processing video with AI in Lambda. You can’t,” the user commented, emphasizing that Cloud Run GPUs make such tasks “trivial with serverless GPUs that scale to zero.”

While Cloud Run GPUs offer compelling features, some users on a Hacker News thread have raised concerns regarding the lack of hard billing limits, which could lead to unexpected costs. While Cloud Run allows setting maximum instance limits, it doesn’t provide an actual dollar-based spending cap.

In addition, comparisons on the same Hacker News thread also indicate that other providers like Runpod.io may offer more competitive pricing for similar GPU instances. For example, some users have pointed out that Runpod’s hourly rates for L4, A100, and H100 GPUs can be significantly lower than Google’s, even when considering Google’s per-second billing.

Beyond real-time inference, Google has also announced the availability of GPUs on Cloud Run jobs (currently in private preview), unlocking new use cases for batch processing and asynchronous tasks. These features are supported globally, with Cloud Run GPUs available in five Google Cloud regions: us-central1 (Iowa, USA), europe-west1 (Belgium), europe-west4 (Netherlands), asia-southeast1 (Singapore), and asia-south1 (Mumbai, India). Additional regions are planned.

Lastly, the company states that developers can start building with Cloud Run GPUs by leveraging the official documentation, quickstarts, and best practices for optimizing model loading.

Sign Up For Daily Newsletter

Be keep up! Get the latest breaking news delivered straight to your inbox.
By signing up, you agree to our Terms of Use and acknowledge the data practices in our Privacy Policy. You may unsubscribe at any time.
Share This Article
Facebook Twitter Email Print
Share
What do you think?
Love0
Sad0
Happy0
Sleepy0
Angry0
Dead0
Wink0
Previous Article Vance urges crypto industry to stay involved in politics as he touts Trump administration’s record
Next Article Best MacBook 2025: The best performing Apple silicon laptops tested and ranked
Leave a comment

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Stay Connected

248.1k Like
69.1k Follow
134k Pin
54.3k Follow

Latest News

Deep Dive into LLM Scaling: Multi-Token Prediction’s Impact on Coding Accuracy | HackerNoon
Computing
Apple Could Offer Six Different New iPhone Models in 2027
News
All the Free Stuff From Today's Pokemon Presents Stream
News
A Surprise Pokémon Game Just Dropped for Switch and Mobile
Gadget

You Might also Like

News

Apple Could Offer Six Different New iPhone Models in 2027

7 Min Read
News

All the Free Stuff From Today's Pokemon Presents Stream

6 Min Read
News

New investors flock to Scottish startup scene – UKTN

2 Min Read
News

Travel Funding Holds Strong As Etraveli Said To Reach $3.1B Valuation

4 Min Read
//

World of Software is your one-stop website for the latest tech news and updates, follow us now to get the news that matters to you.

Quick Link

  • Privacy Policy
  • Terms of use
  • Advertise
  • Contact

Topics

  • Computing
  • Software
  • Press Release
  • Trending

Sign Up for Our Newsletter

Subscribe to our newsletter to get our newest articles instantly!

World of SoftwareWorld of Software
Follow US
Copyright © All Rights Reserved. World of Software.
Welcome Back!

Sign in to your account

Lost your password?