Google Cloud Run Adds GPU Support for AI and Batch Workloads

Google Cloud has officially launched general availability of NVIDIA GPU support for Cloud Run, marking a major step forward in its serverless platform. This update aims to give developers a cost-effective, scalable solution for GPU-powered tasks, especially those involving AI inference and batch processing. It addresses the rising need for accessible, production-ready GPU resources in the cloud—while preserving the key features that have made Cloud Run popular with developers.

A standout feature of this release is its pay-per-second billing model, which charges users only for the GPU resources they actually use. This helps reduce waste and closely matches costs with actual workload demands. Cloud Run also supports scaling down GPU instances to zero when idle, avoiding unnecessary expenses. This makes it ideal for workloads that are irregular or unpredictable.

Another advantage is fast startup times, with GPU-enabled instances launching in under five seconds. This quick response is essential for applications that must adapt to changing demand or deliver real-time output, such as interactive AI services or live data processing. Cloud Run also supports HTTP and WebSocket streaming, making it well-suited for real-time applications, including those powered by large language models (LLMs).

NVIDIA has praised the move, noting that serverless GPU access lowers the entry barrier for AI development. Developers can easily enable NVIDIA L4 GPUs through a command-line flag or a checkbox in the Google Cloud console. There’s no need for quota requests, so GPU resources are available to all users instantly.

Cloud Run with GPU support is backed by Google Cloud’s Service Level Agreement (SLA), providing the reliability and uptime needed for production workloads. The service includes zonal redundancy by default for improved resilience, while also offering a more affordable option for users willing to accept best-effort failover during zonal outages.

The launch has sparked comparisons with other cloud platforms. Observers point out that Google is addressing a gap left open by competitors like AWS Lambda, which still only supports CPU-based compute and enforces a 15-minute execution time limit. This restricts Lambda’s usefulness for modern AI tasks like model fine-tuning or real-time video processing. In contrast, Cloud Run’s GPU support allows these jobs to run efficiently with automatic scaling.

Still, not all feedback has been positive. Some users have expressed concern about unexpected costs, since Cloud Run doesn’t yet offer hard billing limits based on dollar amounts. While it’s possible to set instance limits, there’s currently no built-in way to cap total spending—potentially making budgeting harder. Others have noted that alternative services like Runpod.io may offer lower prices for comparable GPU resources.

In addition to real-time inference, Google has introduced GPU support for Cloud Run jobs, currently in private preview. This opens the door to more use cases involving batch processing and asynchronous tasks, further extending the platform’s potential.

At launch, Cloud Run GPUs are available in five regions: Iowa (us-central1), Belgium (europe-west1), Netherlands (europe-west4), Singapore (asia-southeast1), and Mumbai (asia-south1). More regions are expected to follow. Developers are encouraged to consult Google’s official documentation for best practices and optimization tips.

In conclusion, the addition of serverless GPU support to Google Cloud Run is a strategic upgrade that enhances its appeal for AI and batch processing workloads. It offers developers a scalable, flexible, and production-ready environment for running GPU-accelerated tasks.

Share this post
After a Historic Turn, SK Hynix Becomes the New Market Leader in the Memory Industry
For three decades, the name Samsung was almost synonymous with leadership in the DRAM market. Now, however, the tables have turned: in the first half of 2025, South Korea’s SK Hynix surpassed its rival in the global memory industry for the first time, ending a streak of more than thirty years. This change signifies not just a shift in corporate rankings but also points to a deeper transformation across the entire semiconductor industry.
The Most Popular Theories About the Impact of AI on the Workplace
Since the release of ChatGPT at the end of 2022, the field of AI has seen impressive developments almost every month, sparking widespread speculation about how it will change our lives. One of the central questions concerns its impact on the workplace. As fears surrounding this issue persist, I believe it's worth revisiting the topic from time to time. Although the development of AI is dramatic, over time we may gain a clearer understanding of such questions, as empirical evidence continues to accumulate and more theories emerge attempting to answer them. In this article, I’ve tried to compile the most relevant theories—without claiming to be exhaustive—as the literature on this topic is expanding by the day. The question remains: can we already see the light at the end of the tunnel, or are we still heading into an unfamiliar world we know too little about?
TypeScript 5.9
One of the most important innovations in TypeScript 5.9 is support for deferred module evaluation via the import defer syntax, which implements a proposed future ECMAScript standard.
NVIDIA Driver Support Changes – The Clock Is Ticking for the GTX 900–10 Series
NVIDIA has announced a major shift in its driver support strategy. This decision affects millions of users, but what does it actually mean in practice? Is it really time for everyone to consider upgrading their hardware, or is the situation more nuanced? Understanding the implications is key to staying prepared for the technological changes of the coming years.
This Is What the Google Pixel 10 Might Look Like — But Do We Really Know Everything Yet?
The Google Pixel series, known for its clean software and outstanding photography capabilities, is reaching its 10th generation this year. Ahead of the official launch, expected on August 20, a wealth of information and rumors is already circulating online, outlining a device family that may not bring dramatic visual changes but instead introduces real, tangible innovations under the hood. But are these improvements enough for the Pixel 10 to stand out in an increasingly crowded market?
After so many "I hate CSS" articles, how is it that CSS is still so successful?
If you've ever googled "why do developers hate CSS", you know there's a flood of complaints waiting to be found online. Thousands of forum posts, tweets, memes, and long blog entries lament CSS’s "incomprehensible behavior," its "not-really-programming" nature, or the idea that it's "just for designers." It almost feels like one of the unspoken rules of developer culture is that CSS is a necessary evil—something you use only when you absolutely have to, and preferably as quickly as possible. But if so many people hate it this much—why hasn't it disappeared? In fact, why do we see more and more non-web UI platforms adopting CSS-like semantics?