Google Cloud Run Adds GPU Support for AI and Batch Workloads

Google Cloud has officially launched general availability of NVIDIA GPU support for Cloud Run, marking a major step forward in its serverless platform. This update aims to give developers a cost-effective, scalable solution for GPU-powered tasks, especially those involving AI inference and batch processing. It addresses the rising need for accessible, production-ready GPU resources in the cloud—while preserving the key features that have made Cloud Run popular with developers.

A standout feature of this release is its pay-per-second billing model, which charges users only for the GPU resources they actually use. This helps reduce waste and closely matches costs with actual workload demands. Cloud Run also supports scaling down GPU instances to zero when idle, avoiding unnecessary expenses. This makes it ideal for workloads that are irregular or unpredictable.

Another advantage is fast startup times, with GPU-enabled instances launching in under five seconds. This quick response is essential for applications that must adapt to changing demand or deliver real-time output, such as interactive AI services or live data processing. Cloud Run also supports HTTP and WebSocket streaming, making it well-suited for real-time applications, including those powered by large language models (LLMs).

NVIDIA has praised the move, noting that serverless GPU access lowers the entry barrier for AI development. Developers can easily enable NVIDIA L4 GPUs through a command-line flag or a checkbox in the Google Cloud console. There’s no need for quota requests, so GPU resources are available to all users instantly.

Cloud Run with GPU support is backed by Google Cloud’s Service Level Agreement (SLA), providing the reliability and uptime needed for production workloads. The service includes zonal redundancy by default for improved resilience, while also offering a more affordable option for users willing to accept best-effort failover during zonal outages.

The launch has sparked comparisons with other cloud platforms. Observers point out that Google is addressing a gap left open by competitors like AWS Lambda, which still only supports CPU-based compute and enforces a 15-minute execution time limit. This restricts Lambda’s usefulness for modern AI tasks like model fine-tuning or real-time video processing. In contrast, Cloud Run’s GPU support allows these jobs to run efficiently with automatic scaling.

Still, not all feedback has been positive. Some users have expressed concern about unexpected costs, since Cloud Run doesn’t yet offer hard billing limits based on dollar amounts. While it’s possible to set instance limits, there’s currently no built-in way to cap total spending—potentially making budgeting harder. Others have noted that alternative services like Runpod.io may offer lower prices for comparable GPU resources.

In addition to real-time inference, Google has introduced GPU support for Cloud Run jobs, currently in private preview. This opens the door to more use cases involving batch processing and asynchronous tasks, further extending the platform’s potential.

At launch, Cloud Run GPUs are available in five regions: Iowa (us-central1), Belgium (europe-west1), Netherlands (europe-west4), Singapore (asia-southeast1), and Mumbai (asia-south1). More regions are expected to follow. Developers are encouraged to consult Google’s official documentation for best practices and optimization tips.

In conclusion, the addition of serverless GPU support to Google Cloud Run is a strategic upgrade that enhances its appeal for AI and batch processing workloads. It offers developers a scalable, flexible, and production-ready environment for running GPU-accelerated tasks.

Share this post
Artificial intelligence, space, and humanity
Elon Musk, founder and CEO of SpaceX, Tesla, Neuralink, and xAI, shared his thoughts on the possible directions of the future in a recent interview, with a particular focus on artificial intelligence, space exploration, and the evolution of humanity.
Real-time music composition with Google Magenta RT
The use of artificial intelligence in music composition is not a new endeavor, but real-time operation has long faced significant obstacles. The Google Magenta team has now unveiled a development that could expand both the technical and creative possibilities of the genre. The new model, called Magenta RealTime (Magenta RT for short), generates music in real time and is accessible to anyone thanks to its open source code.
What would the acquisition of Perplexity AI mean for Apple?
Apple has long been trying to find its place in the rapidly evolving market of generative artificial intelligence. The company waited strategically for decades before directing significant resources into artificial intelligence-based developments. Now, however, according to the latest news, the Cupertino-based company may be preparing to take a bigger step than ever before: internal discussions have begun on the possible acquisition of a startup called Perplexity AI.
Gemini Advanced Strengthens GitHub Integration
There is no shortage of innovation in the world of AI-based development tools. Google has now announced direct GitHub integration for its premium AI assistant, Gemini Advanced. This move is not only a response to similar developments by its competitor OpenAI, but also a significant step forward in improving developer workflows.
Apple Plans Its Own “Vibe-Coding” Platform in Partnership with Anthropic
Apple has encountered several challenges in developing its own AI solutions recently, so it’s perhaps unsurprising that the company is turning to external expertise. According to the latest reports, Apple has decided to join forces with Anthropic to create a revolutionary “vibe-coding” software platform that uses generative AI to write, edit, and test programmers’ code.
JetBrains Mellum Is Now Open Source
As of April 30, 2025, JetBrains has taken a major step forward in AI by open-sourcing Mellum, its custom language model for code completion. Previously available only in JetBrains’ commercial products, this 4-billion-parameter model is now freely accessible on Hugging Face, opening new doors for researchers, educators, and development teams.

Linux distribution updates released in the last few days