Google Cloud Run Adds GPU Support for AI and Batch Workloads

Google Cloud has officially launched general availability of NVIDIA GPU support for Cloud Run, marking a major step forward in its serverless platform. This update aims to give developers a cost-effective, scalable solution for GPU-powered tasks, especially those involving AI inference and batch processing. It addresses the rising need for accessible, production-ready GPU resources in the cloud—while preserving the key features that have made Cloud Run popular with developers.

A standout feature of this release is its pay-per-second billing model, which charges users only for the GPU resources they actually use. This helps reduce waste and closely matches costs with actual workload demands. Cloud Run also supports scaling down GPU instances to zero when idle, avoiding unnecessary expenses. This makes it ideal for workloads that are irregular or unpredictable.

Another advantage is fast startup times, with GPU-enabled instances launching in under five seconds. This quick response is essential for applications that must adapt to changing demand or deliver real-time output, such as interactive AI services or live data processing. Cloud Run also supports HTTP and WebSocket streaming, making it well-suited for real-time applications, including those powered by large language models (LLMs).

NVIDIA has praised the move, noting that serverless GPU access lowers the entry barrier for AI development. Developers can easily enable NVIDIA L4 GPUs through a command-line flag or a checkbox in the Google Cloud console. There’s no need for quota requests, so GPU resources are available to all users instantly.

Cloud Run with GPU support is backed by Google Cloud’s Service Level Agreement (SLA), providing the reliability and uptime needed for production workloads. The service includes zonal redundancy by default for improved resilience, while also offering a more affordable option for users willing to accept best-effort failover during zonal outages.

The launch has sparked comparisons with other cloud platforms. Observers point out that Google is addressing a gap left open by competitors like AWS Lambda, which still only supports CPU-based compute and enforces a 15-minute execution time limit. This restricts Lambda’s usefulness for modern AI tasks like model fine-tuning or real-time video processing. In contrast, Cloud Run’s GPU support allows these jobs to run efficiently with automatic scaling.

Still, not all feedback has been positive. Some users have expressed concern about unexpected costs, since Cloud Run doesn’t yet offer hard billing limits based on dollar amounts. While it’s possible to set instance limits, there’s currently no built-in way to cap total spending—potentially making budgeting harder. Others have noted that alternative services like Runpod.io may offer lower prices for comparable GPU resources.

In addition to real-time inference, Google has introduced GPU support for Cloud Run jobs, currently in private preview. This opens the door to more use cases involving batch processing and asynchronous tasks, further extending the platform’s potential.

At launch, Cloud Run GPUs are available in five regions: Iowa (us-central1), Belgium (europe-west1), Netherlands (europe-west4), Singapore (asia-southeast1), and Mumbai (asia-south1). More regions are expected to follow. Developers are encouraged to consult Google’s official documentation for best practices and optimization tips.

In conclusion, the addition of serverless GPU support to Google Cloud Run is a strategic upgrade that enhances its appeal for AI and batch processing workloads. It offers developers a scalable, flexible, and production-ready environment for running GPU-accelerated tasks.

Share this post
Gödel machine: AI that develops itself
Imagine a computer program that can independently modify its own code without human intervention to become even better and smarter! This futuristic-sounding concept is called the “Gödel machine.”
DeepSeek aims to corner OpenAI's o3 model with its more advanced R1 model
DeepSeek R1-0528, the latest development from Chinese company DeepSeek, represents a significant advance in the reasoning capabilities of artificial intelligence models. The new model is based on the January DeepSeek R1 and is an improved version of it. According to the company, the performance of DeepSeek R1-0528 already rivals that of OpenAI's o3 model and approaches the capabilities of Google Gemini 2.5 Pro.
Gemini Advanced Strengthens GitHub Integration
There is no shortage of innovation in the world of AI-based development tools. Google has now announced direct GitHub integration for its premium AI assistant, Gemini Advanced. This move is not only a response to similar developments by its competitor OpenAI, but also a significant step forward in improving developer workflows.
Apple Plans Its Own “Vibe-Coding” Platform in Partnership with Anthropic
Apple has encountered several challenges in developing its own AI solutions recently, so it’s perhaps unsurprising that the company is turning to external expertise. According to the latest reports, Apple has decided to join forces with Anthropic to create a revolutionary “vibe-coding” software platform that uses generative AI to write, edit, and test programmers’ code.