Google Cloud Run Adds GPU Support for AI and Batch Workloads

Google Cloud has officially launched general availability of NVIDIA GPU support for Cloud Run, marking a major step forward in its serverless platform. This update aims to give developers a cost-effective, scalable solution for GPU-powered tasks, especially those involving AI inference and batch processing. It addresses the rising need for accessible, production-ready GPU resources in the cloud—while preserving the key features that have made Cloud Run popular with developers.

A standout feature of this release is its pay-per-second billing model, which charges users only for the GPU resources they actually use. This helps reduce waste and closely matches costs with actual workload demands. Cloud Run also supports scaling down GPU instances to zero when idle, avoiding unnecessary expenses. This makes it ideal for workloads that are irregular or unpredictable.

Another advantage is fast startup times, with GPU-enabled instances launching in under five seconds. This quick response is essential for applications that must adapt to changing demand or deliver real-time output, such as interactive AI services or live data processing. Cloud Run also supports HTTP and WebSocket streaming, making it well-suited for real-time applications, including those powered by large language models (LLMs).

NVIDIA has praised the move, noting that serverless GPU access lowers the entry barrier for AI development. Developers can easily enable NVIDIA L4 GPUs through a command-line flag or a checkbox in the Google Cloud console. There’s no need for quota requests, so GPU resources are available to all users instantly.

Cloud Run with GPU support is backed by Google Cloud’s Service Level Agreement (SLA), providing the reliability and uptime needed for production workloads. The service includes zonal redundancy by default for improved resilience, while also offering a more affordable option for users willing to accept best-effort failover during zonal outages.

The launch has sparked comparisons with other cloud platforms. Observers point out that Google is addressing a gap left open by competitors like AWS Lambda, which still only supports CPU-based compute and enforces a 15-minute execution time limit. This restricts Lambda’s usefulness for modern AI tasks like model fine-tuning or real-time video processing. In contrast, Cloud Run’s GPU support allows these jobs to run efficiently with automatic scaling.

Still, not all feedback has been positive. Some users have expressed concern about unexpected costs, since Cloud Run doesn’t yet offer hard billing limits based on dollar amounts. While it’s possible to set instance limits, there’s currently no built-in way to cap total spending—potentially making budgeting harder. Others have noted that alternative services like Runpod.io may offer lower prices for comparable GPU resources.

In addition to real-time inference, Google has introduced GPU support for Cloud Run jobs, currently in private preview. This opens the door to more use cases involving batch processing and asynchronous tasks, further extending the platform’s potential.

At launch, Cloud Run GPUs are available in five regions: Iowa (us-central1), Belgium (europe-west1), Netherlands (europe-west4), Singapore (asia-southeast1), and Mumbai (asia-south1). More regions are expected to follow. Developers are encouraged to consult Google’s official documentation for best practices and optimization tips.

In conclusion, the addition of serverless GPU support to Google Cloud Run is a strategic upgrade that enhances its appeal for AI and batch processing workloads. It offers developers a scalable, flexible, and production-ready environment for running GPU-accelerated tasks.

Share this post
What Does the Rise of DiffuCoder and Diffusion Language Models Mean?
A new approach is now fundamentally challenging this linear paradigm: diffusion language models (dLLMs), which generate content not sequentially but globally, through iterative refinement. But are they truly better suited to code generation than the well-established AR models? And what insights can we gain from DiffuCoder, the first major open-source experiment in this field?
Apple's New AI Models Can Understand What’s on Your Screen
When we look at our phone's display, what we see feels obvious—icons, text, and buttons we’re used to. But how does artificial intelligence interpret that same interface? This question is at the heart of joint research between Apple and Finland’s Aalto University, resulting in a model called ILuvUI. This development isn’t just a technical milestone; it’s a major step toward enabling digital systems to truly understand how we use applications—and how they can assist us even more effectively.
Artificial Intelligence in the Service of Religion and the Occult
Imagine attending a religious service. The voice of the priest or rabbi is familiar, the message resonates deeply, and the sermon seems thoughtfully tailored to the lives of those present. Then it is revealed that neither the words nor the voice came from a human being—they were generated by artificial intelligence, trained on the speaker’s previous sermons. The surprise lies not only in the capabilities of the technology, but also in the realization that spirituality—so often viewed as timeless and intrinsically human—has found a new partner in the form of an algorithm. What does this shift mean for faith, religious communities, and our understanding of what it means to believe?
A new era in software development
Over the past few decades, software development has fundamentally shaped our digital world, but the latest technological breakthroughs are ushering in a new era in which computer programming is undergoing a radical transformation. According to Andrej Karpathy, former director of artificial intelligence at Tesla, software development has accelerated dramatically in recent years after decades of slow change, fundamentally rewriting our understanding of programming.
Gemini Advanced Strengthens GitHub Integration
There is no shortage of innovation in the world of AI-based development tools. Google has now announced direct GitHub integration for its premium AI assistant, Gemini Advanced. This move is not only a response to similar developments by its competitor OpenAI, but also a significant step forward in improving developer workflows.
Apple Plans Its Own “Vibe-Coding” Platform in Partnership with Anthropic
Apple has encountered several challenges in developing its own AI solutions recently, so it’s perhaps unsurprising that the company is turning to external expertise. According to the latest reports, Apple has decided to join forces with Anthropic to create a revolutionary “vibe-coding” software platform that uses generative AI to write, edit, and test programmers’ code.