Google Cloud Run Adds GPU Support for AI and Batch Workloads

Google Cloud has officially launched general availability of NVIDIA GPU support for Cloud Run, marking a major step forward in its serverless platform. This update aims to give developers a cost-effective, scalable solution for GPU-powered tasks, especially those involving AI inference and batch processing. It addresses the rising need for accessible, production-ready GPU resources in the cloud—while preserving the key features that have made Cloud Run popular with developers.

A standout feature of this release is its pay-per-second billing model, which charges users only for the GPU resources they actually use. This helps reduce waste and closely matches costs with actual workload demands. Cloud Run also supports scaling down GPU instances to zero when idle, avoiding unnecessary expenses. This makes it ideal for workloads that are irregular or unpredictable.

Another advantage is fast startup times, with GPU-enabled instances launching in under five seconds. This quick response is essential for applications that must adapt to changing demand or deliver real-time output, such as interactive AI services or live data processing. Cloud Run also supports HTTP and WebSocket streaming, making it well-suited for real-time applications, including those powered by large language models (LLMs).

NVIDIA has praised the move, noting that serverless GPU access lowers the entry barrier for AI development. Developers can easily enable NVIDIA L4 GPUs through a command-line flag or a checkbox in the Google Cloud console. There’s no need for quota requests, so GPU resources are available to all users instantly.

Cloud Run with GPU support is backed by Google Cloud’s Service Level Agreement (SLA), providing the reliability and uptime needed for production workloads. The service includes zonal redundancy by default for improved resilience, while also offering a more affordable option for users willing to accept best-effort failover during zonal outages.

The launch has sparked comparisons with other cloud platforms. Observers point out that Google is addressing a gap left open by competitors like AWS Lambda, which still only supports CPU-based compute and enforces a 15-minute execution time limit. This restricts Lambda’s usefulness for modern AI tasks like model fine-tuning or real-time video processing. In contrast, Cloud Run’s GPU support allows these jobs to run efficiently with automatic scaling.

Still, not all feedback has been positive. Some users have expressed concern about unexpected costs, since Cloud Run doesn’t yet offer hard billing limits based on dollar amounts. While it’s possible to set instance limits, there’s currently no built-in way to cap total spending—potentially making budgeting harder. Others have noted that alternative services like Runpod.io may offer lower prices for comparable GPU resources.

In addition to real-time inference, Google has introduced GPU support for Cloud Run jobs, currently in private preview. This opens the door to more use cases involving batch processing and asynchronous tasks, further extending the platform’s potential.

At launch, Cloud Run GPUs are available in five regions: Iowa (us-central1), Belgium (europe-west1), Netherlands (europe-west4), Singapore (asia-southeast1), and Mumbai (asia-south1). More regions are expected to follow. Developers are encouraged to consult Google’s official documentation for best practices and optimization tips.

In conclusion, the addition of serverless GPU support to Google Cloud Run is a strategic upgrade that enhances its appeal for AI and batch processing workloads. It offers developers a scalable, flexible, and production-ready environment for running GPU-accelerated tasks.

Share this post
What is WhoFi?
Wireless internet, or WiFi, is now a ubiquitous and indispensable part of our lives. We use it to connect our devices to the internet, communicate, and exchange information. But imagine if this same technology, which invisibly weaves through our homes and cities, could also identify and track us without cameras—even through walls. This is not a distant science fiction scenario, but the reality of a newly developed technology called WhoFi, which harnesses a previously untapped property of WiFi signals. To complicate matters, the term “WhoFi” also refers to an entirely different service with community-focused goals, so it's important to clarify which meaning is being discussed.
China’s Own GPU Industry Is Slowly Awakening
“7G” is an abbreviation that sounds almost identical to the word for “miracle” in Chinese. Whether this is a lucky piece of marketing or a true technological prophecy remains to be seen. What Lisuan Technology is presenting with the 7G106—internally codenamed G100—is nothing less than the first serious attempt to step out of Nvidia and AMD’s shadow. No licensing agreements, no crutches based on Western intellectual property—this is a GPU built from scratch, manufactured using 6 nm DUV technology in a country that is only beginning to break free from the spell of Western technology exports.
Anticipation is high for the release of GPT-5 — but what should we really expect?
OpenAI’s upcoming language model, GPT-5, has become one of the most anticipated technological developments in recent months. Following the release of GPT-4o and the specialized o1 models, attention is now shifting to this next-generation model, which—according to rumors and hints from company leaders—may represent a significant leap forward in artificial intelligence capabilities. But what do we actually know so far, and what remains pure speculation?
After so many "I hate CSS" articles, how is it that CSS is still so successful?
If you've ever googled "why do developers hate CSS", you know there's a flood of complaints waiting to be found online. Thousands of forum posts, tweets, memes, and long blog entries lament CSS’s "incomprehensible behavior," its "not-really-programming" nature, or the idea that it's "just for designers." It almost feels like one of the unspoken rules of developer culture is that CSS is a necessary evil—something you use only when you absolutely have to, and preferably as quickly as possible. But if so many people hate it this much—why hasn't it disappeared? In fact, why do we see more and more non-web UI platforms adopting CSS-like semantics?
According to Replit's CEO, AI Will Make Programming More Human
The rise of artificial intelligence is transforming countless industries, and software development is no exception. While many fear that AI will take over jobs and bring about a dystopian future, Amjad Masad, CEO of Replit, sees it differently. He believes AI will make work more human, interactive, and versatile. He elaborated on this vision in an interview on Y Combinator’s YouTube channel, which serves as the primary source for this article.
A new era in software development
Over the past few decades, software development has fundamentally shaped our digital world, but the latest technological breakthroughs are ushering in a new era in which computer programming is undergoing a radical transformation. According to Andrej Karpathy, former director of artificial intelligence at Tesla, software development has accelerated dramatically in recent years after decades of slow change, fundamentally rewriting our understanding of programming.