Distributed ffmpeg transcoding with NVENC, EF Core and Azure SQL

I went to spin up an Azure GPU VM to transcode video at scale. The platform is orbsen-vid, a video hosting service I have been building for course creators. It uploads MP4s to Azure Blob, transcodes them to HLS adaptive bitrate using ffmpeg, and serves the result through a CDN. The transcoding part is the expensive bit.

I picked a VM family, checked the sizes, clicked through the regions. Every NC, NV, and ND series VM came back: no capacity available. East US, West Europe, UK South, all of them. The AI bubble has cleaned out every GPU-capable VM in Azure's consumer tiers. The hyperscalers are keeping them for their own inference workloads.

So I looked at what was already sitting under my desk.

An RTX 4060 Ti in a Windows box running WSL2

My development machine is a Windows desktop with an RTX 4060 Ti. It runs WSL2. That GPU is fully accessible from WSL via NVIDIA's CUDA for WSL drivers. ffmpeg can see it. NVENC works on it. And it was sitting there at 0% utilisation while I was looking at Azure pricing.

The question was whether I could point my transcode queue at two workers simultaneously: the Hetzner production server for web serving, and the WSL machine for heavy encoding. The answer was yes, but only if the job queue was shared state both machines could reach.

From SQLite to Azure SQL Server

Up to this point, the app used SQLite. One file, one server, simple. The problem with SQLite for a distributed queue is that you cannot have two machines writing to the same file over a network. The moment I wanted the Windows box to claim jobs from the queue, SQLite was the wrong shape.

I migrated the app to Azure SQL Server. Same Azure subscription, existing logical server, new database. The EF Core migration took about an hour. The connection string went into the systemd override.conf on Hetzner and into a gitignored appsettings.Production.json on WSL. Neither is in source control. Neither should be.

The clever part is the concurrency model. Two workers picking jobs from the same queue will race. EF Core has a [ConcurrencyCheck] attribute that makes it include the original column value in the WHERE clause of the UPDATE. If worker A claims a row a millisecond before worker B, worker B's UPDATE matches zero rows, EF throws a DbUpdateConcurrencyException, and the worker backs off and tries the next pending video. No locks. No queuing middleware. Forty lines of code.

NVENC: the GPU does the work the CPU used to

With UseGpu: true in config, the transcode command switches from libx264 to h264_nvenc and adds -hwaccel cuda to the ffmpeg arguments. The first time it worked I watched the CPU drop from 100% to 30% and the NVIDIA GPU encode counter climb to 36%.

Then I pushed it. Two concurrent jobs: 57% GPU encode, 92% on the 3D pipeline. Three concurrent jobs: 71% GPU encode, still comfortable. The GPU has headroom to spare.

One mistake I made: I passed -hwaccel_device {slotIndex} to target a specific GPU per concurrent slot, a pattern that makes sense on a multi-GPU machine. On a single GPU, slot index 1 is an invalid device. Every job on slot 1 failed with CUDA_ERROR_INVALID_DEVICE. The fix was one line: always pass -hwaccel_device 0. NVENC handles multiple concurrent encode sessions on a single device without any special configuration.

A shared CLI for both machines

The monitoring CLI, orbsen-vid transcode list, now shows which machine claimed each job. You see [ubuntu-4gb-nbg1-1] for Hetzner and [THEDEVPC] for the WSL machine, alongside the video owner's email and elapsed time.

The CLI reads its connection string from appsettings.Production.json on WSL and from the systemd environment on Hetzner. Same config stack as the web app.

What this cost

The SQL Server upgrade from Basic (5 DTUs) to Standard S1 (20 DTUs) was necessary once I pushed to two concurrent workers. The Basic tier throttled connections under load and the jobs started failing with cancellation errors. Standard S1 is about fifteen pounds a month more. With Azure credits, right now it is free.

The WSL encoder costs nothing. The hardware was already bought. The electricity is negligible compared to running a cloud VM around the clock.

The lesson

The AI bubble made Azure GPU capacity temporarily unavailable, which turned out to be useful. It forced me to look at what I already had. The answer was sitting under the desk drawing a fraction of a watt in standby, waiting to be useful.

Not every problem needs a cloud solution. Sometimes the right infrastructure is already in the room.

— Mícheál.