Try it Free

How We Replaced Goroutines With a Database Queue for Video Transcription

In our last post about transcription, we showed how we added automatic video transcription with whisper.cpp. The approach was simple: after each upload, fire a goroutine that acquires a semaphore, runs whisper, and updates the database. It worked — until it didn’t.

Three problems pushed us to replace the goroutine-based approach with a database-backed queue.

What went wrong

Jobs lost on restart. The semaphore and the goroutines waiting on it lived in memory. Deploy the app, and any video mid-transcription or waiting in the queue simply vanishes. The video shows “Transcribing…” forever, but the goroutine that was doing the work is gone.

Queued jobs were invisible. When a video was uploaded, its transcript_status stayed at none until the goroutine actually acquired the semaphore and set it to processing. If two videos were uploaded back-to-back, the second one sat in memory with no indication to the user that transcription was queued. The library showed nothing — no “pending” badge, no progress, just silence.

No stuck job recovery. If the server crashed mid-transcription, the video’s status stayed at processing permanently. No goroutine would come back to retry it. The only fix was manually updating the database.

All three problems stem from the same root cause: the queue was in memory. The database knew nothing about pending work. The fix was obvious — move the queue into the database.

Why not Redis or RabbitMQ

The standard answer to “I need a job queue” is to reach for a dedicated queue: Redis with Sidekiq, RabbitMQ, SQS, or a dozen other options. Each adds a service to deploy, monitor, and keep running.

SendRec runs on a single server. We already have PostgreSQL. The transcription throughput is low — a handful of jobs per hour at most, each taking 30-60 seconds. We don’t need sub-millisecond dequeue latency or millions of messages per second.

PostgreSQL has FOR UPDATE SKIP LOCKED, which is purpose-built for exactly this pattern. A worker claims a row atomically, processes it, and updates the status. Other workers (or future iterations of the same worker) skip locked rows and grab the next one. It’s a job queue in three SQL keywords.

The decision was simple: no new dependencies, no new failure modes, no new things to monitor. Just SQL.

The status machine

The existing transcript_status column already tracked none, processing, ready, and failed. We added one value — pending — and a timestamp column for stuck detection:

ALTER TABLE videos DROP CONSTRAINT IF EXISTS videos_transcript_status_check;
ALTER TABLE videos ADD CONSTRAINT videos_transcript_status_check
    CHECK (transcript_status IN ('none', 'pending', 'processing', 'ready', 'failed'));

ALTER TABLE videos ADD COLUMN transcript_started_at TIMESTAMPTZ;

CREATE INDEX idx_videos_transcript_pending ON videos (transcript_status)
    WHERE transcript_status IN ('pending', 'processing');

The flow is: nonependingprocessingready or failed.

Every call site that previously fired a goroutine now calls a single function:

func EnqueueTranscription(ctx context.Context, db database.DBTX, videoID string) error {
    _, err := db.Exec(ctx,
        `UPDATE videos SET transcript_status = 'pending', updated_at = now()
         WHERE id = $1 AND status != 'deleted'`,
        videoID,
    )
    return err
}

One UPDATE, no goroutine, no semaphore. The video immediately shows as “pending” in the UI. The actual transcription happens later, when the worker picks it up.

The worker

A background goroutine polls the database every 5 seconds. Each iteration does one thing: try to claim the next pending job.

func StartTranscriptionWorker(ctx context.Context, db database.DBTX,
    storage ObjectStorage, interval time.Duration) {
    go func() {
        ticker := time.NewTicker(interval)
        defer ticker.Stop()
        for {
            select {
            case <-ctx.Done():
                return
            case <-ticker.C:
                processNextTranscription(ctx, db, storage)
            }
        }
    }()
}

The claim query is the interesting part. It uses a subquery with FOR UPDATE SKIP LOCKED to atomically select and update a single row:

UPDATE videos
SET transcript_status = 'processing',
    transcript_started_at = now(),
    updated_at = now()
WHERE id = (
    SELECT id FROM videos
    WHERE transcript_status = 'pending' AND status != 'deleted'
    ORDER BY updated_at ASC LIMIT 1
    FOR UPDATE SKIP LOCKED
)
RETURNING id, file_key, user_id, share_token

FOR UPDATE locks the selected row. SKIP LOCKED means if another transaction already holds the lock, skip that row and grab the next one. Combined with LIMIT 1, this gives us exactly-once delivery: no two workers will ever process the same video.

In practice we only run one worker, but the pattern scales to multiple workers without code changes — just start more instances.

Compare this to the old approach:

// Before: in-memory semaphore
var transcriptionSemaphore = make(chan struct{}, 1)

func TranscribeVideo(ctx context.Context, ...) {
    select {
    case transcriptionSemaphore <- struct{}{}:
        defer func() { <-transcriptionSemaphore }()
    case <-ctx.Done():
        return
    }
    // ... transcribe
}

The semaphore did concurrency control but nothing else. No persistence, no visibility, no recovery. The database query does all three.

Stuck job recovery

The worker’s first action on each tick — before claiming a new job — is to reset stuck jobs:

db.Exec(ctx,
    `UPDATE videos
     SET transcript_status = 'pending',
         transcript_started_at = NULL,
         updated_at = now()
     WHERE transcript_status = 'processing'
       AND (transcript_started_at < now() - INTERVAL '10 minutes'
            OR transcript_started_at IS NULL)`,
)

Any video that’s been in processing for more than 10 minutes gets reset to pending so the worker will pick it up again.

The OR transcript_started_at IS NULL clause is there for a subtle reason. When we deployed the migration, existing videos that were stuck in processing from before the migration had a NULL transcript_started_at (the column didn’t exist yet). In SQL, NULL < timestamp evaluates to NULL, not true — so the time comparison alone wouldn’t catch them. This is the kind of thing you only discover when you deploy to a database with real data.

When whisper isn’t available

The old approach silently skipped transcription when whisper wasn’t installed. This was fine for a fire-and-forget goroutine, but with a persistent queue it creates an infinite loop: the worker claims a job, finds whisper is missing, resets to pending, claims it again 5 seconds later, forever.

The fix: when whisper is unavailable, mark the video as failed instead of pending. The user sees a “Retry transcript” button in the library if they install whisper later. No infinite loop.

if !isTranscriptionAvailable() {
    db.Exec(ctx,
        `UPDATE videos SET transcript_status = 'failed',
             transcript_started_at = NULL, updated_at = now()
         WHERE id = $1`, videoID)
    return
}

What the user sees

The biggest UX improvement is visibility. Before, users had no idea their video was waiting for transcription. Now the library shows distinct states:

  • “Pending transcription…” — the video is in the queue, waiting its turn
  • “Transcribing…” — the worker is actively processing this video
  • “Retry transcript” — transcription failed, click to re-enqueue

The watch page works the same way. If a viewer arrives while transcription is queued, they see “Transcription queued…” instead of an empty space. A polling loop checks the API every 10 seconds and reloads when the transcript is ready — or stops polling if transcription failed, instead of spinning forever.

Upload two videos quickly, and you can see the queue in action: the first shows “Transcribing…” while the second shows “Pending transcription…” Both update automatically when their turn comes.

What changed in the codebase

The migration was a single SQL file. The worker is 70 lines of Go. The EnqueueTranscription function is 7 lines. The old semaphore, the goroutine spawn in the upload handler, and the inline status transitions were all deleted.

Four call sites — upload, retranscribe, webcam compositing, and video trimming — all collapsed into the same one-line call: EnqueueTranscription(ctx, db, videoID). The worker handles everything else.

No new tables. No new services. No new dependencies. Just a new column value, a timestamp, and a polling loop.

Try it

SendRec is open source (AGPL-3.0) and self-hostable. The transcription queue is live at app.sendrec.eu. The worker code is in transcribe_worker.go and the migration is in 000016_add_transcript_pending_status.up.sql.