Case Study

AI Video Detection System

Computer vision pipeline for automated video analysis. Sub-second inference, GPU-optimized, CI/CD pipeline.

Live in productionComputer VisionPyTorch · ONNX · FFmpegGPU inference

The brief

Client had hours of video footage daily and needed automated detection of specific objects + events. The off-the-shelf cloud APIs (AWS Rekognition, Google Vision) didn't recognize the niche categories they cared about, and at their volume the per-frame pricing was a non-starter.

Constraint: had to run on a single mid-tier GPU machine, process near-real-time, and be reasonably accurate without a research-team-sized training budget.

The approach

Started with the right question: "what's the smallest model that gives 90% of the accuracy of the perfect model?" Tested a couple of strong open-source pretrained backbones, fine-tuned on ~500 labeled frames each, picked the one that fit the inference budget.

Built the pipeline as a stream of small stages — FFmpeg decode → frame sampling → model inference → result aggregation → notification. Each stage is a separate process, communicating via Unix pipes. If one stage stalls, you can replace it without touching the rest.

Used ONNX runtime instead of raw PyTorch for inference — 3-4× faster on the same hardware, simpler deploy artifact, no Python in the hot path.

CI/CD with model versioning: every model checkpoint gets a unique ID, the production system writes the model ID with every detection so you can always trace a result back to the exact weights that produced it. Critical for debugging false positives.

Results

Tech stack

What this means for you

The pattern — pick the smallest model that gives 90% of the accuracy you need, then build the pipeline around it — applies to any AI/ML deployment. Most teams overspec the model and underspec the infrastructure around it; the reverse pays off faster.

If you're doing >100K inferences/month and using a cloud vision API, the math on a self-hosted equivalent is usually worth a 30-minute conversation.

Want something like this for your business?

Start with a free 30-min call — figure out fit before money changes hands.