Case Study

Multi-Bot Trading Framework

Two production crypto trading bots running 24/7 on Kraken + Alpaca, backed by a 3-year backtest harness and weekly walk-forward analysis.

Live in production2 bots, 24/7Backtest-validatedPython · SQLite · systemd

The brief

Owner ran one trading bot, manually monitored, with ad-hoc state files and no backtest discipline. Wanted to scale to multiple bots across venues without doubling the babysitting load.

Hard constraints: zero downtime tolerance on live capital, no manual restarts, no silent failures, must reproduce backtest results within ±2% in live execution.

The approach

Started with a hard separation between strategy logic, execution, and infrastructure. Each bot is a single Python process with a typed state file in SQLite — never JSON in production. State writes are atomic (write to .tmp, rename) so a crash never corrupts.

All bots share a common watchdog daemon that runs systemd-style health checks every 30s. If a bot misses two heartbeats, watchdog restarts it via systemctl and sends a Telegram alert with the last 40 log lines. No silent zombies.

Backtest-vs-live drift was the killer risk. Built a walk-forward analysis (WFA) harness that runs every Friday on the prior week's data — if drift exceeds 2% on any bot, the dashboard flags it and trading auto-pauses until reviewed.

Per-coin tuned configs (step %, rungs, budget) come from WFA mode analysis, not from intuition. Each parameter has provenance — you can trace any live setting back to a specific backtest run.

Results

Zero unscheduled downtime on the live bots since deployment.
Backtest fidelity within ±1.5% verified against multi-week samples of live grid fills.
~2 hours/week of monitoring time, down from ~12 hours pre-framework.
Caught and prevented 3 silent capital-loss scenarios in the first quarter (precision bug, broker reconcile failure, ML gate shadow mode) before they cost real money.

Tech stack

Python 3.12 with type hints + pydantic for state validation
SQLite with atomic write pattern (tmp + rename)
systemd for process supervision
Custom watchdog with Telegram + ntfy alerting
Cron-driven WFA backtest pipeline
Internal admin dashboard (Flask) with health bar + per-bot drilldown

What this means for you

The same pattern works for any system where uptime matters and you want reproducible behavior from research to production: order processing, inventory rebalancing, lead routing, content publishing pipelines.

Trading is just the highest-stakes version. The discipline transfers.

Want something like this for your business?

Start with a free 30-min call — figure out fit before money changes hands.

Book free 30-min call Or start with the $349 audit