Detection note

Detection architecture and the 2014 / 2018 bot busts

Published 21 May 2026 · Updated 28 May 2026 · Reviewed by Raul Moriarty

Reverse-engineered notes on BetOnline's security stack from outside — behavioural fingerprinting and play-pattern analysis at a lower offline cadence than GGPoker, aggressive collusion graphs on the regulatory-exposure side, a smaller human-review queue, and the documented 2014 and 2018 bot cleanups.

Summary

BetOnline runs the same four-layer detection model as every operator — behavioural fingerprinting, statistical play-pattern analysis, anti-collusion graph models, human review — but at a smaller budget. The system is more reactive than proactive.
Collusion and multi-accounting are detected aggressively because they carry direct regulatory and financial exposure. Solver-anchored single-account bots usually only surface on review triggers: large withdrawals, formal complaints, anomalous long-sample winrates.
Two documented cleanups: a 2014 sweep of a single ring caught after forum-led pressure, and a larger 2018 action with refunds issued to affected players. Both were human-review-driven batch decisions, not realtime detection firing.
HUD policy is permissive in practice. Holdem Manager 3 and PokerTracker 4 run unobstructed, screen names are stable, and the long-horizon data-mined HUD attack that died at GGPoker still works here.
The bot-account lifetime distribution is bimodal: most accounts run for months or years uncaught; a minority are caught in batched human-review waves. The right account-risk model is bursty, not a single stationary detection probability.

What counts as cheating in BetOnline's terms

BetOnline's terms of service prohibit the same categories as every other operator, with each category mapping to a different signal stack, false-positive budget, and consequence path. The categories matter because the operator does not spend equal effort on all of them — regulatory exposure determines priority, not player-experience harm directly.

Category	Operator priority	Detection difficulty	Typical signal
Collusion / chip dumping	Highest (regulatory + financial)	Medium	Account graph + suspicious hand sequences
Multi-accounting	High	Low–Medium	Device fingerprint + crypto-wallet join + KYC
Botting (single account)	Medium	Medium-High at smaller scale	Behavioural fingerprint + play-pattern + review
Real-time assistance (RTA)	Medium	High	Statistical play-pattern over volume
Bot farms	High (caught via collusion graph)	Low–Medium	Shared fingerprint across many accounts
Ghosting in MTTs	Medium (spikes around Sunday flagship)	High	Win-rate vs known-skill baseline + IP joins

Collusion sits at the top because dumping rings drain the recreational pool and trigger regulatory complaints from licensed jurisdictions. Bot farms get caught primarily through the collusion graph, not the behavioural layer — when one operator runs fifty bot accounts on a single fingerprint, the graph layer makes the bots a side-effect of catching a farm. Single-account bots running carefully are not on the same priority tier.

The four-layer detection model at a smaller operator

The structure of the stack at BetOnline is the same as at GGPoker or PokerStars. The difference is budget — how often each layer runs, how big its queue is, how much compute and human time it consumes per night.

Layer 1: Behavioural fingerprinting. Client telemetry on input timing, mouse-path geometry on desktop, touch dwell on mobile, action-confirmation latency, idle behaviour between hands. Cheap to compute, runs continuously, feeds a per-session behavioural score. The Chico client collects this at lower fidelity than GGPoker — fewer signals, less aggressive instrumentation. Naive constant-latency bots still get flagged; carefully shaped behavioural noise passes consistently.

Layer 2: Statistical play-pattern analysis. Per-account distributional analysis on VPIP, PFR, 3-bet by position, fold-to-cbet by board texture, bet-sizing histograms, river aggression, all-in equity at showdown. Heavy compute. At a smaller operator the cadence is slower — anecdotally weekly to monthly rather than nightly — and the per-account risk score decays before it converts to action unless human review escalates.

Layer 3: Anti-collusion graph models. Account graphs joined by IP, device fingerprint, deposit method, KYC document, table co-occurrence, action correlation within hands. On BetOnline specifically, deposit method matters more than at fiat-only rooms because shared crypto wallets are a strong join key. This is where the operator spends; it catches the high-impact multi-accounting and chip-dumping cases that hurt the room financially.

Layer 4: Human review. The decisive layer. Reviewers consult model output and read hand history, chat logs, sit-out behaviour, session patterns. Volume is the differentiator at BetOnline — fewer reviewers, longer queue, slower cycle. Most bot bans here are signed off by a person, often after a triggering event has moved the account up the queue.

The asynchronous weighting matters. Layer 1 fires continuously and mostly stays below threshold. Layer 2 produces a slowly-decaying per-account score. Layer 3 fires event-driven on graph changes. Layer 4 is the bottleneck, and its queue is prioritised by combined risk score, expected revenue impact, recent withdrawal activity, and — uniquely visible at BetOnline — by external pressure events such as forum complaints, media coverage, and public refund demands.

2014 and 2018: the documented bot cleanups

Two public events anchor the empirical picture. Both were batched human-review actions, both were triggered by external pressure rather than the system firing in real time, and both led to refunds — the strongest public signal of an operator admitting detection lagged reality.

The 2014 incident involved a single botting ring detected after a series of TwoPlusTwo forum threads documenting suspiciously similar play patterns across multiple accounts. BetOnline acknowledged the issue, banned the accounts, and processed limited refunds. The scale was small — a handful of accounts — but the timeline, weeks of forum pressure before action, revealed the reactive cadence of the review queue.

The 2018 action was larger. Multiple bot rings were caught in a coordinated sweep after extended forum and media coverage. The cleanup ran over several weeks and ended in account closures, balance confiscations, and refunds to opponents who had played significant volume against the offending accounts. The operator published no detection-system internals (none ever do), but the pattern matched 2014: the behavioural and play-pattern signals had likely been visible for months; the queue advanced under external pressure.

Two structural inferences follow. The per-account detection probability inside a quiet stretch is meaningfully lower than the adversarial-classification literature would predict for a stationary classifier. And the per-account detection probability inside a sweep is meaningfully higher — accounts running an obvious play-pattern signal that had been ignored for months get caught in a single batch action.

Signal weights and observable failure modes

Exact signal weights are operator-confidential. The relative weights below are inferred from the observable pattern of which accounts get caught, in what sequence, and after what triggering events.

Signal	Layer	Relative weight	Naive failure mode
Action-timing variance < population	L1	Medium-High	Constant-latency action emission
Mouse-path linearity on desktop	L1	Medium	Straight-line cursor on every action
Idle behaviour too uniform	L1	Low-Medium	No tab-switch, no chat, no micro-movement
VPIP/PFR at population mass, low variance	L2	High	Pure GTO baseline, no human-noise overlay
Bet sizing clustered on exact pot fractions	L2	High	Solver output without sizing perturbation
Win rate persistently outside skill envelope	L2	Very High	Sustained high winrate, no human sessions
Shared crypto wallet across accounts	L3	Very High	Bot farm funded from one BTC/ETH address
Shared device fingerprint across accounts	L3	Very High	Bot farm on one IP / device
Large first withdrawal after long quiet	L3+L4	High	Patient grind, then big-bang cashout
External forum complaint or media coverage	L4	Very High (event-driven)	Account named in a public thread
Zero outgoing chat over 5k+ hands	L4	Medium	Bot never says "nh"

The signal pattern that gets accounts caught is consistent across 2014 and 2018: an L2 statistical-outlier score that accumulated quietly for months, plus an L4 triggering event — usually external — that pushed the account from "flagged in the long tail" to "actioned." The accounts that survive stay near population distributions on L1 and L2 and avoid the L4 triggers. That is not a checklist; it is a description of where the detection frontier sits empirically.

Action-timing fingerprints

Action-timing distributions are the most-discussed and worst-implemented signal in the bot literature. A naive implementation fires at constant intervals or with uniform noise around a centroid — both statistically catastrophic.

Real human action-timing distributions are log-normal in shape, with heavy right tails, and the location parameter conditions on decision difficulty. A snap-fold on a trash hand resolves in 600–1200ms. A routine flop continuation-bet on a clean board lands in 1.5–4 seconds. A boundary river call against a triple-barrel sits in the 5–30 second range. Distraction events — phone notification, conversation, bathroom break — produce an independent 8–25 second tail at roughly 3% per action. The shape of the distribution is the fingerprint, not the mean.

# Schematic: behaviourally-shaped action timing (not the production code)
def sample_action_delay(decision_difficulty, action_type):
    mu_base = {
        'fold_trivial':   math.log(0.9),
        'cbet_routine':   math.log(2.4),
        'check_routine':  math.log(1.6),
        'river_boundary': math.log(8.5),
        'all_in_decision':math.log(12.0),
    }[action_type]
    mu = mu_base + 0.7 * decision_difficulty
    sigma = 0.35 + 0.55 * decision_difficulty
    delay = random.lognormvariate(mu, sigma)
    if random.random() < 0.03:           # ~3% distraction tail
        delay += random.uniform(8, 25)
    return max(0.25, delay)              # humans cannot react under 250ms

Production systems condition on more state — stack depth, opponent action sequence, position, multiway versus heads-up, table count, a session-alertness parameter that drifts down over long sessions to mimic fatigue. The right behaviour is not "add noise"; it is "draw from a distribution whose shape matches the population, conditioned on state."

Anti-detection as adversarial classification

The standard mistake is to treat detection as a feature checklist — add latency noise, vary touch coordinates, randomise schedule, never play more than eight hours. Wrong frame. Detection is an adversarial classifier: the operator builds a model that separates bot behaviour from human behaviour, and the bot's task is to produce a behaviour distribution the classifier cannot separate from the population while preserving EV.

The formal literature begins with Dalvi et al. (2004), Adversarial Classification, KDD, and Lowd & Meek (2005), Adversarial Learning, KDD. An attacker chooses an action distribution that maximises expected utility under a classifier whose decision boundary it can probe but not fully observe. The modern adversarial-ML lineage (Goodfellow et al. 2014 onward) extends this with neural classifiers and gradient-based probing.

Three operational consequences. The classifier's decision boundary is non-stationary — operators retrain, and behaviour that passed in BetOnline's quiet stretch may not pass during a sweep, so the right risk model is a non-stationary detection probability. The population baseline is the reference, not "looking human" — the target is statistical indistinguishability from the pool's specific bet-sizing and timing distributions. And EV versus detection is the right optimisation — pure-GTO output maximises EV per hand but is more separable from the population, so it gets caught faster; the right operating point is EV-maximising under a budgeted detection probability over the account's expected lifetime, with the lifetime model accounting for BetOnline's bursty enforcement.

References

Brown & Sandholm, 2019. Superhuman AI for multiplayer poker. Science 365 (Pluribus). arXiv:1905.10311.
Moravčík et al., 2017. DeepStack: Expert-level AI in heads-up no-limit poker. Science 356. arXiv:1701.01724.
Brown & Sandholm, 2017. Safe and nested subgame solving for imperfect-information games. NeurIPS (Libratus).
Dalvi, Domingos, Mausam, Sanghai & Verma, 2004. Adversarial Classification. KDD.
Lowd & Meek, 2005. Adversarial Learning. KDD.
Heinrich & Silver, 2016. Deep RL from Self-Play in Imperfect-Information Games. NIPS DRL workshop. arXiv:1603.01121.

The companion notes cover the state of bots and "hacks" and whether it is safe to play here.

Working on a BetOnline-side project?

Implementation questions, data, corrections — the chat is read by the Poker Bot AI team. Low volume, high signal.

Talk to the team on Telegram

Reviewed by Raul Moriarty

Poker software research

Fifteen years across software engineering, business development, and online poker technology. Notes here are revised when the field changes, not on a schedule.