Detection architecture and the 2014 / 2018 bot busts
Reverse-engineered notes on BetOnline's security stack from outside — behavioural fingerprinting and play-pattern analysis at a lower offline cadence than GGPoker, aggressive collusion graphs on the regulatory-exposure side, a smaller human-review queue, and the documented 2014 and 2018 bot cleanups.
Summary
- BetOnline runs the same four-layer detection model as every operator — behavioural fingerprinting, statistical play-pattern analysis, anti-collusion graph models, human review — but at a smaller budget. The system is more reactive than proactive.
- Collusion and multi-accounting are detected aggressively because they carry direct regulatory and financial exposure. Solver-anchored single-account bots usually only surface on review triggers: large withdrawals, formal complaints, anomalous long-sample winrates.
- Two documented cleanups: a 2014 sweep of a single ring caught after forum-led pressure, and a larger 2018 action with refunds issued to affected players. Both were human-review-driven batch decisions, not realtime detection firing.
- HUD policy is permissive in practice. Holdem Manager 3 and PokerTracker 4 run unobstructed, screen names are stable, and the long-horizon data-mined HUD attack that died at GGPoker still works here.
- The bot-account lifetime distribution is bimodal: most accounts run for months or years uncaught; a minority are caught in batched human-review waves. The right account-risk model is bursty, not a single stationary detection probability.
What counts as cheating in BetOnline's terms
BetOnline's terms of service prohibit the same categories as every other operator, with each category mapping to a different signal stack, false-positive budget, and consequence path. The categories matter because the operator does not spend equal effort on all of them — regulatory exposure determines priority, not player-experience harm directly.
| Category | Operator priority | Detection difficulty | Typical signal |
|---|---|---|---|
| Collusion / chip dumping | Highest (regulatory + financial) | Medium | Account graph + suspicious hand sequences |
| Multi-accounting | High | Low–Medium | Device fingerprint + crypto-wallet join + KYC |
| Botting (single account) | Medium | Medium-High at smaller scale | Behavioural fingerprint + play-pattern + review |
| Real-time assistance (RTA) | Medium | High | Statistical play-pattern over volume |
| Bot farms | High (caught via collusion graph) | Low–Medium | Shared fingerprint across many accounts |
| Ghosting in MTTs | Medium (spikes around Sunday flagship) | High | Win-rate vs known-skill baseline + IP joins |
Collusion sits at the top because dumping rings drain the recreational pool and trigger regulatory complaints from licensed jurisdictions. Bot farms get caught primarily through the collusion graph, not the behavioural layer — when one operator runs fifty bot accounts on a single fingerprint, the graph layer makes the bots a side-effect of catching a farm. Single-account bots running carefully are not on the same priority tier.
The four-layer detection model at a smaller operator
The structure of the stack at BetOnline is the same as at GGPoker or PokerStars. The difference is budget — how often each layer runs, how big its queue is, how much compute and human time it consumes per night.
Layer 1: Behavioural fingerprinting. Client telemetry on input timing, mouse-path geometry on desktop, touch dwell on mobile, action-confirmation latency, idle behaviour between hands. Cheap to compute, runs continuously, feeds a per-session behavioural score. The Chico client collects this at lower fidelity than GGPoker — fewer signals, less aggressive instrumentation. Naive constant-latency bots still get flagged; carefully shaped behavioural noise passes consistently.
Layer 2: Statistical play-pattern analysis. Per-account distributional analysis on VPIP, PFR, 3-bet by position, fold-to-cbet by board texture, bet-sizing histograms, river aggression, all-in equity at showdown. Heavy compute. At a smaller operator the cadence is slower — anecdotally weekly to monthly rather than nightly — and the per-account risk score decays before it converts to action unless human review escalates.
Layer 3: Anti-collusion graph models. Account graphs joined by IP, device fingerprint, deposit method, KYC document, table co-occurrence, action correlation within hands. On BetOnline specifically, deposit method matters more than at fiat-only rooms because shared crypto wallets are a strong join key. This is where the operator spends; it catches the high-impact multi-accounting and chip-dumping cases that hurt the room financially.
Layer 4: Human review. The decisive layer. Reviewers consult model output and read hand history, chat logs, sit-out behaviour, session patterns. Volume is the differentiator at BetOnline — fewer reviewers, longer queue, slower cycle. Most bot bans here are signed off by a person, often after a triggering event has moved the account up the queue.
The asynchronous weighting matters. Layer 1 fires continuously and mostly stays below threshold. Layer 2 produces a slowly-decaying per-account score. Layer 3 fires event-driven on graph changes. Layer 4 is the bottleneck, and its queue is prioritised by combined risk score, expected revenue impact, recent withdrawal activity, and — uniquely visible at BetOnline — by external pressure events such as forum complaints, media coverage, and public refund demands.
2014 and 2018: the documented bot cleanups
Two public events anchor the empirical picture. Both were batched human-review actions, both were triggered by external pressure rather than the system firing in real time, and both led to refunds — the strongest public signal of an operator admitting detection lagged reality.
The 2014 incident involved a single botting ring detected after a series of TwoPlusTwo forum threads documenting suspiciously similar play patterns across multiple accounts. BetOnline acknowledged the issue, banned the accounts, and processed limited refunds. The scale was small — a handful of accounts — but the timeline, weeks of forum pressure before action, revealed the reactive cadence of the review queue.
The 2018 action was larger. Multiple bot rings were caught in a coordinated sweep after extended forum and media coverage. The cleanup ran over several weeks and ended in account closures, balance confiscations, and refunds to opponents who had played significant volume against the offending accounts. The operator published no detection-system internals (none ever do), but the pattern matched 2014: the behavioural and play-pattern signals had likely been visible for months; the queue advanced under external pressure.
Two structural inferences follow. The per-account detection probability inside a quiet stretch is meaningfully lower than the adversarial-classification literature would predict for a stationary classifier. And the per-account detection probability inside a sweep is meaningfully higher — accounts running an obvious play-pattern signal that had been ignored for months get caught in a single batch action.
Signal weights and observable failure modes
Exact signal weights are operator-confidential. The relative weights below are inferred from the observable pattern of which accounts get caught, in what sequence, and after what triggering events.
| Signal | Layer | Relative weight | Naive failure mode |
|---|---|---|---|
| Action-timing variance < population | L1 | Medium-High | Constant-latency action emission |
| Mouse-path linearity on desktop | L1 | Medium | Straight-line cursor on every action |
| Idle behaviour too uniform | L1 | Low-Medium | No tab-switch, no chat, no micro-movement |
| VPIP/PFR at population mass, low variance | L2 | High | Pure GTO baseline, no human-noise overlay |
| Bet sizing clustered on exact pot fractions | L2 | High | Solver output without sizing perturbation |
| Win rate persistently outside skill envelope | L2 | Very High | Sustained high winrate, no human sessions |
| Shared crypto wallet across accounts | L3 | Very High | Bot farm funded from one BTC/ETH address |
| Shared device fingerprint across accounts | L3 | Very High | Bot farm on one IP / device |
| Large first withdrawal after long quiet | L3+L4 | High | Patient grind, then big-bang cashout |
| External forum complaint or media coverage | L4 | Very High (event-driven) | Account named in a public thread |
| Zero outgoing chat over 5k+ hands | L4 | Medium | Bot never says "nh" |
The signal pattern that gets accounts caught is consistent across 2014 and 2018: an L2 statistical-outlier score that accumulated quietly for months, plus an L4 triggering event — usually external — that pushed the account from "flagged in the long tail" to "actioned." The accounts that survive stay near population distributions on L1 and L2 and avoid the L4 triggers. That is not a checklist; it is a description of where the detection frontier sits empirically.
Action-timing fingerprints
Action-timing distributions are the most-discussed and worst-implemented signal in the bot literature. A naive implementation fires at constant intervals or with uniform noise around a centroid — both statistically catastrophic.
Real human action-timing distributions are log-normal in shape, with heavy right tails, and the location parameter conditions on decision difficulty. A snap-fold on a trash hand resolves in 600–1200ms. A routine flop continuation-bet on a clean board lands in 1.5–4 seconds. A boundary river call against a triple-barrel sits in the 5–30 second range. Distraction events — phone notification, conversation, bathroom break — produce an independent 8–25 second tail at roughly 3% per action. The shape of the distribution is the fingerprint, not the mean.
# Schematic: behaviourally-shaped action timing (not the production code)
def sample_action_delay(decision_difficulty, action_type):
mu_base = {
'fold_trivial': math.log(0.9),
'cbet_routine': math.log(2.4),
'check_routine': math.log(1.6),
'river_boundary': math.log(8.5),
'all_in_decision':math.log(12.0),
}[action_type]
mu = mu_base + 0.7 * decision_difficulty
sigma = 0.35 + 0.55 * decision_difficulty
delay = random.lognormvariate(mu, sigma)
if random.random() < 0.03: # ~3% distraction tail
delay += random.uniform(8, 25)
return max(0.25, delay) # humans cannot react under 250ms
Production systems condition on more state — stack depth, opponent action sequence, position, multiway versus heads-up, table count, a session-alertness parameter that drifts down over long sessions to mimic fatigue. The right behaviour is not "add noise"; it is "draw from a distribution whose shape matches the population, conditioned on state."
Anti-detection as adversarial classification
The standard mistake is to treat detection as a feature checklist — add latency noise, vary touch coordinates, randomise schedule, never play more than eight hours. Wrong frame. Detection is an adversarial classifier: the operator builds a model that separates bot behaviour from human behaviour, and the bot's task is to produce a behaviour distribution the classifier cannot separate from the population while preserving EV.
The formal literature begins with Dalvi et al. (2004), Adversarial Classification, KDD, and Lowd & Meek (2005), Adversarial Learning, KDD. An attacker chooses an action distribution that maximises expected utility under a classifier whose decision boundary it can probe but not fully observe. The modern adversarial-ML lineage (Goodfellow et al. 2014 onward) extends this with neural classifiers and gradient-based probing.
Three operational consequences. The classifier's decision boundary is non-stationary — operators retrain, and behaviour that passed in BetOnline's quiet stretch may not pass during a sweep, so the right risk model is a non-stationary detection probability. The population baseline is the reference, not "looking human" — the target is statistical indistinguishability from the pool's specific bet-sizing and timing distributions. And EV versus detection is the right optimisation — pure-GTO output maximises EV per hand but is more separable from the population, so it gets caught faster; the right operating point is EV-maximising under a budgeted detection probability over the account's expected lifetime, with the lifetime model accounting for BetOnline's bursty enforcement.
References
- Brown & Sandholm, 2019. Superhuman AI for multiplayer poker. Science 365 (Pluribus). arXiv:1905.10311.
- Moravčík et al., 2017. DeepStack: Expert-level AI in heads-up no-limit poker. Science 356. arXiv:1701.01724.
- Brown & Sandholm, 2017. Safe and nested subgame solving for imperfect-information games. NeurIPS (Libratus).
- Dalvi, Domingos, Mausam, Sanghai & Verma, 2004. Adversarial Classification. KDD.
- Lowd & Meek, 2005. Adversarial Learning. KDD.
- Heinrich & Silver, 2016. Deep RL from Self-Play in Imperfect-Information Games. NIPS DRL workshop. arXiv:1603.01121.
The companion notes cover the state of bots and "hacks" and whether it is safe to play here.
Working on a BetOnline-side project?
Implementation questions, data, corrections — the chat is read by the Poker Bot AI team. Low volume, high signal.
Talk to the team on Telegram
Fifteen years across software engineering, business development, and online poker technology. Notes here are revised when the field changes, not on a schedule.