Stripe Radar false positives — how to tune the rules
- Radar defaults are tuned for card testing, not for your specific business — most operators over-block.
- Tuning requires data: false positive rate, actual chargeback rate, decline-to-block ratio.
- Custom rules beat tuning thresholds — specific merchant patterns outperform generic Radar sensitivity knobs.
On this page
Stripe Radar is a good fraud tool that ships with too-aggressive defaults for most multi-brand operators. If you have never touched the rules, you are probably over-blocking good customers and leaving 2-4% of authorizations on the floor for no chargeback benefit.
Here is the tuning playbook that actually works, based on Radar configurations we have run across 80+ merchant accounts.
1. Baseline your current Radar performance
Pull 30 days of Radar data. Calculate: block rate (% of charges blocked), chargeback rate on unblocked charges, dispute rate on blocked charges that eventually got through (via retry, manual review, or customer contact). If block rate > 3% and chargeback rate < 0.5%, you are over-blocking.
2. Customer evidence of over-blocking
Search support tickets for "declined," "blocked," "rejected." Count false positive complaints. If you have more than one per 500 orders, Radar is over-blocking your customer base.
3. Identify the over-blocking rule
Radar has built-in rules plus custom rules. Go to Dashboard → Radar → Rules. Sort by block count. The top 3-5 rules account for 80%+ of blocks. These are your candidates for tuning.
4. Default rules that commonly over-block
(a) "Block if CVC check fails" — overly strict; many legitimate customers mis-type CVC or use Apple Pay tokens that confuse matching.
(b) "Block if high risk" — Radar's risk score over 75 triggers; this threshold is often too aggressive for high-value customers.
(c) "Review if IP country does not match card country" — standard travel triggers this; ~30% false positive rate.
(d) "Block if distinct email on card in past 7 days" — shared families/offices trigger this frequently.
5. Replacing default rules with custom
Instead of "block if risk score > 75," try "review if risk score > 75 AND amount > $X" where X is your 90th percentile order value. High-risk score + typical amount = usually fine. High-risk score + unusually large amount = actually risky.
6. Velocity rules that work
(a) "Block if same card used 5+ times in 1 hour across multiple emails" — catches card testing.
(b) "Review if same email attempted 5+ distinct cards in 30 minutes" — catches card stuffing.
(c) "Block if same IP address attempted 10+ charges in 5 minutes" — catches botting.
These are surgical. They catch actual fraud patterns without hitting legitimate customers.
7. Descriptor-based rules
If you run multiple brands, rules can vary per brand. "Review if amount > $1500 on brand A" vs "Review if amount > $250 on brand B" based on historical fraud rates per brand. Radar does not do this natively; you build it in the rule editor.
8. Allow-listing
Known good customers (100+ orders, 0 disputes) should be on an allow-list that bypasses Radar evaluation. Stripe lets you create customer allow-lists via metadata. Reduces false positives on your best customers.
9. 3DS step-up as a tuning tool
Instead of blocking high-risk charges, step them up to 3DS. Liability shift on fraud, customer completes if legitimate. "Require 3DS if risk score > 65" is often better than "Block if risk score > 65."
10. Test changes safely
Radar has a "preview" mode for new rules — runs them in shadow without actually blocking. Run 7-14 days in preview, compare to production, promote rules that reduce false positives without increasing chargebacks.
11. BIN-level tuning
Certain BINs have structurally higher Radar false positive rates (corporate cards, prepaid, some international). Add BIN-specific rules to relax scoring for BINs that historically approved despite high Radar score.
12. Multi-brand Radar deduplication
If you run 8 brands on 8 Stripe accounts, each account has its own Radar instance. Fraud patterns on brand A do not inform brand B. This is a structural limit that orchestration layers solve by aggregating fraud intelligence across rails. See Radar vs Signifyd vs Kount.
Tuning workflow
- Baseline 30-day Radar metrics.
- Identify top 3-5 blocking rules.
- Write custom replacements; run in preview.
- Measure false positive reduction vs chargeback impact.
- Promote rules, remove old defaults.
- Review monthly; fraud patterns shift.
Common mistakes
(a) Turning off all defaults at once — exposes you to card testing. (b) Not measuring false positives — assuming blocks are all fraud. (c) Not reviewing monthly — fraud tactics evolve and rules go stale. (d) Building rules emotionally after one bad chargeback.
Radar Premium
Radar for Teams ($0.07/charge vs $0.05 standard) unlocks custom rules beyond thresholds. Required for the tuning described here at any serious scale. Worth the cost on $100k+/month operators.
The orchestration alternative
At multi-brand scale, operators layer Signifyd, Kount, or Sift over Radar for cross-rail intelligence. Or use orchestration to route by rail-specific fraud profile. See pricing for the orchestrated fraud stack or apply for a Radar tuning audit on your current volume.
13. The shadow testing discipline
Preview mode (shadow) on new rules is under-used by operators. Ship the new rule in preview, monitor 7-14 days, compare blocks in preview vs production for same charges. If preview would have blocked 100 but your current production rule blocked 150 — and chargeback rate on the non-blocked 50 is low — the new rule is better. Promote it. This is how tuning without breaking things actually works.
14. Radar Lists as segmentation tool
Radar Lists let you create reusable collections — allow-list of known customers, block-list of known-bad IPs, country groups, etc. Instead of hardcoding values in rules, reference lists. Easier to maintain, easier to audit. Most operators never use Lists and end up with fragmented rule logic.
15. Network Tokens and fraud intersection
Network tokenized cards have lower fraud rates than PAN-stored cards because the token is device-bound. Your Radar rules should be less aggressive on network-token transactions. Add a condition: "Do not apply rule X if network_token is true."