Even with perfect waits, stable locators, and independent test data, some test failures are genuinely beyond your control: a brief network blip, a transient server error, a CDN cache miss, or a garbage collection pause in the browser. These rare, unreproducible failures — true flakes — need a management strategy that does not mask real defects. The solution is intelligent retry with tracking: retry failed tests automatically, but track the retry rate and investigate any test that needs retries frequently.
Retry Mechanisms — Controlled Retries Without Masking Defects
The pytest-rerunfailures plugin provides configurable retry logic that re-runs failed tests a specified number of times before marking them as truly failed.
# ── pytest-rerunfailures ──
# Install: pip install pytest-rerunfailures
# Run with 2 retries for all tests:
# pytest --reruns 2
# Run with 2 retries and 5-second delay between retries:
# pytest --reruns 2 --reruns-delay 5
# Mark specific tests for retry (when only some are flaky):
# @pytest.mark.flaky(reruns=3, reruns_delay=2)
# def test_payment_gateway():
# ...
# ── Flake tracking dashboard ──
# Track retries over time to identify and fix chronic flakes
import json
from datetime import datetime
from pathlib import Path
class FlakeTracker:
FLAKE_LOG = Path("reports/flake_log.json")
@classmethod
def record_retry(cls, test_name, attempt, passed, error_msg=""):
cls.FLAKE_LOG.parent.mkdir(parents=True, exist_ok=True)
entry = {
"timestamp": datetime.now().isoformat(),
"test": test_name,
"attempt": attempt,
"passed": passed,
"error": error_msg[:200] if error_msg else "",
}
# Append to log
existing = []
if cls.FLAKE_LOG.exists():
existing = json.loads(cls.FLAKE_LOG.read_text())
existing.append(entry)
cls.FLAKE_LOG.write_text(json.dumps(existing, indent=2))
@classmethod
def get_top_flakes(cls, top_n=10):
if not cls.FLAKE_LOG.exists():
return []
entries = json.loads(cls.FLAKE_LOG.read_text())
# Count retries per test
retry_counts = {}
for e in entries:
if e["attempt"] > 1: # Only count retries, not first attempts
retry_counts[e["test"]] = retry_counts.get(e["test"], 0) + 1
# Sort by frequency
sorted_flakes = sorted(retry_counts.items(), key=lambda x: -x[1])
return sorted_flakes[:top_n]
# ── Flake management strategy ──
FLAKE_STRATEGY = [
{
"rule": "Retry threshold: maximum 2 retries per test",
"why": "More than 2 retries masks real defects and slows the suite",
},
{
"rule": "Track every retry in a flake log",
"why": "Visibility into which tests flake and how often",
},
{
"rule": "Review top flakers weekly",
"why": "Tests that retry > 3 times/week need root cause investigation",
},
{
"rule": "Set a team flake budget: < 2% of total test runs",
"why": "If > 2% of runs involve retries, the suite has systemic issues",
},
{
"rule": "Quarantine chronic flakes after investigation",
"why": "Move persistently flaky tests to a separate run; fix or delete them",
},
{
"rule": "Never add retries to a new test",
"why": "New tests should pass reliably. If a new test is flaky, fix it immediately",
},
]
print("Flake Management Strategy")
print("=" * 60)
for rule in FLAKE_STRATEGY:
print(f"\n Rule: {rule['rule']}")
print(f" Why: {rule['why']}")
print("\n\npytest-rerunfailures Commands:")
print(" pytest --reruns 2 # Retry all failures twice")
print(" pytest --reruns 2 --reruns-delay 5 # 5s delay between retries")
print(" @pytest.mark.flaky(reruns=3, reruns_delay=2) # Per-test retry config")
Common Mistakes
Mistake 1 — Adding retries to every test with a high retry count
❌ Wrong: pytest --reruns 5 globally — masks real defects and adds up to 5x execution time for failing tests.
✅ Correct: pytest --reruns 2 globally (conservative), with @pytest.mark.flaky(reruns=3) only on specific tests known to be affected by external transient conditions (payment gateway, third-party API).
Mistake 2 — Not tracking or reviewing retry data
❌ Wrong: Retries are configured, CI is green, nobody looks at how many retries are occurring.
✅ Correct: A weekly flake review where the team examines the top 5 most-retried tests, investigates root causes, and either fixes the underlying issue or quarantines the test.