Retry Mechanisms and Flake Management — Handling the Unavoidable

Even with perfect waits, stable locators, and independent test data, some test failures are genuinely beyond your control: a brief network blip, a transient server error, a CDN cache miss, or a garbage collection pause in the browser. These rare, unreproducible failures — true flakes — need a management strategy that does not mask real defects. The solution is intelligent retry with tracking: retry failed tests automatically, but track the retry rate and investigate any test that needs retries frequently.

Retry Mechanisms — Controlled Retries Without Masking Defects

The pytest-rerunfailures plugin provides configurable retry logic that re-runs failed tests a specified number of times before marking them as truly failed.

# ── pytest-rerunfailures ──
# Install: pip install pytest-rerunfailures

# Run with 2 retries for all tests:
# pytest --reruns 2

# Run with 2 retries and 5-second delay between retries:
# pytest --reruns 2 --reruns-delay 5

# Mark specific tests for retry (when only some are flaky):
# @pytest.mark.flaky(reruns=3, reruns_delay=2)
# def test_payment_gateway():
#     ...

# ── Flake tracking dashboard ──
# Track retries over time to identify and fix chronic flakes

import json
from datetime import datetime
from pathlib import Path


class FlakeTracker:
    FLAKE_LOG = Path("reports/flake_log.json")

    @classmethod
    def record_retry(cls, test_name, attempt, passed, error_msg=""):
        cls.FLAKE_LOG.parent.mkdir(parents=True, exist_ok=True)

        entry = {
            "timestamp": datetime.now().isoformat(),
            "test": test_name,
            "attempt": attempt,
            "passed": passed,
            "error": error_msg[:200] if error_msg else "",
        }

        # Append to log
        existing = []
        if cls.FLAKE_LOG.exists():
            existing = json.loads(cls.FLAKE_LOG.read_text())
        existing.append(entry)
        cls.FLAKE_LOG.write_text(json.dumps(existing, indent=2))

    @classmethod
    def get_top_flakes(cls, top_n=10):
        if not cls.FLAKE_LOG.exists():
            return []
        entries = json.loads(cls.FLAKE_LOG.read_text())
        # Count retries per test
        retry_counts = {}
        for e in entries:
            if e["attempt"] > 1:  # Only count retries, not first attempts
                retry_counts[e["test"]] = retry_counts.get(e["test"], 0) + 1
        # Sort by frequency
        sorted_flakes = sorted(retry_counts.items(), key=lambda x: -x[1])
        return sorted_flakes[:top_n]


# ── Flake management strategy ──
FLAKE_STRATEGY = [
    {
        "rule": "Retry threshold: maximum 2 retries per test",
        "why": "More than 2 retries masks real defects and slows the suite",
    },
    {
        "rule": "Track every retry in a flake log",
        "why": "Visibility into which tests flake and how often",
    },
    {
        "rule": "Review top flakers weekly",
        "why": "Tests that retry > 3 times/week need root cause investigation",
    },
    {
        "rule": "Set a team flake budget: < 2% of total test runs",
        "why": "If > 2% of runs involve retries, the suite has systemic issues",
    },
    {
        "rule": "Quarantine chronic flakes after investigation",
        "why": "Move persistently flaky tests to a separate run; fix or delete them",
    },
    {
        "rule": "Never add retries to a new test",
        "why": "New tests should pass reliably. If a new test is flaky, fix it immediately",
    },
]

print("Flake Management Strategy")
print("=" * 60)
for rule in FLAKE_STRATEGY:
    print(f"\n  Rule: {rule['rule']}")
    print(f"  Why:  {rule['why']}")

print("\n\npytest-rerunfailures Commands:")
print("  pytest --reruns 2                           # Retry all failures twice")
print("  pytest --reruns 2 --reruns-delay 5           # 5s delay between retries")
print("  @pytest.mark.flaky(reruns=3, reruns_delay=2) # Per-test retry config")
Note: The distinction between a flaky test and a real failure is: a flaky test passes on retry without any code changes. If a test fails, is retried, and passes — the failure was transient (network, timing, resource contention). If it fails on all retries, the failure is real and should be investigated. pytest-rerunfailures reports these differently: “rerun” for transient failures, “failed” for persistent ones. Use this distinction in your CI dashboard to separate noise from signal.
Tip: Implement a “flake budget” — a team-level metric that tracks the percentage of test runs that involve retries. Set a threshold (e.g., less than 2%). If the budget is exceeded, the team prioritises flake investigation over new test development. This creates accountability: flaky tests are not ignored — they consume a measurable budget that the team monitors.
Warning: Retries are a management strategy, not a fix. A test that retries 3 times and passes on the fourth attempt is not a healthy test — it is a test with a timing bug, a shared-state issue, or an environment dependency that should be investigated and fixed. Use retries to keep the CI pipeline green while you investigate. Never use retries as a permanent substitute for proper synchronisation and test isolation.

Common Mistakes

Mistake 1 — Adding retries to every test with a high retry count

❌ Wrong: pytest --reruns 5 globally — masks real defects and adds up to 5x execution time for failing tests.

✅ Correct: pytest --reruns 2 globally (conservative), with @pytest.mark.flaky(reruns=3) only on specific tests known to be affected by external transient conditions (payment gateway, third-party API).

Mistake 2 — Not tracking or reviewing retry data

❌ Wrong: Retries are configured, CI is green, nobody looks at how many retries are occurring.

✅ Correct: A weekly flake review where the team examines the top 5 most-retried tests, investigates root causes, and either fixes the underlying issue or quarantines the test.

🧠 Test Yourself

A test has been retrying 4-5 times per week for the past month. Each time it passes on the second attempt. What should the team do?