Why Tests Are Flaky — The Timing Gap Between Selenium and the Browser

Your test works perfectly on your laptop. You push it to CI, and it fails. You run it again — it passes. You run it five more times — it fails twice. Welcome to the world of flaky tests. The number one cause of Selenium test flakiness is not bad locators or wrong assertions — it is timing. Selenium sends commands to the browser at machine speed, but the browser needs time to load pages, render elements, execute JavaScript, and complete AJAX requests. When Selenium tries to click a button before the browser has finished rendering it, the test fails with a NoSuchElementException — even though the button will appear 200 milliseconds later.

The Race Condition — Selenium vs the Browser

Understanding why flakiness happens is the first step to eliminating it. The root cause is always the same: Selenium acts before the browser is ready.

# The timing problem visualised

RACE_CONDITION = """
Timeline of a page load:

  0ms    Selenium sends: driver.get("https://app.com/dashboard")
  50ms   Browser receives HTTP response (HTML)
  100ms  Browser parses HTML, starts building DOM
  200ms  CSS files loaded, layout computed
  350ms  JavaScript bundles loaded
  500ms  React/Angular renders components into DOM
  600ms  AJAX calls fired for dynamic data
  800ms  AJAX responses received
  900ms  Components re-render with data
  1000ms Page is FULLY ready for interaction

  But Selenium sends the NEXT command at ~50ms:
  50ms   Selenium: driver.find_element(By.ID, "dashboard-chart")
         → NoSuchElementException! The element does not exist yet.
         → The page is still loading at 50ms.

  The element WILL exist at 900ms, but Selenium did not wait.
"""

# Three types of timing problems
TIMING_PROBLEMS = [
    {
        "problem": "Element not yet in DOM",
        "cause": "JavaScript framework has not rendered the component yet",
        "error": "NoSuchElementException",
        "example": "React component renders after an API call completes",
        "fix": "Wait for element presence or visibility",
    },
    {
        "problem": "Element in DOM but not visible / clickable",
        "cause": "CSS animation in progress, overlay covering element, loading spinner",
        "error": "ElementNotInteractableException or ElementClickInterceptedException",
        "example": "Modal fade-in animation takes 300ms; click sent at 100ms",
        "fix": "Wait for element to be clickable, not just present",
    },
    {
        "problem": "Element is stale (reference invalidated)",
        "cause": "Page re-rendered after element was found; DOM reference is outdated",
        "error": "StaleElementReferenceException",
        "example": "Found element, then AJAX updated the list, replacing the DOM node",
        "fix": "Re-find the element after the page update, or wait for stability",
    },
]

print(RACE_CONDITION)

print("\nThree Types of Timing Problems:")
print("=" * 65)
for tp in TIMING_PROBLEMS:
    print(f"\n  Problem: {tp['problem']}")
    print(f"  Cause:   {tp['cause']}")
    print(f"  Error:   {tp['error']}")
    print(f"  Example: {tp['example']}")
    print(f"  Fix:     {tp['fix']}")
Note: The three Selenium exceptions — NoSuchElementException, ElementNotInteractableException, and StaleElementReferenceException — are all symptoms of the same root cause: a timing mismatch between Selenium and the browser. The element either does not exist yet (not rendered), exists but is not ready (animating, covered, disabled), or existed but was replaced by a re-render. All three are solved by the same approach: waiting for the correct condition before acting.
Tip: When a test fails intermittently, add a time.sleep(30) temporarily. If the test passes consistently with the sleep, the failure is timing-related — replace the sleep with the appropriate explicit wait. If it still fails with the sleep, the problem is not timing — it is a locator, data, or environment issue. This diagnostic trick quickly distinguishes timing bugs from real defects.
Warning: Flaky tests are not “minor inconveniences.” A test suite with a 5% flake rate across 200 tests means 10 random failures per run. Teams quickly learn to ignore failures, re-run the suite, and hope for green. This erosion of trust is catastrophic — when a real defect causes a failure, it is dismissed as “probably just flaky” and escapes to production. Eliminating flakiness is not optional; it is a prerequisite for a trustworthy test suite.

Common Mistakes

Mistake 1 — Adding time.sleep() to fix every timing issue

❌ Wrong: time.sleep(3) after every navigation and click, turning a 30-second test into a 3-minute test.

✅ Correct: Using explicit waits that return as soon as the condition is met — typically in milliseconds instead of seconds — while still tolerating slow environments up to a configurable timeout.

Mistake 2 — Blaming “flaky infrastructure” instead of fixing synchronisation

❌ Wrong: “The test is flaky because CI is slow. It works on my machine.”

✅ Correct: “The test has a timing bug — it does not wait for the element to be clickable before clicking. It passes on my fast laptop but fails on the slower CI server. Adding an explicit wait fixes both environments.”

🧠 Test Yourself

A Selenium test throws ElementClickInterceptedException intermittently. What is the most likely cause?