Code Quality for Test Code — Linting, Reviews, Refactoring and Technical Debt

Test code that is not reviewed, linted, or refactored decays just like production code — except faster, because test suites grow rapidly and rarely get the same engineering attention. An SDET treats test code as a first-class software product: it has coding standards, automated linting, mandatory code reviews, regular refactoring, and tracked technical debt. This discipline is what separates a framework that lasts years from one that is rewritten every six months.

Engineering Practices for Test Code Quality

The same practices that keep production code healthy apply to test code — with some test-specific additions.

# ── PRACTICE 1: Linting and formatting ──

LINTING_CONFIG = {
    "Python": {
        "linter": "ruff or flake8",
        "formatter": "black or ruff format",
        "config": "pyproject.toml with max-line-length=120",
        "ci": "Run on every PR; block merge on violations",
    },
    "TypeScript/JavaScript": {
        "linter": "ESLint with cypress/recommended plugin",
        "formatter": "Prettier",
        "config": ".eslintrc.json with cypress rules enabled",
        "ci": "Run on every PR; block merge on violations",
    },
}


# ── PRACTICE 2: Code review checklist for test code ──

REVIEW_CHECKLIST = [
    "Does the test name describe the expected behaviour? (test_valid_login_shows_dashboard)",
    "Does the test have a single assertion focus? (one behaviour per test)",
    "Are locators using stable selectors? (data-cy, data-testid — not fragile classes)",
    "Are there no hardcoded waits? (time.sleep / cy.wait(ms) without justification)",
    "Is test data generated uniquely? (no shared accounts, timestamps in emails)",
    "Are page objects used? (no raw find_element / cy.get in tests)",
    "Are assertions in the test, not in page objects?",
    "Is the test independent? (can run alone, in any order, in parallel)",
    "Are custom commands / page objects documented with docstrings?",
    "Is the test readable without consulting the page object code?",
]


# ── PRACTICE 3: Refactoring triggers ──

REFACTORING_TRIGGERS = [
    {
        "smell": "Duplicated setup code across 5+ test files",
        "refactoring": "Extract into a custom command, fixture, or beforeEach hook",
    },
    {
        "smell": "Page object with 200+ lines",
        "refactoring": "Split into smaller page objects by page section or component",
    },
    {
        "smell": "Test file with 500+ lines",
        "refactoring": "Split into separate files by feature area or user flow",
    },
    {
        "smell": "Same locator used in 3+ page objects",
        "refactoring": "Extract into a shared component class (NavBar, Footer)",
    },
    {
        "smell": "Flaky test fixed by adding sleep/retry instead of proper wait",
        "refactoring": "Replace with explicit wait, intercept, or DOM assertion",
    },
    {
        "smell": "Test that takes 60+ seconds to execute",
        "refactoring": "Replace UI setup with API calls; split into smaller tests",
    },
]


# ── PRACTICE 4: Technical debt tracking ──

TECH_DEBT_CATEGORIES = [
    {
        "category": "Flaky tests",
        "metric": "Flake rate (% of runs involving retries)",
        "target": "< 2%",
        "action": "Fix top 5 flakiest tests each sprint",
    },
    {
        "category": "Slow tests",
        "metric": "Tests exceeding 30 seconds",
        "target": "< 5% of tests",
        "action": "Optimise setup with API calls; split long tests",
    },
    {
        "category": "Skipped tests",
        "metric": "Count of @skip / @pending tests",
        "target": "< 3% of total tests",
        "action": "Fix or delete; skipped tests provide no value and rot silently",
    },
    {
        "category": "Suppressed linting rules",
        "metric": "Count of eslint-disable / noqa comments",
        "target": "Decrease each sprint",
        "action": "Fix the underlying issue rather than suppressing the warning",
    },
    {
        "category": "Outdated dependencies",
        "metric": "Cypress / Selenium / plugin versions behind latest",
        "target": "Within 2 minor versions of latest",
        "action": "Quarterly dependency update with test verification",
    },
]

print("Code Review Checklist for Test Code:")
for i, item in enumerate(REVIEW_CHECKLIST, 1):
    print(f"  {i:>2}. {item}")

print("\n\nRefactoring Triggers:")
for trigger in REFACTORING_TRIGGERS:
    print(f"\n  Smell: {trigger['smell']}")
    print(f"  Fix:   {trigger['refactoring']}")

print("\n\nTechnical Debt Categories:")
for cat in TECH_DEBT_CATEGORIES:
    print(f"\n  {cat['category']} — Target: {cat['target']}")
    print(f"    Metric: {cat['metric']}")
    print(f"    Action: {cat['action']}")
Note: Code reviews for test code should be as rigorous as reviews for production code. A bad locator, a missing wait, or an assertion hidden in a page object can silently break dozens of tests. The review checklist above covers the ten most common test code quality issues — use it as a PR template for test code changes. Teams that skip test code reviews consistently end up with a framework that passes on the reviewer’s machine but fails in CI, contains duplicated logic, and is impossible for new team members to understand.
Tip: Track test technical debt as a sprint metric alongside product technical debt. Report flake rate, slow test count, skipped test count, and suppressed linting rules in retrospectives. Making these metrics visible creates accountability: the team sees the debt accumulating and can prioritise cleanup before it becomes a crisis. A dashboard showing “12 skipped tests, 3.5% flake rate, 8 tests over 60 seconds” is more actionable than a vague “we should clean up the tests.”
Warning: Skipped tests (@pytest.mark.skip, it.skip(), @Disabled) are the most dangerous form of test debt. They silently rot: the feature they tested may have changed, new bugs may have been introduced in their coverage area, and nobody notices because skipped tests produce no output. If a test is skipped for more than two sprints, it should be either fixed or deleted. A deleted test is honest — it tells the team there is no coverage. A skipped test is deceptive — it suggests coverage exists when it does not.

Common Mistakes

Mistake 1 — Not reviewing test code in pull requests

❌ Wrong: PR reviews focus only on production code; test code changes are approved without examination.

✅ Correct: Applying the 10-point review checklist to every test code change. Bad test code that passes review becomes permanent technical debt.

Mistake 2 — Treating skipped tests as “will fix later”

❌ Wrong: 30 skipped tests accumulate over 6 months — nobody remembers why they were skipped or whether they are still relevant.

✅ Correct: Every skipped test has a Jira ticket with a fix target date. If the date passes without a fix, the test is deleted and the coverage gap is documented.

🧠 Test Yourself

A test suite has 200 tests, 15 of which are skipped with @skip. What is the correct approach?