Data-Driven Testing — Parameterising Tests with External Data Sources

📚 QA Engineering 📂 Chapter 42: Selenium Framework Design — Building Production-Grade Automation 📄 Lesson 42020 Advanced 🕒 April 8, 2026

A test that logs in with “standard_user” and asserts 6 products verifies one scenario. But what about locked_out_user, problem_user, and performance_glitch_user? Writing a separate test function for each user duplicates the same logic with different data. Data-driven testing solves this by separating test logic from test data — the same test function runs multiple times with different inputs from an external source (JSON, CSV, YAML, or a database). This is how professional frameworks achieve high coverage with minimal code.

Data-Driven Testing with pytest Parameterisation

pytest’s @pytest.mark.parametrize decorator is the primary mechanism for data-driven testing in Python Selenium frameworks. It runs the same test function once per data set, generating individual test results for each.

import pytest
import json
import csv
from pathlib import Path

# ── Method 1: Inline parameterisation (small data sets) ──

@pytest.mark.parametrize("username, password, expected_url", [
    ("standard_user",   "secret_sauce", "inventory"),
    ("locked_out_user", "secret_sauce", None),        # expects error
    ("problem_user",    "secret_sauce", "inventory"),
])
def test_login_variants(browser, username, password, expected_url):
    login_page = LoginPage(browser).open()
    if expected_url:
        inventory = login_page.login(username, password)
        assert expected_url in browser.current_url
    else:
        login_page.login_expecting_error(username, password)
        assert "error" in login_page.get_error_message().lower()


# ── Method 2: JSON file (structured data) ──

# test_data/login_scenarios.json:
# [
#   {"username": "standard_user", "password": "secret_sauce",
#    "should_succeed": true, "expected_products": 6},
#   {"username": "locked_out_user", "password": "secret_sauce",
#    "should_succeed": false, "error_contains": "locked out"},
#   {"username": "standard_user", "password": "wrong",
#    "should_succeed": false, "error_contains": "do not match"}
# ]

def load_json_data(filename):
    path = Path(__file__).parent.parent / "test_data" / filename
    with open(path) as f:
        return json.load(f)

def load_login_scenarios():
    data = load_json_data("login_scenarios.json")
    # Return as list of tuples for parametrize
    return [
        pytest.param(
            d["username"], d["password"], d["should_succeed"],
            d.get("expected_products"), d.get("error_contains"),
            id=f"{d['username']}-{'pass' if d['should_succeed'] else 'fail'}"
        )
        for d in data
    ]

@pytest.mark.parametrize(
    "username, password, should_succeed, expected_products, error_contains",
    load_login_scenarios()
)
def test_login_from_json(browser, username, password,
                          should_succeed, expected_products, error_contains):
    login_page = LoginPage(browser).open()
    if should_succeed:
        inventory = login_page.login(username, password)
        assert inventory.get_product_count() == expected_products
    else:
        login_page.login_expecting_error(username, password)
        assert error_contains in login_page.get_error_message().lower()


# ── Method 3: CSV file (tabular data) ──

def load_csv_data(filename):
    path = Path(__file__).parent.parent / "test_data" / filename
    with open(path) as f:
        reader = csv.DictReader(f)
        return list(reader)

# test_data/products.csv:
# product_name,expected_price
# Sauce Labs Backpack,$29.99
# Sauce Labs Bike Light,$9.99
# Sauce Labs Bolt T-Shirt,$15.99


# ── Data-driven strategy summary ──
DATA_SOURCES = [
    {
        "source": "Inline (@pytest.mark.parametrize)",
        "best_for": "2-5 data sets; data is simple and stable",
        "pros": "Visible in test file; no external files to manage",
        "cons": "Clutters test file if data sets are large",
    },
    {
        "source": "JSON files",
        "best_for": "Structured, nested data; complex test scenarios",
        "pros": "Supports nested objects; easy to read and edit",
        "cons": "No comments allowed in JSON; verbose for simple cases",
    },
    {
        "source": "CSV files",
        "best_for": "Tabular data; many rows of similar structure",
        "pros": "Editable in Excel/Sheets; easy for non-technical team members",
        "cons": "Flat structure only; no nested data; type coercion needed",
    },
    {
        "source": "YAML files",
        "best_for": "Complex config + data; human-readable",
        "pros": "Supports comments, nested structures, multiple types",
        "cons": "Requires PyYAML dependency; indentation-sensitive",
    },
    {
        "source": "Database / API",
        "best_for": "Dynamic data; large-scale data-driven suites",
        "pros": "Data managed centrally; supports complex queries",
        "cons": "Infrastructure dependency; slower than file-based",
    },
]

print("Data-Driven Testing Sources")
print("=" * 60)
for ds in DATA_SOURCES:
    print(f"\n  {ds['source']}")
    print(f"    Best for: {ds['best_for']}")
    print(f"    Pros: {ds['pros']}")
    print(f"    Cons: {ds['cons']}")

Note: The id parameter in pytest.param(..., id="standard_user-pass") controls how the test appears in reports. Without IDs, parameterised tests show as test_login[0], test_login[1] — meaningless when debugging a failure. With descriptive IDs, they show as test_login[standard_user-pass], test_login[locked_out-fail] — immediately telling you which data set failed. Always provide meaningful IDs for parameterised tests.

Tip: For teams where non-technical members (manual testers, business analysts) need to contribute test data, use CSV files stored in a test_data/ folder. CSVs are editable in Excel or Google Sheets, making data contribution accessible to the entire team. The automation engineer writes the test logic; the domain expert populates the data. This division of labour is one of the highest-value outcomes of data-driven testing.

Warning: Data-driven tests multiply your execution count. A test parameterised with 50 data sets runs 50 times. If each run takes 30 seconds, that single test function contributes 25 minutes to the suite. Be selective about which tests to data-drive and how many data sets to include. Focus on boundary values, edge cases, and business-critical variations rather than exhaustively testing every possible input.

Common Mistakes

Mistake 1 — Duplicating test functions instead of parameterising

❌ Wrong: test_login_standard, test_login_locked, test_login_problem — three functions with identical logic and different data.

✅ Correct: One test_login function parameterised with three data sets. Logic is written once; data varies per run.

Mistake 2 — Hardcoding test data in page object methods

❌ Wrong: LoginPage.login() has default values username="admin" — page objects should not contain test data.

✅ Correct: Test data is passed as parameters from the test layer (Layer 4) or loaded from files. Page objects are data-agnostic.

🧠 Test Yourself

A test needs to verify login behaviour for 20 different user types. What is the most maintainable approach?

Write 20 separate test functions, one for each user type.
Write one test with 20 if/else branches checking the current user type.
Write one parameterised test function that loads 20 user scenarios from a JSON or CSV file. Each scenario includes username, password, expected outcome, and any verification data. The test logic is written once; pytest runs it 20 times with different data. When a 21st user type is added, you edit the data file — not the test code. Each run appears as a separate test result with a descriptive ID, making failures easy to diagnose. This separation of logic and data is the core principle of data-driven testing.
Write one test that loops through 20 users with a for-loop inside the test function.

Data-Driven Testing with pytest Parameterisation #

Common Mistakes #

Mistake 1 — Duplicating test functions instead of parameterising #

Mistake 2 — Hardcoding test data in page object methods #

🧠 Test Yourself #

📚 More in this Tutorial Series

Data-Driven Testing with pytest Parameterisation

Common Mistakes

Mistake 1 — Duplicating test functions instead of parameterising

Mistake 2 — Hardcoding test data in page object methods

🧠 Test Yourself