A test that logs in with “standard_user” and asserts 6 products verifies one scenario. But what about locked_out_user, problem_user, and performance_glitch_user? Writing a separate test function for each user duplicates the same logic with different data. Data-driven testing solves this by separating test logic from test data — the same test function runs multiple times with different inputs from an external source (JSON, CSV, YAML, or a database). This is how professional frameworks achieve high coverage with minimal code.
Data-Driven Testing with pytest Parameterisation
pytest’s @pytest.mark.parametrize decorator is the primary mechanism for data-driven testing in Python Selenium frameworks. It runs the same test function once per data set, generating individual test results for each.
import pytest
import json
import csv
from pathlib import Path
# ── Method 1: Inline parameterisation (small data sets) ──
@pytest.mark.parametrize("username, password, expected_url", [
("standard_user", "secret_sauce", "inventory"),
("locked_out_user", "secret_sauce", None), # expects error
("problem_user", "secret_sauce", "inventory"),
])
def test_login_variants(browser, username, password, expected_url):
login_page = LoginPage(browser).open()
if expected_url:
inventory = login_page.login(username, password)
assert expected_url in browser.current_url
else:
login_page.login_expecting_error(username, password)
assert "error" in login_page.get_error_message().lower()
# ── Method 2: JSON file (structured data) ──
# test_data/login_scenarios.json:
# [
# {"username": "standard_user", "password": "secret_sauce",
# "should_succeed": true, "expected_products": 6},
# {"username": "locked_out_user", "password": "secret_sauce",
# "should_succeed": false, "error_contains": "locked out"},
# {"username": "standard_user", "password": "wrong",
# "should_succeed": false, "error_contains": "do not match"}
# ]
def load_json_data(filename):
path = Path(__file__).parent.parent / "test_data" / filename
with open(path) as f:
return json.load(f)
def load_login_scenarios():
data = load_json_data("login_scenarios.json")
# Return as list of tuples for parametrize
return [
pytest.param(
d["username"], d["password"], d["should_succeed"],
d.get("expected_products"), d.get("error_contains"),
id=f"{d['username']}-{'pass' if d['should_succeed'] else 'fail'}"
)
for d in data
]
@pytest.mark.parametrize(
"username, password, should_succeed, expected_products, error_contains",
load_login_scenarios()
)
def test_login_from_json(browser, username, password,
should_succeed, expected_products, error_contains):
login_page = LoginPage(browser).open()
if should_succeed:
inventory = login_page.login(username, password)
assert inventory.get_product_count() == expected_products
else:
login_page.login_expecting_error(username, password)
assert error_contains in login_page.get_error_message().lower()
# ── Method 3: CSV file (tabular data) ──
def load_csv_data(filename):
path = Path(__file__).parent.parent / "test_data" / filename
with open(path) as f:
reader = csv.DictReader(f)
return list(reader)
# test_data/products.csv:
# product_name,expected_price
# Sauce Labs Backpack,$29.99
# Sauce Labs Bike Light,$9.99
# Sauce Labs Bolt T-Shirt,$15.99
# ── Data-driven strategy summary ──
DATA_SOURCES = [
{
"source": "Inline (@pytest.mark.parametrize)",
"best_for": "2-5 data sets; data is simple and stable",
"pros": "Visible in test file; no external files to manage",
"cons": "Clutters test file if data sets are large",
},
{
"source": "JSON files",
"best_for": "Structured, nested data; complex test scenarios",
"pros": "Supports nested objects; easy to read and edit",
"cons": "No comments allowed in JSON; verbose for simple cases",
},
{
"source": "CSV files",
"best_for": "Tabular data; many rows of similar structure",
"pros": "Editable in Excel/Sheets; easy for non-technical team members",
"cons": "Flat structure only; no nested data; type coercion needed",
},
{
"source": "YAML files",
"best_for": "Complex config + data; human-readable",
"pros": "Supports comments, nested structures, multiple types",
"cons": "Requires PyYAML dependency; indentation-sensitive",
},
{
"source": "Database / API",
"best_for": "Dynamic data; large-scale data-driven suites",
"pros": "Data managed centrally; supports complex queries",
"cons": "Infrastructure dependency; slower than file-based",
},
]
print("Data-Driven Testing Sources")
print("=" * 60)
for ds in DATA_SOURCES:
print(f"\n {ds['source']}")
print(f" Best for: {ds['best_for']}")
print(f" Pros: {ds['pros']}")
print(f" Cons: {ds['cons']}")
id parameter in pytest.param(..., id="standard_user-pass") controls how the test appears in reports. Without IDs, parameterised tests show as test_login[0], test_login[1] — meaningless when debugging a failure. With descriptive IDs, they show as test_login[standard_user-pass], test_login[locked_out-fail] — immediately telling you which data set failed. Always provide meaningful IDs for parameterised tests.test_data/ folder. CSVs are editable in Excel or Google Sheets, making data contribution accessible to the entire team. The automation engineer writes the test logic; the domain expert populates the data. This division of labour is one of the highest-value outcomes of data-driven testing.Common Mistakes
Mistake 1 — Duplicating test functions instead of parameterising
❌ Wrong: test_login_standard, test_login_locked, test_login_problem — three functions with identical logic and different data.
✅ Correct: One test_login function parameterised with three data sets. Logic is written once; data varies per run.
Mistake 2 — Hardcoding test data in page object methods
❌ Wrong: LoginPage.login() has default values username="admin" — page objects should not contain test data.
✅ Correct: Test data is passed as parameters from the test layer (Layer 4) or loaded from files. Page objects are data-agnostic.