Introduction to Production Quality and SLOs

📚 QA Engineering 📂 Chapter 33: Production Quality, SLOs and Observability 📄 Lesson 33010 Advanced 🕒 March 17, 2026

Production quality is what your users actually experience once software is live. Traditional testing focuses on pre-release activities, but modern teams also use production signals such as uptime, error rates, and latency to understand whether the system is “good enough.” Service-level objectives (SLOs) make this explicit.

From Testing Activities to Quality Outcomes

Running many tests does not guarantee a reliable service. SLOs define target levels for key user-centric metrics, such as request success rate or page load time, over a rolling window. They connect engineering work to user expectations and business risk.

# Example SLOs

- 99.9% of checkout requests succeed over 30 days.
- 99% of homepage loads complete within 1.5 seconds.
- 99.5% of API calls respond without server error.

Error budget = 1 - SLO target (e.g., 0.1% allowed failures for a 99.9% SLO).

Note: SLOs are about user experience; they focus on what users can observe, not just internal system metrics.

Tip: Start with a small set of critical journeys (such as sign-in, search, purchase) and define simple SLOs for them before expanding.

Warning: Setting unrealistic SLOs (for example, 100% success) can make every small issue feel like a failure and reduce their usefulness.

Error budgets express how much unreliability you are willing to tolerate within a period. They help teams balance feature delivery and reliability work by providing a shared, quantitative view of risk.

QA’s Role in Production Quality

QA engineers can help define meaningful SLOs, interpret production charts, and connect incidents back to gaps in test design or environments. This shifts the role from gatekeeping to partnership in ongoing quality improvement.

Common Mistakes

Mistake 1 — Treating SLOs as purely an SRE concern

Quality is cross-functional.

❌ Wrong: Assuming testers have no role once code reaches production.

✅ Correct: Use SLOs and incident data to refine tests and test environments.

Mistake 2 — Defining SLOs only in technical terms

Users feel outcomes, not implementation details.

❌ Wrong: Focusing solely on CPU usage or internal queue sizes.

✅ Correct: Anchor SLOs in user-visible behaviours like success rates and latency.

From Testing Activities to Quality Outcomes #

QA’s Role in Production Quality #

Common Mistakes #

Mistake 1 — Treating SLOs as purely an SRE concern #

Mistake 2 — Defining SLOs only in technical terms #

🧠 Reflect and Plan #

📚 More in this Tutorial Series

From Testing Activities to Quality Outcomes

QA’s Role in Production Quality

Common Mistakes

Mistake 1 — Treating SLOs as purely an SRE concern

Mistake 2 — Defining SLOs only in technical terms

🧠 Reflect and Plan