Once you see that performance is not meeting expectations, the next step is to identify exactly where time and resources are being spent. Metrics, logs and distributed traces work together to reveal bottlenecks.
Using Metrics, Logs and Traces to Find Bottlenecks
System metrics (CPU, memory, disk I/O, network, database connections) indicate which components are under pressure, while application logs and traces show which requests or operations are slow. Distributed tracing tools let you break down end-to-end latency into segments such as API gateway, services and database calls.
Example bottleneck investigation:
- Symptom: P95 latency > 1s on /checkout under load
- Metrics: DB CPU at 90%, connection pool saturation
- Traces: 70% of time spent in "CalculateDiscounts" query
- Logs: warnings about slow queries and missing index on discount_rules table
This structured approach turns vague βthe system is slowβ reports into specific, actionable findings.
Common Mistakes
Mistake 1 β Jumping straight to code changes without data
This risks targeting the wrong area.
β Wrong: Guessing where the bottleneck is.
β Correct: Use metrics and traces to confirm before optimising.
Mistake 2 β Ignoring external dependencies
This hides critical contributors.
β Wrong: Forgetting about payment gateways, third-party APIs or caches.
β Correct: Include external calls in your tracing and monitoring strategy.