Monitoring transforms production from a black box into an observable system. When a seller reports “my listing isn’t showing up,” Application Insights’ distributed tracing lets you find every request that touched that listing — the create command, the publish event, the search queries — and identify exactly where the failure occurred, with timestamps, error messages, and SQL queries. Custom metrics (listing_published, contact_request_sent) let you track business health in addition to technical health.
Monitoring and Alerting
// ── Application Insights custom metrics ───────────────────────────────────
// Install: dotnet add package Microsoft.ApplicationInsights.AspNetCore
// dotnet add package Microsoft.ApplicationInsights.WorkerService
// In Program.cs:
// builder.Services.AddApplicationInsightsTelemetry();
// In domain event handlers — track business events:
public class TrackListingPublishedHandler
: INotificationHandler<ListingPublishedEvent>
{
private readonly TelemetryClient _telemetry;
public Task Handle(ListingPublishedEvent evt, CancellationToken ct)
{
_telemetry.TrackEvent("ListingPublished", new Dictionary<string, string>
{
["listingId"] = evt.ListingId.ToString(),
["category"] = evt.Category.ToString(),
["city"] = evt.City,
["price"] = evt.Price.Amount.ToString("F2"),
});
_telemetry.GetMetric("listings.published.count").TrackValue(1);
return Task.CompletedTask;
}
}
// In the search query handler — track search patterns:
public class TrackSearchQueryHandler
: INotificationHandler<SearchExecutedEvent>
{
private readonly TelemetryClient _telemetry;
public Task Handle(SearchExecutedEvent evt, CancellationToken ct)
{
_telemetry.TrackEvent("ListingSearched", new Dictionary<string, string>
{
["keyword"] = evt.Keyword ?? "(none)",
["category"] = evt.Category?.ToString() ?? "(all)",
["city"] = evt.City ?? "(all)",
["resultCount"] = evt.ResultCount.ToString(),
["hasResults"] = (evt.ResultCount > 0).ToString(),
});
return Task.CompletedTask;
}
}
// ── Azure Monitor alert rules (via Azure Portal or Bicep) ─────────────────
// Alert 1: High API error rate
// Condition: requests/failed > 5% of total requests for 5 minutes
// Action: Send email + PagerDuty webhook
// Severity: 2 (Warning)
// Alert 2: Database DTU saturation
// Condition: dtu_consumption_percent > 80% for 10 minutes
// Action: Trigger scale-up to GP_Gen5_4 (via Azure Automation runbook)
// Severity: 3 (Informational)
// Alert 3: SignalR disconnection spike
// Condition: signalr/connection_count drops > 20% in 2 minutes
// Action: Investigation email
// Severity: 3
// ── Structured log queries in Azure Monitor Logs (KQL) ────────────────────
// Find all operations for a specific listing:
// traces
// | where customDimensions.listingId == "abc-123-guid"
// | order by timestamp desc
// | project timestamp, message, severityLevel, customDimensions
// Find all failed contact requests in the last hour:
// customEvents
// | where name == "ContactRequestSent"
// | where timestamp > ago(1h)
// | where customDimensions.failed == "true"
// | summarize count() by bin(timestamp, 5m)
// Find slow search queries (> 500ms):
// dependencies
// | where type == "SQL"
// | where duration > 500
// | where data contains "Listings"
// | order by duration desc
// | project timestamp, duration, data
TrackEvent("ListingPublished")) give you business-level monitoring alongside technical monitoring. The technical dashboard shows requests/second, error rates, response times. The business dashboard shows listings published per day, contact requests per hour, premium upgrades per week. When these business metrics drop unexpectedly (listings published drops 50%), it may indicate a bug that the technical metrics don’t capture (the API returns 200 but the domain event isn’t fired). Business metrics are the ground truth./api/health from multiple global regions every 5 minutes. If the health check fails from any region, an alert fires before any user reports an issue. Configure the test to fire an alert if the endpoint takes more than 3 seconds to respond — latency spikes often precede full outages. The global multi-region testing catches regional Azure issues that only affect some users.TrackException and TrackEvent which bypass sampling). Regular request telemetry can be sampled. Configure adaptive sampling in ApplicationInsightsServiceOptions: EnableAdaptiveSampling = true.
Common Mistakes
Mistake 1 — Only technical monitoring, no business metrics (invisible business failures)
❌ Wrong — monitoring only HTTP error rates; publishing a listing returns 200 but the event isn’t dispatched; sellers never notified; no alert.
✅ Correct — TrackEvent for all business operations; alert when listing_published drops significantly from baseline.
Mistake 2 — Alerting on every error (alert fatigue)
❌ Wrong — alert on any single 500 error; 3 alerts per hour for transient timeouts; team ignores alerts after first week.
✅ Correct — alert on error rate threshold (>5% for 5+ minutes); transient errors don’t page; sustained issues do.