Skip to main content

Overview

The Alert Analytics page provides deep insights into your incident patterns. Identify when incidents occur, analyze response time distributions, and discover which alert sources generate the most noise.

Incident Heatmap

Visualize when incidents occur by day and hour

Response Distributions

Understand MTTA and MTTR patterns

Top Sources

Identify noisiest alert sources

Recurring Alerts

Find candidates for tuning or suppression

Key Metrics

The page displays four primary metrics:
MetricDescription
Total IncidentsTotal incidents in the period with change percentage
MTTAMean Time to Acknowledge with trend indicator
MTTRMean Time to Resolve with trend indicator
Escalation RatePercentage of incidents requiring escalation

Incident Heatmap

The heatmap visualizes incident frequency by day of week and hour:

Reading the Heatmap

ColorIncident Level
GrayNo incidents
Light GreenLow (< 25% of max)
AmberModerate (25-50% of max)
OrangeHigh (50-75% of max)
RedVery High (> 75% of max)

What to Look For

Identify when most incidents occur. Common patterns:
  • Business hours spikes (user-driven issues)
  • Off-hours spikes (batch jobs, maintenance windows)
  • Early morning (overnight job failures)
  • Monday spikes may indicate weekend accumulation
  • Friday afternoon spikes may suggest deployment issues
  • Weekend patterns show after-hours coverage needs
Use heatmap data to:
  • Optimize on-call schedules for high-incident periods
  • Schedule maintenance during quiet hours
  • Plan deployments to avoid peak incident times
Hover over any cell to see the exact incident count for that day and hour.

MTTA Distribution

The MTTA (Mean Time to Acknowledge) distribution chart shows how long it takes to acknowledge incidents:
BucketPerformance
0-2 minExcellent
2-5 minGood
5-15 minAcceptable
15-30 minNeeds improvement
30+ minCritical - review alerting

Key Statistics

  • Average — Mean acknowledgment time
  • P95 — 95th percentile (95% of incidents acknowledged faster than this)
A high P95 with low average suggests occasional slow responses. Check for patterns in those slow acknowledgments.

MTTR Distribution

The MTTR (Mean Time to Resolve) distribution shows resolution time patterns:
BucketTypical Severity
0-30 minQuick fixes, auto-resolved
30-60 minStandard incidents
1-4 hoursComplex issues
4-24 hoursMajor incidents
24+ hoursExtended outages

Analyzing MTTR

Two peaks (e.g., 15 min and 4 hours) often indicate:
  • Quick-fix incidents vs. complex investigations
  • Different severity levels requiring different effort
  • Opportunity to improve runbooks for the longer category
A long tail of high-MTTR incidents may indicate:
  • Missing documentation for complex issues
  • Need for escalation path improvements
  • Knowledge gaps in the on-call team
Most incidents resolving in similar time suggests:
  • Well-defined playbooks
  • Consistent incident complexity
  • Effective knowledge sharing

Top Alert Sources

This section ranks alert sources by incident volume:
ColumnDescription
RankPosition by incident count
SourceAlert source/integration name
CountNumber of incidents
PercentShare of total incidents

Using This Data

1

Identify Noisy Sources

Sources generating > 20% of alerts may need tuning
2

Review Alert Quality

High-volume, low-severity sources are candidates for:
  • Alert threshold adjustment
  • Suppression rules
  • Consolidation
3

Balance Coverage

Ensure critical services have adequate alerting relative to their importance

Top Recurring Alerts

This table shows the most frequently triggered alerts:
ColumnDescription
Alert TitleThe alert name and service
CountNumber of occurrences
TrendIncreasing, stable, or decreasing
VolumePercentage of total alerts
MTTA/MTTRAverage response times for this alert
SeveritiesBreakdown by severity level

Group by Service Toggle

Enable “Group by Service” to aggregate alerts by service, useful for:
  • Identifying services generating the most alerts
  • Service-level noise analysis
  • Capacity planning

Taking Action

From the recurring alerts view, you can:
  1. View Incidents — Filter incident list to this alert
  2. Create Suppress Rule — Set up an alert rule to reduce noise
Before suppressing alerts, ensure they’re truly noise and not indicators of underlying issues.

Best Practices

Schedule weekly reviews of top recurring alerts to identify tuning opportunities.
Establish targets based on severity:
  • Critical: MTTA < 5 min, MTTR < 1 hour
  • High: MTTA < 10 min, MTTR < 4 hours
  • Medium: MTTA < 30 min, MTTR < 24 hours
Align on-call schedules with peak incident hours identified in the heatmap.
Maintain documentation for each major alert source including expected volume and response procedures.

Troubleshooting

  • Verify incidents exist in the selected date range
  • Check that incidents have proper timestamps
  • Ensure incidents are assigned to your tenant
  • Incidents may be auto-acknowledged
  • Check if acknowledgment is being recorded properly
  • Review integration configurations
  • Verify incidents have source information
  • Check integration is properly configured to send source data