Services - EasyAlert

Overview

The Service Health page provides visibility into the reliability of your services. Track health scores, uptime percentages, and incident patterns for each service to identify areas needing attention.

Health Scores

Composite reliability scores for each service

Uptime Tracking

Monitor service availability percentages

Incident Impact

See which services have the most incidents

MTTR by Service

Compare resolution times across services

Summary Statistics

Four metrics provide an overview of service health:

Metric	Description
Avg Health Score	Average health score across all services
Healthy Services	Services with health score ≥ 80
Degraded Services	Services with health score 50-79
Critical Services	Services with health score < 50

Health Score Calculation

The health score (0-100) combines multiple factors:

Factor	Weight	Description
Uptime	40%	Percentage of time without incidents
Incident Count	30%	Lower is better
MTTR	20%	Faster resolution improves score
Severity Mix	10%	Critical incidents have more impact

Health Status Levels

Status	Score Range	Badge Color	Meaning
Healthy	80-100	🟢 Green	Service is performing well
Degraded	50-79	🟡 Amber	Service needs attention
Critical	0-49	🔴 Red	Service requires immediate action

Service Cards

Each service displays a detailed card with:

Card Information

Section	Details
Header	Service name and health status badge
Health Score	Visual progress bar with numeric score
Metrics	Uptime %, Incident count, Average MTTR

Understanding the Metrics

Health Score
Uptime
Incident Count
Avg MTTR

The overall reliability indicator:

90-100 — Excellent reliability
80-89 — Good, minor issues
70-79 — Degraded, needs attention
50-69 — Significant problems
< 50 — Critical, requires immediate action

Using Service Health Data

Prioritizing Improvements

Identify Critical Services

Start with any services showing “Critical” status

Review Degraded Services

Plan improvements for “Degraded” services

Investigate Root Causes

For unhealthy services, analyze:

Recurring incident patterns
Common failure modes
Resource constraints

Track Improvement

Monitor health scores over time to verify fixes

Comparing Services

Similar Services

Compare services with similar functions:

Why does API Service A have 95% uptime while API Service B has 99%?
What practices from healthy services can be adopted?

Critical Path Services

Pay extra attention to services that:

Support revenue-generating features
Are dependencies for many other services
Have external SLA commitments

New vs. Established

New services may naturally have lower scores:

Track improvement trajectory
Ensure adequate monitoring is in place
Document expected stabilization timeline

Improving Service Health

Quick Wins

Reduce Incident Volume

Tune noisy alerts that don’t require action
Fix recurring issues identified in postmortems
Implement preventive monitoring

Improve MTTR

Create and maintain runbooks
Improve logging and observability
Cross-train team members

Increase Uptime

Add redundancy for single points of failure
Implement graceful degradation
Improve deployment practices

Long-term Improvements

Architecture Review

For persistently unhealthy services:

Evaluate technical debt
Consider refactoring or rewriting
Review dependencies and failure domains

Capacity Planning

Health issues may indicate:

Insufficient resources
Scaling limitations
Need for performance optimization

Process Improvements

Implement better change management
Improve deployment practices
Enhance pre-production testing

Best Practices

Set Service-Level Targets

Establish health score targets based on service criticality:

Customer-facing critical: 95+
Internal critical: 90+
Non-critical: 80+

Regular Health Reviews

Include service health in:

Weekly team standups
Monthly reliability reviews
Quarterly planning

Track Trends

A service improving from 60 to 75 is progress, even if not yet “healthy.”

Investigate Sudden Drops

If a healthy service suddenly becomes degraded:

Check for recent deployments
Review infrastructure changes
Look for external factors (dependencies, traffic)

Balance Investment

Don’t over-invest in already-healthy services. Focus improvement effort on degraded and critical services.

Document Service Context

Maintain documentation explaining:

Expected health baselines
Known limitations
Improvement roadmaps

Troubleshooting

Service not appearing

Verify incidents exist with this service name
Check service tagging in integrations
Ensure consistent service naming across alerts

Health score seems incorrect

Review incident data for the service
Check if all incidents are properly attributed
Verify the calculation period matches expectations

Uptime shows 100% despite incidents

Check if incident durations are being recorded
Verify resolution times are being set
Review how uptime is calculated for your setup

Too many services listed

Review alert naming conventions
Consolidate similar service names
Consider service grouping strategies

Service Naming Best Practices

Consistent service naming improves analytics accuracy:

Pattern	Example	Benefit
Environment prefix	`prod-api`, `staging-api`	Separate production metrics
Team ownership	`payments-gateway`	Easy team attribution
Functional grouping	`auth-service`, `auth-cache`	Group related services

Establish service naming conventions and document them. Inconsistent naming creates fragmented analytics.

Alert Analytics

Detailed incident analysis

Postmortems

Document and learn from incidents

Integrations

Configure service metadata

​Overview

Health Scores

Uptime Tracking

Incident Impact

MTTR by Service

​Summary Statistics

​Health Score Calculation

​Health Status Levels

​Service Cards

​Card Information

​Understanding the Metrics

​Using Service Health Data

​Prioritizing Improvements

​Comparing Services

​Improving Service Health

​Quick Wins

​Long-term Improvements

​Best Practices

​Troubleshooting

​Service Naming Best Practices

​Related Pages

Alert Analytics

Postmortems

Integrations

Overview

Summary Statistics

Health Score Calculation

Health Status Levels

Service Cards

Card Information

Understanding the Metrics

Using Service Health Data

Prioritizing Improvements

Comparing Services

Improving Service Health

Quick Wins

Long-term Improvements

Best Practices

Troubleshooting

Service Naming Best Practices

Related Pages