Skip to main content

Trace Graph

Trace Graph renders a live, interactive service-dependency map built from your trace data. Each node represents a service, and edges show the call relationships between them. Use it to understand your architecture, spot bottlenecks, and identify failing dependencies at a glance.

Trace Graph

Getting Started

Navigate to Traces → Trace Graph in the sidebar. The graph populates automatically from trace spans observed in the selected time range.

Reading the Graph

Nodes

Each node represents a service (identified by the service.name span attribute). Nodes are color-coded by health:

ColorMeaning
GreenHealthy — low error rate
Yellow / OrangeWarning — elevated error rate
RedCritical — high error rate

Edges

Directed arrows between nodes indicate that one service calls another. The arrow direction follows the request flow (caller → callee).

The left-hand sidebar provides several filter controls:

  • Environment — Filter by deployment environment (e.g. production, staging).
  • Cluster — Filter by Kubernetes cluster.
  • Health status — Show only Healthy, Warning, or Critical services.
  • Service search — Type to find a specific service by name.
  • Service checkboxes — Toggle individual services on or off to simplify the view.
  • Hide services with no connections — Remove isolated nodes from the graph.
  • Compare with previous period — Enable comparison stats to see how metrics have changed.

Time Range

Use the time picker in the sidebar to change the observation window (maximum 24 hours). A shorter window shows recent call patterns; a longer window provides a broader picture.

Interacting with the Graph

  • Pan and zoom — Use the mouse wheel to zoom and drag to pan. The toolbar buttons (Zoom In, Zoom Out, Fit View) are also available.
  • Click a node — Opens a detail panel showing the service's error rate, latency percentiles, and throughput.
  • Hover an edge — Displays the request rate and error rate for that specific call path.

Use Cases

  • Architecture discovery — Visualize how microservices interact without reading code or config files.
  • Incident triage — During an outage, identify which upstream or downstream dependency is the root cause.
  • Performance analysis — Find hot paths where multiple services converge and latency compounds.
  • Dependency auditing — Verify that services only communicate with expected dependencies.

Best Practices

  • Use a short time range (15 minutes) for real-time incident investigation.
  • Use a longer range (1–24 hours) for architecture review and dependency mapping.
  • Filter by cluster when you have multiple environments to avoid mixing production and staging traffic.

Support

If you need assistance or have any questions, please reach out to us through: