Advanced Monitoring of Leapwork Performance with Datadog and Grafana Integration

This guide explains how to correlate Leapwork Performance test metrics with infrastructure telemetry from tools such as Datadog and Grafana. Use it to identify the root cause of slow runs, error spikes, and throughput limits faster than test results alone allow.

Why advanced monitoring matters

Leapwork Performance shows what happened during a performance test. Advanced monitoring tools such as Datadog and Grafana help explain why it happened.

Leapwork Performance publishes performance test metrics such as:

  • Latency

  • Throughput

  • Errors

  • Requests per second

  • Status-code breakdown

  • Step-wise performance

  • Run-level summaries

Monitoring tools add system and application telemetry such as:

  • CPU, memory, disk, and network usage

  • Container and pod health

  • Database performance

  • Cache behavior

  • Application logs

  • Traces and dependency timings

  • Infrastructure events and restarts

When you combine these two views, you can correlate user-facing performance with backend and infrastructure behavior.

Correlation overview

Leapwork Performance publishes performance test metrics, while Datadog or Grafana provides system and application telemetry. Correlating both views helps you identify the real cause of latency, errors, or throughput bottlenecks.

Correlation dashboard

Align Leapwork Performance run metrics with infrastructure metrics on the same timeline. This helps you quickly identify whether performance degradation matches backend resource saturation.

Step-level troubleshooting view

Step-level visibility helps you identify which specific request or transaction caused the overall run to slow down or fail.

Geo comparison view

Regional comparison helps you detect location-specific issues such as CDN delays, routing problems, or regional infrastructure instability.

What advanced monitoring solves

Detect the real reason behind slow runs

A run may show higher latency, but the test result alone does not explain whether the cause was:

  • CPU saturation

  • Database contention

  • Memory pressure

  • Autoscaling delay

  • Network instability

  • Third-party dependency issues

With correlated monitoring data, you can identify the actual bottleneck.

Explain error spikes

Leapwork Performance can show that errors increased during a run. Monitoring helps you determine whether those errors were caused by:

  • Authentication failures

  • Application exceptions

  • Backend service failures

  • API gateway or load balancer errors

  • Dependency outages

Understand throughput limits

If virtual users increase but throughput stops growing, monitoring helps explain whether the system hit:

  • Thread pool exhaustion

  • Connection pool exhaustion

  • Rate limiting

  • CPU bottlenecks

  • Storage or database limits

Find regional or environment-specific issues

Monitoring makes it possible to compare:

  • One geo against another

  • One environment against another

  • One deployment against another

This helps you detect issues that are invisible in aggregated run-level data.

Speed up root cause analysis

Instead of manually checking multiple tools after a failed test, use shared tags and aligned timelines to investigate the failure in one place.

How correlation works

1. Leapwork Performance publishes performance metrics

Leapwork Performance publishes metrics such as:

  • lp.run_results.latency.p95_ms

  • lp.run_results.errors

  • lp.run_results.requests_sent

  • lp.run_results.timeseries.requests_per_second

2. Leapwork Performance publishes context tags

Examples:

  • run_id

  • run_name

  • timeline_name

  • project_name

  • step_id

  • step_title

  • geo_location

3. Monitoring systems contain server and app telemetry

Examples:

  • CPU and memory metrics

  • Pod and container metrics

  • Logs

  • Traces

  • Database response times

  • Cache and queue metrics

4. Align both datasets

Correlate the data using:

  • The same time window

  • The same environment

  • The same service

  • The same region

  • Shared run or step context where applicable

5. Investigate patterns across both views

For example:

  • Leapwork Performance P95 spikes at the same time as database latency

  • Leapwork Performance error count increases at the same time as 500-error logs

  • Throughput flattens at the same time CPU reaches saturation

Example use cases

Slow checkout flow during a run

Leapwork Performance shows:

  • P95 and P99 increased for checkout-related steps

Monitoring shows:

  • Database CPU increased sharply

  • Payment dependency latency also increased

Outcome: The team identifies that the slowdown was not caused by the test script, but by backend dependency pressure.

Error spike in one API step

Leapwork Performance shows:

  • Step POST /api/order has a sudden increase in errors

Monitoring shows:

  • Application logs contain 500 exceptions for the same time window

  • A downstream inventory service was timing out

Outcome: The issue is traced to a backend dependency instead of the frontend workflow.

Throughput stopped growing under load

Leapwork Performance shows:

  • Virtual users increased

  • Requests per second flattened

  • Latency increased

Monitoring shows:

  • Application CPU was stable

  • Database connection pool was exhausted

Outcome: The throughput limit is identified as a database bottleneck rather than app CPU.

One geography is consistently slower

Leapwork Performance shows:

  • One geo_location has higher latency and more errors

Monitoring shows:

  • A specific regional cluster has pod restarts

  • Network latency is elevated only in that region

Outcome: Teams isolate the issue to a regional infrastructure problem.

Regression after a release

Leapwork Performance shows:

  • The latest run has worse P95 and P99 than previous runs

Monitoring shows:

  • Cache hit ratio dropped after deployment

  • Backend response time increased

Outcome: The team identifies a release-related regression and rolls back or fixes the affected service.

Benefits

Faster root cause analysis

You can move from "the run is slow" to "the database was saturated during the run" much faster.

Better collaboration across teams

QA, performance engineers, developers, and infrastructure teams can work from the same evidence instead of separate tools and assumptions.

Higher release confidence

You can validate not only that a run passed or failed, but also whether the system remained healthy under load.

Clearer SLA and SLO tracking

Run metrics can be tied to operational telemetry to understand whether user-facing performance targets were truly met.

Reduced troubleshooting effort

Instead of manually checking many dashboards and logs after every issue, shared metrics and tags make investigation more direct.

Better capacity planning

By correlating throughput, latency, and resource usage, you can identify where scaling is needed before production issues occur.

Summary

Leapwork Performance tells you what users experienced during a performance run. Advanced monitoring systems tell you what the system was doing at the same time.

Correlation between these two views helps you:

  • Identify bottlenecks faster

  • Explain failures more accurately

  • Compare runs with system health

  • Make better release and scaling decisions

Investigate common problems using monitoring data

This section describes common problems and how to investigate them using the metrics Leapwork Performance publishes to Datadog or Grafana.

1. "My run finished, but the application felt slow. Which steps were the problem?"

Metrics to use:

  • lp.run_results.latency.p95_ms

  • lp.run_results.latency.p99_ms

  • lp.run_results.latency.avg_ms

Tags to use:

  • run_id

  • step_id

  • step_title

How to investigate:

  1. Filter the dashboard by the target run_id.

  2. Group the data by step_title.

  3. Sort by P95 or P99.

What this helps you identify:

  • Which specific step had the worst latency

  • Whether the slowness came from one request or many steps

Recommended visualization:

  • Datadog: query table or top list

  • Grafana: table panel grouped by step_id and step_title

2. "One run is much slower than another. What changed?"

Metrics to use:

  • lp.run_results.run.latency.avg_ms

  • lp.run_results.run.latency.p95_ms

  • lp.run_results.run.latency.p99_ms

  • lp.run_results.run.peak_throughput

Tags to use:

  • run_name

  • timeline_name

  • project_name

  • run_id

How to investigate:

  1. Compare multiple runs by run_name or run_id.

  2. Plot latency trends across runs.

  3. Compare peak throughput and error rate.

What this helps you identify:

  • Whether latency increased with load

  • Whether a newer run introduced a regression

Recommended visualization:

  • Datadog: bar chart or timeseries grouped by run_name

  • Grafana: Graphite timeseries queries filtered by run_name

3. "Which steps are failing the most?"

Metrics to use:

  • lp.run_results.errors

  • lp.run_results.status_code.count

Tags to use:

  • step_id

  • step_title

  • status_code

  • run_id

How to investigate:

  1. Filter by run_id.

  2. Group by step_title.

  3. Break down by status_code.

What this helps you identify:

  • Which step is most unstable

  • Whether the issue is due to client errors, auth errors, or server failures

Recommended visualization:

  • Datadog: grouped query table

  • Grafana: step-wise error table plus status-code table

4. "Are we getting authentication failures or backend failures?"

Metrics to use:

  • lp.run_results.status_code.count

Tags to use:

  • status_code

  • step_title

  • timeline_name

How to investigate:

  1. Filter for specific status codes:

    • 401 / 403 for authentication or authorization problems

    • 500 / 502 / 503 for backend or infrastructure failures

  2. Group by step_title.

What this helps you identify:

  • Whether the issue is authentication-related

  • Whether the failures are coming from the backend

Recommended visualization:

  • Datadog: status-code breakdown table

  • Grafana: Graphite queries filtered by status_code

5. "Did the system degrade as load increased during the run?"

Metrics to use:

  • lp.run_results.timeseries.requests_per_second

  • lp.run_results.timeseries.latency.avg_ms

  • lp.run_results.run.peak_load

Tags to use:

  • track_item_id

  • timeline_name

  • run_id

How to investigate:

  1. Plot requests per second over time.

  2. Plot average latency over time.

  3. Compare both as load rises.

What this helps you identify:

  • Whether the application degrades gradually or sharply under load

  • Whether latency spikes correlate with throughput increases

Recommended visualization:

  • Datadog: separate timeseries widgets

  • Grafana: Graphite time series panels

6. "Did one geography perform worse than another?"

Metrics to use:

  • lp.run_results.latency.avg_ms

  • lp.run_results.latency.p95_ms

  • lp.run_results.errors

Tags to use:

  • geo_location

  • track_item_id

  • step_title

How to investigate:

  1. Group by geo_location.

  2. Compare latency and errors between regions.

What this helps you identify:

  • Whether one geography had higher latency

  • Whether a region had more errors than the others

Recommended visualization:

  • Datadog: split graph or grouped table by geo_location

  • Grafana: grouped queries by Graphite tags

7. "Why did the run stop meeting SLA or SLO expectations?"

Metrics to use:

  • lp.run_results.run.latency.p95_ms

  • lp.run_results.run.error_rate_pct

  • lp.run_results.run.peak_throughput

Tags to use:

  • run_id

  • run_name

  • timeline_name

How to investigate:

  1. Define target thresholds such as:

    • P95 under a specific value

    • Error rate below a target percentage

  2. Highlight or alert on runs that cross those thresholds.

What this helps you identify:

  • Which runs violated the target

  • Whether the issue came from latency, errors, or both

Recommended visualization:

  • Datadog: monitors and stat widgets

  • Grafana: stat panels with thresholds

8. "Which asset or request transferred the most data?"

Metrics to use:

  • lp.run_results.bytes_received

  • lp.run_results.bytes_sent

Tags to use:

  • step_id

  • step_title

  • run_id

How to investigate:

  1. Filter by the target run_id.

  2. Group by step_title.

  3. Sort descending by bytes received or sent.

What this helps you identify:

  • Which requests have the heaviest payloads

  • Whether payload size may be contributing to latency

Recommended visualization:

  • Datadog: query table

  • Grafana: table by step

9. "Was the run actually executing the expected amount of work?"

Metrics to use:

  • lp.run_results.track_item.total_sequences_run

  • lp.run_results.requests_sent

  • lp.run_results.track_item.vum_used

  • lp.run_results.track_item.current_virtual_users

Tags to use:

  • run_id

  • timeline_name

  • track_item_id

How to investigate:

  1. Compare actual completed sequences with the expected workload.

  2. Check whether user and load-related values behaved as planned.

What this helps you identify:

  • Whether the run under-executed

  • Whether the generated load profile matched expectations

Recommended visualization:

  • Datadog: stat widgets

  • Grafana: summary panels

10. "Which exact run should I investigate in monitoring?"

Tags to use:

  • run_id

  • run_name

  • timeline_name

  • project_name

  • company_name

How to investigate:

  1. Copy the run ID from Leapwork Performance.

  2. Filter Datadog or Grafana using the run_id.

  3. Use run_name for more human-readable dashboards.

What this helps you identify:

  • The exact run without mixing data from other executions

  • A clean entry point for debugging one specific result set

Datadog

Datadog is especially useful for:

  • Query tables

  • Top lists

  • Tag-driven filtering

  • Alerting and monitors

Grafana

Grafana is especially useful for:

  • Graphite timeseries exploration

  • Flexible dashboard layouts

  • Operational monitoring views

  • Visual comparison dashboards

Executive dashboard

Recommended widgets:

  • Run P95 latency

  • Run P99 latency

  • Error rate

  • Peak throughput

  • Peak load

Troubleshooting dashboard

Recommended widgets:

  • Step-wise P95 latency

  • Step-wise errors

  • Status-code breakdown

  • Bytes received and sent

  • Requests per second over time

  • Average latency over time

Geo comparison dashboard

Recommended widgets:

  • Latency by geo_location

  • Errors by geo_location

  • Throughput by geo_location

Build a Datadog dashboard

Use this plan to create a ready-to-use Datadog dashboard.

Dashboard name: Leapwork Performance Dashboard

Filters:

Add dashboard filters for:

  • timeline_name

  • run_name

  • run_status

  • project_name

  • company_name

  • geo_location

Widgets:

1. Run P95 latency

avg:lp.run_results.run.latency.p95_ms{*}

2. Run P99 latency

avg:lp.run_results.run.latency.p99_ms{*}

3. Run average latency

avg:lp.run_results.run.latency.avg_ms{*}

4. Run peak throughput

avg:lp.run_results.run.peak_throughput{*}

5. Run error rate

avg:lp.run_results.run.error_rate_pct{*}

6. Step P95 latency by step title

avg:lp.run_results.latency.p95_ms{*} by {step_title}

7. Requests per second over time

avg:lp.run_results.timeseries.requests_per_second{*} by {track_item_id}

8. Average latency over time

avg:lp.run_results.timeseries.latency.avg_ms{*} by {track_item_id}

9. Status code counts

sum:lp.run_results.status_code.count{*} by {status_code}

10. Bytes received by step

avg:lp.run_results.bytes_received{*} by {step_title}

Build a Grafana dashboard

Use this plan to create a ready-to-use Grafana dashboard.

Dashboard name: Leapwork Performance Dashboard

Data source: Use the Graphite data source.

Panels:

1. Run P95 latency

seriesByTag('name=lp.run_results.run.latency.p95_ms')

2. Run P99 latency

seriesByTag('name=lp.run_results.run.latency.p99_ms')

3. Run average latency

seriesByTag('name=lp.run_results.run.latency.avg_ms')

4. Peak throughput

seriesByTag('name=lp.run_results.run.peak_throughput')

5. Error rate

seriesByTag('name=lp.run_results.run.error_rate_pct')

6. Requests per second over time

seriesByTag('name=lp.run_results.timeseries.requests_per_second')

7. Average latency over time

seriesByTag('name=lp.run_results.timeseries.latency.avg_ms')

8. Step P95 latency

seriesByTag('name=lp.run_results.latency.p95_ms')

9. Status code breakdown

seriesByTag('name=lp.run_results.status_code.count')

10. Per-step bytes received

seriesByTag('name=lp.run_results.bytes_received')

Filtered query examples:

Filter to one run only:

seriesByTag('name=lp.run_results.run.latency.p95_ms','run_id=u5t4tc5h')

Filter to one timeline only:

seriesByTag('name=lp.run_results.run.latency.p95_ms','timeline_name=new_timeline')

Filter to one run name only:

seriesByTag('name=lp.run_results.run.latency.p95_ms','run_name=new_timeline_24042026083945')