Advanced Monitoring of Leapwork Performance with Datadog and Grafana Integration

This guide explains how to correlate Leapwork Performance test metrics with infrastructure telemetry from tools such as Datadog and Grafana. Use it to identify the root cause of slow runs, error spikes, and throughput limits faster than test results alone allow.

Why advanced monitoring matters

Leapwork Performance shows what happened during a performance test. Advanced monitoring tools such as Datadog and Grafana help explain why it happened.

Leapwork Performance publishes performance test metrics such as:

Latency
Throughput
Errors
Requests per second
Status-code breakdown
Step-wise performance
Run-level summaries

Monitoring tools add system and application telemetry such as:

CPU, memory, disk, and network usage
Container and pod health
Database performance
Cache behavior
Application logs
Traces and dependency timings
Infrastructure events and restarts

When you combine these two views, you can correlate user-facing performance with backend and infrastructure behavior.

Correlation overview

Leapwork Performance publishes performance test metrics, while Datadog or Grafana provides system and application telemetry. Correlating both views helps you identify the real cause of latency, errors, or throughput bottlenecks.

Correlation dashboard

Align Leapwork Performance run metrics with infrastructure metrics on the same timeline. This helps you quickly identify whether performance degradation matches backend resource saturation.

Step-level troubleshooting view

Step-level visibility helps you identify which specific request or transaction caused the overall run to slow down or fail.

Geo comparison view

Regional comparison helps you detect location-specific issues such as CDN delays, routing problems, or regional infrastructure instability.

What advanced monitoring solves

Detect the real reason behind slow runs

A run may show higher latency, but the test result alone does not explain whether the cause was:

CPU saturation
Database contention
Memory pressure
Autoscaling delay
Network instability
Third-party dependency issues

With correlated monitoring data, you can identify the actual bottleneck.

Explain error spikes

Leapwork Performance can show that errors increased during a run. Monitoring helps you determine whether those errors were caused by:

Authentication failures
Application exceptions
Backend service failures
API gateway or load balancer errors
Dependency outages

Understand throughput limits

If virtual users increase but throughput stops growing, monitoring helps explain whether the system hit:

Thread pool exhaustion
Connection pool exhaustion
Rate limiting
CPU bottlenecks
Storage or database limits

Find regional or environment-specific issues

Monitoring makes it possible to compare:

One geo against another
One environment against another
One deployment against another

This helps you detect issues that are invisible in aggregated run-level data.

Speed up root cause analysis

Instead of manually checking multiple tools after a failed test, use shared tags and aligned timelines to investigate the failure in one place.

How correlation works

1. Leapwork Performance publishes performance metrics

Leapwork Performance publishes metrics such as:

lp.run_results.latency.p95_ms
lp.run_results.errors
lp.run_results.requests_sent
lp.run_results.timeseries.requests_per_second

2. Leapwork Performance publishes context tags

Examples:

run_id
run_name
timeline_name
project_name
step_id
step_title
geo_location

3. Monitoring systems contain server and app telemetry

Examples:

CPU and memory metrics
Pod and container metrics
Logs
Traces
Database response times
Cache and queue metrics

4. Align both datasets

Correlate the data using:

The same time window
The same environment
The same service
The same region
Shared run or step context where applicable

5. Investigate patterns across both views

For example:

Leapwork Performance P95 spikes at the same time as database latency
Leapwork Performance error count increases at the same time as 500-error logs
Throughput flattens at the same time CPU reaches saturation

Example use cases

Slow checkout flow during a run

Leapwork Performance shows:

P95 and P99 increased for checkout-related steps

Monitoring shows:

Database CPU increased sharply
Payment dependency latency also increased

Outcome: The team identifies that the slowdown was not caused by the test script, but by backend dependency pressure.

Error spike in one API step

Leapwork Performance shows:

Step POST /api/order has a sudden increase in errors

Monitoring shows:

Application logs contain 500 exceptions for the same time window
A downstream inventory service was timing out

Outcome: The issue is traced to a backend dependency instead of the frontend workflow.

Throughput stopped growing under load

Leapwork Performance shows:

Virtual users increased
Requests per second flattened
Latency increased

Monitoring shows:

Application CPU was stable
Database connection pool was exhausted

Outcome: The throughput limit is identified as a database bottleneck rather than app CPU.

One geography is consistently slower

Leapwork Performance shows:

One geo_location has higher latency and more errors

Monitoring shows:

A specific regional cluster has pod restarts
Network latency is elevated only in that region

Outcome: Teams isolate the issue to a regional infrastructure problem.

Regression after a release

Leapwork Performance shows:

The latest run has worse P95 and P99 than previous runs

Monitoring shows:

Cache hit ratio dropped after deployment
Backend response time increased

Outcome: The team identifies a release-related regression and rolls back or fixes the affected service.

Benefits

Faster root cause analysis

You can move from "the run is slow" to "the database was saturated during the run" much faster.

Better collaboration across teams

QA, performance engineers, developers, and infrastructure teams can work from the same evidence instead of separate tools and assumptions.

Higher release confidence

You can validate not only that a run passed or failed, but also whether the system remained healthy under load.

Clearer SLA and SLO tracking

Run metrics can be tied to operational telemetry to understand whether user-facing performance targets were truly met.

Reduced troubleshooting effort

Instead of manually checking many dashboards and logs after every issue, shared metrics and tags make investigation more direct.

Better capacity planning

By correlating throughput, latency, and resource usage, you can identify where scaling is needed before production issues occur.

Summary

Leapwork Performance tells you what users experienced during a performance run. Advanced monitoring systems tell you what the system was doing at the same time.

Correlation between these two views helps you:

Identify bottlenecks faster
Explain failures more accurately
Compare runs with system health
Make better release and scaling decisions

Investigate common problems using monitoring data

This section describes common problems and how to investigate them using the metrics Leapwork Performance publishes to Datadog or Grafana.

1. "My run finished, but the application felt slow. Which steps were the problem?"

Metrics to use:

lp.run_results.latency.p95_ms
lp.run_results.latency.p99_ms
lp.run_results.latency.avg_ms

Tags to use:

run_id
step_id
step_title

How to investigate:

Filter the dashboard by the target run_id.
Group the data by step_title.
Sort by P95 or P99.

What this helps you identify:

Which specific step had the worst latency
Whether the slowness came from one request or many steps

Recommended visualization:

Datadog: query table or top list
Grafana: table panel grouped by step_id and step_title

2. "One run is much slower than another. What changed?"

Metrics to use:

lp.run_results.run.latency.avg_ms
lp.run_results.run.latency.p95_ms
lp.run_results.run.latency.p99_ms
lp.run_results.run.peak_throughput

Tags to use:

run_name
timeline_name
project_name
run_id

How to investigate:

Compare multiple runs by run_name or run_id.
Plot latency trends across runs.
Compare peak throughput and error rate.

What this helps you identify:

Whether latency increased with load
Whether a newer run introduced a regression

Recommended visualization:

Datadog: bar chart or timeseries grouped by run_name
Grafana: Graphite timeseries queries filtered by run_name

3. "Which steps are failing the most?"

Metrics to use:

lp.run_results.errors
lp.run_results.status_code.count

Tags to use:

step_id
step_title
status_code
run_id

How to investigate:

Filter by run_id.
Group by step_title.
Break down by status_code.

What this helps you identify:

Which step is most unstable
Whether the issue is due to client errors, auth errors, or server failures

Recommended visualization:

Datadog: grouped query table
Grafana: step-wise error table plus status-code table

4. "Are we getting authentication failures or backend failures?"

Metrics to use:

lp.run_results.status_code.count

Tags to use:

status_code
step_title
timeline_name

How to investigate:

Filter for specific status codes:
- 401 / 403 for authentication or authorization problems
- 500 / 502 / 503 for backend or infrastructure failures
Group by step_title.

What this helps you identify:

Whether the issue is authentication-related
Whether the failures are coming from the backend

Recommended visualization:

Datadog: status-code breakdown table
Grafana: Graphite queries filtered by status_code

5. "Did the system degrade as load increased during the run?"

Metrics to use:

lp.run_results.timeseries.requests_per_second
lp.run_results.timeseries.latency.avg_ms
lp.run_results.run.peak_load

Tags to use:

track_item_id
timeline_name
run_id

How to investigate:

Plot requests per second over time.
Plot average latency over time.
Compare both as load rises.

What this helps you identify:

Whether the application degrades gradually or sharply under load
Whether latency spikes correlate with throughput increases

Recommended visualization:

Datadog: separate timeseries widgets
Grafana: Graphite time series panels

6. "Did one geography perform worse than another?"

Metrics to use:

lp.run_results.latency.avg_ms
lp.run_results.latency.p95_ms
lp.run_results.errors

Tags to use:

geo_location
track_item_id
step_title

How to investigate:

Group by geo_location.
Compare latency and errors between regions.

What this helps you identify:

Whether one geography had higher latency
Whether a region had more errors than the others

Recommended visualization:

Datadog: split graph or grouped table by geo_location
Grafana: grouped queries by Graphite tags

7. "Why did the run stop meeting SLA or SLO expectations?"

Metrics to use:

lp.run_results.run.latency.p95_ms
lp.run_results.run.error_rate_pct
lp.run_results.run.peak_throughput

Tags to use:

run_id
run_name
timeline_name

How to investigate:

Define target thresholds such as:
- P95 under a specific value
- Error rate below a target percentage
Highlight or alert on runs that cross those thresholds.

What this helps you identify:

Which runs violated the target
Whether the issue came from latency, errors, or both

Recommended visualization:

Datadog: monitors and stat widgets
Grafana: stat panels with thresholds

8. "Which asset or request transferred the most data?"

Metrics to use:

lp.run_results.bytes_received
lp.run_results.bytes_sent

Tags to use:

step_id
step_title
run_id

How to investigate:

Filter by the target run_id.
Group by step_title.
Sort descending by bytes received or sent.

What this helps you identify:

Which requests have the heaviest payloads
Whether payload size may be contributing to latency

Recommended visualization:

Datadog: query table
Grafana: table by step

9. "Was the run actually executing the expected amount of work?"

Metrics to use:

lp.run_results.track_item.total_sequences_run
lp.run_results.requests_sent
lp.run_results.track_item.vum_used
lp.run_results.track_item.current_virtual_users

Tags to use:

run_id
timeline_name
track_item_id

How to investigate:

Compare actual completed sequences with the expected workload.
Check whether user and load-related values behaved as planned.

What this helps you identify:

Whether the run under-executed
Whether the generated load profile matched expectations

Recommended visualization:

Datadog: stat widgets
Grafana: summary panels

10. "Which exact run should I investigate in monitoring?"

Tags to use:

run_id
run_name
timeline_name
project_name
company_name

How to investigate:

Copy the run ID from Leapwork Performance.
Filter Datadog or Grafana using the run_id.
Use run_name for more human-readable dashboards.

What this helps you identify:

The exact run without mixing data from other executions
A clean entry point for debugging one specific result set

Recommended monitoring use cases by tool

Datadog

Datadog is especially useful for:

Query tables
Top lists
Tag-driven filtering
Alerting and monitors

Grafana

Grafana is especially useful for:

Graphite timeseries exploration
Flexible dashboard layouts
Operational monitoring views
Visual comparison dashboards

Recommended dashboard types

Executive dashboard

Recommended widgets:

Run P95 latency
Run P99 latency
Error rate
Peak throughput
Peak load

Troubleshooting dashboard

Recommended widgets:

Step-wise P95 latency
Step-wise errors
Status-code breakdown
Bytes received and sent
Requests per second over time
Average latency over time

Geo comparison dashboard

Recommended widgets:

Latency by geo_location
Errors by geo_location
Throughput by geo_location

Build a Datadog dashboard

Use this plan to create a ready-to-use Datadog dashboard.

Dashboard name: Leapwork Performance Dashboard

Filters:

Add dashboard filters for:

timeline_name
run_name
run_status
project_name
company_name
geo_location

Widgets:

1. Run P95 latency

avg:lp.run_results.run.latency.p95_ms{*}

2. Run P99 latency

avg:lp.run_results.run.latency.p99_ms{*}

3. Run average latency

avg:lp.run_results.run.latency.avg_ms{*}

4. Run peak throughput

avg:lp.run_results.run.peak_throughput{*}

5. Run error rate

avg:lp.run_results.run.error_rate_pct{*}

6. Step P95 latency by step title

avg:lp.run_results.latency.p95_ms{*} by {step_title}

7. Requests per second over time

avg:lp.run_results.timeseries.requests_per_second{*} by {track_item_id}

8. Average latency over time

avg:lp.run_results.timeseries.latency.avg_ms{*} by {track_item_id}

9. Status code counts

sum:lp.run_results.status_code.count{*} by {status_code}

10. Bytes received by step

avg:lp.run_results.bytes_received{*} by {step_title}

Build a Grafana dashboard

Use this plan to create a ready-to-use Grafana dashboard.

Dashboard name: Leapwork Performance Dashboard

Data source: Use the Graphite data source.

Panels:

1. Run P95 latency

seriesByTag('name=lp.run_results.run.latency.p95_ms')

2. Run P99 latency

seriesByTag('name=lp.run_results.run.latency.p99_ms')

3. Run average latency

seriesByTag('name=lp.run_results.run.latency.avg_ms')

4. Peak throughput

seriesByTag('name=lp.run_results.run.peak_throughput')

5. Error rate

seriesByTag('name=lp.run_results.run.error_rate_pct')

6. Requests per second over time

seriesByTag('name=lp.run_results.timeseries.requests_per_second')

7. Average latency over time

seriesByTag('name=lp.run_results.timeseries.latency.avg_ms')

8. Step P95 latency

seriesByTag('name=lp.run_results.latency.p95_ms')

9. Status code breakdown

seriesByTag('name=lp.run_results.status_code.count')

10. Per-step bytes received

seriesByTag('name=lp.run_results.bytes_received')

Filtered query examples:

Filter to one run only:

seriesByTag('name=lp.run_results.run.latency.p95_ms','run_id=u5t4tc5h')

Filter to one timeline only:

seriesByTag('name=lp.run_results.run.latency.p95_ms','timeline_name=new_timeline')

Filter to one run name only:

seriesByTag('name=lp.run_results.run.latency.p95_ms','run_name=new_timeline_24042026083945')