This guide explains how to correlate Leapwork Performance test metrics with infrastructure telemetry from tools such as Datadog and Grafana. Use it to identify the root cause of slow runs, error spikes, and throughput limits faster than test results alone allow.
Why advanced monitoring matters
Leapwork Performance shows what happened during a performance test. Advanced monitoring tools such as Datadog and Grafana help explain why it happened.
Leapwork Performance publishes performance test metrics such as:
-
Latency
-
Throughput
-
Errors
-
Requests per second
-
Status-code breakdown
-
Step-wise performance
-
Run-level summaries
Monitoring tools add system and application telemetry such as:
-
CPU, memory, disk, and network usage
-
Container and pod health
-
Database performance
-
Cache behavior
-
Application logs
-
Traces and dependency timings
-
Infrastructure events and restarts
When you combine these two views, you can correlate user-facing performance with backend and infrastructure behavior.
Correlation overview
Leapwork Performance publishes performance test metrics, while Datadog or Grafana provides system and application telemetry. Correlating both views helps you identify the real cause of latency, errors, or throughput bottlenecks.
Correlation dashboard
Align Leapwork Performance run metrics with infrastructure metrics on the same timeline. This helps you quickly identify whether performance degradation matches backend resource saturation.
Step-level troubleshooting view
Step-level visibility helps you identify which specific request or transaction caused the overall run to slow down or fail.
Geo comparison view
Regional comparison helps you detect location-specific issues such as CDN delays, routing problems, or regional infrastructure instability.
What advanced monitoring solves
Detect the real reason behind slow runs
A run may show higher latency, but the test result alone does not explain whether the cause was:
-
CPU saturation
-
Database contention
-
Memory pressure
-
Autoscaling delay
-
Network instability
-
Third-party dependency issues
With correlated monitoring data, you can identify the actual bottleneck.
Explain error spikes
Leapwork Performance can show that errors increased during a run. Monitoring helps you determine whether those errors were caused by:
-
Authentication failures
-
Application exceptions
-
Backend service failures
-
API gateway or load balancer errors
-
Dependency outages
Understand throughput limits
If virtual users increase but throughput stops growing, monitoring helps explain whether the system hit:
-
Thread pool exhaustion
-
Connection pool exhaustion
-
Rate limiting
-
CPU bottlenecks
-
Storage or database limits
Find regional or environment-specific issues
Monitoring makes it possible to compare:
-
One geo against another
-
One environment against another
-
One deployment against another
This helps you detect issues that are invisible in aggregated run-level data.
Speed up root cause analysis
Instead of manually checking multiple tools after a failed test, use shared tags and aligned timelines to investigate the failure in one place.
How correlation works
1. Leapwork Performance publishes performance metrics
Leapwork Performance publishes metrics such as:
-
lp.run_results.latency.p95_ms -
lp.run_results.errors -
lp.run_results.requests_sent -
lp.run_results.timeseries.requests_per_second
2. Leapwork Performance publishes context tags
Examples:
-
run_id -
run_name -
timeline_name -
project_name -
step_id -
step_title -
geo_location
3. Monitoring systems contain server and app telemetry
Examples:
-
CPU and memory metrics
-
Pod and container metrics
-
Logs
-
Traces
-
Database response times
-
Cache and queue metrics
4. Align both datasets
Correlate the data using:
-
The same time window
-
The same environment
-
The same service
-
The same region
-
Shared run or step context where applicable
5. Investigate patterns across both views
For example:
-
Leapwork Performance P95 spikes at the same time as database latency
-
Leapwork Performance error count increases at the same time as 500-error logs
-
Throughput flattens at the same time CPU reaches saturation
Example use cases
Slow checkout flow during a run
Leapwork Performance shows:
-
P95andP99increased for checkout-related steps
Monitoring shows:
-
Database CPU increased sharply
-
Payment dependency latency also increased
Outcome: The team identifies that the slowdown was not caused by the test script, but by backend dependency pressure.
Error spike in one API step
Leapwork Performance shows:
-
Step
POST /api/orderhas a sudden increase in errors
Monitoring shows:
-
Application logs contain 500 exceptions for the same time window
-
A downstream inventory service was timing out
Outcome: The issue is traced to a backend dependency instead of the frontend workflow.
Throughput stopped growing under load
Leapwork Performance shows:
-
Virtual users increased
-
Requests per second flattened
-
Latency increased
Monitoring shows:
-
Application CPU was stable
-
Database connection pool was exhausted
Outcome: The throughput limit is identified as a database bottleneck rather than app CPU.
One geography is consistently slower
Leapwork Performance shows:
-
One
geo_locationhas higher latency and more errors
Monitoring shows:
-
A specific regional cluster has pod restarts
-
Network latency is elevated only in that region
Outcome: Teams isolate the issue to a regional infrastructure problem.
Regression after a release
Leapwork Performance shows:
-
The latest run has worse P95 and P99 than previous runs
Monitoring shows:
-
Cache hit ratio dropped after deployment
-
Backend response time increased
Outcome: The team identifies a release-related regression and rolls back or fixes the affected service.
Benefits
Faster root cause analysis
You can move from "the run is slow" to "the database was saturated during the run" much faster.
Better collaboration across teams
QA, performance engineers, developers, and infrastructure teams can work from the same evidence instead of separate tools and assumptions.
Higher release confidence
You can validate not only that a run passed or failed, but also whether the system remained healthy under load.
Clearer SLA and SLO tracking
Run metrics can be tied to operational telemetry to understand whether user-facing performance targets were truly met.
Reduced troubleshooting effort
Instead of manually checking many dashboards and logs after every issue, shared metrics and tags make investigation more direct.
Better capacity planning
By correlating throughput, latency, and resource usage, you can identify where scaling is needed before production issues occur.
Summary
Leapwork Performance tells you what users experienced during a performance run. Advanced monitoring systems tell you what the system was doing at the same time.
Correlation between these two views helps you:
-
Identify bottlenecks faster
-
Explain failures more accurately
-
Compare runs with system health
-
Make better release and scaling decisions
Investigate common problems using monitoring data
This section describes common problems and how to investigate them using the metrics Leapwork Performance publishes to Datadog or Grafana.
1. "My run finished, but the application felt slow. Which steps were the problem?"
Metrics to use:
-
lp.run_results.latency.p95_ms -
lp.run_results.latency.p99_ms -
lp.run_results.latency.avg_ms
Tags to use:
-
run_id -
step_id -
step_title
How to investigate:
-
Filter the dashboard by the target
run_id. -
Group the data by
step_title. -
Sort by
P95orP99.
What this helps you identify:
-
Which specific step had the worst latency
-
Whether the slowness came from one request or many steps
Recommended visualization:
-
Datadog: query table or top list
-
Grafana: table panel grouped by
step_idandstep_title
2. "One run is much slower than another. What changed?"
Metrics to use:
-
lp.run_results.run.latency.avg_ms -
lp.run_results.run.latency.p95_ms -
lp.run_results.run.latency.p99_ms -
lp.run_results.run.peak_throughput
Tags to use:
-
run_name -
timeline_name -
project_name -
run_id
How to investigate:
-
Compare multiple runs by
run_nameorrun_id. -
Plot latency trends across runs.
-
Compare peak throughput and error rate.
What this helps you identify:
-
Whether latency increased with load
-
Whether a newer run introduced a regression
Recommended visualization:
-
Datadog: bar chart or timeseries grouped by
run_name -
Grafana: Graphite timeseries queries filtered by
run_name
3. "Which steps are failing the most?"
Metrics to use:
-
lp.run_results.errors -
lp.run_results.status_code.count
Tags to use:
-
step_id -
step_title -
status_code -
run_id
How to investigate:
-
Filter by
run_id. -
Group by
step_title. -
Break down by
status_code.
What this helps you identify:
-
Which step is most unstable
-
Whether the issue is due to client errors, auth errors, or server failures
Recommended visualization:
-
Datadog: grouped query table
-
Grafana: step-wise error table plus status-code table
4. "Are we getting authentication failures or backend failures?"
Metrics to use:
-
lp.run_results.status_code.count
Tags to use:
-
status_code -
step_title -
timeline_name
How to investigate:
-
Filter for specific status codes:
-
401/403for authentication or authorization problems -
500/502/503for backend or infrastructure failures
-
-
Group by
step_title.
What this helps you identify:
-
Whether the issue is authentication-related
-
Whether the failures are coming from the backend
Recommended visualization:
-
Datadog: status-code breakdown table
-
Grafana: Graphite queries filtered by
status_code
5. "Did the system degrade as load increased during the run?"
Metrics to use:
-
lp.run_results.timeseries.requests_per_second -
lp.run_results.timeseries.latency.avg_ms -
lp.run_results.run.peak_load
Tags to use:
-
track_item_id -
timeline_name -
run_id
How to investigate:
-
Plot requests per second over time.
-
Plot average latency over time.
-
Compare both as load rises.
What this helps you identify:
-
Whether the application degrades gradually or sharply under load
-
Whether latency spikes correlate with throughput increases
Recommended visualization:
-
Datadog: separate timeseries widgets
-
Grafana: Graphite time series panels
6. "Did one geography perform worse than another?"
Metrics to use:
-
lp.run_results.latency.avg_ms -
lp.run_results.latency.p95_ms -
lp.run_results.errors
Tags to use:
-
geo_location -
track_item_id -
step_title
How to investigate:
-
Group by
geo_location. -
Compare latency and errors between regions.
What this helps you identify:
-
Whether one geography had higher latency
-
Whether a region had more errors than the others
Recommended visualization:
-
Datadog: split graph or grouped table by
geo_location -
Grafana: grouped queries by Graphite tags
7. "Why did the run stop meeting SLA or SLO expectations?"
Metrics to use:
-
lp.run_results.run.latency.p95_ms -
lp.run_results.run.error_rate_pct -
lp.run_results.run.peak_throughput
Tags to use:
-
run_id -
run_name -
timeline_name
How to investigate:
-
Define target thresholds such as:
-
P95 under a specific value
-
Error rate below a target percentage
-
-
Highlight or alert on runs that cross those thresholds.
What this helps you identify:
-
Which runs violated the target
-
Whether the issue came from latency, errors, or both
Recommended visualization:
-
Datadog: monitors and stat widgets
-
Grafana: stat panels with thresholds
8. "Which asset or request transferred the most data?"
Metrics to use:
-
lp.run_results.bytes_received -
lp.run_results.bytes_sent
Tags to use:
-
step_id -
step_title -
run_id
How to investigate:
-
Filter by the target
run_id. -
Group by
step_title. -
Sort descending by bytes received or sent.
What this helps you identify:
-
Which requests have the heaviest payloads
-
Whether payload size may be contributing to latency
Recommended visualization:
-
Datadog: query table
-
Grafana: table by step
9. "Was the run actually executing the expected amount of work?"
Metrics to use:
-
lp.run_results.track_item.total_sequences_run -
lp.run_results.requests_sent -
lp.run_results.track_item.vum_used -
lp.run_results.track_item.current_virtual_users
Tags to use:
-
run_id -
timeline_name -
track_item_id
How to investigate:
-
Compare actual completed sequences with the expected workload.
-
Check whether user and load-related values behaved as planned.
What this helps you identify:
-
Whether the run under-executed
-
Whether the generated load profile matched expectations
Recommended visualization:
-
Datadog: stat widgets
-
Grafana: summary panels
10. "Which exact run should I investigate in monitoring?"
Tags to use:
-
run_id -
run_name -
timeline_name -
project_name -
company_name
How to investigate:
-
Copy the run ID from Leapwork Performance.
-
Filter Datadog or Grafana using the
run_id. -
Use
run_namefor more human-readable dashboards.
What this helps you identify:
-
The exact run without mixing data from other executions
-
A clean entry point for debugging one specific result set
Recommended monitoring use cases by tool
Datadog
Datadog is especially useful for:
-
Query tables
-
Top lists
-
Tag-driven filtering
-
Alerting and monitors
Grafana
Grafana is especially useful for:
-
Graphite timeseries exploration
-
Flexible dashboard layouts
-
Operational monitoring views
-
Visual comparison dashboards
Recommended dashboard types
Executive dashboard
Recommended widgets:
-
Run P95 latency
-
Run P99 latency
-
Error rate
-
Peak throughput
-
Peak load
Troubleshooting dashboard
Recommended widgets:
-
Step-wise P95 latency
-
Step-wise errors
-
Status-code breakdown
-
Bytes received and sent
-
Requests per second over time
-
Average latency over time
Geo comparison dashboard
Recommended widgets:
-
Latency by
geo_location -
Errors by
geo_location -
Throughput by
geo_location
Build a Datadog dashboard
Use this plan to create a ready-to-use Datadog dashboard.
Dashboard name: Leapwork Performance Dashboard
Filters:
Add dashboard filters for:
-
timeline_name -
run_name -
run_status -
project_name -
company_name -
geo_location
Widgets:
1. Run P95 latency
avg:lp.run_results.run.latency.p95_ms{*}
2. Run P99 latency
avg:lp.run_results.run.latency.p99_ms{*}
3. Run average latency
avg:lp.run_results.run.latency.avg_ms{*}
4. Run peak throughput
avg:lp.run_results.run.peak_throughput{*}
5. Run error rate
avg:lp.run_results.run.error_rate_pct{*}
6. Step P95 latency by step title
avg:lp.run_results.latency.p95_ms{*} by {step_title}
7. Requests per second over time
avg:lp.run_results.timeseries.requests_per_second{*} by {track_item_id}
8. Average latency over time
avg:lp.run_results.timeseries.latency.avg_ms{*} by {track_item_id}
9. Status code counts
sum:lp.run_results.status_code.count{*} by {status_code}
10. Bytes received by step
avg:lp.run_results.bytes_received{*} by {step_title}
Build a Grafana dashboard
Use this plan to create a ready-to-use Grafana dashboard.
Dashboard name: Leapwork Performance Dashboard
Data source: Use the Graphite data source.
Panels:
1. Run P95 latency
seriesByTag('name=lp.run_results.run.latency.p95_ms')
2. Run P99 latency
seriesByTag('name=lp.run_results.run.latency.p99_ms')
3. Run average latency
seriesByTag('name=lp.run_results.run.latency.avg_ms')
4. Peak throughput
seriesByTag('name=lp.run_results.run.peak_throughput')
5. Error rate
seriesByTag('name=lp.run_results.run.error_rate_pct')
6. Requests per second over time
seriesByTag('name=lp.run_results.timeseries.requests_per_second')
7. Average latency over time
seriesByTag('name=lp.run_results.timeseries.latency.avg_ms')
8. Step P95 latency
seriesByTag('name=lp.run_results.latency.p95_ms')
9. Status code breakdown
seriesByTag('name=lp.run_results.status_code.count')
10. Per-step bytes received
seriesByTag('name=lp.run_results.bytes_received')
Filtered query examples:
Filter to one run only:
seriesByTag('name=lp.run_results.run.latency.p95_ms','run_id=u5t4tc5h')
Filter to one timeline only:
seriesByTag('name=lp.run_results.run.latency.p95_ms','timeline_name=new_timeline')
Filter to one run name only:
seriesByTag('name=lp.run_results.run.latency.p95_ms','run_name=new_timeline_24042026083945')