What to Measure When Your CI Pipeline Is Slow but Your Tests Still Look Healthy

A CI pipeline can feel slow for all the wrong reasons. The test suite may still be green, flaky failures may be rare, and individual tests may not have changed much, yet builds keep taking longer. If the dashboard only shows total duration and pass rate, it is easy to misread the problem as “tests are healthy, so the slowdown must be external.” In practice, slow pipelines usually come from a mix of queue time, environment setup, dependency work, test execution time, and post-test steps that nobody is watching closely enough.

If you need to measure CI pipeline slowdown with confidence, the first step is to stop treating the pipeline as one number. A healthy-looking test suite can hide slow checkout steps, overloaded runners, image pulls, artifact uploads, and serial jobs that quietly dominate end-to-end runtime.

The useful question is not “are the tests slow?”, it is “which part of the pipeline is consuming time, and is that time predictable?”

This article breaks down the measurements that matter, how to instrument them, and how to interpret the signals without overreacting to noise. It is written for engineering managers, DevOps engineers, QA leads, and SREs who need to find real pipeline bottlenecks instead of guessing.

Start by splitting pipeline time into measurable buckets

A slow build is rarely caused by one cause. You need to separate the pipeline into components that can be tracked independently.

1) Queue time

Queue time is the time a job spends waiting before a runner starts it. This is often the first place to look because it is invisible to test results. If tests pass and runtime looks stable once they begin, but builds wait for 3, 5, or 15 minutes before starting, your problem is capacity or scheduling, not test logic.

Track:

Time from job creation to runner pickup
Time by runner pool, environment, branch, and time of day
Queue depth and runner utilization over time

Queue time often points to one of these issues:

Not enough runners for peak demand
Concurrency limits on the CI platform
Long-running jobs hogging executors
A mix of job types competing for the same pool

2) Checkout and dependency restore

Source checkout, git submodule fetches, package restore, and cache hydration can become a major part of runtime, especially in large monorepos or image-heavy pipelines.

Measure:

Git clone or fetch duration
Dependency install duration
Cache hit rate and cache restore time
Artifact download duration between stages

If dependency restoration is slower but tests are not, the test suite is innocent. The bottleneck is usually cache quality, network access, package registry latency, or bloated dependency graphs.

3) Environment setup and provisioning

Environment setup includes container startup, VM boot time, database initialization, browser installation, service startup, and test data seeding. These steps can be stable for months and then silently drift upward as images grow or environments become more complex.

Track:

Container startup time
VM or ephemeral environment provisioning time
Service readiness time
Test fixture creation time
Browser and driver startup time for UI tests

4) Test execution time

This is the part most teams watch first, but it is only one slice of the full pipeline.

Measure:

Per-suite runtime
Per-file or per-spec runtime
Median and p95 test duration
Time spent waiting on explicit or implicit waits
Serial vs parallel execution time

For context on the broader discipline, the software testing and test automation references are useful reminders that execution time is only one property of an automated test system, not the whole system.

5) Post-test work

Many teams forget that the pipeline is not finished when the last test passes. Reporting, artifact upload, log collection, coverage merging, container cleanup, and notifications can all add nontrivial time.

Measure:

Artifact upload duration
Coverage report generation time
Log compression and upload time
Cleanup and teardown time
Slack, webhook, or release note publishing time

Build a timing model before you optimize anything

The simplest way to analyze CI runtime is to treat each job as a timeline with named phases. A useful model looks like this:

queued
started
checkout completed
dependencies ready
environment ready
tests started
tests completed
artifacts uploaded
job finished

If you can emit timestamps for each phase, you can answer almost every performance question with data instead of intuition.

Example of a minimal CI timing record

{ “pipeline_id”: “build-18421”, “queued_at”: “2026-06-10T08:14:03Z”, “started_at”: “2026-06-10T08:18:41Z”, “checkout_done_at”: “2026-06-10T08:19:20Z”, “env_ready_at”: “2026-06-10T08:22:05Z”, “tests_done_at”: “2026-06-10T08:29:50Z”, “finished_at”: “2026-06-10T08:31:02Z” }

From this, you can derive:

queue time
checkout time
environment setup time
test execution time
post-test time
end-to-end duration

Once the phases are explicit, you can compare the same job across branches, weekdays, runner pools, or release trains.

What to measure when tests look healthy

If pass rate is steady and test durations per spec are not changing dramatically, focus on the following metrics.

Queue metrics

These show whether the pipeline is waiting for capacity.

Average queue time
p95 queue time
Maximum queue time during working hours
Queue time by runner label or pool
Jobs delayed by priority or branch policy

Interpretation tips:

Rising average and p95 queue times with stable test runtime usually indicate capacity pressure.
Queue spikes at predictable times often mean you need more concurrency during business hours, not a faster test suite.
A few long-running jobs can starve the entire fleet if scheduling is not isolated.

Runner utilization and saturation

Measure how busy your executors are.

CPU utilization per runner
Memory pressure
Disk I/O wait
Concurrent jobs per runner
Idle time between jobs

High utilization with growing queue time usually means the system is saturated. Low utilization with high queue time can indicate misconfiguration, locked resources, or poor job matching.

Setup overhead

This is one of the most common hidden costs in CI.

Average time to create a fresh environment
Package restore duration
Container image pull time
Browser install time
Database migration time
Service health-check wait time

If setup overhead grows while test runtime stays flat, the solution may be prebuilt images, better caching, smaller images, or a more stable test fixture strategy.

Test execution distribution, not just averages

Averages hide the most useful information.

Track:

Median test suite time
p90 and p95 suite time
Longest specs or test classes
Standard deviation over a rolling window
Time spent in explicit waits and retries

A suite that averages 8 minutes but sometimes takes 16 minutes is more operationally painful than a suite that reliably takes 10 minutes. Variability is a performance problem because it makes planning harder and keeps teams waiting on uncertain feedback.

Stage-to-stage handoff time

Sometimes the delay is not in the stage itself, but in the gap between stages.

Measure:

Time between test end and artifact upload start
Time between one job and the next in a workflow
Time waiting for downstream dependencies, approvals, or fan-in jobs

These gaps often show up in multi-stage pipelines where the CI system is fine, but the workflow design is not.

A practical way to attribute slowdown

A simple attribution process prevents guesswork.

Step 1: Compare the same job over time

Use a weekly or daily baseline for each phase. Compare the same job type, not just the overall pipeline, because different branches or test suites may not be comparable.

Questions to ask:

Did queue time change?
Did environment setup change?
Did test execution change?
Did post-test work change?

Step 2: Compare successful runs to successful runs

Do not let failed builds distort the analysis. A failure can shorten a run and make a job look “faster” even when the system is slower overall. Compare:

success to success
branch to branch
runner pool to runner pool
small change to small change

Step 3: Break down by job class

Different pipeline jobs usually fail for different reasons.

Unit tests, often CPU-bound and sensitive to dependency restore
Integration tests, often environment-bound and database-heavy
UI tests, often dominated by browser startup, waits, and app readiness
Packaging jobs, often I/O-bound or artifact-heavy

If only UI jobs are slow, do not waste time tuning unit test parallelism. If every job is slow, inspect runner saturation and shared infrastructure first.

Step 4: Look at percentile drift

The p95 is often more important than the average in CI. A small number of slow runs can block merges, extend release windows, and create a perception that the entire system is unreliable.

When teams say “the pipeline feels slow,” they are often describing variance, not mean runtime.

Why tests can look healthy while builds get slower

A healthy test suite can coexist with a slow pipeline for several reasons.

The test logic is stable, but infrastructure is not

Maybe the tests run in the same amount of time, but runner images have grown, dependency downloads have increased, or external services have become slower. The test suite passes, but the build is still slower because the environment is doing more work around it.

Cached work stopped caching well

A subtle cache key change, dependency lockfile churn, or container image invalidation can cause repeated cold starts. Test reports will still look good because the tests themselves are fine, but the pipeline pays the cost every time.

Parallelism is hiding in the wrong place

Tests may be parallelized inside a job, while the pipeline remains serial across jobs. In that case, each individual test can be healthy, but the full workflow is still bottlenecked by staged dependencies or fan-in sequencing.

A few steps dominate the critical path

If your pipeline has 20 steps and 2 of them account for 80 percent of total time, you need critical-path analysis, not average step analysis. Shortening non-critical steps will not change build time much.

External calls are slowing the build

Package registries, container registries, artifact stores, license checks, and internal APIs can all add latency. The tests still pass, but the surrounding ecosystem is slower than it used to be.

Instrumentation ideas that actually help

You do not need a full observability platform to start measuring. You do need consistent timestamps and enough metadata to group runs meaningfully.

Add timing to each phase in CI logs

For script-driven pipelines, print timestamps before and after major steps.

start=$(date +%s)
echo "[checkout] $(date -Is)"
git fetch --depth=1 origin main
checkout_done=$(date +%s)
echo "[deps] $(date -Is)"
npm ci
install_done=$(date +%s)
echo "checkout_seconds=$((checkout_done - start))"
echo "deps_seconds=$((install_done - checkout_done))"

This is crude, but it is enough to separate checkout from dependency work in a way that can be graphed later.

Emit test timing per spec or file

For browser or integration tests, collect per-spec duration so slow paths become visible.

import { test } from '@playwright/test';

test('checkout flow', async ({ page }) => {
  const started = Date.now();
  await page.goto('/checkout');
  await page.getByRole('button', { name: 'Place order' }).click();
  console.log(`checkout_flow_ms=${Date.now() - started}`);
});

If a specific test keeps drifting, the problem may be an app change, a new wait condition, or a shared environment dependency.

Track cache hit rate

A cache that exists but misses often is just extra complexity. Record cache hits and misses per job type, especially for dependencies, browser binaries, and build artifacts.

Tag runs with environment metadata

When comparing timing data, capture:

branch name
commit SHA
runner type
OS image version
container image digest
dependency lockfile hash
test suite name

Without metadata, you cannot tell whether the slowdown came from code, infra, or job placement.

How to tell the difference between test slowness and pipeline slowness

A useful separation is this:

Test slowness means individual specs, suites, or test phases are taking longer once execution begins.
Pipeline slowness means everything around tests, before and after execution, is taking longer.

Indicators of test slowness

More time inside assertions, waits, retries, or API polling
Longer per-spec durations
Browser sessions taking longer because the app under test is slower
Higher failure rates that trigger automatic reruns

Indicators of pipeline slowness

Longer time in queue
Slower checkout or dependency install
More time waiting for container or VM readiness
Longer artifact upload or report generation
No obvious change in per-test runtime, but a higher total build time

If you only look at final status and test counts, you can miss the entire class of pipeline problems.

Common bottlenecks and what they usually mean

Long queue times

Likely causes:

Not enough runners
Jobs are larger than expected
Shared runner pool with mixed priorities
Autoscaling lag

What to check next:

Capacity trend over time
Runner start latency
Job scheduling rules
Heavy jobs that should be isolated

Slow dependency restore

Likely causes:

Cold caches
Large lockfiles or dependency trees
Registry latency
Too many language ecosystems in one job

What to check next:

Cache key design
Image pre-baking
Proxy or mirror performance
Dependency pruning

Slow browser test startup

Likely causes:

Browser binary install each run
Slow container image pulls
Improperly warmed test environments
App readiness waits that are longer than necessary

What to check next:

Reusable base images
Health checks
Headless browser startup logs
Network and DNS delays

Slow artifact upload

Likely causes:

Large logs or coverage files
Compression overhead
Slow object storage
Excessive artifacts from every job

What to check next:

Artifact retention policy
File sizes
Whether all artifacts are necessary on every run

A lightweight CI runtime analysis template

If you need a practical checklist, use this structure for each job type:

Queue time
- median
- p95
- max
Preparation time
- checkout
- dependency install
- environment startup
Execution time
- total suite duration
- longest specs
- retry time
Post-processing time
- artifacts
- reports
- cleanup
Variability
- standard deviation
- p95 minus median
- run-to-run drift

Once you have these numbers, annotate them with runner type and branch class. This is enough to identify most pipeline bottlenecks without building a complex observability stack first.

When to optimize the tests, and when not to

You should optimize test execution time when:

The execution phase is the dominant contributor to build time
A few long tests dominate the critical path
Parallel execution can shorten feedback without adding too much complexity
Retry logic is masking real test inefficiency

You should not start with test optimization when:

Queue time is growing faster than execution time
Environment setup is the majority of the runtime
Artifacts and cleanup are the biggest post-test cost
The same tests are fast locally but slow only in CI

A common mistake is to rewrite tests to save 30 seconds when the pipeline spends 8 minutes waiting for a runner.

A sample GitHub Actions timing pattern

If your CI system is flexible enough, log the major phase boundaries directly in the workflow.

name: ci
on: [push]

jobs: test: runs-on: ubuntu-latest steps: - name: Mark start run: echo “started_at=$(date -Is)” - uses: actions/checkout@v4 - name: Install dependencies run: npm ci - name: Run tests run: npm test

This example is intentionally simple. The point is not the exact syntax, it is the habit of instrumenting phases so that duration is visible outside the job summary.

Building a decision tree for slow builds

When a build slows down, ask these questions in order:

Is the slowdown in queue time or execution time?
If queue time, is it tied to a runner pool, time window, or branch policy?
If execution time, is the added time before tests, during tests, or after tests?
If during tests, is it a specific suite, specific file, or broad drift?
If before or after tests, what changed in dependencies, images, artifacts, or cleanup?

This order matters because it keeps you from optimizing the wrong layer first.

If you cannot separate queue time from runtime, you do not have a CI performance problem yet, you have a measurement problem.

Metrics that are useful for managers and SREs

Managers need trends and operational risk. SREs need signal, saturation, and stability. Both groups benefit from the same core numbers, but they should read them differently.

For engineering managers

Focus on:

median and p95 end-to-end pipeline duration
queue time by team or repo
stability of critical path time
regression windows after platform changes
build completion latency during release windows

These help answer whether the delivery process is becoming less predictable.

For SREs and DevOps engineers

Focus on:

runner utilization
autoscaling lag
I/O wait and memory pressure
cache hit rate
job distribution across pools
external dependency latency

These help find whether the platform is overloaded, misconfigured, or sensitive to environmental changes.

For QA leads

Focus on:

per-suite timing drift
slow specs and unstable waits
environment readiness
test retries
how often UI or integration tests dominate the critical path

These help distinguish test design issues from infrastructure issues.

A practical closing rule

If your pipeline is slow but the tests still look healthy, assume the problem is not in the pass/fail signal. The useful work is to measure where time accumulates, compare those measurements over consistent cohorts, and optimize the slowest slice of the critical path.

Most teams do not need more opinions about why CI is slow. They need a timing model, a few phase-level metrics, and a way to compare builds that are actually comparable. Once you can separate queue time, setup overhead, test execution time, and environment delays, the real bottleneck usually becomes obvious.

That is the core of effective CI runtime analysis, not chasing every slow build, but learning which kind of slowness you are actually seeing.

Start by splitting pipeline time into measurable buckets

1) Queue time

2) Checkout and dependency restore

3) Environment setup and provisioning

4) Test execution time

5) Post-test work

Build a timing model before you optimize anything

Example of a minimal CI timing record

What to measure when tests look healthy

Queue metrics

Runner utilization and saturation

Setup overhead

Test execution distribution, not just averages

Stage-to-stage handoff time

A practical way to attribute slowdown

Step 1: Compare the same job over time

Step 2: Compare successful runs to successful runs

Step 3: Break down by job class

Step 4: Look at percentile drift

Why tests can look healthy while builds get slower

The test logic is stable, but infrastructure is not

Cached work stopped caching well

Parallelism is hiding in the wrong place

A few steps dominate the critical path

External calls are slowing the build

Instrumentation ideas that actually help

Add timing to each phase in CI logs

Emit test timing per spec or file

Track cache hit rate

Tag runs with environment metadata

How to tell the difference between test slowness and pipeline slowness

Indicators of test slowness

Indicators of pipeline slowness

Common bottlenecks and what they usually mean

Long queue times

Slow dependency restore

Slow browser test startup

Slow artifact upload

A lightweight CI runtime analysis template

When to optimize the tests, and when not to

A sample GitHub Actions timing pattern

Building a decision tree for slow builds

Metrics that are useful for managers and SREs

For engineering managers

For SREs and DevOps engineers

For QA leads

A practical closing rule

Further reading