A CI pipeline can feel slow for all the wrong reasons. The test suite may still be green, flaky failures may be rare, and individual tests may not have changed much, yet builds keep taking longer. If the dashboard only shows total duration and pass rate, it is easy to misread the problem as “tests are healthy, so the slowdown must be external.” In practice, slow pipelines usually come from a mix of queue time, environment setup, dependency work, test execution time, and post-test steps that nobody is watching closely enough.

If you need to measure CI pipeline slowdown with confidence, the first step is to stop treating the pipeline as one number. A healthy-looking test suite can hide slow checkout steps, overloaded runners, image pulls, artifact uploads, and serial jobs that quietly dominate end-to-end runtime.

The useful question is not “are the tests slow?”, it is “which part of the pipeline is consuming time, and is that time predictable?”

This article breaks down the measurements that matter, how to instrument them, and how to interpret the signals without overreacting to noise. It is written for engineering managers, DevOps engineers, QA leads, and SREs who need to find real pipeline bottlenecks instead of guessing.

Start by splitting pipeline time into measurable buckets

A slow build is rarely caused by one cause. You need to separate the pipeline into components that can be tracked independently.

1) Queue time

Queue time is the time a job spends waiting before a runner starts it. This is often the first place to look because it is invisible to test results. If tests pass and runtime looks stable once they begin, but builds wait for 3, 5, or 15 minutes before starting, your problem is capacity or scheduling, not test logic.

Track:

  • Time from job creation to runner pickup
  • Time by runner pool, environment, branch, and time of day
  • Queue depth and runner utilization over time

Queue time often points to one of these issues:

  • Not enough runners for peak demand
  • Concurrency limits on the CI platform
  • Long-running jobs hogging executors
  • A mix of job types competing for the same pool

2) Checkout and dependency restore

Source checkout, git submodule fetches, package restore, and cache hydration can become a major part of runtime, especially in large monorepos or image-heavy pipelines.

Measure:

  • Git clone or fetch duration
  • Dependency install duration
  • Cache hit rate and cache restore time
  • Artifact download duration between stages

If dependency restoration is slower but tests are not, the test suite is innocent. The bottleneck is usually cache quality, network access, package registry latency, or bloated dependency graphs.

3) Environment setup and provisioning

Environment setup includes container startup, VM boot time, database initialization, browser installation, service startup, and test data seeding. These steps can be stable for months and then silently drift upward as images grow or environments become more complex.

Track:

  • Container startup time
  • VM or ephemeral environment provisioning time
  • Service readiness time
  • Test fixture creation time
  • Browser and driver startup time for UI tests

4) Test execution time

This is the part most teams watch first, but it is only one slice of the full pipeline.

Measure:

  • Per-suite runtime
  • Per-file or per-spec runtime
  • Median and p95 test duration
  • Time spent waiting on explicit or implicit waits
  • Serial vs parallel execution time

For context on the broader discipline, the software testing and test automation references are useful reminders that execution time is only one property of an automated test system, not the whole system.

5) Post-test work

Many teams forget that the pipeline is not finished when the last test passes. Reporting, artifact upload, log collection, coverage merging, container cleanup, and notifications can all add nontrivial time.

Measure:

  • Artifact upload duration
  • Coverage report generation time
  • Log compression and upload time
  • Cleanup and teardown time
  • Slack, webhook, or release note publishing time

Build a timing model before you optimize anything

The simplest way to analyze CI runtime is to treat each job as a timeline with named phases. A useful model looks like this:

  • queued
  • started
  • checkout completed
  • dependencies ready
  • environment ready
  • tests started
  • tests completed
  • artifacts uploaded
  • job finished

If you can emit timestamps for each phase, you can answer almost every performance question with data instead of intuition.

Example of a minimal CI timing record

{ “pipeline_id”: “build-18421”, “queued_at”: “2026-06-10T08:14:03Z”, “started_at”: “2026-06-10T08:18:41Z”, “checkout_done_at”: “2026-06-10T08:19:20Z”, “env_ready_at”: “2026-06-10T08:22:05Z”, “tests_done_at”: “2026-06-10T08:29:50Z”, “finished_at”: “2026-06-10T08:31:02Z” }

From this, you can derive:

  • queue time
  • checkout time
  • environment setup time
  • test execution time
  • post-test time
  • end-to-end duration

Once the phases are explicit, you can compare the same job across branches, weekdays, runner pools, or release trains.

What to measure when tests look healthy

If pass rate is steady and test durations per spec are not changing dramatically, focus on the following metrics.

Queue metrics

These show whether the pipeline is waiting for capacity.

  • Average queue time
  • p95 queue time
  • Maximum queue time during working hours
  • Queue time by runner label or pool
  • Jobs delayed by priority or branch policy

Interpretation tips:

  • Rising average and p95 queue times with stable test runtime usually indicate capacity pressure.
  • Queue spikes at predictable times often mean you need more concurrency during business hours, not a faster test suite.
  • A few long-running jobs can starve the entire fleet if scheduling is not isolated.

Runner utilization and saturation

Measure how busy your executors are.

  • CPU utilization per runner
  • Memory pressure
  • Disk I/O wait
  • Concurrent jobs per runner
  • Idle time between jobs

High utilization with growing queue time usually means the system is saturated. Low utilization with high queue time can indicate misconfiguration, locked resources, or poor job matching.

Setup overhead

This is one of the most common hidden costs in CI.

  • Average time to create a fresh environment
  • Package restore duration
  • Container image pull time
  • Browser install time
  • Database migration time
  • Service health-check wait time

If setup overhead grows while test runtime stays flat, the solution may be prebuilt images, better caching, smaller images, or a more stable test fixture strategy.

Test execution distribution, not just averages

Averages hide the most useful information.

Track:

  • Median test suite time
  • p90 and p95 suite time
  • Longest specs or test classes
  • Standard deviation over a rolling window
  • Time spent in explicit waits and retries

A suite that averages 8 minutes but sometimes takes 16 minutes is more operationally painful than a suite that reliably takes 10 minutes. Variability is a performance problem because it makes planning harder and keeps teams waiting on uncertain feedback.

Stage-to-stage handoff time

Sometimes the delay is not in the stage itself, but in the gap between stages.

Measure:

  • Time between test end and artifact upload start
  • Time between one job and the next in a workflow
  • Time waiting for downstream dependencies, approvals, or fan-in jobs

These gaps often show up in multi-stage pipelines where the CI system is fine, but the workflow design is not.

A practical way to attribute slowdown

A simple attribution process prevents guesswork.

Step 1: Compare the same job over time

Use a weekly or daily baseline for each phase. Compare the same job type, not just the overall pipeline, because different branches or test suites may not be comparable.

Questions to ask:

  • Did queue time change?
  • Did environment setup change?
  • Did test execution change?
  • Did post-test work change?

Step 2: Compare successful runs to successful runs

Do not let failed builds distort the analysis. A failure can shorten a run and make a job look “faster” even when the system is slower overall. Compare:

  • success to success
  • branch to branch
  • runner pool to runner pool
  • small change to small change

Step 3: Break down by job class

Different pipeline jobs usually fail for different reasons.

  • Unit tests, often CPU-bound and sensitive to dependency restore
  • Integration tests, often environment-bound and database-heavy
  • UI tests, often dominated by browser startup, waits, and app readiness
  • Packaging jobs, often I/O-bound or artifact-heavy

If only UI jobs are slow, do not waste time tuning unit test parallelism. If every job is slow, inspect runner saturation and shared infrastructure first.

Step 4: Look at percentile drift

The p95 is often more important than the average in CI. A small number of slow runs can block merges, extend release windows, and create a perception that the entire system is unreliable.

When teams say “the pipeline feels slow,” they are often describing variance, not mean runtime.

Why tests can look healthy while builds get slower

A healthy test suite can coexist with a slow pipeline for several reasons.

The test logic is stable, but infrastructure is not

Maybe the tests run in the same amount of time, but runner images have grown, dependency downloads have increased, or external services have become slower. The test suite passes, but the build is still slower because the environment is doing more work around it.

Cached work stopped caching well

A subtle cache key change, dependency lockfile churn, or container image invalidation can cause repeated cold starts. Test reports will still look good because the tests themselves are fine, but the pipeline pays the cost every time.

Parallelism is hiding in the wrong place

Tests may be parallelized inside a job, while the pipeline remains serial across jobs. In that case, each individual test can be healthy, but the full workflow is still bottlenecked by staged dependencies or fan-in sequencing.

A few steps dominate the critical path

If your pipeline has 20 steps and 2 of them account for 80 percent of total time, you need critical-path analysis, not average step analysis. Shortening non-critical steps will not change build time much.

External calls are slowing the build

Package registries, container registries, artifact stores, license checks, and internal APIs can all add latency. The tests still pass, but the surrounding ecosystem is slower than it used to be.

Instrumentation ideas that actually help

You do not need a full observability platform to start measuring. You do need consistent timestamps and enough metadata to group runs meaningfully.

Add timing to each phase in CI logs

For script-driven pipelines, print timestamps before and after major steps.

start=$(date +%s)
echo "[checkout] $(date -Is)"
git fetch --depth=1 origin main
checkout_done=$(date +%s)
echo "[deps] $(date -Is)"
npm ci
install_done=$(date +%s)
echo "checkout_seconds=$((checkout_done - start))"
echo "deps_seconds=$((install_done - checkout_done))"

This is crude, but it is enough to separate checkout from dependency work in a way that can be graphed later.

Emit test timing per spec or file

For browser or integration tests, collect per-spec duration so slow paths become visible.

import { test } from '@playwright/test';
test('checkout flow', async ({ page }) => {
  const started = Date.now();
  await page.goto('/checkout');
  await page.getByRole('button', { name: 'Place order' }).click();
  console.log(`checkout_flow_ms=${Date.now() - started}`);
});

If a specific test keeps drifting, the problem may be an app change, a new wait condition, or a shared environment dependency.

Track cache hit rate

A cache that exists but misses often is just extra complexity. Record cache hits and misses per job type, especially for dependencies, browser binaries, and build artifacts.

Tag runs with environment metadata

When comparing timing data, capture:

  • branch name
  • commit SHA
  • runner type
  • OS image version
  • container image digest
  • dependency lockfile hash
  • test suite name

Without metadata, you cannot tell whether the slowdown came from code, infra, or job placement.

How to tell the difference between test slowness and pipeline slowness

A useful separation is this:

  • Test slowness means individual specs, suites, or test phases are taking longer once execution begins.
  • Pipeline slowness means everything around tests, before and after execution, is taking longer.

Indicators of test slowness

  • More time inside assertions, waits, retries, or API polling
  • Longer per-spec durations
  • Browser sessions taking longer because the app under test is slower
  • Higher failure rates that trigger automatic reruns

Indicators of pipeline slowness

  • Longer time in queue
  • Slower checkout or dependency install
  • More time waiting for container or VM readiness
  • Longer artifact upload or report generation
  • No obvious change in per-test runtime, but a higher total build time

If you only look at final status and test counts, you can miss the entire class of pipeline problems.

Common bottlenecks and what they usually mean

Long queue times

Likely causes:

  • Not enough runners
  • Jobs are larger than expected
  • Shared runner pool with mixed priorities
  • Autoscaling lag

What to check next:

  • Capacity trend over time
  • Runner start latency
  • Job scheduling rules
  • Heavy jobs that should be isolated

Slow dependency restore

Likely causes:

  • Cold caches
  • Large lockfiles or dependency trees
  • Registry latency
  • Too many language ecosystems in one job

What to check next:

  • Cache key design
  • Image pre-baking
  • Proxy or mirror performance
  • Dependency pruning

Slow browser test startup

Likely causes:

  • Browser binary install each run
  • Slow container image pulls
  • Improperly warmed test environments
  • App readiness waits that are longer than necessary

What to check next:

  • Reusable base images
  • Health checks
  • Headless browser startup logs
  • Network and DNS delays

Slow artifact upload

Likely causes:

  • Large logs or coverage files
  • Compression overhead
  • Slow object storage
  • Excessive artifacts from every job

What to check next:

  • Artifact retention policy
  • File sizes
  • Whether all artifacts are necessary on every run

A lightweight CI runtime analysis template

If you need a practical checklist, use this structure for each job type:

  1. Queue time
    • median
    • p95
    • max
  2. Preparation time
    • checkout
    • dependency install
    • environment startup
  3. Execution time
    • total suite duration
    • longest specs
    • retry time
  4. Post-processing time
    • artifacts
    • reports
    • cleanup
  5. Variability
    • standard deviation
    • p95 minus median
    • run-to-run drift

Once you have these numbers, annotate them with runner type and branch class. This is enough to identify most pipeline bottlenecks without building a complex observability stack first.

When to optimize the tests, and when not to

You should optimize test execution time when:

  • The execution phase is the dominant contributor to build time
  • A few long tests dominate the critical path
  • Parallel execution can shorten feedback without adding too much complexity
  • Retry logic is masking real test inefficiency

You should not start with test optimization when:

  • Queue time is growing faster than execution time
  • Environment setup is the majority of the runtime
  • Artifacts and cleanup are the biggest post-test cost
  • The same tests are fast locally but slow only in CI

A common mistake is to rewrite tests to save 30 seconds when the pipeline spends 8 minutes waiting for a runner.

A sample GitHub Actions timing pattern

If your CI system is flexible enough, log the major phase boundaries directly in the workflow.

name: ci
on: [push]

jobs: test: runs-on: ubuntu-latest steps: - name: Mark start run: echo “started_at=$(date -Is)” - uses: actions/checkout@v4 - name: Install dependencies run: npm ci - name: Run tests run: npm test

This example is intentionally simple. The point is not the exact syntax, it is the habit of instrumenting phases so that duration is visible outside the job summary.

Building a decision tree for slow builds

When a build slows down, ask these questions in order:

  1. Is the slowdown in queue time or execution time?
  2. If queue time, is it tied to a runner pool, time window, or branch policy?
  3. If execution time, is the added time before tests, during tests, or after tests?
  4. If during tests, is it a specific suite, specific file, or broad drift?
  5. If before or after tests, what changed in dependencies, images, artifacts, or cleanup?

This order matters because it keeps you from optimizing the wrong layer first.

If you cannot separate queue time from runtime, you do not have a CI performance problem yet, you have a measurement problem.

Metrics that are useful for managers and SREs

Managers need trends and operational risk. SREs need signal, saturation, and stability. Both groups benefit from the same core numbers, but they should read them differently.

For engineering managers

Focus on:

  • median and p95 end-to-end pipeline duration
  • queue time by team or repo
  • stability of critical path time
  • regression windows after platform changes
  • build completion latency during release windows

These help answer whether the delivery process is becoming less predictable.

For SREs and DevOps engineers

Focus on:

  • runner utilization
  • autoscaling lag
  • I/O wait and memory pressure
  • cache hit rate
  • job distribution across pools
  • external dependency latency

These help find whether the platform is overloaded, misconfigured, or sensitive to environmental changes.

For QA leads

Focus on:

  • per-suite timing drift
  • slow specs and unstable waits
  • environment readiness
  • test retries
  • how often UI or integration tests dominate the critical path

These help distinguish test design issues from infrastructure issues.

A practical closing rule

If your pipeline is slow but the tests still look healthy, assume the problem is not in the pass/fail signal. The useful work is to measure where time accumulates, compare those measurements over consistent cohorts, and optimize the slowest slice of the critical path.

Most teams do not need more opinions about why CI is slow. They need a timing model, a few phase-level metrics, and a way to compare builds that are actually comparable. Once you can separate queue time, setup overhead, test execution time, and environment delays, the real bottleneck usually becomes obvious.

That is the core of effective CI runtime analysis, not chasing every slow build, but learning which kind of slowness you are actually seeing.

Further reading