June 24, 2026
How to Debug Browser Tests That Pass Locally but Fail in Headless CI
A practical troubleshooting guide for browser tests that pass locally but fail in headless CI, with a decision tree, common root causes, and fixes for timing, viewport, and environment mismatches.
Browser tests that pass on a developer laptop but fail in headless CI are one of the most common, and most frustrating, forms of test flakiness. The code is the same, the test data is the same, and yet the result changes the moment the browser runs without a visible UI inside a container or CI runner.
The gap usually comes from subtle differences in timing, rendering, viewport size, font availability, networking, authentication state, or how the test runner waits for elements. When a failure only appears in CI, the instinct is often to retry until it goes green. That hides the symptom, not the cause.
This guide gives you a practical way to triage those failures quickly. It focuses on the most common reasons browser tests fail in headless CI, how to isolate the mismatch between local and CI, and how to decide whether the fix belongs in the test, the application, or the pipeline.
If a test only passes when a human is watching it, that is usually a signal that the test depends on visual or timing behavior it never explicitly modeled.
Start with a fast decision tree
When a test passes locally but fails in CI, do not begin by rewriting selectors or adding random waits. Start with a narrow triage path.
1. Is the failure deterministic in CI?
Run the same test several times in the same CI environment.
- Fails every time, this is likely an environment or setup mismatch.
- Fails intermittently, this is more likely timing, concurrency, or state leakage.
- Only fails on one branch or one runner type, inspect runner differences first.
2. Does the failure reproduce in headless mode locally?
Run the browser locally with the same headless setting used in CI.
- If it reproduces, you have a true headless issue, often related to rendering, viewport, or timing.
- If it does not reproduce, compare the rest of the environment, especially browser version, OS, fonts, network, and container settings.
3. Does the failure disappear when you slow the test down?
Temporarily add explicit instrumentation, screenshots, and logs. Do not add blanket sleeps as a permanent fix.
- If slowing the test helps, the problem is usually a wait condition, animation, async rendering, or data readiness issue.
- If slowing does not help, focus on layout, auth, environment parity, or hidden browser differences.
4. Is the DOM actually ready, or only visually present?
Many modern apps render content in stages. A button may be present in the DOM but still disabled, offscreen, overlapped, or replaced by a skeleton loader.
5. Does the test depend on the exact viewport or pixel layout?
If an element moves, wraps, collapses, or becomes hidden in CI, the problem may not be timing at all. It may be the browser rendering at a different size than your local machine.
The highest-probability causes of local versus CI mismatch
Most failures fit into a small set of categories. You can usually debug them by asking which layer differs between local and CI.
Timing issues
Timing is the most common culprit. The test may assert too early, before the page has finished loading data, before a React effect has settled, or before the browser has completed a repaint.
Symptoms include:
element not foundeven though the element appears in screenshotsclick interceptedorelement is not clickable- assertions that pass only after retries
- race conditions around navigation, API responses, or websocket-driven UI updates
The fix is usually to wait for the right condition, not just a fixed delay. For example, wait for a network response, a visible text change, or a specific state in the DOM.
Playwright example
typescript
await page.goto('https://app.example.test/dashboard');
await page.getByRole('button', { name: 'Refresh' }).click();
await page.waitForResponse(response => response.url().includes('/api/summary') && response.status() === 200);
await expect(page.getByText('Summary ready')).toBeVisible();
The important part is that the test waits for the application outcome, not just the click action.
Viewport differences
Local browsers often run at a larger window size than CI, while headless browsers may default to a smaller viewport. Responsive layouts can change meaningfully across breakpoints.
Symptoms include:
- menu items collapse into a hamburger menu
- buttons move below the fold
- text wraps and shifts neighboring controls
- sticky headers cover targets after scrolling
- locators based on position break when the layout changes
Always make the viewport explicit in the test configuration, and make it match what the test expects.
Playwright example
import { defineConfig } from '@playwright/test';
export default defineConfig({ use: { viewport: { width: 1440, height: 900 }, headless: true } });
If your application is responsive, do not treat one viewport as universal truth. Instead, write tests for each meaningful breakpoint.
Environment parity issues
Environment parity means the local and CI environments behave closely enough that test results are comparable. When parity breaks, tests start failing for reasons unrelated to product behavior.
Common parity gaps include:
- different browser versions
- different operating systems or window managers
- missing fonts or locale packages
- different time zones and system clocks
- container memory limits
- GPU or sandbox restrictions
- proxy, DNS, or certificate differences
A locally installed browser on macOS is not the same execution environment as Chromium inside a Linux container. When tests depend on rendering precision or browser internals, those differences matter.
State leakage
A test may pass alone and fail in a suite because state leaks from a previous test, browser context, or shared backend fixture.
Examples include:
- reused cookies or localStorage
- stale database rows
- feature flags changed by a previous test
- shared test accounts with conflicting sessions
- backend data seeded inconsistently across runs
If a test only fails in the full suite, isolate it and run it with the same neighboring tests disabled. If the behavior changes, suspect state leakage or ordering dependencies.
Selector fragility
Selectors that depend on layout, exact text, or CSS structure are more likely to fail when the UI is rendered under different conditions.
Weak patterns include:
- deep CSS selectors tied to DOM structure
- XPath that matches the third list item or nth button
- text selectors that change with localization or feature flags
- locating by visual position instead of semantic role
Prefer stable locators based on roles, labels, test IDs, or accessible names. Tools like test automation become much more maintainable when locators follow product semantics instead of page structure.
A practical triage workflow
Use this sequence to avoid guessing.
Step 1, capture evidence in CI
When a test fails in CI, collect as much artifact data as possible:
- screenshot at failure time
- DOM snapshot or HTML dump
- browser console logs
- network failures
- video, if your runner supports it
- trace or HAR files when available
A screenshot often answers the first question, which is whether the failure is a missing element, a layout shift, or a completely different page state.
Playwright example
import { test } from '@playwright/test';
test('checkout flow', async ({ page }) => {
await page.goto('/checkout');
await page.screenshot({ path: 'artifacts/checkout.png', fullPage: true });
});
Step 2, reproduce locally in CI-like mode
Match the CI environment as closely as practical.
- run headless
- use the same browser version
- use the same viewport
- use a Linux container if CI is Linux
- use the same test data and config
A local GUI browser is useful for interactive debugging, but it can hide issues like missing fonts, disabled GPU paths, or timing differences in paint and layout.
Step 3, compare the browser session, not just the code
Ask what changed between local and CI:
- browser flags
- environment variables
- secrets or tokens
- network latency
- backend base URL
- locale and timezone
- browser storage state
When needed, print the runtime context early in the test.
console.log({
viewport: page.viewportSize(),
userAgent: await page.evaluate(() => navigator.userAgent),
timezone: Intl.DateTimeFormat().resolvedOptions().timeZone
});
Step 4, isolate the failure surface
Disable unrelated steps until the failure becomes small and obvious.
For example:
- remove test dependencies on login by using a pre-authenticated state
- replace live API calls with a controlled test fixture
- run one spec file instead of the full suite
- run one browser instead of a matrix
- bypass animations or nonessential visual transitions temporarily
Step 5, decide whether the bug is in the test or the app
Not every CI-only failure is a bad test. Sometimes the application really does break in headless conditions because of timing or layout assumptions.
Ask these questions:
- Would a real user on a small screen experience this issue?
- Is the app relying on a fixed screen size to function?
- Is the test waiting on a signal the app never guarantees?
- Is the failure caused by a missing accessibility state, such as a disabled button that is still clickable in the DOM?
If the issue reflects a real user path, fix the product behavior. If the issue reflects an assumption the test made without modeling it explicitly, fix the test.
Common failure patterns and what they usually mean
1. The element is present, but the click fails
This often means the element is covered, disabled, offscreen, or still animating into place. Headless runs can be faster than visible runs, which means a click may happen during a transition window.
What to check:
- is there a loading overlay?
- is the target obscured by a sticky header?
- does the element move after the page scrolls?
- is the click happening before the button becomes enabled?
Use a wait for visibility and enabled state, not just presence in the DOM.
2. Assertions pass locally, fail in CI because text wraps differently
This is usually a viewport or font issue. Different available fonts can change line breaks, which changes element height and positions.
What to check:
- default browser fonts in the container
- font fallback behavior
- device scale factor
- line-height and width constraints
Avoid tests that depend on exact text placement unless that is the thing you are actually validating.
3. Login works locally, but CI gets redirected or logged out
This often points to cookie, session, or certificate differences.
What to check:
- secure cookie flags and HTTPS configuration
- domain and path mismatch for cookies
- cross-site auth behavior in headless mode
- token expiry due to slower CI startup
- third-party cookie restrictions in the browser version used by CI
4. The page loads, but data is missing
This usually means the app is not waiting for the backend the same way in CI. It can also indicate network access problems, incorrect test data, or failed requests hidden by a retry mechanism.
What to check:
- API calls in the network log
- response codes and payloads
- CORS and proxy behavior
- test environment seeding
- cached responses or stale service worker data
5. The suite passes alone, fails in parallel
This is usually shared state or resource contention.
What to check:
- shared user accounts
- fixed filenames in upload tests
- database rows reused by multiple workers
- test data collisions
- backend rate limits
If you parallelize browser tests, make the data strategy parallel-safe before optimizing the runtime.
Make the headless environment less mysterious
The more observable your CI browser session is, the faster you can debug it.
Turn on tracing and screenshots
For modern browser runners, traces often outperform raw logs because they show DOM snapshots, actions, timing, and network activity together.
Capture browser console errors
Console errors can reveal failed script loads, missing assets, or runtime exceptions that do not fail the test directly.
Selenium Python example
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
options = Options() options.add_argument(‘–headless=new’) options.add_argument(‘–window-size=1440,900’)
driver = webdriver.Chrome(options=options) for entry in driver.get_log(‘browser’): print(entry)
Freeze time when the app depends on dates
Tests can fail when CI runs in a different timezone or at a different date boundary.
Common pain points include:
- date pickers that depend on local timezone
- invoice or billing cutoffs
- “today” labels that change at midnight
- relative time strings, such as “in 2 days”
Set timezone explicitly in CI and test data, or mock time when the behavior is not the feature under test.
Make network behavior explicit
If the page depends on live APIs, intercept or stub them where appropriate, or at least assert on the responses.
typescript
await page.route('**/api/cart', async route => {
await route.fulfill({
status: 200,
contentType: 'application/json',
body: JSON.stringify({ items: [] })
});
});
This helps distinguish application failures from external service instability.
Fix the root causes, not just the symptoms
Replace sleeps with condition-based waits
A fixed sleep can make a test look stable while increasing overall runtime and masking real races.
Use waits for specific state transitions:
- element is visible
- button is enabled
- request completes
- spinner disappears
- text changes to the expected value
This is especially important in continuous integration, where machine performance varies and timing is less predictable than on a developer machine.
Standardize the browser matrix
Do not let local and CI drift across browser versions without noticing.
Good practices include:
- pin browser and driver versions where practical
- use the same container image for local debugging and CI runs
- record the exact runner image in test logs
- verify the browser major version in a startup check
Use stable, semantic locators
Prefer locators that reflect user-facing intent.
- role plus accessible name
- label text
data-testidfor non-user-visible controls- form control associations
This makes your tests less sensitive to layout and CSS changes. It also encourages better accessibility, which usually improves automation stability too.
Control test data carefully
A CI-only failure often comes from data assumptions rather than browser behavior.
Questions to answer:
- Is the account already used by another test?
- Are there enough fixtures for all parallel workers?
- Does the setup create a unique identifier per run?
- Are cleanup steps guaranteed to execute after failure?
If the test creates records, make the created data unique and easy to trace back to the run ID.
Audit animations and transitions
Animations are a frequent source of timing drift. A button that appears immediately in a local visible browser may be behind a transition in headless mode when the test clicks it.
If animations are not part of what you are testing, reduce or disable them in the test environment.
* {
transition-duration: 0ms !important;
animation-duration: 0ms !important;
}
Use this carefully. Do not mask a true product issue that users will feel. For example, if a transition causes an actual interaction bug, you should fix the UI behavior, not just suppress the animation in tests.
A debugging checklist you can reuse
When a browser test fails only in headless CI, check these in order:
- Reproduce in local headless mode.
- Confirm the browser version matches CI.
- Set the viewport explicitly.
- Capture screenshots, console logs, and traces.
- Check whether the test waits for the right application state.
- Inspect for overlays, animations, and responsive layout changes.
- Compare auth, cookies, and storage state.
- Verify the backend data seed and test user isolation.
- Look for hidden network failures or API retries.
- Run the test in isolation and then in the full suite.
The fastest path to a fix is usually to identify which layer changed, browser, app, data, or runner, before you change the test code.
When to change the test, and when to change the app
This is the decision that saves the most time in the long run.
Change the test when:
- the locator is brittle or tied to DOM structure
- the test assumes a visible state without waiting for it
- the test depends on arbitrary timing instead of a real event
- the test is using the wrong viewport for the scenario
- the test shares mutable state with other tests
Change the app when:
- the UI is not accessible or semantically testable
- the product depends on a browser-specific quirk
- the page breaks at a realistic viewport size
- a loading state allows interaction before readiness
- the app does not expose stable cues for automation, such as disabled states or meaningful labels
Good browser automation usually requires cooperation from the application. The app should provide stable hooks for user intent, and the test should observe those hooks instead of guessing at visual timing.
A minimal CI configuration pattern
A predictable CI setup is part of the solution. Keep the test runtime close to what the test expects.
GitHub Actions example
name: browser-tests
on: [push, pull_request]
jobs: run: runs-on: ubuntu-latest steps: - uses: actions/checkout@v4 - uses: actions/setup-node@v4 with: node-version: 20 - run: npm ci - run: npx playwright test –headless env: CI: true TZ: UTC
This does not solve every mismatch, but it makes the test environment easier to reason about. Explicit timezone settings and a deterministic install step remove two common sources of surprise.
A final mental model for headless CI debugging
Think of the problem as a comparison between two executions of the same test under different constraints. Local and CI are not identical, even if the code is. The browser may render differently, the network may be slower, the viewport may be smaller, and the suite may run in a much more constrained environment.
The goal is not to make CI look exactly like a developer laptop. The goal is to reduce the number of uncontrolled differences so that a failure means something real.
If you can answer these four questions, you are usually close to the root cause:
- What is different between local and CI?
- Which difference matters to this test?
- Is the test waiting on the wrong signal?
- Is the application exposing a stable, user-visible state that the test can depend on?
Once you start debugging browser tests with that model, the failures become less mysterious. The browser did not become random, it just revealed assumptions your local setup had been hiding.
Related concepts worth keeping in mind
If you want to go deeper into why these problems happen, it helps to understand the basics of software testing and how browsers behave inside automated workflows. Headless browser execution is only one part of a larger automation stack, and the more layers you can make deterministic, the less time you spend chasing flaky results.
The recurring theme is simple, even if the symptoms are not: when browser tests fail in headless CI, the mismatch is usually about timing, environment parity, or assumptions about how the page becomes ready. Start by making those assumptions visible, then fix the narrowest layer that owns the problem.