Why Browser Tests Pass in Preview but Fail After Deployment: A Practical Debugging Guide

Browser tests that pass in preview and fail after deployment are usually not random. They are a symptom that preview and production are not actually equivalent, even if they look close enough from a product demo perspective. The gap can come from config mismatch, CDN behavior, feature flags, injected scripts, authentication differences, or build-time assumptions that only show up once the app is served in its real environment.

If you have ever seen a test suite green in a preview URL and then watched the same flow fail minutes later in production, you have already met environment drift. This guide focuses on how to debug that gap systematically, so you can stop guessing and start isolating the exact layer that changed.

The key question is not “why did the test fail?” It is “what changed between preview and deployed reality that made the same browser path behave differently?”

What usually changes between preview and production

Preview environments are often built for speed and convenience. Production is built for scale, reliability, and real users. Those goals are related, but they are not identical. That difference creates drift across several layers.

1. Configuration values

A browser test may pass because the app reads a dev-friendly configuration in preview, then fails in production when it reads a different API base URL, payment provider key, analytics script ID, or auth callback URL.

Common examples include:

API endpoints that point to mock or staging services in preview
environment variables that differ across deployment targets
runtime config loaded from a JSON file or <script> tag
feature toggles that are defaulted differently in production

2. Build-time versus runtime behavior

Some frameworks bake values into the bundle during build, while others read them at runtime. If the preview build and production deployment do not use the same pipeline, you can end up with subtly different client behavior.

For example, a route may work in preview because the app is built with a fallback API path, then break after deployment because the production build inlines a different path or strips a debug-only branch.

3. Injected scripts and third-party tags

Production often carries additional JavaScript from analytics, A/B testing, tag managers, consent tools, chat widgets, error monitoring, or session replay. Any of these can change load order, alter the DOM, intercept events, or create timing differences.

Preview environments often omit these tags entirely, or load them with test-safe settings. That means a selector, timeout, or navigation pattern that works in preview can fail once extra scripts appear in production.

4. Caching and CDN behavior

Caching is one of the biggest sources of “works in preview, fails after release” bugs. Production may sit behind a CDN, reverse proxy, service worker, or browser cache configuration that preview does not reproduce.

That can affect:

HTML served from cache versus freshly rendered HTML
stale JavaScript bundles after a release
asset hash mismatches
API responses cached at an unexpected layer
service worker registration and offline behavior

5. Feature flags and rollout targeting

Feature flags are useful, but they can make testing deceptive. Your preview environment may have every feature enabled by default, while production uses targeted rollout rules, user segmentation, or percentage-based exposure. The test passes in preview because the target path exists there, but fails after deployment because the user is routed to a different experience.

6. Auth, cookies, and domain boundaries

Preview environments often use relaxed cookies, local test accounts, or simplified identity flows. Production may add SameSite restrictions, stricter cookie domains, SSO redirects, CSRF checks, or MFA steps. Browser tests can pass when the auth state is local and predictable, then fail after deployment when the browser’s cookie or redirect behavior changes.

Start debugging with a versioned comparison

When a browser test fails after deployment, the first move is not to rerun it ten times. The first move is to compare the environments with precision.

Create a short checklist for the exact release in question:

commit SHA or build number
preview environment URL
production URL
deployment timestamp
browser version used by the test runner
test data or account used
feature flag state
config values injected at build time and runtime

If you can, attach all of these values to the test run as metadata. The more you can tie a failure to a specific release artifact, the easier it becomes to reason about drift.

A practical rule is simple:

If you cannot answer “what exact build did this browser test run against?”, you are debugging with incomplete evidence.

Reproduce the failure in production, not just in preview

Preview passing is useful, but it is not proof. You need to reproduce the failure in the deployed environment if the bug only appears there.

That means running the same browser flow against production with the same:

browser family and version
viewport size
locale and time zone
user permissions
feature flags
network conditions, if relevant

For Playwright, this kind of reproduction usually starts with trace capture and explicit base URL selection.

import { test, expect } from '@playwright/test';

test.use({ baseURL: ‘https://app.example.com’ });

test('checkout flow', async ({ page }) => {
  await page.goto('/checkout');
  await expect(page.getByRole('heading', { name: 'Checkout' })).toBeVisible();
});

If this fails in production but passes in preview, you are no longer chasing a flaky test. You are chasing an environmental mismatch.

Check the DOM, not just the UI screenshot

Visual similarity does not guarantee DOM similarity. A production-only script may inject banners, consent overlays, or wrappers that shift the layout or capture clicks. A button may still look visible while being obscured or disabled by a layer above it.

When a click fails, inspect the DOM at failure time and answer these questions:

Is the element present?
Is it visible according to the browser, or just present in HTML?
Is it covered by another element?
Has its text changed due to localization or experiment targeting?
Is the element inside an iframe or shadow root in production only?

Useful debugging steps include taking a DOM snapshot, recording the accessibility tree, and checking whether your locator depends on fragile text or positional assumptions.

Common selector drift patterns

text=Save becomes text=Save changes in production
a button gets wrapped by a consent modal overlay
a list item order changes after real data is loaded
a CSS class is hashed differently because the build pipeline differs
a shadow DOM component is lazy-loaded after a production-only bundle split

A more resilient locator often helps, but resilient locators do not fix environmental drift by themselves. They only make the failure easier to interpret.

Investigate config mismatch first

Config mismatch is one of the fastest ways to get browser tests that pass in preview but fail after deployment. It is also one of the easiest issues to miss because the app still “works” in a narrow sense.

Look at these config layers:

Build-time config

Examples:

Vite, Webpack, Next.js, or Nuxt environment variables
compile-time feature gates
tree-shaken branches only present in one build mode

Runtime config

Examples:

/config.json
a server-rendered config object
environment-specific bootstrap scripts
flags fetched from a remote config service

Backend service config

Examples:

CORS allowlists
API gateway routes
database connection strings
OAuth redirect URIs
webhook secrets

If production uses a different API host, an auth callback path, or stricter CORS rules, the browser flow may fail after a redirect or AJAX request even though the preview test passes against a mock.

A useful debugging habit is to render config values to the browser console in non-production-safe form, or log them in a secure internal trace. The goal is not to expose secrets, it is to confirm that the app is reading the expected values for the current release.

Compare network behavior, not just page rendering

A browser test can pass visually while network requests silently diverge.

Open the network panel or collect request logs and inspect:

request URLs
response status codes
redirects
payload shape differences
cache headers
CORS failures
mixed content issues
request timing differences

Some bugs only show up after deployment because production adds a CDN, WAF, rate limit, or auth gateway. A request that was instant in preview may now take longer or get transformed.

What to look for in the browser test runner

If you use Playwright, capture failed requests and console messages.

page.on('requestfailed', request => {
  console.log('failed request', request.url(), request.failure()?.errorText);
});

page.on(‘console’, msg => { if (msg.type() === ‘error’) console.log(‘console error:’, msg.text()); });

This kind of logging is often enough to reveal that a production-only script is 404ing, an API call is being blocked, or a JSON response differs from what the test assumed.

Don’t ignore caching and release propagation

Caching problems are especially common when the preview environment serves fresh assets while production serves cached or partially cached assets.

Consider these failure modes:

new HTML points to old JS chunk names
a CDN edge node serves a stale page shell
the browser keeps an old service worker and replays stale routes
API responses are cached longer than intended
release propagation is not finished when the test runs

If browser tests run immediately after deployment, add a release-readiness check for asset availability and cache invalidation. Many teams treat deployment success and release readiness as the same thing, but they are not.

A good debugging step is to bypass caches where possible, then compare behavior again. If the test passes with a hard refresh or cache disablement, your issue is likely in the caching layer, not the UI code itself.

Validate feature flags and experiment targeting

Feature flags can make a preview environment look stable while production routes users into multiple possible states.

Check whether the failing path depends on:

user role
region
account age
cookie presence
percentage rollout
experiment assignment
remote config fetched after page load

A browser test that authenticates with a generic test account may not match the real production cohort. If the feature is behind an experiment, the same URL can lead to different DOM structures depending on assignment.

This is why release debugging should include explicit flag snapshots. At minimum, record the state of the flags that affect the tested path.

A green preview test is only useful if the same feature set exists in the deployed environment for the same user context.

Look for injected scripts and tag manager side effects

Production-only scripts are a classic source of environment drift. They can:

delay load events
move focus unexpectedly
create cookie consent overlays
alter scroll position
modify forms before submission
add event listeners that interfere with automation

The most useful comparison is often a simple one: inspect the loaded scripts in preview and in production. If production loads tag manager containers, third-party widgets, or consent tooling that preview does not, you have found an obvious source of behavioral drift.

A practical tactic is to create a minimal production-safe test account and run the browser flow with third-party scripts disabled by consent state or environment controls, if your app supports it. That can help determine whether the failure is in your application code or in the surrounding production script ecosystem.

Verify browser and environment parity in CI

Many teams call a deployment “tested” even though the browser suite ran in a different browser version, a different viewport, or a different container image than production-adjacent reality.

The closer your CI browser is to real deployment conditions, the fewer surprises you will get later. This does not mean duplicating all of production, but it does mean standardizing the variables that matter most.

Useful parity checks

same browser major version as your supported users, or at least same family
same base OS image for test containers
same time zone and locale if formatting matters
same cookie policy and secure context assumptions
same build artifacts deployed to preview and production, if feasible
same environment variable naming and injection path

For continuous integration, see the general concept of continuous integration. The point is not just automated execution, it is repeated validation against a controlled release pipeline.

Use a staged debugging workflow

When the failure is urgent, use a layered approach instead of trying everything at once.

Step 1: Confirm the failure is reproducible

Run the same test against production and preview with identical inputs.

Step 2: Freeze variables

Lock browser version, user account, locale, viewport, and test data.

Step 3: Compare logs

Check console output, failed requests, redirects, and DOM state.

Step 4: Diff config and flags

Compare build-time config, runtime config, and feature flag state.

Step 5: Inspect injection and caching

Look for third-party scripts, CDN behavior, and service worker differences.

Step 6: Narrow the surface area

Run the smallest version of the flow that still fails. If checkout fails, try product page, then cart, then payment step. If login fails, isolate the auth redirect chain.

This workflow sounds basic, but it prevents the most expensive mistake in release debugging, which is changing multiple variables before you know which one mattered.

Imagine a login test that clicks “Sign in,” enters credentials, and expects the dashboard.

It passes in preview because:

preview uses a mock auth endpoint
no consent banner is shown
cookies are not restricted by cross-site policies
the preview app uses a debug route that bypasses MFA

After deployment, the same flow fails because:

production redirects through the identity provider
a consent overlay intercepts the first click
the auth cookie is marked SameSite=Lax and the redirect path does not preserve state
the dashboard route is gated behind a feature flag the test account does not have

The test is not flaky. It is doing its job by revealing that preview and production are not the same system.

The fix might be in your app, your auth configuration, your flag targeting, or your test strategy. The point is to localize the discrepancy instead of masking it with a longer timeout.

When a longer wait is the wrong fix

It is tempting to add explicit waits whenever a production test fails. Sometimes a wait is appropriate, especially if production scripts or network calls legitimately take longer. But a wait should be the last step, not the first.

Avoid using waits to paper over these issues:

wrong selector or brittle locator
stale cached bundle
missing config value
blocked network request
environment-specific redirect
element covered by a modal or injected banner

A timeout increase can make a bad test pass, but it does not restore production parity.

Build a release checklist for parity

If this problem has happened more than once, turn the debugging lessons into a release checklist.

A useful checklist might include:

preview and production use the same build artifact
config values are validated before deploy
feature flags are documented for each release
CDN invalidation is confirmed
browser automation runs against a post-deploy smoke target
third-party scripts are accounted for in production test runs
auth flows are validated with production-like cookies and redirects
console and network failures are captured in test artifacts

This is less about making every environment identical and more about making differences explicit, intentional, and testable.

Final takeaways

When browser tests pass in preview but fail after deployment, the root cause is usually environment drift, not a mysterious browser problem. Preview environments are often simplified, while production adds real config, real caching, real scripts, real flags, and real auth boundaries.

The fastest way to debug is to compare the two environments like a release engineer, not like a hopeful tester. Verify the artifact, config, flags, network, caching, and injected scripts. Then reproduce the failure with the same browser context and user state that production actually sees.

For broader background on the discipline behind this work, the general definition of software testing and test automation can help frame the problem, but the real value comes from making parity measurable in your own delivery pipeline.

If you treat preview as a hint rather than a guarantee, you will debug faster, avoid false confidence, and ship releases with fewer surprises.

What usually changes between preview and production

1. Configuration values

2. Build-time versus runtime behavior

3. Injected scripts and third-party tags

4. Caching and CDN behavior

5. Feature flags and rollout targeting

6. Auth, cookies, and domain boundaries

Start debugging with a versioned comparison

Reproduce the failure in production, not just in preview

Check the DOM, not just the UI screenshot

Common selector drift patterns

Investigate config mismatch first

Build-time config

Runtime config

Backend service config

Compare network behavior, not just page rendering

What to look for in the browser test runner

Don’t ignore caching and release propagation

Validate feature flags and experiment targeting

Look for injected scripts and tag manager side effects

Verify browser and environment parity in CI

Useful parity checks

Use a staged debugging workflow

Step 1: Confirm the failure is reproducible

Step 2: Freeze variables

Step 3: Compare logs

Step 4: Diff config and flags

Step 5: Inspect injection and caching

Step 6: Narrow the surface area

Example: a login flow that passes in preview and fails after deployment

When a longer wait is the wrong fix

Build a release checklist for parity

Final takeaways