July 1, 2026
Why Browser Tests Pass in Preview but Fail After Deployment: A Practical Debugging Guide
A practical guide to browser tests that pass in preview but fail after deployment, with debugging steps for environment drift, config mismatch, caching, feature flags, injected scripts, and production parity.
Browser tests that pass in preview and fail after deployment are usually not random. They are a symptom that preview and production are not actually equivalent, even if they look close enough from a product demo perspective. The gap can come from config mismatch, CDN behavior, feature flags, injected scripts, authentication differences, or build-time assumptions that only show up once the app is served in its real environment.
If you have ever seen a test suite green in a preview URL and then watched the same flow fail minutes later in production, you have already met environment drift. This guide focuses on how to debug that gap systematically, so you can stop guessing and start isolating the exact layer that changed.
The key question is not “why did the test fail?” It is “what changed between preview and deployed reality that made the same browser path behave differently?”
What usually changes between preview and production
Preview environments are often built for speed and convenience. Production is built for scale, reliability, and real users. Those goals are related, but they are not identical. That difference creates drift across several layers.
1. Configuration values
A browser test may pass because the app reads a dev-friendly configuration in preview, then fails in production when it reads a different API base URL, payment provider key, analytics script ID, or auth callback URL.
Common examples include:
- API endpoints that point to mock or staging services in preview
- environment variables that differ across deployment targets
- runtime config loaded from a JSON file or
<script>tag - feature toggles that are defaulted differently in production
2. Build-time versus runtime behavior
Some frameworks bake values into the bundle during build, while others read them at runtime. If the preview build and production deployment do not use the same pipeline, you can end up with subtly different client behavior.
For example, a route may work in preview because the app is built with a fallback API path, then break after deployment because the production build inlines a different path or strips a debug-only branch.
3. Injected scripts and third-party tags
Production often carries additional JavaScript from analytics, A/B testing, tag managers, consent tools, chat widgets, error monitoring, or session replay. Any of these can change load order, alter the DOM, intercept events, or create timing differences.
Preview environments often omit these tags entirely, or load them with test-safe settings. That means a selector, timeout, or navigation pattern that works in preview can fail once extra scripts appear in production.
4. Caching and CDN behavior
Caching is one of the biggest sources of “works in preview, fails after release” bugs. Production may sit behind a CDN, reverse proxy, service worker, or browser cache configuration that preview does not reproduce.
That can affect:
- HTML served from cache versus freshly rendered HTML
- stale JavaScript bundles after a release
- asset hash mismatches
- API responses cached at an unexpected layer
- service worker registration and offline behavior
5. Feature flags and rollout targeting
Feature flags are useful, but they can make testing deceptive. Your preview environment may have every feature enabled by default, while production uses targeted rollout rules, user segmentation, or percentage-based exposure. The test passes in preview because the target path exists there, but fails after deployment because the user is routed to a different experience.
6. Auth, cookies, and domain boundaries
Preview environments often use relaxed cookies, local test accounts, or simplified identity flows. Production may add SameSite restrictions, stricter cookie domains, SSO redirects, CSRF checks, or MFA steps. Browser tests can pass when the auth state is local and predictable, then fail after deployment when the browser’s cookie or redirect behavior changes.
Start debugging with a versioned comparison
When a browser test fails after deployment, the first move is not to rerun it ten times. The first move is to compare the environments with precision.
Create a short checklist for the exact release in question:
- commit SHA or build number
- preview environment URL
- production URL
- deployment timestamp
- browser version used by the test runner
- test data or account used
- feature flag state
- config values injected at build time and runtime
If you can, attach all of these values to the test run as metadata. The more you can tie a failure to a specific release artifact, the easier it becomes to reason about drift.
A practical rule is simple:
If you cannot answer “what exact build did this browser test run against?”, you are debugging with incomplete evidence.
Reproduce the failure in production, not just in preview
Preview passing is useful, but it is not proof. You need to reproduce the failure in the deployed environment if the bug only appears there.
That means running the same browser flow against production with the same:
- browser family and version
- viewport size
- locale and time zone
- user permissions
- feature flags
- network conditions, if relevant
For Playwright, this kind of reproduction usually starts with trace capture and explicit base URL selection.
import { test, expect } from '@playwright/test';
test.use({ baseURL: ‘https://app.example.com’ });
test('checkout flow', async ({ page }) => {
await page.goto('/checkout');
await expect(page.getByRole('heading', { name: 'Checkout' })).toBeVisible();
});
If this fails in production but passes in preview, you are no longer chasing a flaky test. You are chasing an environmental mismatch.
Check the DOM, not just the UI screenshot
Visual similarity does not guarantee DOM similarity. A production-only script may inject banners, consent overlays, or wrappers that shift the layout or capture clicks. A button may still look visible while being obscured or disabled by a layer above it.
When a click fails, inspect the DOM at failure time and answer these questions:
- Is the element present?
- Is it visible according to the browser, or just present in HTML?
- Is it covered by another element?
- Has its text changed due to localization or experiment targeting?
- Is the element inside an iframe or shadow root in production only?
Useful debugging steps include taking a DOM snapshot, recording the accessibility tree, and checking whether your locator depends on fragile text or positional assumptions.
Common selector drift patterns
text=Savebecomestext=Save changesin production- a button gets wrapped by a consent modal overlay
- a list item order changes after real data is loaded
- a CSS class is hashed differently because the build pipeline differs
- a shadow DOM component is lazy-loaded after a production-only bundle split
A more resilient locator often helps, but resilient locators do not fix environmental drift by themselves. They only make the failure easier to interpret.
Investigate config mismatch first
Config mismatch is one of the fastest ways to get browser tests that pass in preview but fail after deployment. It is also one of the easiest issues to miss because the app still “works” in a narrow sense.
Look at these config layers:
Build-time config
Examples:
- Vite, Webpack, Next.js, or Nuxt environment variables
- compile-time feature gates
- tree-shaken branches only present in one build mode
Runtime config
Examples:
/config.json- a server-rendered config object
- environment-specific bootstrap scripts
- flags fetched from a remote config service
Backend service config
Examples:
- CORS allowlists
- API gateway routes
- database connection strings
- OAuth redirect URIs
- webhook secrets
If production uses a different API host, an auth callback path, or stricter CORS rules, the browser flow may fail after a redirect or AJAX request even though the preview test passes against a mock.
A useful debugging habit is to render config values to the browser console in non-production-safe form, or log them in a secure internal trace. The goal is not to expose secrets, it is to confirm that the app is reading the expected values for the current release.
Compare network behavior, not just page rendering
A browser test can pass visually while network requests silently diverge.
Open the network panel or collect request logs and inspect:
- request URLs
- response status codes
- redirects
- payload shape differences
- cache headers
- CORS failures
- mixed content issues
- request timing differences
Some bugs only show up after deployment because production adds a CDN, WAF, rate limit, or auth gateway. A request that was instant in preview may now take longer or get transformed.
What to look for in the browser test runner
If you use Playwright, capture failed requests and console messages.
page.on('requestfailed', request => {
console.log('failed request', request.url(), request.failure()?.errorText);
});
page.on(‘console’, msg => { if (msg.type() === ‘error’) console.log(‘console error:’, msg.text()); });
This kind of logging is often enough to reveal that a production-only script is 404ing, an API call is being blocked, or a JSON response differs from what the test assumed.
Don’t ignore caching and release propagation
Caching problems are especially common when the preview environment serves fresh assets while production serves cached or partially cached assets.
Consider these failure modes:
- new HTML points to old JS chunk names
- a CDN edge node serves a stale page shell
- the browser keeps an old service worker and replays stale routes
- API responses are cached longer than intended
- release propagation is not finished when the test runs
If browser tests run immediately after deployment, add a release-readiness check for asset availability and cache invalidation. Many teams treat deployment success and release readiness as the same thing, but they are not.
A good debugging step is to bypass caches where possible, then compare behavior again. If the test passes with a hard refresh or cache disablement, your issue is likely in the caching layer, not the UI code itself.
Validate feature flags and experiment targeting
Feature flags can make a preview environment look stable while production routes users into multiple possible states.
Check whether the failing path depends on:
- user role
- region
- account age
- cookie presence
- percentage rollout
- experiment assignment
- remote config fetched after page load
A browser test that authenticates with a generic test account may not match the real production cohort. If the feature is behind an experiment, the same URL can lead to different DOM structures depending on assignment.
This is why release debugging should include explicit flag snapshots. At minimum, record the state of the flags that affect the tested path.
A green preview test is only useful if the same feature set exists in the deployed environment for the same user context.
Look for injected scripts and tag manager side effects
Production-only scripts are a classic source of environment drift. They can:
- delay load events
- move focus unexpectedly
- create cookie consent overlays
- alter scroll position
- modify forms before submission
- add event listeners that interfere with automation
The most useful comparison is often a simple one: inspect the loaded scripts in preview and in production. If production loads tag manager containers, third-party widgets, or consent tooling that preview does not, you have found an obvious source of behavioral drift.
A practical tactic is to create a minimal production-safe test account and run the browser flow with third-party scripts disabled by consent state or environment controls, if your app supports it. That can help determine whether the failure is in your application code or in the surrounding production script ecosystem.
Verify browser and environment parity in CI
Many teams call a deployment “tested” even though the browser suite ran in a different browser version, a different viewport, or a different container image than production-adjacent reality.
The closer your CI browser is to real deployment conditions, the fewer surprises you will get later. This does not mean duplicating all of production, but it does mean standardizing the variables that matter most.
Useful parity checks
- same browser major version as your supported users, or at least same family
- same base OS image for test containers
- same time zone and locale if formatting matters
- same cookie policy and secure context assumptions
- same build artifacts deployed to preview and production, if feasible
- same environment variable naming and injection path
For continuous integration, see the general concept of continuous integration. The point is not just automated execution, it is repeated validation against a controlled release pipeline.
Use a staged debugging workflow
When the failure is urgent, use a layered approach instead of trying everything at once.
Step 1: Confirm the failure is reproducible
Run the same test against production and preview with identical inputs.
Step 2: Freeze variables
Lock browser version, user account, locale, viewport, and test data.
Step 3: Compare logs
Check console output, failed requests, redirects, and DOM state.
Step 4: Diff config and flags
Compare build-time config, runtime config, and feature flag state.
Step 5: Inspect injection and caching
Look for third-party scripts, CDN behavior, and service worker differences.
Step 6: Narrow the surface area
Run the smallest version of the flow that still fails. If checkout fails, try product page, then cart, then payment step. If login fails, isolate the auth redirect chain.
This workflow sounds basic, but it prevents the most expensive mistake in release debugging, which is changing multiple variables before you know which one mattered.
Example: a login flow that passes in preview and fails after deployment
Imagine a login test that clicks “Sign in,” enters credentials, and expects the dashboard.
It passes in preview because:
- preview uses a mock auth endpoint
- no consent banner is shown
- cookies are not restricted by cross-site policies
- the preview app uses a debug route that bypasses MFA
After deployment, the same flow fails because:
- production redirects through the identity provider
- a consent overlay intercepts the first click
- the auth cookie is marked
SameSite=Laxand the redirect path does not preserve state - the dashboard route is gated behind a feature flag the test account does not have
The test is not flaky. It is doing its job by revealing that preview and production are not the same system.
The fix might be in your app, your auth configuration, your flag targeting, or your test strategy. The point is to localize the discrepancy instead of masking it with a longer timeout.
When a longer wait is the wrong fix
It is tempting to add explicit waits whenever a production test fails. Sometimes a wait is appropriate, especially if production scripts or network calls legitimately take longer. But a wait should be the last step, not the first.
Avoid using waits to paper over these issues:
- wrong selector or brittle locator
- stale cached bundle
- missing config value
- blocked network request
- environment-specific redirect
- element covered by a modal or injected banner
A timeout increase can make a bad test pass, but it does not restore production parity.
Build a release checklist for parity
If this problem has happened more than once, turn the debugging lessons into a release checklist.
A useful checklist might include:
- preview and production use the same build artifact
- config values are validated before deploy
- feature flags are documented for each release
- CDN invalidation is confirmed
- browser automation runs against a post-deploy smoke target
- third-party scripts are accounted for in production test runs
- auth flows are validated with production-like cookies and redirects
- console and network failures are captured in test artifacts
This is less about making every environment identical and more about making differences explicit, intentional, and testable.
Final takeaways
When browser tests pass in preview but fail after deployment, the root cause is usually environment drift, not a mysterious browser problem. Preview environments are often simplified, while production adds real config, real caching, real scripts, real flags, and real auth boundaries.
The fastest way to debug is to compare the two environments like a release engineer, not like a hopeful tester. Verify the artifact, config, flags, network, caching, and injected scripts. Then reproduce the failure with the same browser context and user state that production actually sees.
For broader background on the discipline behind this work, the general definition of software testing and test automation can help frame the problem, but the real value comes from making parity measurable in your own delivery pipeline.
If you treat preview as a hint rather than a guarantee, you will debug faster, avoid false confidence, and ship releases with fewer surprises.