Modern frontends spend a lot of time in motion. Buttons pulse, cards skeleton-load, lists fade in, menus animate, and route transitions overlap with network activity. That is good product design, but it creates a nasty class of test failures: the app is functionally correct, yet the test clicks too early, reads too soon, or locates the wrong element while the UI is still changing.

If you are trying to stabilize flaky e2e tests, the problem is often not the test framework itself. The problem is the mismatch between what the user sees and what the automation layer thinks is ready. A human naturally waits for the spinner to disappear or the card to stop shimmering. A script, unless you teach it otherwise, may act on the DOM at the worst possible moment.

This article focuses on the UI states that most commonly create false failures in end-to-end tests, especially skeleton screens, animated loading states, and micro-interactions. It covers the failure modes, how to diagnose them, and how to build tests that are resilient without becoming slow or overly abstract.

The goal is not to make every test wait longer. The goal is to make each test wait for the right signal.

Why these UI states create flaky E2E tests

A flaky test is one that passes and fails intermittently without a corresponding product change. In browser testing, flakiness often comes from timing, animation, re-rendering, or unstable selectors. The Wikipedia entry on test automation gives the broad definition, but the practical version is simpler, a test is flaky when the automation races the UI.

Animated loading states and micro-interactions make that race more likely because they introduce several temporary states between “page loaded” and “page usable”:

  • the DOM is present, but data is not
  • text exists, but is hidden behind placeholders
  • elements are clickable visually, but blocked by overlay layers
  • the layout is stable enough for screenshots, but not yet stable enough for hit testing
  • the control exists, but its label or position changes during animation

These states are especially common in apps built with React, Vue, Angular, Next.js, Remix, or any SPA that hydrates on the client after the initial HTML is rendered.

The main sources of false failures

Skeleton screens

Skeleton screens are placeholder shapes that mimic the final layout while data loads. They are often styled with animated shimmer effects. Testing skeleton screens is tricky because the placeholder can remain in the DOM after the real content is inserted, or both can coexist for a short period during a transition.

Common failure patterns:

  • the test clicks the placeholder, not the real card
  • the selector matches both skeleton and real item text
  • an assertion reads the placeholder count before the data arrives
  • a visual snapshot captures the shimmer animation, creating noisy diffs

Animated loading states

Spinners, progress bars, shimmer gradients, and fade transitions often mean the page is technically interactive before it is logically ready. A button may be enabled, but the data underneath it is still loading.

Common failure patterns:

  • clicking a button before async state resolves
  • asserting against text that appears only after an animation completes
  • using fixed sleeps that are either too short or too long
  • racing request completion versus DOM update

Micro-interactions

Micro-interactions are small UI motions, hover reveals, accordions, collapsing navs, toast messages, dropdown transitions, and expanding panels. They are easy for users, but dangerous for automation because the target element can change state between pointer move, visibility, and click.

Common failure patterns:

  • a submenu disappears between hover and click
  • a tooltip covers the target
  • a button moves under the cursor because layout shifts
  • a click lands on the wrong position due to animation

First principle, test readiness, not just visibility

Many test suites still rely on “element exists” or “element is visible” as the only readiness check. That is rarely enough for modern UI.

Instead, think in layers:

  1. Network readiness, the data request is complete or a known server response has arrived
  2. DOM readiness, the final element is present and the skeleton is gone
  3. Interaction readiness, the element is stable, visible, enabled, and not covered
  4. Semantic readiness, the right content is present, not just a placeholder

A test should move forward only when the layer it depends on is true.

Visibility is necessary, not sufficient.

Stabilization tactics that actually work

1) Wait on a product signal, not a time delay

The worst fix for animation flakiness is sleep(3000) or wait(3000). Fixed delays hide the problem instead of solving it, and they slow down every passing run.

Prefer an explicit signal the application already knows, such as:

  • loading spinner removed
  • skeleton container detached
  • API response finished
  • route state settled
  • content root has the final data attribute

Example in Playwright:

import { test, expect } from '@playwright/test';
test('waits for data to replace skeleton', async ({ page }) => {
  await page.goto('/dashboard');

await expect(page.locator(‘[data-testid=”dashboard-skeleton”]’)).toBeHidden(); await expect(page.locator(‘[data-testid=”dashboard-content”]’)).toBeVisible(); await expect(page.getByText(‘Revenue’)).toBeVisible(); });

This is better than waiting for the page itself, because it waits for the application state that matters.

2) Use stable locators, not animation-prone selectors

Skeleton screens and animated containers often reuse the same CSS classes as the final element tree. If your locator targets a class that is attached to both the placeholder and the real content, it can hit the wrong node.

Prefer selectors that are stable across UI states:

  • data-testid
  • role-based locators with accessible names
  • text only when the text is unique and final
  • structure combined with a semantic attribute

Examples:

typescript

await page.getByRole('button', { name: 'Save changes' }).click();
await page.locator('[data-testid="user-card"]').click();

Avoid selectors like:

typescript

await page.locator('.card .button.primary').click();

Class chains are often the first thing to change when a frontend framework re-renders during animation.

3) Assert that the skeleton is gone before the real assertion

If a test verifies list content, first verify that the placeholder state has ended.

typescript

await expect(page.locator('[data-testid="results-skeleton"]')).toHaveCount(0);
await expect(page.locator('[data-testid="results-list"]')).toBeVisible();
await expect(page.getByText('Acme Analytics')).toBeVisible();

This pattern is especially useful in skeleton screens testing scenarios where a placeholder and final content may overlap for a few hundred milliseconds.

4) Prefer assertions that poll over assertions that inspect once

Modern test runners usually retry assertions for a short period. Use that behavior.

For example, in Playwright, expect(locator).toBeVisible() retries until the timeout. That makes it more reliable than one-off DOM inspection.

Bad pattern:

typescript

const visible = await page.locator('[data-testid="checkout-total"]').isVisible();
expect(visible).toBe(true);

Better pattern:

typescript

await expect(page.locator('[data-testid="checkout-total"]').getByText('$42.00')).toBeVisible();

The second version gives the framework room to absorb transient states.

5) Disable or reduce animation in test environments

If the animation itself is not part of what you are validating, reduce it in test runs.

This can be done with CSS overrides, test-only flags, or browser preferences. A simple pattern is to expose a data-test-mode attribute at the root and remove non-essential transitions.

<html data-test-mode="true">
html[data-test-mode="true"] *,
html[data-test-mode="true"] *::before,
html[data-test-mode="true"] *::after {
  animation-duration: 0ms !important;
  animation-delay: 0ms !important;
  transition-duration: 0ms !important;
  scroll-behavior: auto !important;
}

This is not always appropriate, especially if you are explicitly testing motion behavior or visual regressions. But for functional e2e coverage, it can remove a large source of noise.

6) Block on the network only when it maps to user readiness

Network interception can be very useful, but it is easy to overuse. Waiting for every request to finish can make tests brittle if the app preloads analytics, avatars, or feature flags after the page is usable.

Good use cases for request waits:

  • a specific API response is required before the UI can render
  • a mutation must succeed before a confirmation appears
  • a route transition is driven by a known request

Example:

typescript

await Promise.all([
  page.waitForResponse(resp => resp.url().includes('/api/orders') && resp.status() === 200),
  page.getByRole('button', { name: 'Load orders' }).click(),
]);

Avoid making your entire suite depend on full network idle, because many modern apps never truly go idle.

7) Guard against layout shifts before clicking

A button that moves after the pointer enters it can cause missed clicks or accidental clicks on nearby elements. This often happens when hover states expand menus or when content pushes the target down during lazy loading.

Use test runner actionability checks and explicit stability conditions. Playwright’s actionability model is documented in its locator and auto-waiting docs.

If your runner does not protect you well enough, assert the location is stable before action, or wait for the animation class to disappear.

typescript

const saveButton = page.getByRole('button', { name: 'Save' });
await expect(saveButton).toBeEnabled();
await expect(saveButton).toBeVisible();
await saveButton.click();

8) Separate functional tests from animation verification

Do not ask one test to verify everything. A test that is supposed to check saving a form should not also be responsible for proving that a fade transition lasted 180 ms.

Split concerns like this:

  • functional e2e test, verifies the user can complete the workflow
  • visual regression test, verifies the final rendered state looks correct
  • motion-specific test, verifies animation timing or transitions if the motion itself matters

This separation reduces false failures and makes root cause analysis easier.

A practical workflow for debugging flaky UI-state tests

When a test fails around an animated state, inspect the sequence of events instead of re-running immediately.

Step 1, identify whether the failure is a locator problem or a timing problem

Ask these questions:

  • Did the selector match the wrong element?
  • Did the element exist but not yet become visible?
  • Was the element visible but blocked by an overlay?
  • Did the DOM re-render and detach the node after it was found?

Step 2, inspect the DOM at the moment of failure

Use trace tools, screenshots, or debug logs. If you are using Playwright, trace viewer is often the fastest way to see whether you clicked a skeleton, an overlay, or the final control.

Step 3, annotate the app with test-friendly state hooks

A simple convention like data-testid="products-loaded" or data-state="ready" can be worth more than a large amount of test-side guessing.

Step 4, observe whether the test is waiting for the right thing

A test can be technically correct and still wrong. For example, waiting for networkidle before clicking a submit button may be unnecessary if the button is already enabled and the form is ready. On the other hand, waiting only for button visibility may be insufficient if the button is visually present during a loading skeleton.

Concrete patterns by UI state

Skeleton screens

Do this:

  • wait for the skeleton container to disappear
  • use a ready-state marker on the content wrapper
  • assert that the final text is present and the placeholder is gone

Avoid this:

  • clicking the first matching card in a list that contains both skeleton and real card markup
  • relying on a fixed delay after page load
  • using text selectors that also match placeholder labels

Example:

typescript

await expect(page.locator('[data-testid="product-skeleton"]')).toHaveCount(0);
await expect(page.getByRole('heading', { name: 'Products' })).toBeVisible();

Loading spinners

Do this:

  • wait for the spinner to be hidden or removed
  • bind the spinner to a known request or state transition
  • ensure the tested control is enabled only when the operation is ready

Avoid this:

  • assuming the disappearance of the spinner is enough when other async work still continues
  • clicking through overlays that remain in the DOM but are transparent

Hover menus and dropdown micro-interactions

Do this:

  • hover, then wait for menu visibility
  • keep pointer interactions close to the actual user path
  • verify the menu is not closing due to a layout shift

Avoid this:

  • chaining hover and click without an intermediate assertion
  • relying on coordinates instead of locators

Example:

typescript

const menu = page.getByRole('menu', { name: 'Account' });
await page.getByRole('button', { name: 'Account' }).hover();
await expect(menu).toBeVisible();
await menu.getByRole('menuitem', { name: 'Settings' }).click();

Toasts and transient banners

Do this:

  • assert on the message text if the toast is part of the behavior
  • wait for the toast to appear before interacting with something underneath it
  • give the test a deterministic exit condition, such as toBeHidden()

Avoid this:

  • assuming a toast will remain visible long enough for a screenshot
  • testing both the message and the animation speed in the same flow

What to change in the app code, not just the test

Test stability is a shared responsibility. A brittle test often points to an app that does not expose its state clearly enough for automation.

Useful application changes include:

  • adding data-testid attributes to stable interactive nodes
  • using aria-busy="true" while content is loading
  • making skeletons and content mutually exclusive in the DOM when possible
  • disabling pointer events on overlays that should not intercept clicks
  • keeping button labels stable across loading and loaded states
  • exposing explicit ready flags in the frontend state tree

If you can make the UI easier to understand for a user, you often make it easier for the test as well.

A simple readiness contract for frontend teams

A useful team practice is to define readiness contracts for important pages. For example:

  • the dashboard is ready when the main content root has loaded and the skeleton count is zero
  • checkout is ready when the payment summary is visible and the submit button is enabled
  • search is ready when the result count is rendered and the loading overlay is absent

That gives SDETs a concrete target and gives frontend engineers a shared language for state transitions.

When to use visual checks instead of interaction checks

If the failure is about animation timing or layout polish, a functional E2E test is often the wrong tool. Use visual testing when the thing that matters is the final appearance, spacing, or transient motion state.

Use E2E interaction tests when you need to validate that the user can complete a path. Use screenshot or DOM snapshot tools when you need to validate structure, geometry, or final rendering. Mixing them can create tests that are too broad to be stable.

This distinction is one reason software testing teams often separate functional automation from visual regression in their test strategy.

If your failures mostly happen in CI, look at the environment as well as the code:

  • run browsers with consistent viewport sizes
  • avoid CPU starvation from too many parallel workers
  • collect traces or video on retry and failure
  • keep test data deterministic
  • make sure your CSS overrides for test mode are loaded in CI too
  • use the same browser family in local and pipeline runs when possible

A sample GitHub Actions job might look like this:

name: e2e

on: [push, pull_request]

jobs: test: runs-on: ubuntu-latest steps: - uses: actions/checkout@v4 - uses: actions/setup-node@v4 with: node-version: 20 - run: npm ci - run: npx playwright install –with-deps - run: npm run test:e2e

For teams using Selenium, the same principles apply, even if the syntax differs. The browser still needs the right readiness signal, and the locator still needs to point to the final element, not the placeholder.

A minimal Selenium pattern for waiting on a skeleton to disappear

from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

wait = WebDriverWait(driver, 10) wait.until(EC.invisibility_of_element_located((By.CSS_SELECTOR, ‘[data-testid=”skeleton”]’))) wait.until(EC.visibility_of_element_located((By.CSS_SELECTOR, ‘[data-testid=”results”]’)))

This is not fancy, but it is often enough to remove the most obvious race conditions.

Tradeoffs, because stability has a cost

Every stabilization tactic has a tradeoff:

  • More explicit waits can make tests slower if they are too broad
  • More data-testid hooks add markup maintenance, but reduce selector ambiguity
  • Disabling animations in test mode can hide animation bugs, so keep separate coverage if motion matters
  • Waiting on network responses can tightly couple tests to implementation details if overused
  • Separating tests by concern increases test count, but usually decreases debug time

The best teams pick the smallest stabilization change that removes the failure mode.

A decision guide for flaky UI-state failures

Use this quick decision tree:

  • If the test clicks too early, wait for a readiness signal
  • If the test finds the wrong element, fix the locator strategy
  • If the test passes locally but fails in CI, inspect timing and CPU contention
  • If the test fails during a transition, wait for the animation or disable it in test mode
  • If the test is trying to validate motion, move that check to a visual or motion-specific test

Where tools like Endtest can fit

For teams that want more stable flow-based automation with less hand-written timing logic, Endtest is one possible alternative to evaluate. It is an agentic AI test automation platform with self-healing tests, so when a locator breaks because the DOM changes, it can try nearby stable context and continue the run. That will not replace good UI-state design, but it can reduce some maintenance pressure in large suites.

If you are curious about the mechanics, the self-healing documentation explains how broken locators are recovered inside the platform.

Closing thoughts

To stabilize flaky e2e tests in apps with skeleton screens, loading animations, and micro-interactions, focus on the state transitions the user actually experiences. Treat readiness as a contract, not a guess. Make locators stable, wait on meaningful signals, and separate functional validation from motion validation.

The core idea is simple: modern UI is not just a page, it is a sequence of temporary states. Once your tests acknowledge those states instead of fighting them, the false failures usually start dropping for reasons that are easy to explain and easier to maintain.