Debugging Flaky Tests: A Practical Guide

A test that fails randomly on unchanged code teaches the team to rerun CI instead of investigating, and that habit is how real failures go unnoticed. This guide covers the five root causes of flakiness in Playwright: timing issues, test pollution, shared test data, network dependencies, and element order instability, plus a step-by-step debugging workflow using the trace viewer to find which cause you're dealing with.

Why flaky tests are worse than no tests

When a test fails consistently, you fix it. When a test fails randomly, the team starts ignoring red builds. "It's probably just the flaky login test" becomes the standard response to CI failures. Eventually a real bug slips through because nobody took the red build seriously.

Flaky tests erode trust in your entire test suite. That's why fixing them is worth the time, even when the test itself isn't critical.

The most common causes

Before debugging, know what you're looking for. Flaky tests almost always come from one of five places.

Timing issues. The most common cause by far. The test tries to interact with an element before it's ready: before it appears, before it's enabled, before an animation finishes. The test passes when the page loads fast and fails when it loads slow. Test pollution. One test leaves behind state that breaks the next test. A created record, a leftover cookie, a modified localStorage value. Tests that pass alone but fail in a suite are almost always this. Shared test data. Two tests run in parallel and both try to use or modify the same record. One wins, one fails. Network dependencies. A test makes a real API call that occasionally times out or returns unexpected data. Element order instability. A test assumes elements appear in a specific order (first row, second button) but the order isn't guaranteed.

Start with the Playwright trace viewer

Before changing any code, reproduce the failure and capture a trace. The trace viewer is Playwright's most powerful debugging tool: it records every action, network request, and DOM snapshot during a test run.

Enable tracing in playwright.config.ts:

export default defineConfig({
  use: {
    trace: 'on-first-retry',  // capture trace when a test fails and retries
  },
  retries: 1,  // retry once so the trace gets captured
});

Run the tests, then open the report:

npx playwright test
npx playwright show-report

Click on a failed test. The trace view shows a timeline of every action with before/after screenshots. You can see exactly which step failed, what the page looked like at that moment, and what network requests were in flight.

This alone resolves about half of flaky test investigations without any guesswork.

Fix timing issues

Timing problems look like this in the error output:

Error: locator.click: Timeout 30000ms exceeded.
waiting for getByRole('button', { name: 'Submit' })

Or:

Error: expect(locator).toBeVisible()
Received: hidden

The instinct is to add a wait. The wrong fix:

// Bad — guessing how long to wait
await page.waitForTimeout(2000);
await page.getByRole('button', { name: 'Submit' }).click();

This makes the test slower and still flaky. Sometimes 2 seconds isn't enough.

The right fix: wait for the specific condition that needs to be true before the action.

// Wait for a loading indicator to disappear
await page.getByTestId('loading-spinner').waitFor({ state: 'hidden' });
await page.getByRole('button', { name: 'Submit' }).click();

// Wait for a button to become enabled
await page.getByRole('button', { name: 'Submit' }).waitFor({ state: 'visible' });
await expect(page.getByRole('button', { name: 'Submit' })).toBeEnabled();
await page.getByRole('button', { name: 'Submit' }).click();

// Wait for a network request to finish
await page.waitForResponse(resp =>
  resp.url().includes('/api/items') && resp.status() === 200
);

Playwright's built-in auto-waiting handles most cases automatically. When auto-waiting isn't enough, wait for the specific thing, not a fixed duration.

Fix test pollution

If tests pass individually but fail when run together, the problem is almost certainly state leaking between tests.

Check for these sources of pollution:

Browser storage. If one test writes to localStorage or sessionStorage and another test reads from it, you have pollution. Playwright creates a fresh browser context for each test file by default, but tests within the same file share context by default in some configurations.

// Clear storage before each test in the file
test.beforeEach(async ({ page }) => {
  await page.goto('https://lab.becomeqa.com');
  await page.evaluate(() => {
    localStorage.clear();
    sessionStorage.clear();
  });
});

Database state. If your tests create records and don't clean them up, tests that run after them see unexpected data.

test.afterEach(async ({ request }) => {
  // Delete the test record created during the test
  await request.delete('https://lab.becomeqa.com/api/items/test-item-id');
});

Global test state. If you use global variables in your test files to share data between tests, don't. Each test should be self-contained.

Run your tests with --repeat-each=3 to see if they're stable when repeated. A test that fails on the second run is leaking state. npx playwright test --repeat-each=3 tests/login.spec.ts

Fix parallel execution conflicts

Playwright runs tests in parallel by default across multiple workers. If two tests try to modify the same record or use the same user account simultaneously, they conflict.

The fix depends on the situation:

Use unique test data per test. Instead of always using admin@becomeqa.com, generate a unique identifier for each test run:

const uniqueId = Date.now();
const testEmail = `test-${uniqueId}@example.com`;

Isolate parallel tests. Group tests that conflict into the same file and set that file to run with a single worker:

// At the top of the file
test.describe.configure({ mode: 'serial' });

This runs all tests in the file sequentially, preventing conflicts.

Use separate test data per worker. Playwright passes a workerIndex to fixtures:

const workerEmail = `test-worker-${workerInfo.workerIndex}@example.com`;

Use retries carefully

Playwright supports automatic retries for flaky tests:

// playwright.config.ts
export default defineConfig({
  retries: process.env.CI ? 2 : 0,
});

Retries in CI mask problems rather than fix them, but they're a practical tool when you have genuine infrastructure flakiness (network timeouts, CI machine variance) rather than bugs in your test code.

The rule: retries are acceptable for infrastructure flakiness. They're not acceptable as a substitute for fixing real timing or isolation problems.

Setting retries: 3 without investigating why tests fail is how you end up with a suite that takes 3x longer to run and still has no test you actually trust.

Quarantine persistently flaky tests

If a test is flaky and you can't fix it immediately, quarantine it. Don't leave it in the main suite failing randomly.

test.skip('checkout flow completes successfully', async ({ page }) => {
  // Flaky due to payment API timeouts — tracked in JIRA-1234
  // TODO: mock the payment API response instead of hitting the real one
});

A skipped test with a comment is infinitely better than a flaky test that trains the team to ignore red builds.

A systematic debugging workflow

When you hit a flaky test, work through this order:

1. Capture the trace: run with retries: 1 and trace: 'on-first-retry', look at the exact failure point

2. Run it 10 times: npx playwright test --repeat-each=10 tests/your.spec.ts, see how often it fails

3. Run it in isolation: npx playwright test tests/your.spec.ts, if it passes alone, it's test pollution

4. Run it headed: npx playwright test --headed --slow-mo=500, watch it fail in slow motion

5. Check the network tab in the trace: are requests failing or timing out?

6. Add explicit waits for the specific condition that needs to be true before the failing action

7. Check for shared state: what does the test before it do?

Most flaky tests are solved at step 3 or step 6.

FAQ

How do I know if a test is genuinely flaky vs. caught a real bug?

Run it 10 times on the same commit. If it fails 2 out of 10, it's flaky. If it fails 10 out of 10, it caught a bug.

My test only fails in CI, never locally. Why?

CI machines are slower and have less memory. Timing issues that are invisible locally show up under load. Run locally with --slow-mo=500 to simulate a slower machine. Also check if CI uses a different base URL or environment variables.

Should I use test.fixme or test.skip for known flaky tests? test.skip excludes the test entirely. test.fixme marks it as broken but still runs it. The test is expected to fail, and it becomes a failure if it starts passing (which alerts you to check it). For known flaky tests that need fixing, test.fixme is the more honest choice. The trace shows the element was visible but the click still failed. What happened?

The element was visible but probably covered by another element (a modal, a tooltip, a sticky header). Check isVisible() vs isInViewport(). You may need to scroll to the element first: await locator.scrollIntoViewIfNeeded().

Why flaky tests are worse than no tests

The most common causes

Start with the Playwright trace viewer

Fix timing issues

Fix test pollution

Fix parallel execution conflicts

Use retries carefully

Quarantine persistently flaky tests

A systematic debugging workflow

FAQ

Continue reading