Visual Regression Testing with Playwright: toHaveScreenshot Without Applitools

toHaveScreenshot() creates a baseline screenshot on first run and fails on any subsequent run where the pixel difference exceeds a threshold. The most common first CI failure has nothing to do with a real regression: baselines generated on a developer's macOS machine don't match what Linux CI renders, because font hinting and sub-pixel rendering differ between platforms. This article covers threshold configuration, masking dynamic content like timestamps and user avatars, generating Linux-compatible baselines with Playwright's Docker image, and when a commercial tool like Applitools is worth the cost over the built-in approach.

What visual regression testing actually is

A visual regression test captures a screenshot of a page or element, stores it as a baseline, and then compares every future run against that baseline pixel by pixel. If the difference exceeds a configurable threshold, the test fails and shows you exactly which pixels changed.

The distinction from a standard screenshot is important. Taking a screenshot with page.screenshot() just saves a file. It never fails. It tells you nothing about whether the page looks correct. Visual regression testing requires a reference (the agreed-upon "this is what it should look like" image) and an automated comparison against that reference on every run.

The appeal is real. You catch layout regressions that no functional assertion would ever surface: a CSS change that shifts a modal five pixels to the left, a z-index bug that hides a dropdown behind a banner, a dark-mode implementation that accidentally inverts a logo. These are the kinds of bugs that make it through code review because reviewers focus on logic, not pixels.

The challenge is also real. Screenshots are sensitive. A one-pixel anti-aliasing difference between macOS and Linux, a dynamic timestamp on the page, an ad that rotates content: all of these will generate false failures. Managing that noise is most of the practical work in visual regression testing.

`toHaveScreenshot()`: the built-in assertion

Playwright's visual assertion is expect(locator).toHaveScreenshot() or expect(page).toHaveScreenshot(). You can screenshot the full page or scope it to any locator.

// tests/visual/homepage.spec.ts
import { test, expect } from '@playwright/test';

test('homepage matches baseline', async ({ page }) => {
  await page.goto('https://lab.becomeqa.com');

  // Full page screenshot
  await expect(page).toHaveScreenshot('homepage.png');
});

test('login button matches baseline', async ({ page }) => {
  await page.goto('https://lab.becomeqa.com');

  // Screenshot scoped to a specific element
  const loginButton = page.getByRole('button', { name: 'Login' });
  await expect(loginButton).toHaveScreenshot('login-button.png');
});

The name argument ('homepage.png') is optional. If you omit it, Playwright generates a name automatically from the test title and a counter. Providing an explicit name makes the baseline files easier to find and understand when you're reviewing them later.

On the first run, there is no baseline to compare against. Playwright creates one.

Generating baseline screenshots on the first run

Run your tests the first time and you will see failures like this:

Error: A snapshot doesn't exist at tests/visual/homepage.spec.ts-snapshots/homepage-chromium-darwin.png, writing actual.

This is expected. Playwright is telling you it wrote the baseline file and asking you to review and commit it. The test fails on first run by design. Playwright won't silently create a baseline without you knowing about it.

After the first run, your project will have a snapshots directory:

tests/
  visual/
    homepage.spec.ts
    homepage.spec.ts-snapshots/
      homepage-chromium-darwin.png
      homepage-chromium-linux.png
      login-button-chromium-darwin.png

Review those images. If they look correct, commit them to your repository. They are now the baseline. Every subsequent test run compares against these committed files.

// playwright.config.ts
import { defineConfig, devices } from '@playwright/test';

export default defineConfig({
  testDir: './tests',
  // Where snapshot files are stored. Defaults to next to the spec file.
  snapshotDir: './tests/__snapshots__',
  use: {
    baseURL: 'https://lab.becomeqa.com',
  },
  projects: [
    {
      name: 'chromium',
      use: { ...devices['Desktop Chrome'] },
    },
  ],
});

You can centralize all snapshots under a single directory with snapshotDir in your config, which some teams prefer for cleaner repository organization.

Updating baselines with `--update-snapshots`

The app changes. The design changes. When a visual change is intentional, you need to update the baseline. Run:

npx playwright test --update-snapshots

This overwrites all existing snapshots with fresh screenshots. Every test that ran will now have its current state as the new baseline.

If you only want to update snapshots for one test file:

npx playwright test tests/visual/homepage.spec.ts --update-snapshots

Or for a specific test by name:

npx playwright test --update-snapshots -g "homepage matches baseline"

Treat --update-snapshots with the same caution you give to git push --force. Running it carelessly will overwrite legitimate baselines with broken states. Always review the updated images before committing them. In CI, the flag should never be set automatically. It should only run in response to a deliberate developer action.

After updating, you'll commit the changed .png files to your repository. Your diff in code review will show the before and after images, which is exactly the right place to catch unintended visual changes.

Configuring comparison thresholds

Pixel-perfect comparison works beautifully in a controlled environment and generates constant noise everywhere else. Playwright gives you three threshold options to manage sensitivity.

test('product card matches baseline', async ({ page }) => {
  await page.goto('https://lab.becomeqa.com/products');

  const productCard = page.locator('.product-card').first();

  await expect(productCard).toHaveScreenshot('product-card.png', {
    // Maximum number of pixels that are allowed to differ
    maxDiffPixels: 100,

    // Maximum ratio of differing pixels (0–1). 0.01 = 1% of all pixels
    maxDiffPixelRatio: 0.01,

    // Per-pixel color difference threshold (0–1). Higher = more tolerant
    threshold: 0.2,
  });
});

threshold controls how different a single pixel must be to count as "different." The default is 0.2, which handles minor anti-aliasing and sub-pixel rendering differences. Bump it to 0.3 or 0.4 on components with lots of curves or gradients where rendering varies slightly across platforms. maxDiffPixels is an absolute count. Use this for small, bounded components where you know a few pixels might vary (icon rendering, border radius) but a 50-pixel shift should always fail. maxDiffPixelRatio is a percentage of total pixels. Use this for full-page screenshots where the total pixel count is large. maxDiffPixels: 100 on a 1920x1080 page is extremely strict, but maxDiffPixelRatio: 0.001 gives you a reasonable tolerance.

You can set defaults in playwright.config.ts so you don't repeat the same thresholds in every test:

// playwright.config.ts
export default defineConfig({
  expect: {
    toHaveScreenshot: {
      maxDiffPixels: 100,
      threshold: 0.2,
    },
  },
});

Individual tests can still override these values if they need different sensitivity.

Masking dynamic content

Dynamic content is the biggest source of false failures in visual regression tests. A timestamp that updates every second, a user avatar pulled from a CDN, a rotating banner ad: any of these will generate a diff on every single run.

Playwright's mask option accepts an array of locators. Those regions are painted over with a solid color before the comparison is made.

test('dashboard matches baseline', async ({ page }) => {
  await page.goto('https://lab.becomeqa.com/dashboard');

  await expect(page).toHaveScreenshot('dashboard.png', {
    mask: [
      // Mask the "Last updated" timestamp in the header
      page.locator('[data-testid="last-updated-timestamp"]'),

      // Mask the user avatar. Different for every user.
      page.locator('[data-testid="user-avatar"]'),

      // Mask any third-party ad containers
      page.locator('.ad-container'),
    ],
    // Customize the mask color (default is a magenta overlay)
    maskColor: '#FF00FF',
  });
});

The masked regions show up in the comparison as a solid block of color. The comparison still runs across the entire screenshot. The masked areas just always match themselves because both the actual and expected screenshots have the same mask applied.

Add data-testid attributes to dynamic content specifically so they can be reliably masked in visual tests. Selecting by class name works, but class names change. A data-testid="user-avatar" is stable and clearly communicates its purpose to anyone reading the test.

For animations, you can use animations: 'disabled' to stop CSS animations before the screenshot is taken:

test('animated hero section matches baseline', async ({ page }) => {
  await page.goto('https://lab.becomeqa.com');

  await expect(page).toHaveScreenshot('hero.png', {
    animations: 'disabled',
  });
});

This freezes CSS transitions and animations at their initial state, which makes animated components deterministic. For JavaScript-driven animations that don't use CSS transitions, you may need to wait for the animation to complete or add a waitForLoadState('networkidle') before the assertion.

Snapshot naming and cross-platform organization

Look at the snapshot filename Playwright generates: homepage-chromium-darwin.png. The browser and operating system are embedded in the name. This is not an accident.

The same page rendered in Chromium on macOS versus Chromium on Linux produces subtly different pixels. Font hinting, sub-pixel rendering, and small differences in how the OS composites graphics mean you cannot share a single baseline image across platforms. Playwright handles this by creating separate baselines for each browser/OS combination.

tests/__snapshots__/
  homepage.spec.ts/
    homepage-chromium-darwin.png   (macOS Chrome)
    homepage-chromium-linux.png    (Linux Chrome)
    homepage-firefox-linux.png     (Linux Firefox)
    homepage-webkit-darwin.png     (macOS Safari)

You control the naming pattern through the snapshotPathTemplate option in playwright.config.ts:

// playwright.config.ts
export default defineConfig({
  snapshotPathTemplate:
    '{snapshotDir}/{testFileDir}/{testFileName}-snapshots/{arg}-{projectName}-{platform}{ext}',
});

The available tokens are:

{arg}: the name you passed to toHaveScreenshot()
{projectName}: the project name from your config (e.g., chromium, firefox)
{platform}: the OS (darwin, linux, win32)
{testFileName}: the spec file name without extension
{snapshotDir}: the base snapshot directory

Keep {platform} in the template. Removing it and trying to share one baseline across OSes is the most common mistake teams make when first setting up visual tests, and it generates constant false failures in CI.

Running visual tests in CI

Running visual tests in CI reveals the cross-OS problem immediately. Your baselines were generated on a developer's macOS machine. Your CI pipeline runs on Linux. The snapshots don't match.

The cleanest solution is to generate baselines inside the same Docker container your CI uses. Playwright provides official Docker images:

# .github/workflows/visual-tests.yml
name: Visual Tests

on: [push, pull_request]

jobs:
  visual:
    runs-on: ubuntu-latest
    container:
      image: mcr.microsoft.com/playwright:v1.44.0-jammy

    steps:
      - uses: actions/checkout@v4

      - name: Install dependencies
        run: npm ci

      - name: Run visual tests
        run: npx playwright test tests/visual/

      - name: Upload diff report on failure
        uses: actions/upload-artifact@v4
        if: failure()
        with:
          name: playwright-visual-report
          path: playwright-report/
          retention-days: 7

When tests fail in CI, the uploaded report contains the actual screenshot, the expected baseline, and a diff image that highlights exactly which pixels changed. This is how you distinguish a real visual regression from an environment mismatch.

For generating Linux baselines from a macOS machine without switching to Linux yourself, run the Playwright Docker container locally:

# Generate Linux-compatible baselines from your Mac
docker run --rm \
  -v "$(pwd):/work" \
  -w /work \
  mcr.microsoft.com/playwright:v1.44.0-jammy \
  npx playwright test tests/visual/ --update-snapshots

This writes new *-linux.png snapshot files that will match what CI produces. Commit those files and the CI failures caused by platform differences go away.

A common CI workflow pattern is to run visual tests in a separate job or project, gated on the functional test suite passing first. Visual tests are slower than functional tests and their failures are noisier, so keeping them in a dedicated pipeline step prevents them from blocking fast feedback on functional regressions:

// playwright.config.ts
export default defineConfig({
  projects: [
    // Functional tests run first
    {
      name: 'functional',
      testMatch: 'tests/functional/**/*.spec.ts',
    },
    // Visual tests run after functional
    {
      name: 'visual',
      testMatch: 'tests/visual/**/*.spec.ts',
      dependencies: ['functional'],
    },
  ],
});

Playwright built-in vs Applitools and Percy

Playwright's built-in visual testing covers a lot of ground. But commercial tools like Applitools Eyes and Percy exist for reasons worth understanding.

The core limitation of the built-in approach is snapshot management. Every baseline image lives in your repository. A project with 50 visual tests, 3 browsers, and 2 platforms generates 300 PNG files. Add more test cases and the repo grows. Reviewing visual changes in a pull request means looking at image diffs in GitHub's interface, which works but is not great for large images or subtle changes.

Applitools and Percy solve this with cloud storage for baselines, dedicated review UIs designed for visual diffs, intelligent AI-based comparison that understands layout shifts vs. content changes, and team workflows for approving or rejecting visual changes.

The trade-off is straightforward:

| | Playwright built-in | Applitools / Percy |

|---|---|---|

| Cost | Free | Paid (free tier available) |

| Setup | Minutes | Minutes + API key |

| Baseline storage | Git repository | Cloud |

| Diff review UI | Playwright HTML report | Dedicated cloud UI |

| AI comparison | No | Yes (Applitools) |

| Cross-browser baselines | Separate files per browser/OS | Unified with normalization |

| CI snapshots | Requires Docker image matching | Handled by the service |

For a solo project or small team, the built-in approach is the right starting point. It's free, it's fast to set up, and it handles the common cases well. The Docker workflow manages the cross-OS problem adequately once you've done it once.

For larger teams, especially ones where multiple people need to review and approve visual changes, the workflow friction of managing PNG files in git and reviewing diffs in GitHub becomes real. That's when a dedicated service starts to justify its cost. You're paying for the review workflow as much as for the comparison technology.

Applitools also offers a Playwright integration that replaces toHaveScreenshot() with Applitools' eyes.check() calls, so switching is a matter of updating one import and changing the assertion call, not rewriting tests.

FAQ

How do I run only visual tests without running the full suite?

Use a --grep flag or organize visual tests into their own directory and point Playwright at it: npx playwright test tests/visual/. If you're using projects in your config, npx playwright test --project=visual runs just the visual project.

My snapshots keep failing because of a loading spinner that sometimes appears. What should I do?

Wait for the spinner to disappear before asserting: await page.locator('[data-testid="loading-spinner"]').waitFor({ state: 'hidden' }). Alternatively, mask it. The masking approach is more resilient. If the timing changes, a mask still handles it, but a waitFor with a tight timeout might not.

Can I use toHaveScreenshot() for mobile viewport testing?

Yes. Set the viewport in your project config or in the test itself: await page.setViewportSize({ width: 375, height: 812 }). Playwright will treat mobile and desktop screenshots as separate baselines if they're captured in separate tests or projects.

How many visual tests should I write?

Fewer than you think. Visual tests are best reserved for components and pages where the visual output is genuinely part of the specification: a design system's button states, a data visualization, a PDF export preview. Trying to cover every page visually creates a maintenance burden that teams usually abandon within a few months.

Can I snapshot a component in isolation without navigating to a page?

Not directly with Playwright. It's a browser-based tool that operates on full pages. For component-level visual testing in isolation, Storybook with Chromatic (Percy for Storybook) is the more appropriate tool. Playwright visual tests work best at the integration level: real pages in a real browser.

What visual regression testing actually is

toHaveScreenshot(): the built-in assertion

Generating baseline screenshots on the first run

Updating baselines with --update-snapshots

Configuring comparison thresholds

Masking dynamic content

Snapshot naming and cross-platform organization

Running visual tests in CI

Playwright built-in vs Applitools and Percy

FAQ

Continue reading

`toHaveScreenshot()`: the built-in assertion

Updating baselines with `--update-snapshots`