Exploratory Testing: The Skill AI Cannot Replace

Scripted tests check behavior that someone thought to specify. Exploratory testing finds the bug where selecting 200 catalog items, navigating away, then returning and selecting again submits changes to 400 items because no one specified what happens to hidden selection state. This guide covers how session-based test management structures exploration without scripting it, which heuristics point toward problems, and three tour techniques that consistently surface failures the automation suite can't find.

What exploratory testing actually is

The definition that holds up best comes from James Bach and Michael Bolton: exploratory testing is simultaneous learning, test design, and execution. You're not doing these things sequentially. You're doing them all at once, in real time.

With scripted testing, someone writes a test case first, then a tester executes it later, often days or weeks apart. The person designing the test and the person executing it may not even be the same human. In exploratory testing, there's no gap between those activities. What you discover in the first five minutes shapes where you go in the next five. The test evolves as you learn.

This is not a lesser form of testing. It's a different cognitive mode entirely. Scripted testing excels at verification: confirming that known behavior still holds. Exploratory testing excels at discovery: finding behavior nobody thought to write a script for.

A concrete example: your team ships a new bulk-edit feature for a product catalog. The scripted tests cover selecting items, applying a change, confirming the save. During an exploratory session, you select 200 items, start editing, then navigate away before saving. When you come back and select items again, the previous selection is still active in a hidden state. Submitting the form now applies changes to 400 items. No one specified that scenario. No script checked for it. Your curiosity caught it.

Exploratory testing is not opposed to scripted testing. A healthy test strategy uses both. Scripted tests give you regression coverage at speed. Exploratory testing finds what the scripts didn't know to look for.

Why scripted and exploratory testing solve different problems

If you send the same tester to execute a 50-step regression script every sprint, they stop seeing the application. They're reading a list and clicking boxes. Cognitive load drops to nearly zero, which means so does bug detection, unless the bug happens to intersect exactly with a step on the script.

Exploratory testing demands full attention. You're forming hypotheses, trying to disprove them, and reacting to what you find. "This dropdown loads slowly. I wonder what happens if I submit the form before it finishes populating." That thought doesn't appear in any test case document. It appears because you're thinking while you're testing.

The distinction matters for how you plan your testing work. When you inherit a stable feature with good automation coverage, scripted regression is the right tool. When you're looking at a newly built feature, a third-party integration, or anything that touches multiple parts of the system in non-obvious ways, exploratory testing finds what scripted testing misses.

Neither approach is superior. Choosing the right one, or the right combination, is the skill.

Session-Based Test Management: structure without rigidity

Unstructured exploration is where "clicking around randomly" criticism has some truth. If you open an application with no goal and no time limit and no record of what you did, you'll cover the same ground repeatedly, miss large areas entirely, and have nothing useful to show at the end.

Session-Based Test Management (SBTM), developed by Jonathan and James Bach, solves this without turning exploration into scripted testing.

The core unit is a test session: a block of uninterrupted testing, usually 60 to 90 minutes, with three components.

The charter defines the mission. It's not a test case, it's a focused question. "Explore the invoice export feature with attention to edge cases in multi-currency orders." That's a charter. It tells you where to start and what to care about, but it doesn't dictate your path. You can and should follow interesting threads as they appear.

The time box keeps sessions honest. 90 minutes of focused exploration is significantly more productive than four hours of drifting. The constraint forces prioritization. If you only have 90 minutes, you move toward the riskiest areas first.

The debrief is where the session's value gets captured. After the session, you spend 10-15 minutes summarizing what you covered, what you found, what you didn't get to, and what questions remain. This output is what makes exploratory testing visible to the rest of the team.

In practice, a charter for a payment flow might look like: "Explore checkout behavior when users switch payment methods mid-session, with focus on error handling and state management." You timebox it to 60 minutes, test with intent, then debrief. The resulting notes are your deliverable, not a pass/fail report, but a map of what you learned.

Keep a simple SBTM log in a shared doc or Notion table: charter, tester, date, duration, bugs found, areas not covered. Over time this builds a picture of your coverage that's more honest than any scripted test plan.

Heuristics that guide where to look

Experienced exploratory testers don't wander. They follow heuristics: mental shortcuts that point toward areas likely to contain problems.

The SFDPOT heuristic (developed by James Bach, sometimes read as "San Francisco Depot") gives you six lenses: Structure, Function, Data, Platform, Operations, and Time. Applied to a new feature, you're asking: does the structure of the data model create any vulnerabilities? What happens at the boundaries of accepted inputs? Does behavior change on different platforms or browsers? What happens when this runs under load, or when a time-dependent process is interrupted?

You don't apply all six to every session. You use them as prompts when you feel like you're running out of ideas.

User personas work alongside heuristics. A power user who has memorized keyboard shortcuts interacts with a product very differently than someone opening it for the first time. A user with poor network connectivity hits different edge cases than someone on fiber. Embodying a specific persona keeps your exploration coherent and helps you discover persona-specific failure modes.

Risk areas are the third guide. New code has more bugs than old code. Integrations between systems have more bugs than either system individually. Features with complex state (multi-step wizards, shopping carts, forms with conditional logic) have more bugs than simple CRUD screens. Point your exploration there first.

Tour techniques: three that pay off immediately

Cem Kaner introduced the idea of applying tourist metaphors to exploratory testing, later expanded by Michael Kelly. Three tours are worth knowing cold.

The feature tour is systematic coverage of a feature's capabilities, but approached with curiosity rather than a checklist. You're visiting every room in the building (every major function, every interaction point) not to confirm it works, but to understand it well enough to know where to probe deeper. On a new reporting module, this means generating every report type, applying every filter, exporting in every available format. Not to verify them, but to map the territory.

The complexity tour hunts for interactions between features. You're looking for places where two systems touch in ways that might not have been designed together. A payment form that also applies discount codes is more complex than either alone. An admin panel action that triggers a notification email is more complex than either alone. You test the seam. In a project management tool, this might mean: create a task, assign it to a user, then immediately deactivate that user. What happens to the task? Does the assigned user field handle a now-invalid reference gracefully?

The interruption tour tests resilience. Slow connections. Mid-process navigation. Browser back button during a form submission. Closing a tab during file upload. Session timeout mid-checkout. Modern applications handle the happy path well. The interruption tour looks for everything that happens when the user doesn't behave as the happy path assumes. A three-step onboarding wizard that works perfectly when completed normally may completely lose state if the user clicks the browser's back button between steps two and three.

Documenting findings without killing the flow

The worst thing you can do during an exploratory session is stop to write a formal bug report. You break your mental model, lose your train of thought, and spend the rest of the session in a different cognitive mode.

The solution is separation: capture during the session, formalize after.

During the session, you want notes that are fast and disposable. Sticky notes on a physical desk. A running scratch note in VS Code or Notepad. Brief annotations while screenshotting. The goal is enough information to reconstruct what you found, not a polished report.

Screenshots should be annotated immediately with a single callout arrow before moving on. The annotation tells you in two days why you took this screenshot. Without it, a screenshot of a broken dropdown filter is just a screenshot.

Loom (or any screen recorder) is worth using for anything involving a sequence of steps. Record the steps, narrate what you're doing and why it's unexpected, and save the link in your scratch notes. A 90-second Loom of a reproduction is worth more than a 400-word bug report, and it takes less time to create.

After the session, review your scratch notes and decide which findings warrant formal bug reports. Most exploratory sessions produce a mix of confirmed bugs, questions for the developer, behavioral observations, and things to follow up on. The formal report comes after the session, not during.

Don't let perfect documentation be the enemy of good exploration. If you're spending more than 30 seconds documenting something during the session, you're interrupting the flow. Capture a screenshot and a one-line note, then keep moving.

Why AI can't do this

The argument for AI replacing exploratory testing usually goes: "AI can generate test cases, learn from past bugs, and explore the application intelligently." Some of this is already happening. AI-powered tools crawl applications, generate test scripts, and flag visual regressions. This is useful.

What it isn't is exploratory testing.

Exploratory testing depends on domain intuition: knowledge about how users actually behave in the specific context of a specific product. A tester who has spent three months on an e-commerce platform knows that users with large wishlists behave differently at checkout. That the promotion code field attracts injection attempts. That the address validation breaks in specific ways for non-US addresses that US-centric developers consistently overlook. This isn't in any training dataset in a usable form. It's accumulated context.

More fundamentally, exploratory testing depends on curiosity driven by meaning. When you see a slow dropdown, you wonder what happens during that window of slowness because you understand what a user is trying to do and why the latency creates a risk. An AI sees a performance metric. You see a failure mode.

AI tools are excellent at executing known paths at scale, identifying regressions against a baseline, and surfacing anomalies in logs and metrics. The question "what should I be curious about here?" is not something a model trained on past test cases can answer well for a system it hasn't encountered in a context it hasn't seen.

The exploratory tester's job is precisely to find what nobody thought to specify. That requires asking questions nobody thought to ask, which requires understanding the domain, the users, and the product deeply enough to know which questions matter. That's not automation. That's expertise.

How to apply this Monday morning

You don't need a formal SBTM program to start. Here's a practical starting point that fits into a normal sprint.

Pick one feature that was recently changed or recently shipped. Write a charter in one sentence: "Explore [feature] focusing on [area of concern]." Spend 60 minutes on it. Take notes in a scratch document as you go. At the end, spend 10 minutes reviewing your notes and filing the bugs you found.

That's one exploratory session. Do it consistently (one or two per sprint) and you'll start catching things your automation suite never would.

When you write charters, rotate your focus. One session on data edge cases. The next on the interaction between this feature and an adjacent one. The next on what happens when the user behaves unexpectedly. The variety is the point.

Finally: debrief by sharing your session notes with your team, even informally. A quick Slack message ("ran an exploratory session on the bulk-edit flow, found two bugs, noticed the error handling for partial failures is unclear") builds visibility for work that otherwise happens invisibly. It also makes exploratory testing a team practice rather than a solo habit.

Exploratory testing isn't a supplement to a "real" test strategy. It's the part of your strategy that catches what everything else misses. The teams that do it consistently ship fewer regressions and catch more critical bugs before production. The skill is learnable, improvable, and entirely yours.

What exploratory testing actually is

Why scripted and exploratory testing solve different problems

Session-Based Test Management: structure without rigidity

Heuristics that guide where to look

Tour techniques: three that pay off immediately

Documenting findings without killing the flow

Why AI can't do this

How to apply this Monday morning

Continue reading