Using ChatGPT for Test Case Generation: A QA Engineer's Practical Guide

"Write test cases for a login form" produces a generic list covering username field, password field, and submit button. "Login form for a SaaS app with email+password auth, Google OAuth, and MFA where accounts lock after 5 failed attempts" produces something you can use. This article covers the prompt structure that makes AI output specific, five categories where it genuinely helps, the review checklist for what it consistently gets wrong, and a five-step workflow that uses AI as a first-pass generator with you as the quality filter.

What ChatGPT Is Good At for Test Cases

Before the prompts, be honest about what you're getting:

Where AI genuinely helps:

Generating a first-pass list of scenarios quickly — then you refine it
Finding equivalence partitions and boundaries you might have missed
Converting user stories to a structured test case table
Generating negative test cases (these are easy to forget)
Creating test data ideas (valid/invalid email formats, edge-case numbers)

Where AI falls short:

It doesn't know your actual application's behavior
It can't test — it only suggests what to test
It misses business-specific edge cases it has no context for
It generates plausible-looking but sometimes wrong assertions
Generic prompts produce generic output

The fix for the last point: give it context. Lots of it.

The Basic Prompt Structure

Generic prompt (bad):

"Write test cases for a login form."

You'll get: Username field testing, password field testing, submit button... Nothing specific to your system.

Better structure:

Feature: [describe what the feature does and its purpose]
Rules: [list the actual business rules and constraints]
User types: [who uses this feature]
Tech context: [optional — what framework, what type of app]
Output format: [how you want the results]
Ask: Generate test cases covering happy path, negative cases, and edge cases.

Prompt Examples You Can Use

1. User story to test cases

Convert this user story to test cases covering happy path, 
negative cases, and edge cases.

User story:
As a registered user, I want to reset my password via email,
so that I can regain access if I forget it.

Rules:
- Reset link is valid for 1 hour
- After using the link, it expires immediately
- After 3 consecutive failed login attempts, the account is locked
- Reset link goes to the email on file
- New password must meet: 8–64 chars, at least 1 number, at least 1 uppercase

Format: table with columns: Test ID | Description | Steps | Expected Result

ChatGPT will generate:

Happy path: valid email → receives link → uses link → sets valid password ✓
Expired link: uses link after 1 hour → error ✓
Reused link: uses link twice → error on second use ✓
Invalid new password: too short, no number, no uppercase ✓
Email not in system: no email sent, no error revealing if email exists ✓

Review the output: does it match your system's actual behavior? Adjust anything that doesn't.

2. API endpoint test cases

I'm testing a REST API endpoint. Generate test cases for it.

Endpoint: POST /api/users
Purpose: Create a new user account
Rules:
- email: required, valid format, unique in system
- password: required, 8–64 chars, min 1 uppercase, min 1 number
- role: optional, values: 'admin' | 'member' | 'viewer', default 'member'
- name: optional, max 100 chars

Success: Returns 201 with user object (id, email, role, created_at). No password in response.
Validation error: Returns 400 with field-level error details.
Duplicate email: Returns 409.

Generate test cases as a table: Test | Method | Body | Expected Status | Expected Response

This gives you structured test cases that can directly translate to Playwright API tests.

3. Finding edge cases for a specific field

I have a "quantity" field in an e-commerce cart:
- Must be a whole number
- Minimum: 1
- Maximum: 99
- The current quantity updates in real time

Using boundary value analysis and equivalence partitioning,
list the values I should test and the expected result for each.
Format as: Value | Partition | Expected behavior

ChatGPT knows boundary value analysis. This prompt gives you a structured, defensible list.

4. Expanding sparse acceptance criteria

Our acceptance criteria for this feature are vague. 
Help me identify missing scenarios.

Feature: Product search
Acceptance criteria:
- Users can search by product name
- Results display within 2 seconds
- Relevant results appear first

What scenarios are missing from these acceptance criteria?
What should the QA team add before testing begins?

This is a "what are we missing?" prompt. Useful in sprint planning.

5. Generating test data

I need test data for testing an email field.
Generate 15 test values covering:
- Valid email formats (include international, subdomains, plus-addressing)
- Invalid formats (various ways emails fail)
- Edge cases (very long, empty, only spaces, SQL injection attempt, emoji)

Format as: Value | Valid? | Why

AI is very good at generating diverse test data sets. This saves significant time.

Improving Output Quality

Tell it what you already have

I already have happy path and basic negative cases.
Here are my current test cases: [paste them]

What am I missing? Focus on edge cases and security scenarios.

This prevents duplication and makes it focus on the gaps.

Give it the tech stack

This is a Next.js app using PostgreSQL. 
What database-level edge cases should I test for the 
user registration form that might not be obvious from the UI?

Stack context produces more specific output.

Ask it to prioritize

From this list of 30 test cases, which 8 should I prioritize
if I only have 2 hours to test before a release?
Consider: user impact, probability of bugs, feature risk.

Ask for Playwright code

Convert these test cases to Playwright TypeScript code.
The page URL is https://lab.becomeqa.com/login.
Selectors: data-testid="email-input", data-testid="password-input", 
data-testid="login-button", data-testid="error-message".

Test cases:
1. Valid login with admin@test.com / ValidPass1
2. Invalid password shows error message
3. Empty email shows required field error

The output will need review and adjustment, but it's a fast starting point.

What to Always Review in AI-Generated Test Cases

Check for generic filler. "Verify that the button is clickable" isn't a test case. Delete it. Check expected results are specific. "Error message displays" is weak. "Error message reads 'Email is required'" is testable. Check for business logic. AI doesn't know your domain rules. If your app has special pricing for certain user types, it won't generate tests for that. Check for integration scenarios. AI generates isolated tests. "Does the reset email work if the user changes their email address during the reset flow?" — AI probably missed this. Verify boundary values. AI often gets boundaries slightly wrong. If your max is 64 characters, double-check it says 63/64/65, not 63/64.

A Practical Workflow

1. Write your own scenarios first (5–10 minutes). What are the key user flows?

2. Give ChatGPT the feature + rules + your scenarios, ask: "What am I missing? Expand these into detailed test cases."

3. Review the output — cross out the generic ones, adjust the expected results to match your actual application.

4. Add domain-specific cases that AI wouldn't know about (your company's special business rules, known past bugs, integration scenarios).

5. Use the result as your actual test case list or convert it to Playwright test code.

This way, AI is your first-pass generator and you're the quality filter — which is exactly the right division of labor.

What Not to Do

Don't copy-paste AI test cases without review. They'll look complete but miss the specific things about your system. Don't use it as a replacement for understanding the feature. If you don't understand what you're testing, AI can't save you. Don't skip adding context. The more specific your prompt, the more specific the output. "Login form" produces garbage. "Login form for a SaaS app with email+password auth, Google OAuth, and MFA support where accounts lock after 5 failed attempts" produces something useful.

The engineers who get the most value from AI are the ones who treat it as a junior collaborator who needs clear direction and close review — not as an oracle that produces correct output automatically. With the right prompts and a critical eye on the output, ChatGPT can meaningfully accelerate your test planning without sacrificing coverage quality.