"Write test cases for a login form" produces a generic list covering username field, password field, and submit button. "Login form for a SaaS app with email+password auth, Google OAuth, and MFA where accounts lock after 5 failed attempts" produces something you can use. This article covers the prompt structure that makes AI output specific, five categories where it genuinely helps, the review checklist for what it consistently gets wrong, and a five-step workflow that uses AI as a first-pass generator with you as the quality filter.
What ChatGPT Is Good At for Test Cases
Before the prompts, be honest about what you're getting:
Where AI genuinely helps:- Generating a first-pass list of scenarios quickly — then you refine it
- Finding equivalence partitions and boundaries you might have missed
- Converting user stories to a structured test case table
- Generating negative test cases (these are easy to forget)
- Creating test data ideas (valid/invalid email formats, edge-case numbers)
- It doesn't know your actual application's behavior
- It can't test — it only suggests what to test
- It misses business-specific edge cases it has no context for
- It generates plausible-looking but sometimes wrong assertions
- Generic prompts produce generic output
The fix for the last point: give it context. Lots of it.
The Basic Prompt Structure
Generic prompt (bad):
"Write test cases for a login form."
You'll get: Username field testing, password field testing, submit button... Nothing specific to your system.
Better structure:
Feature: [describe what the feature does and its purpose]
Rules: [list the actual business rules and constraints]
User types: [who uses this feature]
Tech context: [optional — what framework, what type of app]
Output format: [how you want the results]
Ask: Generate test cases covering happy path, negative cases, and edge cases.Prompt Examples You Can Use
1. User story to test cases
Convert this user story to test cases covering happy path,
negative cases, and edge cases.
User story:
As a registered user, I want to reset my password via email,
so that I can regain access if I forget it.
Rules:
- Reset link is valid for 1 hour
- After using the link, it expires immediately
- After 3 consecutive failed login attempts, the account is locked
- Reset link goes to the email on file
- New password must meet: 8–64 chars, at least 1 number, at least 1 uppercase
Format: table with columns: Test ID | Description | Steps | Expected Result- Happy path: valid email → receives link → uses link → sets valid password ✓
- Expired link: uses link after 1 hour → error ✓
- Reused link: uses link twice → error on second use ✓
- Invalid new password: too short, no number, no uppercase ✓
- Email not in system: no email sent, no error revealing if email exists ✓
Review the output: does it match your system's actual behavior? Adjust anything that doesn't.
2. API endpoint test cases
I'm testing a REST API endpoint. Generate test cases for it.
Endpoint: POST /api/users
Purpose: Create a new user account
Rules:
- email: required, valid format, unique in system
- password: required, 8–64 chars, min 1 uppercase, min 1 number
- role: optional, values: 'admin' | 'member' | 'viewer', default 'member'
- name: optional, max 100 chars
Success: Returns 201 with user object (id, email, role, created_at). No password in response.
Validation error: Returns 400 with field-level error details.
Duplicate email: Returns 409.
Generate test cases as a table: Test | Method | Body | Expected Status | Expected ResponseThis gives you structured test cases that can directly translate to Playwright API tests.
3. Finding edge cases for a specific field
I have a "quantity" field in an e-commerce cart:
- Must be a whole number
- Minimum: 1
- Maximum: 99
- The current quantity updates in real time
Using boundary value analysis and equivalence partitioning,
list the values I should test and the expected result for each.
Format as: Value | Partition | Expected behaviorChatGPT knows boundary value analysis. This prompt gives you a structured, defensible list.
4. Expanding sparse acceptance criteria
Our acceptance criteria for this feature are vague.
Help me identify missing scenarios.
Feature: Product search
Acceptance criteria:
- Users can search by product name
- Results display within 2 seconds
- Relevant results appear first
What scenarios are missing from these acceptance criteria?
What should the QA team add before testing begins?This is a "what are we missing?" prompt. Useful in sprint planning.
5. Generating test data
I need test data for testing an email field.
Generate 15 test values covering:
- Valid email formats (include international, subdomains, plus-addressing)
- Invalid formats (various ways emails fail)
- Edge cases (very long, empty, only spaces, SQL injection attempt, emoji)
Format as: Value | Valid? | WhyAI is very good at generating diverse test data sets. This saves significant time.
Improving Output Quality
Tell it what you already have
I already have happy path and basic negative cases.
Here are my current test cases: [paste them]
What am I missing? Focus on edge cases and security scenarios.This prevents duplication and makes it focus on the gaps.
Give it the tech stack
This is a Next.js app using PostgreSQL.
What database-level edge cases should I test for the
user registration form that might not be obvious from the UI?Stack context produces more specific output.
Ask it to prioritize
From this list of 30 test cases, which 8 should I prioritize
if I only have 2 hours to test before a release?
Consider: user impact, probability of bugs, feature risk.Ask for Playwright code
Convert these test cases to Playwright TypeScript code.
The page URL is https://lab.becomeqa.com/login.
Selectors: data-testid="email-input", data-testid="password-input",
data-testid="login-button", data-testid="error-message".
Test cases:
1. Valid login with admin@test.com / ValidPass1
2. Invalid password shows error message
3. Empty email shows required field errorThe output will need review and adjustment, but it's a fast starting point.
What to Always Review in AI-Generated Test Cases
Check for generic filler. "Verify that the button is clickable" isn't a test case. Delete it. Check expected results are specific. "Error message displays" is weak. "Error message reads 'Email is required'" is testable. Check for business logic. AI doesn't know your domain rules. If your app has special pricing for certain user types, it won't generate tests for that. Check for integration scenarios. AI generates isolated tests. "Does the reset email work if the user changes their email address during the reset flow?" — AI probably missed this. Verify boundary values. AI often gets boundaries slightly wrong. If your max is 64 characters, double-check it says 63/64/65, not 63/64.A Practical Workflow
1. Write your own scenarios first (5–10 minutes). What are the key user flows?
2. Give ChatGPT the feature + rules + your scenarios, ask: "What am I missing? Expand these into detailed test cases."
3. Review the output — cross out the generic ones, adjust the expected results to match your actual application.
4. Add domain-specific cases that AI wouldn't know about (your company's special business rules, known past bugs, integration scenarios).
5. Use the result as your actual test case list or convert it to Playwright test code.
This way, AI is your first-pass generator and you're the quality filter — which is exactly the right division of labor.
What Not to Do
Don't copy-paste AI test cases without review. They'll look complete but miss the specific things about your system. Don't use it as a replacement for understanding the feature. If you don't understand what you're testing, AI can't save you. Don't skip adding context. The more specific your prompt, the more specific the output. "Login form" produces garbage. "Login form for a SaaS app with email+password auth, Google OAuth, and MFA support where accounts lock after 5 failed attempts" produces something useful.The engineers who get the most value from AI are the ones who treat it as a junior collaborator who needs clear direction and close review — not as an oracle that produces correct output automatically. With the right prompts and a critical eye on the output, ChatGPT can meaningfully accelerate your test planning without sacrificing coverage quality.
→ See also: Prompt Engineering for QA Engineers: Get Better Results from AI Tools | AI in QA 2026: What's Actually Useful and What's Hype | AI Test Generation Tools Compared: What Actually Works in 2026