Unlike Playwright assertions, a failed k6 check does not stop the test: all virtual users keep running, and k6 counts pass/fail rates across thousands of requests to show the health signal in the summary. What stops the run is a failed threshold: if p95 response time exceeds the configured limit, k6 exits with code 99 and the CI step fails exactly like a failing test. This tutorial walks through installing k6, writing a first script with realistic sleep() think time, reading the output table, configuring ramp-up stages, and setting thresholds that make performance regressions visible in CI.
What k6 Is and Why QA Engineers Should Know It
k6 is an open source performance testing tool built by Grafana Labs. You write test scripts in JavaScript or TypeScript, run them from the CLI, and get detailed output about how your system behaved under load. The GitHub repository has over 24,000 stars and it's used across organizations of every size.
What makes k6 the right entry point for QA engineers specifically is the scripting model. If you already write Playwright or Jest tests, the syntax is familiar: you import modules, write functions, make HTTP requests, and add assertions. There's no heavy GUI, no proprietary recording format, no XML configuration file. Just a script.
k6 also integrates cleanly with GitHub Actions. You install the binary, run your script, and if the test fails (more on what "fails" means later), the CI step exits with a non-zero code. That's the same contract every other test tool uses.
The tool is not a replacement for Playwright. Playwright verifies behavior; k6 measures performance under load. They answer different questions and both belong in a mature QA strategy.
Installing k6 and Running Your First Test
k6 ships as a standalone binary. On macOS:
brew install k6On Linux (Debian/Ubuntu):
sudo gpg -k
sudo gpg --no-default-keyring --keyring /usr/share/keyrings/k6-archive-keyring.gpg --keyserver hkp://keyserver.ubuntu.com:80 --recv-keys C5AD17C747E3415A3642D57D77C6C491D6AC1D69
echo "deb [signed-by=/usr/share/keyrings/k6-archive-keyring.gpg] https://dl.k6.io/deb stable main" | sudo tee /etc/apt/sources.list.d/k6.list
sudo apt-get update
sudo apt-get install k6On Windows, use the official installer from k6.io/docs/get-started/installation or install via Chocolatey:
choco install k6Once installed, verify it works:
k6 versionNow create a file called load-test.js and add this minimal script:
import http from 'k6/http';
export default function () {
http.get('https://test-api.k6.io/public/crocodiles/');
}Run it:
k6 run load-test.jsk6 uses test-api.k6.io as a public sandbox. You can run this against it without setting anything up. You'll see output in your terminal showing request counts, response times, and a summary table. That's your first load test running.
The Basic Test Anatomy
A real k6 script has three parts: the options export that controls how the test runs, the default function that contains your test logic, and any setup your requests need. Here's the structure:
import http from 'k6/http';
import { sleep } from 'k6';
export const options = {
vus: 10, // virtual users running concurrently
duration: '30s', // how long the test runs
};
export default function () {
http.get('https://your-api.example.com/api/products');
sleep(1); // think time between requests
}vus stands for virtual users. Each VU runs the default function in a loop for the full duration, independently of the others. Setting vus: 10 and duration: '30s' means 10 simulated users hitting your endpoint continuously for 30 seconds.
The sleep(1) call is not optional; it models the think time a real user would spend between actions. Without it, each VU fires requests as fast as the server can respond, which is unrealistic and will produce misleadingly high throughput numbers. A one-second sleep is a reasonable default; adjust it to match your users' actual behavior.
For POST requests with a JSON body, which you'll need for most API testing:
import http from 'k6/http';
import { sleep } from 'k6';
export const options = {
vus: 20,
duration: '1m',
};
export default function () {
const url = 'https://your-api.example.com/api/login';
const payload = JSON.stringify({
username: 'testuser@example.com',
password: 'testpassword',
});
const params = {
headers: { 'Content-Type': 'application/json' },
};
http.post(url, payload, params);
sleep(1);
}Reading k6 Output
After a test run, k6 prints a summary table to the terminal. Learning to read it quickly is the skill that turns raw numbers into actionable information.
A typical output looks like this:
✓ checks.........................: 100.00% ✓ 1800 ✗ 0
data_received..................: 4.3 MB 143 kB/s
data_sent......................: 523 kB 17 kB/s
http_req_blocked...............: avg=1.2ms min=1µs med=4µs max=312ms p(90)=7µs p(95)=11µs
http_req_connecting............: avg=743µs min=0s med=0s max=181ms p(90)=0s p(95)=0s
http_req_duration..............: avg=312ms min=98ms med=280ms max=2.1s p(90)=520ms p(95)=640ms
{ expected_response:true }...: avg=312ms min=98ms med=280ms max=2.1s p(90)=520ms p(95)=640ms
http_req_failed................: 0.00% ✓ 0 ✗ 1800
http_req_receiving.............: avg=2.1ms min=45µs med=1.1ms max=74ms p(90)=4.2ms p(95)=5.8ms
http_req_sending...............: avg=178µs min=27µs med=136µs max=3.2ms p(90)=312µs p(95)=389µs
http_req_tls_handshaking.......: avg=412µs min=0s med=0s max=168ms p(90)=0s p(95)=0s
http_req_waiting...............: avg=310ms min=97ms med=278ms max=2.1s p(90)=518ms p(95)=637ms
http_reqs......................: 1800 60.0/s
iteration_duration.............: avg=1.31s min=1.1s med=1.28s max=3.1s p(90)=1.52s p(95)=1.64s
iterations.....................: 1800 60.0/s
vus............................: 10 min=10 max=10
vus_max........................: 10 min=10 max=10The three metrics you look at first are:
http_req_duration is the total round-trip time for each request. The p(95) column tells you the 95th percentile: 95% of your requests completed within that time. This is the number that matters for SLAs, not the average. Averages hide outliers; p95 doesn't.
http_reqs shows total requests made and the rate per second. This tells you how much load you actually generated.
http_req_failed shows the percentage of requests that received an error response (4xx or 5xx). Zero percent is expected; anything above it requires investigation.
Adding Checks
Functional correctness under load matters as much as speed. k6 has a built-in assertion mechanism called checks that works similarly to expect statements in other test frameworks:
import http from 'k6/http';
import { check, sleep } from 'k6';
export const options = {
vus: 10,
duration: '30s',
};
export default function () {
const res = http.get('https://your-api.example.com/api/products');
check(res, {
'status is 200': (r) => r.status === 200,
'response time < 500ms': (r) => r.timings.duration < 500,
'body contains products': (r) => r.body.includes('"id"'),
});
sleep(1);
}The important difference from traditional assertions: a failed check does not stop the test. All VUs keep running. k6 counts how many checks passed and failed across all iterations and shows the percentage in the summary. This is intentional. You want to know what percentage of requests under load produced a correct response, not just whether one request failed.
For a login endpoint that returns a token, you'd write:
import http from 'k6/http';
import { check, sleep } from 'k6';
export default function () {
const payload = JSON.stringify({
username: 'user@example.com',
password: 'password123',
});
const res = http.post('https://your-api.example.com/auth/login', payload, {
headers: { 'Content-Type': 'application/json' },
});
check(res, {
'login succeeds': (r) => r.status === 200,
'token present': (r) => JSON.parse(r.body).token !== undefined,
'login fast enough': (r) => r.timings.duration < 1000,
});
sleep(1);
}Stages: Ramp Up, Sustain, Ramp Down
A flat constant load is rarely what you want. Real traffic ramps up as users arrive, holds at peak, then drops off. k6 models this with stages, which let you define how VU count changes over time:
import http from 'k6/http';
import { check, sleep } from 'k6';
export const options = {
stages: [
{ duration: '2m', target: 50 }, // ramp up to 50 VUs over 2 minutes
{ duration: '5m', target: 50 }, // hold at 50 VUs for 5 minutes
{ duration: '2m', target: 100 }, // ramp up to 100 VUs over 2 minutes
{ duration: '5m', target: 100 }, // hold at 100 VUs for 5 minutes
{ duration: '2m', target: 0 }, // ramp down to 0
],
};
export default function () {
const res = http.get('https://your-api.example.com/api/products');
check(res, {
'status is 200': (r) => r.status === 200,
'response time OK': (r) => r.timings.duration < 1000,
});
sleep(1);
}This pattern (ramp up, sustain, ramp up more, sustain, ramp down) is called a stress test profile. It reveals at which user count your system starts to degrade. You'll see the p95 response time in your results track steadily upward as VUs increase. The inflection point where it jumps sharply is your system's breaking point.
For most API testing you'll run a simpler two-phase test: ramp up then hold.
export const options = {
stages: [
{ duration: '1m', target: 50 }, // warm up
{ duration: '5m', target: 50 }, // sustained load
{ duration: '30s', target: 0 }, // ramp down
],
};The ramp-down phase matters. Cutting load abruptly can mask connection pool issues and gives you cleaner metrics at the tail of the test.
Thresholds: Making the Test Pass or Fail
Checks tell you what happened. Thresholds determine whether the test run succeeds or fails. A threshold is a pass/fail condition on any metric. If the condition is violated, k6 exits with a non-zero code. That's the hook that makes load tests useful in CI.
import http from 'k6/http';
import { check, sleep } from 'k6';
export const options = {
stages: [
{ duration: '1m', target: 50 },
{ duration: '5m', target: 50 },
{ duration: '30s', target: 0 },
],
thresholds: {
http_req_duration: ['p(95)<2000'], // 95th percentile must be under 2 seconds
http_req_failed: ['rate<0.01'], // fewer than 1% of requests can fail
checks: ['rate>0.99'], // more than 99% of checks must pass
},
};
export default function () {
const res = http.get('https://your-api.example.com/api/products');
check(res, {
'status is 200': (r) => r.status === 200,
'response time < 2s': (r) => r.timings.duration < 2000,
});
sleep(1);
}When k6 runs this script and the p95 response time exceeds 2000ms, the run exits with code 99. Your CI step fails. This is the correct behavior. A build that ships code making your API 40% slower should fail.
You can set thresholds per endpoint using tags:
export default function () {
const loginRes = http.post(
'https://your-api.example.com/auth/login',
JSON.stringify({ username: 'user@example.com', password: 'pass' }),
{ headers: { 'Content-Type': 'application/json' }, tags: { name: 'login' } }
);
const productsRes = http.get(
'https://your-api.example.com/api/products',
{ tags: { name: 'products' } }
);
}export const options = {
thresholds: {
'http_req_duration{name:login}': ['p(95)<1000'],
'http_req_duration{name:products}': ['p(95)<500'],
},
};Login is expected to be slower than a cached product list, so you give each endpoint its own SLA.
Running k6 in GitHub Actions CI
k6 runs cleanly in GitHub Actions. The Grafana team publishes an official action that installs the binary and handles version pinning:
# .github/workflows/load-test.yml
name: Load Tests
on:
push:
branches: [main]
pull_request:
branches: [main]
workflow_dispatch:
jobs:
load-test:
runs-on: ubuntu-latest
timeout-minutes: 30
steps:
- name: Checkout code
uses: actions/checkout@v4
- name: Install k6
uses: grafana/setup-k6-action@v1
with:
k6-version: '0.55.0'
- name: Run load test
run: k6 run load-test.js
env:
BASE_URL: ${{ vars.BASE_URL }}
API_KEY: ${{ secrets.API_KEY }}
- name: Upload k6 results
uses: actions/upload-artifact@v4
if: always()
with:
name: k6-results
path: k6-results.json
retention-days: 14To use environment variables inside your k6 script, access them through __ENV:
import http from 'k6/http';
import { check, sleep } from 'k6';
const BASE_URL = __ENV.BASE_URL || 'http://localhost:3000';
export const options = {
stages: [
{ duration: '1m', target: 20 },
{ duration: '3m', target: 20 },
{ duration: '30s', target: 0 },
],
thresholds: {
http_req_duration: ['p(95)<2000'],
http_req_failed: ['rate<0.01'],
},
};
export default function () {
const res = http.get(`${BASE_URL}/api/products`);
check(res, {
'status is 200': (r) => r.status === 200,
'response time acceptable': (r) => r.timings.duration < 2000,
});
sleep(1);
}To generate a JSON results file for the artifact upload, pass the --out flag:
k6 run --out json=k6-results.json load-test.jsOne practical recommendation: don't run load tests on every pull request against your production or shared staging environment. The load test job should run on merges to main, or be triggered via workflow_dispatch manually. Running heavy load tests on every PR branch causes interference between concurrent test runs. A smoke-level test (5 VUs, 30 seconds) on PRs is acceptable; a full load test is not.
What Endpoints to Test
Choosing what to test with k6 is as important as the test itself. Not every endpoint needs a load test. Focus on the ones where performance matters.
The login endpoint is the first test to write. Every user session starts here. A slow login multiplied by 500 concurrent users collapses the experience before anyone reaches the actual features. Test the full login flow: POST credentials, receive a token, make one authenticated request.import http from 'k6/http';
import { check, sleep } from 'k6';
const BASE_URL = __ENV.BASE_URL || 'http://localhost:3000';
export const options = {
stages: [
{ duration: '1m', target: 50 },
{ duration: '5m', target: 50 },
{ duration: '1m', target: 0 },
],
thresholds: {
http_req_duration: ['p(95)<1500'],
http_req_failed: ['rate<0.01'],
},
};
export default function () {
// Step 1: Login
const loginRes = http.post(
`${BASE_URL}/auth/login`,
JSON.stringify({ username: 'loadtest@example.com', password: 'loadtestpass' }),
{ headers: { 'Content-Type': 'application/json' } }
);
check(loginRes, {
'login status 200': (r) => r.status === 200,
'token returned': (r) => JSON.parse(r.body).token !== undefined,
});
const token = JSON.parse(loginRes.body).token;
// Step 2: Authenticated request
const profileRes = http.get(`${BASE_URL}/api/me`, {
headers: { Authorization: `Bearer ${token}` },
});
check(profileRes, {
'profile status 200': (r) => r.status === 200,
});
sleep(2);
}A reasonable starting point for most applications is three load test scripts: one for authentication, one for your top three read endpoints in a single script, and one for your conversion-critical write operation. That's enough coverage to catch the performance regressions that matter without building a full performance test suite before you have a baseline.
FAQ
Do I need a separate test environment to run k6?You need an environment that can absorb the load without affecting real users or corrupting production data. A dedicated load test environment is ideal. If you only have staging, run load tests off-hours and make sure your test data is isolated. Never point k6 at production with high VU counts.
How many VUs should I use?Start with a number that represents realistic peak traffic, not your theoretical maximum. If your application has 50 concurrent users on a busy day, test at 50 to 100. Finding what happens at 1,000 VUs matters less than knowing your system handles normal peak load comfortably.
My k6 test passes but the application feels slow. Why?k6 measures HTTP response time, which includes server processing but excludes client-side rendering. An API endpoint that responds in 200ms might still feel slow if the frontend makes 20 sequential calls to populate a page. Use k6 for API-level performance; use browser profiling tools for perceived page performance.
Can k6 test WebSocket or gRPC endpoints?Yes. k6 has built-in support for WebSockets via the k6/ws module and gRPC via the k6/net/grpc module. The scripting model is the same; only the protocol-specific API differs.
The JSON output file from --out json can be imported into Grafana for visualization. If your team uses Grafana Cloud, k6 has native integration that streams results in real time and stores them for comparison across runs. For teams without Grafana, the terminal summary table copied into a PR comment or Slack message is enough to communicate pass/fail and key percentiles.
Load testing verifies your system meets performance requirements at expected traffic levels. Stress testing pushes past expected levels to find the breaking point. The stages configuration handles both: a load test holds at a target VU count, a stress test keeps ramping until checks fail.
→ See also: Performance Testing Basics: Load, Stress, and Spike Testing Explained | API Testing 101: What Every QA Engineer Needs to Know in 2026 | CI/CD for QA: GitHub Actions, Jenkins, and GitLab Compared