Performance Testing with k6: The QA Engineer's First Load Test

Unlike Playwright assertions, a failed k6 check does not stop the test: all virtual users keep running, and k6 counts pass/fail rates across thousands of requests to show the health signal in the summary. What stops the run is a failed threshold: if p95 response time exceeds the configured limit, k6 exits with code 99 and the CI step fails exactly like a failing test. This tutorial walks through installing k6, writing a first script with realistic sleep() think time, reading the output table, configuring ramp-up stages, and setting thresholds that make performance regressions visible in CI.

What k6 Is and Why QA Engineers Should Know It

k6 is an open source performance testing tool built by Grafana Labs. You write test scripts in JavaScript or TypeScript, run them from the CLI, and get detailed output about how your system behaved under load. The GitHub repository has over 24,000 stars and it's used across organizations of every size.

What makes k6 the right entry point for QA engineers specifically is the scripting model. If you already write Playwright or Jest tests, the syntax is familiar: you import modules, write functions, make HTTP requests, and add assertions. There's no heavy GUI, no proprietary recording format, no XML configuration file. Just a script.

k6 also integrates cleanly with GitHub Actions. You install the binary, run your script, and if the test fails (more on what "fails" means later), the CI step exits with a non-zero code. That's the same contract every other test tool uses.

The tool is not a replacement for Playwright. Playwright verifies behavior; k6 measures performance under load. They answer different questions and both belong in a mature QA strategy.

Installing k6 and Running Your First Test

k6 ships as a standalone binary. On macOS:

brew install k6

On Linux (Debian/Ubuntu):

sudo gpg -k
sudo gpg --no-default-keyring --keyring /usr/share/keyrings/k6-archive-keyring.gpg --keyserver hkp://keyserver.ubuntu.com:80 --recv-keys C5AD17C747E3415A3642D57D77C6C491D6AC1D69
echo "deb [signed-by=/usr/share/keyrings/k6-archive-keyring.gpg] https://dl.k6.io/deb stable main" | sudo tee /etc/apt/sources.list.d/k6.list
sudo apt-get update
sudo apt-get install k6

On Windows, use the official installer from k6.io/docs/get-started/installation or install via Chocolatey:

choco install k6

Once installed, verify it works:

k6 version

Now create a file called load-test.js and add this minimal script:

import http from 'k6/http';

export default function () {
  http.get('https://test-api.k6.io/public/crocodiles/');
}

Run it:

k6 run load-test.js

k6 uses test-api.k6.io as a public sandbox. You can run this against it without setting anything up. You'll see output in your terminal showing request counts, response times, and a summary table. That's your first load test running.

The Basic Test Anatomy

A real k6 script has three parts: the options export that controls how the test runs, the default function that contains your test logic, and any setup your requests need. Here's the structure:

import http from 'k6/http';
import { sleep } from 'k6';

export const options = {
  vus: 10,        // virtual users running concurrently
  duration: '30s', // how long the test runs
};

export default function () {
  http.get('https://your-api.example.com/api/products');
  sleep(1); // think time between requests
}

vus stands for virtual users. Each VU runs the default function in a loop for the full duration, independently of the others. Setting vus: 10 and duration: '30s' means 10 simulated users hitting your endpoint continuously for 30 seconds.

The sleep(1) call is not optional; it models the think time a real user would spend between actions. Without it, each VU fires requests as fast as the server can respond, which is unrealistic and will produce misleadingly high throughput numbers. A one-second sleep is a reasonable default; adjust it to match your users' actual behavior.

k6 virtual users are not threads or processes; they're lightweight goroutines. A single k6 instance can realistically run thousands of VUs without consuming a proportional amount of memory or CPU. You don't need a cluster to run a useful load test.

For POST requests with a JSON body, which you'll need for most API testing:

import http from 'k6/http';
import { sleep } from 'k6';

export const options = {
  vus: 20,
  duration: '1m',
};

export default function () {
  const url = 'https://your-api.example.com/api/login';
  const payload = JSON.stringify({
    username: 'testuser@example.com',
    password: 'testpassword',
  });
  const params = {
    headers: { 'Content-Type': 'application/json' },
  };

  http.post(url, payload, params);
  sleep(1);
}

Reading k6 Output

After a test run, k6 prints a summary table to the terminal. Learning to read it quickly is the skill that turns raw numbers into actionable information.

A typical output looks like this:

✓ checks.........................: 100.00% ✓ 1800 ✗ 0
  data_received..................: 4.3 MB  143 kB/s
  data_sent......................: 523 kB  17 kB/s
  http_req_blocked...............: avg=1.2ms    min=1µs    med=4µs    max=312ms  p(90)=7µs    p(95)=11µs
  http_req_connecting............: avg=743µs    min=0s     med=0s     max=181ms  p(90)=0s     p(95)=0s
  http_req_duration..............: avg=312ms    min=98ms   med=280ms  max=2.1s   p(90)=520ms  p(95)=640ms
    { expected_response:true }...: avg=312ms    min=98ms   med=280ms  max=2.1s   p(90)=520ms  p(95)=640ms
  http_req_failed................: 0.00%   ✓ 0 ✗ 1800
  http_req_receiving.............: avg=2.1ms    min=45µs   med=1.1ms  max=74ms   p(90)=4.2ms  p(95)=5.8ms
  http_req_sending...............: avg=178µs    min=27µs   med=136µs  max=3.2ms  p(90)=312µs  p(95)=389µs
  http_req_tls_handshaking.......: avg=412µs    min=0s     med=0s     max=168ms  p(90)=0s     p(95)=0s
  http_req_waiting...............: avg=310ms    min=97ms   med=278ms  max=2.1s   p(90)=518ms  p(95)=637ms
  http_reqs......................: 1800    60.0/s
  iteration_duration.............: avg=1.31s    min=1.1s   med=1.28s  max=3.1s   p(90)=1.52s  p(95)=1.64s
  iterations.....................: 1800    60.0/s
  vus............................: 10      min=10     max=10
  vus_max........................: 10      min=10     max=10

The three metrics you look at first are:

http_req_duration is the total round-trip time for each request. The p(95) column tells you the 95th percentile: 95% of your requests completed within that time. This is the number that matters for SLAs, not the average. Averages hide outliers; p95 doesn't. http_reqs shows total requests made and the rate per second. This tells you how much load you actually generated. http_req_failed shows the percentage of requests that received an error response (4xx or 5xx). Zero percent is expected; anything above it requires investigation.

Adding Checks

Functional correctness under load matters as much as speed. k6 has a built-in assertion mechanism called checks that works similarly to expect statements in other test frameworks:

import http from 'k6/http';
import { check, sleep } from 'k6';

export const options = {
  vus: 10,
  duration: '30s',
};

export default function () {
  const res = http.get('https://your-api.example.com/api/products');

  check(res, {
    'status is 200': (r) => r.status === 200,
    'response time < 500ms': (r) => r.timings.duration < 500,
    'body contains products': (r) => r.body.includes('"id"'),
  });

  sleep(1);
}

The important difference from traditional assertions: a failed check does not stop the test. All VUs keep running. k6 counts how many checks passed and failed across all iterations and shows the percentage in the summary. This is intentional. You want to know what percentage of requests under load produced a correct response, not just whether one request failed.

Use checks for everything you'd normally assert in a functional test: status code, response time, presence of expected fields in the response body. The check pass rate in the summary gives you a combined health signal across thousands of requests.

For a login endpoint that returns a token, you'd write:

import http from 'k6/http';
import { check, sleep } from 'k6';

export default function () {
  const payload = JSON.stringify({
    username: 'user@example.com',
    password: 'password123',
  });

  const res = http.post('https://your-api.example.com/auth/login', payload, {
    headers: { 'Content-Type': 'application/json' },
  });

  check(res, {
    'login succeeds': (r) => r.status === 200,
    'token present': (r) => JSON.parse(r.body).token !== undefined,
    'login fast enough': (r) => r.timings.duration < 1000,
  });

  sleep(1);
}

Stages: Ramp Up, Sustain, Ramp Down

A flat constant load is rarely what you want. Real traffic ramps up as users arrive, holds at peak, then drops off. k6 models this with stages, which let you define how VU count changes over time:

import http from 'k6/http';
import { check, sleep } from 'k6';

export const options = {
  stages: [
    { duration: '2m', target: 50 },   // ramp up to 50 VUs over 2 minutes
    { duration: '5m', target: 50 },   // hold at 50 VUs for 5 minutes
    { duration: '2m', target: 100 },  // ramp up to 100 VUs over 2 minutes
    { duration: '5m', target: 100 },  // hold at 100 VUs for 5 minutes
    { duration: '2m', target: 0 },    // ramp down to 0
  ],
};

export default function () {
  const res = http.get('https://your-api.example.com/api/products');

  check(res, {
    'status is 200': (r) => r.status === 200,
    'response time OK': (r) => r.timings.duration < 1000,
  });

  sleep(1);
}

This pattern (ramp up, sustain, ramp up more, sustain, ramp down) is called a stress test profile. It reveals at which user count your system starts to degrade. You'll see the p95 response time in your results track steadily upward as VUs increase. The inflection point where it jumps sharply is your system's breaking point.

For most API testing you'll run a simpler two-phase test: ramp up then hold.

export const options = {
  stages: [
    { duration: '1m', target: 50 },  // warm up
    { duration: '5m', target: 50 },  // sustained load
    { duration: '30s', target: 0 },  // ramp down
  ],
};

The ramp-down phase matters. Cutting load abruptly can mask connection pool issues and gives you cleaner metrics at the tail of the test.

Thresholds: Making the Test Pass or Fail

Checks tell you what happened. Thresholds determine whether the test run succeeds or fails. A threshold is a pass/fail condition on any metric. If the condition is violated, k6 exits with a non-zero code. That's the hook that makes load tests useful in CI.

import http from 'k6/http';
import { check, sleep } from 'k6';

export const options = {
  stages: [
    { duration: '1m', target: 50 },
    { duration: '5m', target: 50 },
    { duration: '30s', target: 0 },
  ],
  thresholds: {
    http_req_duration: ['p(95)<2000'],   // 95th percentile must be under 2 seconds
    http_req_failed: ['rate<0.01'],      // fewer than 1% of requests can fail
    checks: ['rate>0.99'],              // more than 99% of checks must pass
  },
};

export default function () {
  const res = http.get('https://your-api.example.com/api/products');

  check(res, {
    'status is 200': (r) => r.status === 200,
    'response time < 2s': (r) => r.timings.duration < 2000,
  });

  sleep(1);
}

When k6 runs this script and the p95 response time exceeds 2000ms, the run exits with code 99. Your CI step fails. This is the correct behavior. A build that ships code making your API 40% slower should fail.

You can set thresholds per endpoint using tags:

export default function () {
  const loginRes = http.post(
    'https://your-api.example.com/auth/login',
    JSON.stringify({ username: 'user@example.com', password: 'pass' }),
    { headers: { 'Content-Type': 'application/json' }, tags: { name: 'login' } }
  );

  const productsRes = http.get(
    'https://your-api.example.com/api/products',
    { tags: { name: 'products' } }
  );
}

export const options = {
  thresholds: {
    'http_req_duration{name:login}': ['p(95)<1000'],
    'http_req_duration{name:products}': ['p(95)<500'],
  },
};

Thresholds are evaluated continuously during the test, not just at the end. If your p95 exceeds the limit early in the run and recovers later, the threshold is still marked as failed. This is intentional. A system that degrades and recovers may be hiding a capacity problem you need to investigate.

Running k6 in GitHub Actions CI

k6 runs cleanly in GitHub Actions. The Grafana team publishes an official action that installs the binary and handles version pinning:

# .github/workflows/load-test.yml
name: Load Tests

on:
  push:
    branches: [main]
  pull_request:
    branches: [main]
  workflow_dispatch:

jobs:
  load-test:
    runs-on: ubuntu-latest
    timeout-minutes: 30

    steps:
      - name: Checkout code
        uses: actions/checkout@v4

      - name: Install k6
        uses: grafana/setup-k6-action@v1
        with:
          k6-version: '0.55.0'

      - name: Run load test
        run: k6 run load-test.js
        env:
          BASE_URL: ${{ vars.BASE_URL }}
          API_KEY: ${{ secrets.API_KEY }}

      - name: Upload k6 results
        uses: actions/upload-artifact@v4
        if: always()
        with:
          name: k6-results
          path: k6-results.json
          retention-days: 14

To use environment variables inside your k6 script, access them through __ENV:

import http from 'k6/http';
import { check, sleep } from 'k6';

const BASE_URL = __ENV.BASE_URL || 'http://localhost:3000';

export const options = {
  stages: [
    { duration: '1m', target: 20 },
    { duration: '3m', target: 20 },
    { duration: '30s', target: 0 },
  ],
  thresholds: {
    http_req_duration: ['p(95)<2000'],
    http_req_failed: ['rate<0.01'],
  },
};

export default function () {
  const res = http.get(`${BASE_URL}/api/products`);

  check(res, {
    'status is 200': (r) => r.status === 200,
    'response time acceptable': (r) => r.timings.duration < 2000,
  });

  sleep(1);
}

To generate a JSON results file for the artifact upload, pass the --out flag:

k6 run --out json=k6-results.json load-test.js

One practical recommendation: don't run load tests on every pull request against your production or shared staging environment. The load test job should run on merges to main, or be triggered via workflow_dispatch manually. Running heavy load tests on every PR branch causes interference between concurrent test runs. A smoke-level test (5 VUs, 30 seconds) on PRs is acceptable; a full load test is not.

What Endpoints to Test

Choosing what to test with k6 is as important as the test itself. Not every endpoint needs a load test. Focus on the ones where performance matters.

The login endpoint is the first test to write. Every user session starts here. A slow login multiplied by 500 concurrent users collapses the experience before anyone reaches the actual features. Test the full login flow: POST credentials, receive a token, make one authenticated request.

import http from 'k6/http';
import { check, sleep } from 'k6';

const BASE_URL = __ENV.BASE_URL || 'http://localhost:3000';

export const options = {
  stages: [
    { duration: '1m', target: 50 },
    { duration: '5m', target: 50 },
    { duration: '1m', target: 0 },
  ],
  thresholds: {
    http_req_duration: ['p(95)<1500'],
    http_req_failed: ['rate<0.01'],
  },
};

export default function () {
  // Step 1: Login
  const loginRes = http.post(
    `${BASE_URL}/auth/login`,
    JSON.stringify({ username: 'loadtest@example.com', password: 'loadtestpass' }),
    { headers: { 'Content-Type': 'application/json' } }
  );

  check(loginRes, {
    'login status 200': (r) => r.status === 200,
    'token returned': (r) => JSON.parse(r.body).token !== undefined,
  });

  const token = JSON.parse(loginRes.body).token;

  // Step 2: Authenticated request
  const profileRes = http.get(`${BASE_URL}/api/me`, {
    headers: { Authorization: `Bearer ${token}` },
  });

  check(profileRes, {
    'profile status 200': (r) => r.status === 200,
  });

  sleep(2);
}

Critical read endpoints are the second priority. Product listing, search results, dashboard data: anything users access repeatedly and that hits your database. These are the endpoints most likely to show degradation at scale because they aggregate data across multiple rows or tables. The checkout or order submission flow is the third. This is the endpoint where slow performance has direct revenue impact. Test it at a lower VU count than your read endpoints (real checkout concurrency is lower than browse concurrency) but hold it to a tighter response time threshold.

A reasonable starting point for most applications is three load test scripts: one for authentication, one for your top three read endpoints in a single script, and one for your conversion-critical write operation. That's enough coverage to catch the performance regressions that matter without building a full performance test suite before you have a baseline.

FAQ

Do I need a separate test environment to run k6?

You need an environment that can absorb the load without affecting real users or corrupting production data. A dedicated load test environment is ideal. If you only have staging, run load tests off-hours and make sure your test data is isolated. Never point k6 at production with high VU counts.

How many VUs should I use?

Start with a number that represents realistic peak traffic, not your theoretical maximum. If your application has 50 concurrent users on a busy day, test at 50 to 100. Finding what happens at 1,000 VUs matters less than knowing your system handles normal peak load comfortably.

My k6 test passes but the application feels slow. Why?

k6 measures HTTP response time, which includes server processing but excludes client-side rendering. An API endpoint that responds in 200ms might still feel slow if the frontend makes 20 sequential calls to populate a page. Use k6 for API-level performance; use browser profiling tools for perceived page performance.

Can k6 test WebSocket or gRPC endpoints?

Yes. k6 has built-in support for WebSockets via the k6/ws module and gRPC via the k6/net/grpc module. The scripting model is the same; only the protocol-specific API differs.

How do I share k6 results with my team?

The JSON output file from --out json can be imported into Grafana for visualization. If your team uses Grafana Cloud, k6 has native integration that streams results in real time and stores them for comparison across runs. For teams without Grafana, the terminal summary table copied into a PR comment or Slack message is enough to communicate pass/fail and key percentiles.

What's the difference between a load test and a stress test?

Load testing verifies your system meets performance requirements at expected traffic levels. Stress testing pushes past expected levels to find the breaking point. The stages configuration handles both: a load test holds at a target VU count, a stress test keeps ramping until checks fail.

What k6 Is and Why QA Engineers Should Know It

Installing k6 and Running Your First Test

The Basic Test Anatomy

Reading k6 Output

Adding Checks

Stages: Ramp Up, Sustain, Ramp Down

Thresholds: Making the Test Pass or Fail

Running k6 in GitHub Actions CI

What Endpoints to Test

FAQ

Continue reading