API Design — REST, GraphQL, gRPC, Versioning, Pagination, Errors
Session 31 of the 48-session learning series.
Why this session matters
This is Session 31 of 48 in the System Design track. An API is the public face of your system. Bad APIs are renegotiated for years; good ones outlive the teams that built them. Knowing where REST shines, where GraphQL pays off, and where gRPC dominates is table-stakes for any senior engineer.
Agenda
- REST done well — resources, HTTP verbs, status codes, HATEOAS (or not)
- GraphQL — the schema, query, mutation, subscription model; n+1 trap
- gRPC + Protobuf — when binary + streaming wins
- API versioning, deprecation, pagination, errors
- Idempotency, rate limiting, auth — production hygiene
Pre-read (skim before the session)
- Roy Fielding — Architectural Styles (PhD thesis, 2000) — Ch. 5
- JSON:API spec
- Google API Design Guide
- GraphQL Best Practices
Deep dive
1. The 3 dominant API styles in 2026
| Style | Format | Schema | Streaming | Best at |
|---|---|---|---|---|
| REST | JSON over HTTP | Optional (OpenAPI) | Limited (SSE) | Public, cacheable resource APIs |
| GraphQL | JSON over HTTP | Mandatory (SDL) | Subscriptions | Mobile/web with varying needs |
| gRPC | Protobuf over HTTP/2 | Mandatory (proto) | Native bidi | Internal microservices, low-latency |
There is no "best". Pick by use-case. Most companies have all three.
2. REST done well
GET /v1/users/42 → fetch one
GET /v1/users?role=admin → list, query params for filter
POST /v1/users → create
PUT /v1/users/42 → replace
PATCH /v1/users/42 → partial update
DELETE /v1/users/42 → delete
Status codes matter:
- 2xx success —
200,201 Created(withLocation: /users/43),204 No Content. - 3xx redirect — rare in APIs.
- 4xx client errors —
400(bad input),401(unauth),403(forbidden),404(not found),409(conflict),422(validation),429(rate limit). - 5xx server errors —
500,502,503(overload),504(timeout).
Don't return 200 OK with {"error": "..."}. That breaks every client retry policy on the planet.
3. Resource shape
- Plural noun for collections:
/users, not/user. - Sub-resources for relationships:
/users/42/orders. - Don't put verbs in URLs (
/getUser,/activateUser) — use HTTP verbs and resource state. - Reserved exception: actions that don't map to CRUD —
POST /users/42:resetPassword(Google-style colon-action).
4. GraphQL — the pitch
One endpoint (/graphql). Client describes exactly what it needs:
query {
user(id: "42") {
name
orders(last: 5) {
id
total
items { name price }
}
}
}
Pros:
- No over-fetching (mobile users save bandwidth).
- No under-fetching (fewer round-trips).
- Typed schema; clients can codegen.
- Single endpoint; easier ops.
Cons:
- N+1 query problem (must use DataLoader-style batching).
- Caching is harder (no URL → response mapping).
- Authorisation is per-field, not per-endpoint (more places to mess up).
- Costly arbitrary queries — depth/complexity limits required.
5. gRPC — when binary wins
Protobuf-defined service:
service UserService {
rpc GetUser(GetUserRequest) returns (User);
rpc StreamEvents(EventFilter) returns (stream Event);
rpc UploadFile(stream FileChunk) returns (UploadResult);
rpc Chat(stream Message) returns (stream Message);
}
Pros:
- Strongly typed, generated stubs in 10+ languages.
- Binary on the wire — 3–10× smaller than JSON.
- HTTP/2 multiplexing, head-of-line blocking gone.
- Native streaming (unary, server, client, bidi).
Cons:
- Not browser-native — needs grpc-web proxy.
- Binary = harder to debug with curl (use
grpcurl). - HTTP/2 issues with some old infrastructure.
Default for internal microservices in any language-polyglot org.
6. Versioning
Three schools:
- URL versioning —
/v1/users,/v2/users. Most popular, easiest to reason about, ugliest. - Header versioning —
Accept: application/vnd.myapi.v2+json. Cleaner URL, harder to debug. - No versioning, only deprecation — additive changes only; never break. Stripe pioneered; high discipline required.
For a startup → URL versioning. Migrate to Stripe-style additive once you have public partners.
7. Pagination
| Style | How | Pros | Cons |
|---|---|---|---|
| Offset/limit | ?page=2&size=20 | Simple, jump to page | Slow on deep pages; broken on insert |
| Cursor | ?cursor=abc&size=20 | Stable across inserts; fast | No jump-to-page; opaque cursor |
| Keyset (seek) | ?after_id=12345&size=20 | Fastest; index-friendly | Sortable column required |
For infinite scroll or sync APIs → cursor. For human-facing tables → offset. Don't allow page=100000 — set a cap.
8. Error responses
Consistent shape:
{
"error": {
"code": "USER_NOT_FOUND",
"message": "User 42 not found",
"details": [
{"field": "user_id", "issue": "no_such_record"}
],
"trace_id": "abc-123"
}
}
code— machine-readable enum; never change.message— human; safe to display.details— per-field issues for form validation.trace_id— for support tickets.
Document every code. Clients will switch on it.
9. Idempotency
A retried POST should not create two orders. Implement via:
- Client supplies
Idempotency-Key: \<uuid>header. - Server stores key → response for 24h.
- Same key + same body → return cached response.
- Same key + different body →
409 Conflict.
Stripe-style. Mandatory for any POST that has side-effects + money.
10. Rate limiting
Two strategies often combined:
- Per-API-key quota (
X-RateLimit-Remainingheader). - Global per-resource burst (token bucket).
Headers (standard):
X-RateLimit-Limit: 100
X-RateLimit-Remaining: 23
X-RateLimit-Reset: 1735689600
Retry-After: 30
Return 429 Too Many Requests + Retry-After. Clients can back off intelligently.
11. Auth
Common patterns:
- API keys — simple, machine-to-machine. Rotate periodically.
- OAuth 2.0 — third-party consent (login with Google).
- JWT bearer — short-lived; signed claims; stateless verification.
- mTLS — service-to-service; cert-based; zero password leakage.
- OIDC + JWT — modern combo for SaaS B2B.
Don't roll your own auth. Use Auth0/Clerk/Cognito/Keycloak.
12. Documentation
Non-negotiable artefacts:
- OpenAPI spec (REST) — generated from code; renders in Swagger UI / ReDoc.
- GraphQL introspection — schema browsable in GraphiQL.
.protofiles — gRPC; generate docs viaprotoc-gen-doc.- Postman / Bruno collection — examples that actually run.
- Changelog — every breaking change, with a date.
API without docs = API that doesn't exist.
13. Reality check
A modern API stack:
- REST + OpenAPI for public.
- gRPC for internal high-traffic services.
- GraphQL for mobile/web aggregation layer (BFF pattern).
- A gateway in front (Kong, Envoy) for auth, rate limit, observability.
- Versioning at URL.
- Postman + Swagger for docs.
You don't need all three. Most teams should pick REST first and add the others when actual pain demands it.
Reading material
Books:
- RESTful Web APIs — Leonard Richardson, Mike Amundsen, Sam Ruby (the deep REST book; hypermedia, HTTP, status codes)
- Designing Web APIs — Brenda Jin, Saurabh Sahni, Amir Shevat (the Slack-team practitioner book; pragmatic)
- API Design Patterns — JJ Geewax (Google; the "Google API design guide" turned into a 600-page book)
- Building Microservices, 2nd ed. — Sam Newman (the service-boundary + contract chapters)
Papers:
- Roy Fielding — Architectural Styles and the Design of Network-Based Software Architectures (REST dissertation) — Fielding's 2000 PhD; where REST was actually defined.
- GraphQL: A Data Query Language — Facebook 2015 — the original engineering announcement.
Official docs:
- Google API Design Guide — the most opinionated public guide; the source for AIP style.
- Microsoft REST API Guidelines — the other major public style guide.
- Stripe API reference — the de-facto "what a great REST API looks like" reference.
- gRPC documentation — protocol + language guides + best practices.
- GraphQL official spec — the language definition.
Blog posts:
- Zalando RESTful API Guidelines — the most thorough public style guide; 200+ rules with examples.
- Phil Sturgeon — JSON API: explained — the modern critique of REST and where JSON:API fits.
- Lee Byron (GraphQL co-creator) — GraphQL at Facebook — the retrospective from the inside.
- Cindy Sridharan — Notes on distributed systems for young bloods — the canonical "what to think about with internal APIs" essay.
- Martin Fowler — Microservices — the canonical primer; sets the context for API boundaries.
In-depth research material
- grpc-go — github.com/grpc/grpc-go — ~21k ★, the Go reference implementation; canonical reading for understanding gRPC internals.
- grpc — github.com/grpc/grpc — ~42k ★, the C++ core used by Java/Python/Ruby bindings.
- graphql-js — github.com/graphql/graphql-js — ~20k ★, the JS reference implementation.
- Apollo Server — github.com/apollographql/apollo-server — ~14k ★, the production-grade GraphQL server.
- OpenAPI Specification — github.com/OAI/OpenAPI-Specification — ~29k ★, the OpenAPI spec repo.
- Stripe blog — Designing robust and predictable APIs with idempotency — the canonical write-up on idempotency keys.
- Slack Engineering — How Slack built shared channels — multi-tenant API design at scale.
- Netflix Tech Blog — Migrating Netflix to GraphQL Safely — REST → GraphQL migration without downtime.
- Shopify Engineering — Building Shopify's API at scale — the multi-API-type strategy from a $1T platform.
- Cloudflare blog — Why we use REST instead of GraphQL — the contrarian view from people who do operate APIs at scale.
Videos
- REST vs GraphQL vs gRPC — Hussein Nasser — 39 min — the comparison video everyone watches first; honest tradeoffs.
- What is GraphQL? — Lee Byron (co-creator) — 33 min — the talk from a creator; the why before the how.
- gRPC Crash Course — Tech with Tim — 31 min — concept + Python implementation in one sitting.
- How Stripe Designs APIs — Brandur Leach (Stripe) — 35 min — the canonical "how a great API gets designed" talk.
- API Design Patterns — JJ Geewax (Google) — 51 min — the book in talk form, from a Google API tech lead.
LeetCode — Design Rate Limiter
- Link: https://leetcode.com/problems/design-rate-limiter/
- Difficulty: Medium
- Why this problem: Token bucket / sliding window — the canonical API hygiene primitive.
- Time-box: 30 minutes. Look up the editorial only after.
Post-session checklist
By the end of this session you should be able to:
- Pick REST, GraphQL, or gRPC for a given use-case with one-sentence justification.
- List the 4xx codes for: invalid input, auth missing, auth wrong, conflict, rate-limit.
- Design an idempotent POST endpoint with
Idempotency-Key. - Explain cursor vs offset pagination and when each fails.
- Write an error response with
code,message,details,trace_id. - Solve
design-rate-limiter— sliding-window or token-bucket implementation.
Generated from sessions_data.py + content_part*.py. To edit a video / leetcode / title, edit the data file and re-run write_sessions.py.