112,000 RPS single-thread
llhttp C parser, Cython hot paths, and an io_uring event loop. p50 1.68 ms, p99 4.35 ms, 25 MB RSS — only Go Fiber goes faster, and only barely.
See benchmarks →
The fastest Python web framework — built on llhttp, Cython, and io_uring. 6× faster than FastAPI on `/hello`, 10× faster on real PostgreSQL CRUD.
Single-worker hello-world on an Intel i7-14700, k6 200 VUs / 20 s, 0% errors.
| Framework | RPS | p50 (ms) | p99 (ms) | Peak RSS |
|---|---|---|---|---|
| Fiber (Go) | 140,993 | 1.21 | 3.96 | 9 MB |
| Ember (Python) | 112,177 | 1.68 | 4.35 | 25 MB |
| Express (Node) | 26,357 | 7.09 | 13.57 | 131 MB |
| NestJS (Node) | 23,528 | 8.08 | 13.75 | 158 MB |
| FastAPI (Python) | 17,517 | 9.45 | 30.86 | 49 MB |
Ember is 6.4× FastAPI, 4.3× Express, and 4.8× NestJS on identical hardware — at 25 MB RSS, half FastAPI's footprint and 5× lighter than Node. Fiber stays ahead on raw throughput, but the gap is now ~20% rather than 75%.
from ember import Ember
app = Ember()
@app.get("/")
async def index():
return {"hello": "world"}
app.run(host="0.0.0.0", port=8000)pip install ember-api
python app.py
# 112k RPS at 25 MB RSS. No tuning required.from ember import (
Ember, Request, SSEResponse,
ConversationContext, sse_stream
)
app = Ember()
@app.ai_route("/v1/chat",
methods=["POST"], streaming=True)
async def chat(
request: Request,
context: ConversationContext,
) -> SSEResponse:
body = await request.json()
context.add_message("user", body["message"])
return sse_stream(token_stream(body["message"]))From microservices to LLM gateways — Ember handles them all.
Native ai_route(), ConversationContext, ModelRouter with fallback strategies, SemanticCache for vector lookups, and token-bucket rate limiting that counts tokens — not requests.
112k RPS single-thread means a 4-worker box reliably handles 400k+ RPS. Cython router with O(1) static dispatch, llhttp parser, multishot recv on Linux 5.1+.
25 MB RSS at idle means Ember fits in containers most Python frameworks can't even boot in. Ideal for sidecars, Functions, Kubernetes microservices.
SSE-first design: SSEResponse, sse_stream(), TokenStreamResponse — zero-copy from generator to wire. Built for chat completions, log tails, live dashboards.
Multi-process workers with SO_REUSEPORT, kernel load-balancing, dead-worker revival, graceful shutdown, keep-alive reaper. Production semantics out of the box.
Decorator-based routing, type-hint dependency injection, async handlers, JSON responses with orjson — the patterns you already know, but 6× faster.
Pip-installable. Cython-compiled wheels for Linux x86_64/aarch64 and macOS arm64/x86_64.