🔥 Ember112k RPS. 25 MB RAM. Pure Python.

The fastest Python web framework — built on llhttp, Cython, and io_uring. 6× faster than FastAPI on `/hello`, 10× faster on real PostgreSQL CRUD.

Get Started →

112,000 RPS single-thread

llhttp C parser, Cython hot paths, and an io_uring event loop. p50 1.68 ms, p99 4.35 ms, 25 MB RSS — only Go Fiber goes faster, and only barely.

See benchmarks →

🤖

AI-first by design

ai_route(), SSEResponse, ConversationContext, ToolRegistry, ModelRouter, SemanticCache, token-aware rate limits. Every LLM-API primitive is built in.

AI reference →

🪶

25 MB at idle, 25 MB under load

Tunable io_uring buffer pool, lazy AI/cache imports, in-process workers=1 mode. Fits in containers Node.js can't even boot in.

How we shrunk it →

🧵

Multi-process scaling

Fork-based workers share a SO_REUSEPORT socket. Kernel load-balances connections — no master overhead. Crashed workers auto-revive.

How it works →

🌊

Native SSE & streaming

SSEResponse, sse_stream(), TokenStreamResponse — zero-copy LLM token output. Built for chat/completions, not bolted on like ASGI.

SSE API →

🛠️

Production from day one

Graceful shutdown, dead-worker revival, keep-alive reaper, tunable kernel backlog, Redis & Memcached caches, CORS / Bearer / API-Key middleware.

Deploy →

Benchmarks — same machine, identical workload

Single-worker hello-world on an Intel i7-14700, k6 200 VUs / 20 s, 0% errors.

Framework	RPS	p50 (ms)	p99 (ms)	Peak RSS
Fiber (Go)	140,993	1.21	3.96	9 MB
Ember (Python)	112,177	1.68	4.35	25 MB
Express (Node)	26,357	7.09	13.57	131 MB
NestJS (Node)	23,528	8.08	13.75	158 MB
FastAPI (Python)	17,517	9.45	30.86	49 MB

Ember is 6.4× FastAPI, 4.3× Express, and 4.8× NestJS on identical hardware — at 25 MB RSS, half FastAPI's footprint and 5× lighter than Node. Fiber stays ahead on raw throughput, but the gap is now ~20% rather than 75%.

Full methodology and reproducible scripts →

Hello, Ember

python

from ember import Ember

app = Ember()

@app.get("/")
async def index():
    return {"hello": "world"}

app.run(host="0.0.0.0", port=8000)

bash

pip install ember-api
python app.py
# 112k RPS at 25 MB RSS. No tuning required.

Built for AI workloads

python

from ember import (
    Ember, Request, SSEResponse,
    ConversationContext, sse_stream
)

app = Ember()

@app.ai_route("/v1/chat",
              methods=["POST"], streaming=True)
async def chat(
    request: Request,
    context: ConversationContext,
) -> SSEResponse:
    body = await request.json()
    context.add_message("user", body["message"])
    return sse_stream(token_stream(body["message"]))

Built for the workloads that matter

From microservices to LLM gateways — Ember handles them all.

LLM API gateways

Native ai_route(), ConversationContext, ModelRouter with fallback strategies, SemanticCache for vector lookups, and token-bucket rate limiting that counts tokens — not requests.

High-throughput APIs

112k RPS single-thread means a 4-worker box reliably handles 400k+ RPS. Cython router with O(1) static dispatch, llhttp parser, multishot recv on Linux 5.1+.

Edge & sidecar deployments

25 MB RSS at idle means Ember fits in containers most Python frameworks can't even boot in. Ideal for sidecars, Functions, Kubernetes microservices.

Real-time streaming

SSE-first design: SSEResponse, sse_stream(), TokenStreamResponse — zero-copy from generator to wire. Built for chat completions, log tails, live dashboards.

Background-job APIs

Multi-process workers with SO_REUSEPORT, kernel load-balancing, dead-worker revival, graceful shutdown, keep-alive reaper. Production semantics out of the box.

Replace FastAPI without rewriting

Decorator-based routing, type-hint dependency injection, async handlers, JSON responses with orjson — the patterns you already know, but 6× faster.

Ship faster APIs.

Pip-installable. Cython-compiled wheels for Linux x86_64/aarch64 and macOS arm64/x86_64.

Read the docs →Star on GitHub

🔥 Ember112k RPS. 25 MB RAM. Pure Python.

112,000 RPS single-thread

AI-first by design

25 MB at idle, 25 MB under load

Multi-process scaling

Native SSE & streaming

Production from day one

Benchmarks — same machine, identical workload ​

Hello, Ember ​

Built for AI workloads ​

Built for the workloads that matter ​

LLM API gateways ​

High-throughput APIs ​

Edge & sidecar deployments ​

Real-time streaming ​

Background-job APIs ​

Replace FastAPI without rewriting ​

Ship faster APIs. ​