Skip to content

🔥 Ember112k RPS. 25 MB RAM. Pure Python.

The fastest Python web framework — built on llhttp, Cython, and io_uring. 6× faster than FastAPI on `/hello`, 10× faster on real PostgreSQL CRUD.

Ember Logo

Benchmarks — same machine, identical workload

Single-worker hello-world on an Intel i7-14700, k6 200 VUs / 20 s, 0% errors.

FrameworkRPSp50 (ms)p99 (ms)Peak RSS
Fiber (Go)140,9931.213.969 MB
Ember (Python)112,1771.684.3525 MB
Express (Node)26,3577.0913.57131 MB
NestJS (Node)23,5288.0813.75158 MB
FastAPI (Python)17,5179.4530.8649 MB

Ember is 6.4× FastAPI, 4.3× Express, and 4.8× NestJS on identical hardware — at 25 MB RSS, half FastAPI's footprint and 5× lighter than Node. Fiber stays ahead on raw throughput, but the gap is now ~20% rather than 75%.

Full methodology and reproducible scripts →

Hello, Ember

python
from ember import Ember

app = Ember()

@app.get("/")
async def index():
    return {"hello": "world"}

app.run(host="0.0.0.0", port=8000)
bash
pip install ember-api
python app.py
# 112k RPS at 25 MB RSS. No tuning required.

Built for AI workloads

python
from ember import (
    Ember, Request, SSEResponse,
    ConversationContext, sse_stream
)

app = Ember()

@app.ai_route("/v1/chat",
              methods=["POST"], streaming=True)
async def chat(
    request: Request,
    context: ConversationContext,
) -> SSEResponse:
    body = await request.json()
    context.add_message("user", body["message"])
    return sse_stream(token_stream(body["message"]))

Built for the workloads that matter

From microservices to LLM gateways — Ember handles them all.

LLM API gateways

Native ai_route(), ConversationContext, ModelRouter with fallback strategies, SemanticCache for vector lookups, and token-bucket rate limiting that counts tokens — not requests.

High-throughput APIs

112k RPS single-thread means a 4-worker box reliably handles 400k+ RPS. Cython router with O(1) static dispatch, llhttp parser, multishot recv on Linux 5.1+.

Edge & sidecar deployments

25 MB RSS at idle means Ember fits in containers most Python frameworks can't even boot in. Ideal for sidecars, Functions, Kubernetes microservices.

Real-time streaming

SSE-first design: SSEResponse, sse_stream(), TokenStreamResponse — zero-copy from generator to wire. Built for chat completions, log tails, live dashboards.

Background-job APIs

Multi-process workers with SO_REUSEPORT, kernel load-balancing, dead-worker revival, graceful shutdown, keep-alive reaper. Production semantics out of the box.

Replace FastAPI without rewriting

Decorator-based routing, type-hint dependency injection, async handlers, JSON responses with orjson — the patterns you already know, but 6× faster.

Ship faster APIs.

Pip-installable. Cython-compiled wheels for Linux x86_64/aarch64 and macOS arm64/x86_64.

Released under the MIT License.