Performance Comparison

Performance Comparison: ePHPm vs Everything Else

This document breaks down where time is spent serving a PHP request across every major stack, showing exactly where ePHPm eliminates overhead.

The Stacks Compared

Stack	Architecture
nginx + php-fpm	Separate web server → FastCGI protocol → separate PHP process pool
Apache + mod_php	Web server with PHP embedded as a module (same process)
Apache + php-fpm	Separate web server → FastCGI protocol → separate PHP process pool
RoadRunner	Go HTTP server → Goridge IPC (pipes) → separate PHP process pool
FrankenPHP	Go/Caddy HTTP server + PHP embedded via CGO (same process)
Swoole	PHP C extension — PHP IS the server (same process, coroutines)
ePHPm	Rust HTTP server + PHP embedded via zero-cost FFI (same process)

Request Lifecycle: Where Time Goes

A single HTTP request to a PHP application involves these stages. Each stack handles them differently.

Stage 1: Connection Handling + TLS

The client connects, TLS handshake completes, HTTP request is parsed.

Stack	How	Overhead
nginx + php-fpm	nginx handles (C, event-driven)	~0.1-0.5ms (excellent)
Apache + mod_php	Apache handles (C, process/thread-per-conn)	~0.2-0.8ms
Apache + php-fpm	Apache handles	~0.2-0.8ms
RoadRunner	Go `net/http`	~0.1-0.5ms
FrankenPHP	Caddy (Go `net/http` + middleware chain)	~0.2-0.8ms (Caddy middleware adds latency)
Swoole	PHP C extension (`swoole_http_server`)	~0.1-0.5ms
ePHPm	Rust `hyper` + `tokio` + `rustls`	~0.05-0.3ms (fastest — no GC, no middleware chain, zero-copy TLS)

Stage 2: Request Dispatch to PHP

How does the HTTP request reach the PHP interpreter?

Stack	Mechanism	Overhead
nginx + php-fpm	FastCGI protocol over Unix socket/TCP. Serialize full request (headers, body, env vars) into FastCGI records. PHP-FPM worker reads, deserializes.	~50-200μs (socket write + read + FastCGI encode/decode)
Apache + mod_php	In-process function call. Apache populates PHP’s request struct directly.	~1-5μs (near zero — same process)
Apache + php-fpm	Same as nginx + php-fpm (FastCGI over socket).	~50-200μs
RoadRunner	Goridge binary protocol over stdin/stdout pipes. Serialize request into Goridge frames (12-byte header + payload). PHP worker deserializes. OS context switch between Go and PHP processes.	~20-80μs (pipe I/O + serialize + context switch)
FrankenPHP	CGO call from Go → C. 11+ boundary crossings per request for thread dispatch, SAPI callbacks (headers, body, cookies, superglobals, output).	~2.2μs+ (200ns × 11+ crossings)
Swoole	In-process. PHP event loop receives directly. No dispatch needed.	~0.5-2μs (event loop wakeup)
ePHPm	Rust FFI call to libphp. Same 11+ SAPI callbacks as FrankenPHP but each is a zero-cost C function call. Tokio channel send to wake worker thread.	~0.5-2μs (channel send + zero-cost FFI calls)

Stage 3: PHP Bootstrap (Cold Start)

Does the PHP application reload from scratch on every request?

Stack	Bootstrap Model	Overhead
nginx + php-fpm	Cold start every request. Each request: autoloader runs, framework boots, service container builds, routes compile, config loads. OPcache helps (~50% reduction) but the autoloader and framework init still run.	~10-30ms (Laravel), ~5-15ms (Symfony), ~2-5ms (WordPress)
Apache + mod_php	Same as php-fpm — cold start every request.	~10-30ms (Laravel)
Apache + php-fpm	Same as php-fpm.	~10-30ms (Laravel)
RoadRunner	Worker mode — boot once. App boots once, stays in memory. Subsequent requests skip bootstrap entirely.	~0ms (amortized to zero after first request)
FrankenPHP (classic)	Cold start every request (like php-fpm).	~10-30ms (Laravel)
FrankenPHP (worker)	Worker mode — boot once.	~0ms (amortized)
Swoole	Worker mode — boot once.	~0ms (amortized)
ePHPm (worker)	Worker mode — boot once.	~0ms (amortized)

This is the single biggest performance factor. Worker mode eliminates 10-30ms of overhead per request for framework-heavy apps. The difference between FPM and worker mode dwarfs every other optimization.

Stage 4: PHP Execution

The actual application logic — database queries, business logic, template rendering. This is identical across all stacks because they all run the same PHP interpreter.

Stack	Execution Model	Overhead vs Native PHP
All stacks	Same PHP engine (Zend VM)	0ms — the PHP code runs at the same speed everywhere

The PHP execution time is the constant. Everything else in this document is overhead on top of it.

Stage 5: Database Access

How does PHP talk to the database?

Stack	DB Connection Model	Overhead Per Query
nginx + php-fpm	New TCP connection per request (or persistent per-worker). No pooling. N workers = N connections.	~1-3ms connection setup (first query), ~0.5-2ms per query (network RTT)
Apache + mod_php	Same — per-process connections.	~1-3ms / ~0.5-2ms
Apache + php-fpm	Same.	~1-3ms / ~0.5-2ms
RoadRunner	Same as php-fpm — PHP opens its own connections. No server-level pooling.	~1-3ms / ~0.5-2ms
FrankenPHP	Same — no DB proxy.	~1-3ms / ~0.5-2ms
Swoole	Connection pool (`PDOPool`). Persistent connections reused across coroutines. But still TCP to the actual database.	~0ms connection setup, ~0.5-2ms per query
ePHPm	In-process DB proxy with connection pooling. PHP connects to `localhost:3306` (ePHPm’s proxy). Pool maintains persistent connections to real DB. For even tighter integration, SAPI functions bypass TCP entirely.	~0ms connection setup, ~0.5-2ms per query (network to real DB), ~0μs proxy overhead (in-process)

Without pooling, hitting max_connections is easy: 200 PHP workers × 1 connection each = 200 DB connections. With ePHPm’s proxy: 200 workers → 20 pooled backend connections (10:1 multiplexing).

Stage 6: Session / Cache Access

How does PHP read/write sessions and cache data?

Stack	Session/Cache Model	Overhead Per Operation
nginx + php-fpm	External Redis/Memcached over TCP. Every `session_start()` and cache read = network round trip.	~0.5-2ms per operation (TCP to Redis on localhost: ~200μs network + ~100μs Redis processing + connection overhead)
Apache + mod_php	Same — external Redis/Memcached.	~0.5-2ms
RoadRunner	KV plugin (in-memory, single-node). Access via Goridge IPC (pipe round trip + serialization).	~20-80μs per operation (IPC overhead)
FrankenPHP	No KV store. External Redis required.	~0.5-2ms
Swoole	`Swoole\Table` (shared memory, same process). Fast but single-node only.	~0.1-1μs per operation
ePHPm	In-process KV store (DashMap). PHP accesses via SAPI function call — zero-cost FFI, no TCP, no serialization. Local keys: direct memory access. Remote keys (clustered): internal network hop.	~100-200ns local, ~0.5-2ms remote

For a typical Laravel request that does session_start() + 2-3 cache reads:

php-fpm + Redis: 4 × ~1ms = ~4ms of session/cache overhead
RoadRunner: 4 × ~50μs = ~200μs
ePHPm (local): 4 × ~150ns = ~600ns (6,600x faster than Redis)

Stage 7: Response Delivery

How does the PHP response get back to the client?

Stack	Mechanism	Overhead
nginx + php-fpm	PHP-FPM serializes response into FastCGI records → Unix socket → nginx deserializes → sends to client.	~50-200μs (FastCGI encode/decode + socket)
Apache + mod_php	In-process. PHP writes directly to Apache’s output buffer.	~1-5μs
RoadRunner	PHP serializes response into Goridge frames → pipe → Go deserializes → sends to client.	~20-80μs
FrankenPHP	PHP’s `echo`/`header()` → SAPI callbacks → CGO crossing to Go → Go writes to client.	~2μs+ (CGO crossings for headers + each output write)
Swoole	In-process. `$response->end()` writes directly.	~0.5-2μs
ePHPm	PHP’s `echo`/`header()` → SAPI callbacks → zero-cost FFI → Rust writes to client via hyper.	~0.5-2μs

Total Overhead Per Request (Excluding PHP Execution)

Everything except the actual PHP application code running. This is pure infrastructure tax.

Scenario: Laravel API request (worker mode where available)

Assumptions: JSON API endpoint, worker mode (where supported), 3 DB queries, 2 cache reads, 1 session read, response under 10KB.

Stage	nginx + php-fpm	RoadRunner	FrankenPHP (worker)	Swoole	ePHPm
Connection/TLS	0.3ms	0.3ms	0.5ms	0.3ms	0.15ms
Request dispatch	0.1ms	0.05ms	0.002ms	0.001ms	0.001ms
PHP bootstrap	15ms	0ms	0ms	0ms	0ms
DB connections (3 queries)	1ms setup + 3ms queries	1ms + 3ms	1ms + 3ms	0ms + 3ms	0ms + 3ms
Session + cache (3 ops)	3ms (Redis)	0.15ms (IPC)	3ms (Redis)	0.003ms	0.0005ms
Response delivery	0.1ms	0.05ms	0.002ms	0.001ms	0.001ms
Total overhead	~22.5ms	~7.55ms	~7.5ms	~3.3ms	~3.15ms

Scenario: Same request, php-fpm stacks (no worker mode)

	nginx + php-fpm	Apache + mod_php	Apache + php-fpm
Connection/TLS	0.3ms	0.5ms	0.5ms
Request dispatch	0.1ms	0.003ms	0.1ms
PHP bootstrap	15ms	15ms	15ms
DB connections	4ms	4ms	4ms
Session + cache (Redis)	3ms	3ms	3ms
Response delivery	0.1ms	0.003ms	0.1ms
Total overhead	~22.5ms	~22.5ms	~22.7ms

The Multiplier Effect

These overheads compound. A typical Laravel page makes 5-15 DB queries and 3-8 cache reads.

Heavy page: 10 DB queries, 6 cache/session ops, 50ms PHP execution

Stack	Infra overhead	PHP execution	Total	Overhead %
nginx + php-fpm	~28ms	50ms	78ms	36% overhead
RoadRunner	~10ms	50ms	60ms	17% overhead
FrankenPHP (worker)	~13ms	50ms	63ms	21% overhead (Redis for cache)
Swoole	~5ms	50ms	55ms	9% overhead
ePHPm	~3.2ms	50ms	53.2ms	6% overhead

At 10,000 requests/second

Stack	Overhead CPU burned/sec	Wasted per day
nginx + php-fpm	280 seconds (bootstrap dominates)	N/A — can’t hit 10k req/s without massive worker pool
RoadRunner	100 seconds	~2.4 core-hours
FrankenPHP (worker)	130 seconds	~3.1 core-hours
Swoole	50 seconds	~1.2 core-hours
ePHPm	32 seconds	~0.8 core-hours

Where Each Stack Loses Time

nginx + php-fpm — Death by a Thousand Cuts

Client ──► nginx ──FastCGI──► php-fpm worker
                   ~100μs        │
                              Bootstrap Laravel: ~15ms
                              DB connect: ~1ms
                              3 × DB query: ~3ms (TCP to MySQL)
                              3 × Redis: ~3ms (TCP to Redis)
                              FastCGI response: ~100μs
                                 │
Client ◄── nginx ◄──FastCGI──◄──┘

Total overhead: ~22ms+
Biggest cost: PHP bootstrap (15ms) — re-runs EVERY request

RoadRunner — IPC Tax

Client ──► Go HTTP server
              │
              Goridge pipe write: ~20μs (serialize request)
              OS context switch: ~10μs
              │
              ▼
           PHP worker (persistent — no bootstrap)
              DB: 3 queries over TCP: ~4ms
              KV: 3 ops over Goridge IPC: ~150μs
              │
              Goridge pipe write: ~20μs (serialize response)
              OS context switch: ~10μs
              │
Client ◄── Go HTTP server

Total overhead: ~4.3ms (without Redis), ~7.5ms (with Redis for sessions)
Biggest cost: DB connections (no pooling) + Goridge IPC serialization

FrankenPHP (Worker Mode) — CGO Tax + No Infrastructure

Client ──► Caddy/Go HTTP server
              │
              Caddy middleware chain: ~200-500μs
              CGO dispatch to PHP thread: ~200ns
              CGO: populate superglobals: ~800ns (4 callbacks)
              │
              ▼
           PHP worker (persistent — no bootstrap)
              DB: 3 queries over TCP: ~4ms (no pooling)
              Cache: 3 ops to external Redis: ~3ms
              │
              CGO: write headers: ~200ns
              CGO: write body (echo): ~200ns × N chunks
              │
Client ◄── Caddy/Go HTTP server

Total overhead: ~7.5ms
Biggest cost: External Redis (no built-in KV) + DB connections (no pooling)

ePHPm — Minimal Overhead

Client ──► Rust hyper (direct, no middleware chain)
              │
              Tokio channel send to PHP worker: ~100ns
              FFI: populate superglobals: ~0ns (zero-cost C calls)
              │
              ▼
           PHP worker (persistent — no bootstrap)
              DB: 3 queries via in-process proxy: ~3ms (pooled, no connect overhead)
              Cache: 3 ops via in-process KV: ~450ns (direct memory access)
              │
              FFI: write headers: ~0ns
              FFI: write body: ~0ns
              │
Client ◄── Rust hyper

Total overhead: ~3.15ms
Biggest cost: Network RTT to actual database (unavoidable)

p99 Latency: The GC Factor

Average latency tells one story. p99 (worst 1% of requests) tells another.

Go’s garbage collector introduces periodic pauses. These are short (~0.5-2ms with modern Go) but unpredictable. Under load, GC pauses hit the tail latency:

Stack	p99 Factor	Why
nginx + php-fpm	None (C + separate PHP processes)	nginx has no GC. PHP processes are independent — one GC doesn’t affect others.
Apache + mod_php	None (C)	No GC in Apache or PHP request lifecycle.
RoadRunner	Go GC pauses ~0.5-2ms	Go HTTP server GC affects all in-flight requests. PHP workers are separate processes (no GC).
FrankenPHP	Go GC pauses ~1-5ms (worse)	PHP runs IN the Go process. PHP’s memory allocations are visible to Go’s GC. In Symfony benchmarks, FrankenPHP showed 45ms std dev on CPU-bound tasks vs RoadRunner’s 8ms.
Swoole	None (C extension)	No GC in the server layer. PHP’s GC is per-request.
ePHPm	None	Rust has no GC. PHP’s per-request GC is isolated per worker thread. Predictable p99.

FrankenPHP’s GC problem is uniquely bad because PHP memory is allocated inside the Go process. Go’s GC must scan/track these allocations, increasing both GC frequency and pause duration under heavy PHP load.

Memory Efficiency

Stack	Memory Per Worker	1000-Connection Overhead	Notes
nginx + php-fpm	~30-50MB per FPM worker	~30-50GB for 1000 workers	Each FPM worker is a full process with its own memory space
Apache + mod_php	~30-50MB per Apache process	~30-50GB	Same — process-per-worker
RoadRunner	~30-50MB per PHP process + Go overhead	~30-50GB + ~200MB Go	PHP processes are separate
FrankenPHP	~20-40MB per worker (shared process)	~20-40GB + Go runtime	Shared address space saves some overhead
Swoole	~10-30MB per worker	~10-30GB	Efficient — shared memory, coroutines
ePHPm	~10-30MB per worker	~10-30GB + ~50MB Rust	Rust runtime is tiny. No Go runtime overhead. KV store replaces external Redis (saves ~100MB+).

ePHPm’s real memory win: it replaces external Redis (typically 100MB-1GB in production) with the in-process KV store. One fewer process, one fewer memory footprint.

Feature-Adjusted Comparison

Raw speed means nothing without features. Here’s what each stack actually provides:

Feature	nginx+fpm	Apache+mod_php	RoadRunner	FrankenPHP	Swoole	ePHPm
Worker mode (no cold start)	No	No	Yes	Yes	Yes	Yes
Auto TLS (Let’s Encrypt)	Via certbot (external)	Via certbot	Yes	Yes (Caddy)	No	Yes
DB connection pooling	No	No	No	No	Yes	Yes
Built-in KV/cache	No	No	Yes (single-node)	No	Yes (single-node)	Yes (clustered)
Multi-node clustering	No	No	No	No	No	Yes
Query digest/analysis	No	No	No	No	No	Yes
Auto-instrumented traces	No	No	No	No	No	Yes
Superglobals work	Yes	Yes	No	Yes	No	Yes
Zero code changes	Yes	Yes	No (PSR-7)	Yes	No (Swoole API)	Yes
GC-free server layer	Yes (C)	Yes (C)	No (Go)	No (Go)	Yes (C)	Yes (Rust)
Memory-safe server	No (C)	No (C)	Yes (Go)	Yes (Go)	No (C)	Yes (Rust)
Single binary deploy	No (nginx+php)	No (apache+php)	Yes	Yes	No (PECL ext)	Yes

The Pitch (by audience)

For developers on nginx + php-fpm:

“ePHPm eliminates 15ms of bootstrap overhead per request, replaces your nginx + php-fpm + Redis stack with a single binary, and adds connection pooling, query analysis, and a built-in observability dashboard. Your existing Laravel/Symfony/WordPress app works with zero code changes.”

For developers on RoadRunner:

“ePHPm gives you the same worker model without the PSR-7 migration tax — superglobals just work. Plus you get DB connection pooling, clustered caching (no external Redis), and zero IPC serialization overhead.”

For developers on FrankenPHP:

“ePHPm eliminates 2.2μs of CGO overhead per request, removes Go’s GC jitter from your p99 latency, and adds DB connection pooling, clustered KV, query analysis, and a full observability dashboard — features FrankenPHP doesn’t have and can’t easily add.”

For developers on Swoole:

“ePHPm gives you connection pooling and in-process caching like Swoole, but with superglobal compatibility, cross-platform support (not Linux-only), multi-node clustering, and no PECL extension to install. Your existing code works unchanged.”

Sources

Laravel Octane Popularity