Skip to content
Performance Comparison

Performance Comparison: ePHPm vs Everything Else

This document breaks down where time is spent serving a PHP request across every major stack, showing exactly where ePHPm eliminates overhead.


The Stacks Compared

StackArchitecture
nginx + php-fpmSeparate web server → FastCGI protocol → separate PHP process pool
Apache + mod_phpWeb server with PHP embedded as a module (same process)
Apache + php-fpmSeparate web server → FastCGI protocol → separate PHP process pool
RoadRunnerGo HTTP server → Goridge IPC (pipes) → separate PHP process pool
FrankenPHPGo/Caddy HTTP server + PHP embedded via CGO (same process)
SwoolePHP C extension — PHP IS the server (same process, coroutines)
ePHPmRust HTTP server + PHP embedded via zero-cost FFI (same process)

Request Lifecycle: Where Time Goes

A single HTTP request to a PHP application involves these stages. Each stack handles them differently.

Stage 1: Connection Handling + TLS

The client connects, TLS handshake completes, HTTP request is parsed.

StackHowOverhead
nginx + php-fpmnginx handles (C, event-driven)~0.1-0.5ms (excellent)
Apache + mod_phpApache handles (C, process/thread-per-conn)~0.2-0.8ms
Apache + php-fpmApache handles~0.2-0.8ms
RoadRunnerGo net/http~0.1-0.5ms
FrankenPHPCaddy (Go net/http + middleware chain)~0.2-0.8ms (Caddy middleware adds latency)
SwoolePHP C extension (swoole_http_server)~0.1-0.5ms
ePHPmRust hyper + tokio + rustls~0.05-0.3ms (fastest — no GC, no middleware chain, zero-copy TLS)

Stage 2: Request Dispatch to PHP

How does the HTTP request reach the PHP interpreter?

StackMechanismOverhead
nginx + php-fpmFastCGI protocol over Unix socket/TCP. Serialize full request (headers, body, env vars) into FastCGI records. PHP-FPM worker reads, deserializes.~50-200μs (socket write + read + FastCGI encode/decode)
Apache + mod_phpIn-process function call. Apache populates PHP’s request struct directly.~1-5μs (near zero — same process)
Apache + php-fpmSame as nginx + php-fpm (FastCGI over socket).~50-200μs
RoadRunnerGoridge binary protocol over stdin/stdout pipes. Serialize request into Goridge frames (12-byte header + payload). PHP worker deserializes. OS context switch between Go and PHP processes.~20-80μs (pipe I/O + serialize + context switch)
FrankenPHPCGO call from Go → C. 11+ boundary crossings per request for thread dispatch, SAPI callbacks (headers, body, cookies, superglobals, output).~2.2μs+ (200ns × 11+ crossings)
SwooleIn-process. PHP event loop receives directly. No dispatch needed.~0.5-2μs (event loop wakeup)
ePHPmRust FFI call to libphp. Same 11+ SAPI callbacks as FrankenPHP but each is a zero-cost C function call. Tokio channel send to wake worker thread.~0.5-2μs (channel send + zero-cost FFI calls)

Stage 3: PHP Bootstrap (Cold Start)

Does the PHP application reload from scratch on every request?

StackBootstrap ModelOverhead
nginx + php-fpmCold start every request. Each request: autoloader runs, framework boots, service container builds, routes compile, config loads. OPcache helps (~50% reduction) but the autoloader and framework init still run.~10-30ms (Laravel), ~5-15ms (Symfony), ~2-5ms (WordPress)
Apache + mod_phpSame as php-fpm — cold start every request.~10-30ms (Laravel)
Apache + php-fpmSame as php-fpm.~10-30ms (Laravel)
RoadRunnerWorker mode — boot once. App boots once, stays in memory. Subsequent requests skip bootstrap entirely.~0ms (amortized to zero after first request)
FrankenPHP (classic)Cold start every request (like php-fpm).~10-30ms (Laravel)
FrankenPHP (worker)Worker mode — boot once.~0ms (amortized)
SwooleWorker mode — boot once.~0ms (amortized)
ePHPm (worker)Worker mode — boot once.~0ms (amortized)

This is the single biggest performance factor. Worker mode eliminates 10-30ms of overhead per request for framework-heavy apps. The difference between FPM and worker mode dwarfs every other optimization.

Stage 4: PHP Execution

The actual application logic — database queries, business logic, template rendering. This is identical across all stacks because they all run the same PHP interpreter.

StackExecution ModelOverhead vs Native PHP
All stacksSame PHP engine (Zend VM)0ms — the PHP code runs at the same speed everywhere

The PHP execution time is the constant. Everything else in this document is overhead on top of it.

Stage 5: Database Access

How does PHP talk to the database?

StackDB Connection ModelOverhead Per Query
nginx + php-fpmNew TCP connection per request (or persistent per-worker). No pooling. N workers = N connections.~1-3ms connection setup (first query), ~0.5-2ms per query (network RTT)
Apache + mod_phpSame — per-process connections.~1-3ms / ~0.5-2ms
Apache + php-fpmSame.~1-3ms / ~0.5-2ms
RoadRunnerSame as php-fpm — PHP opens its own connections. No server-level pooling.~1-3ms / ~0.5-2ms
FrankenPHPSame — no DB proxy.~1-3ms / ~0.5-2ms
SwooleConnection pool (PDOPool). Persistent connections reused across coroutines. But still TCP to the actual database.~0ms connection setup, ~0.5-2ms per query
ePHPmIn-process DB proxy with connection pooling. PHP connects to localhost:3306 (ePHPm’s proxy). Pool maintains persistent connections to real DB. For even tighter integration, SAPI functions bypass TCP entirely.~0ms connection setup, ~0.5-2ms per query (network to real DB), ~0μs proxy overhead (in-process)

Without pooling, hitting max_connections is easy: 200 PHP workers × 1 connection each = 200 DB connections. With ePHPm’s proxy: 200 workers → 20 pooled backend connections (10:1 multiplexing).

Stage 6: Session / Cache Access

How does PHP read/write sessions and cache data?

StackSession/Cache ModelOverhead Per Operation
nginx + php-fpmExternal Redis/Memcached over TCP. Every session_start() and cache read = network round trip.~0.5-2ms per operation (TCP to Redis on localhost: ~200μs network + ~100μs Redis processing + connection overhead)
Apache + mod_phpSame — external Redis/Memcached.~0.5-2ms
RoadRunnerKV plugin (in-memory, single-node). Access via Goridge IPC (pipe round trip + serialization).~20-80μs per operation (IPC overhead)
FrankenPHPNo KV store. External Redis required.~0.5-2ms
SwooleSwoole\Table (shared memory, same process). Fast but single-node only.~0.1-1μs per operation
ePHPmIn-process KV store (DashMap). PHP accesses via SAPI function call — zero-cost FFI, no TCP, no serialization. Local keys: direct memory access. Remote keys (clustered): internal network hop.~100-200ns local, ~0.5-2ms remote

For a typical Laravel request that does session_start() + 2-3 cache reads:

  • php-fpm + Redis: 4 × ~1ms = ~4ms of session/cache overhead
  • RoadRunner: 4 × ~50μs = ~200μs
  • ePHPm (local): 4 × ~150ns = ~600ns (6,600x faster than Redis)

Stage 7: Response Delivery

How does the PHP response get back to the client?

StackMechanismOverhead
nginx + php-fpmPHP-FPM serializes response into FastCGI records → Unix socket → nginx deserializes → sends to client.~50-200μs (FastCGI encode/decode + socket)
Apache + mod_phpIn-process. PHP writes directly to Apache’s output buffer.~1-5μs
RoadRunnerPHP serializes response into Goridge frames → pipe → Go deserializes → sends to client.~20-80μs
FrankenPHPPHP’s echo/header() → SAPI callbacks → CGO crossing to Go → Go writes to client.~2μs+ (CGO crossings for headers + each output write)
SwooleIn-process. $response->end() writes directly.~0.5-2μs
ePHPmPHP’s echo/header() → SAPI callbacks → zero-cost FFI → Rust writes to client via hyper.~0.5-2μs

Total Overhead Per Request (Excluding PHP Execution)

Everything except the actual PHP application code running. This is pure infrastructure tax.

Scenario: Laravel API request (worker mode where available)

Assumptions: JSON API endpoint, worker mode (where supported), 3 DB queries, 2 cache reads, 1 session read, response under 10KB.

Stagenginx + php-fpmRoadRunnerFrankenPHP (worker)SwooleePHPm
Connection/TLS0.3ms0.3ms0.5ms0.3ms0.15ms
Request dispatch0.1ms0.05ms0.002ms0.001ms0.001ms
PHP bootstrap15ms0ms0ms0ms0ms
DB connections (3 queries)1ms setup + 3ms queries1ms + 3ms1ms + 3ms0ms + 3ms0ms + 3ms
Session + cache (3 ops)3ms (Redis)0.15ms (IPC)3ms (Redis)0.003ms0.0005ms
Response delivery0.1ms0.05ms0.002ms0.001ms0.001ms
Total overhead~22.5ms~7.55ms~7.5ms~3.3ms~3.15ms

Scenario: Same request, php-fpm stacks (no worker mode)

nginx + php-fpmApache + mod_phpApache + php-fpm
Connection/TLS0.3ms0.5ms0.5ms
Request dispatch0.1ms0.003ms0.1ms
PHP bootstrap15ms15ms15ms
DB connections4ms4ms4ms
Session + cache (Redis)3ms3ms3ms
Response delivery0.1ms0.003ms0.1ms
Total overhead~22.5ms~22.5ms~22.7ms

The Multiplier Effect

These overheads compound. A typical Laravel page makes 5-15 DB queries and 3-8 cache reads.

Heavy page: 10 DB queries, 6 cache/session ops, 50ms PHP execution

StackInfra overheadPHP executionTotalOverhead %
nginx + php-fpm~28ms50ms78ms36% overhead
RoadRunner~10ms50ms60ms17% overhead
FrankenPHP (worker)~13ms50ms63ms21% overhead (Redis for cache)
Swoole~5ms50ms55ms9% overhead
ePHPm~3.2ms50ms53.2ms6% overhead

At 10,000 requests/second

StackOverhead CPU burned/secWasted per day
nginx + php-fpm280 seconds (bootstrap dominates)N/A — can’t hit 10k req/s without massive worker pool
RoadRunner100 seconds~2.4 core-hours
FrankenPHP (worker)130 seconds~3.1 core-hours
Swoole50 seconds~1.2 core-hours
ePHPm32 seconds~0.8 core-hours

Where Each Stack Loses Time

nginx + php-fpm — Death by a Thousand Cuts

Client ──► nginx ──FastCGI──► php-fpm worker
                   ~100μs        │
                              Bootstrap Laravel: ~15ms
                              DB connect: ~1ms
                              3 × DB query: ~3ms (TCP to MySQL)
                              3 × Redis: ~3ms (TCP to Redis)
                              FastCGI response: ~100μs
                                 │
Client ◄── nginx ◄──FastCGI──◄──┘

Total overhead: ~22ms+
Biggest cost: PHP bootstrap (15ms) — re-runs EVERY request

RoadRunner — IPC Tax

Client ──► Go HTTP server
              │
              Goridge pipe write: ~20μs (serialize request)
              OS context switch: ~10μs
              │
              ▼
           PHP worker (persistent — no bootstrap)
              DB: 3 queries over TCP: ~4ms
              KV: 3 ops over Goridge IPC: ~150μs
              │
              Goridge pipe write: ~20μs (serialize response)
              OS context switch: ~10μs
              │
Client ◄── Go HTTP server

Total overhead: ~4.3ms (without Redis), ~7.5ms (with Redis for sessions)
Biggest cost: DB connections (no pooling) + Goridge IPC serialization

FrankenPHP (Worker Mode) — CGO Tax + No Infrastructure

Client ──► Caddy/Go HTTP server
              │
              Caddy middleware chain: ~200-500μs
              CGO dispatch to PHP thread: ~200ns
              CGO: populate superglobals: ~800ns (4 callbacks)
              │
              ▼
           PHP worker (persistent — no bootstrap)
              DB: 3 queries over TCP: ~4ms (no pooling)
              Cache: 3 ops to external Redis: ~3ms
              │
              CGO: write headers: ~200ns
              CGO: write body (echo): ~200ns × N chunks
              │
Client ◄── Caddy/Go HTTP server

Total overhead: ~7.5ms
Biggest cost: External Redis (no built-in KV) + DB connections (no pooling)

ePHPm — Minimal Overhead

Client ──► Rust hyper (direct, no middleware chain)
              │
              Tokio channel send to PHP worker: ~100ns
              FFI: populate superglobals: ~0ns (zero-cost C calls)
              │
              ▼
           PHP worker (persistent — no bootstrap)
              DB: 3 queries via in-process proxy: ~3ms (pooled, no connect overhead)
              Cache: 3 ops via in-process KV: ~450ns (direct memory access)
              │
              FFI: write headers: ~0ns
              FFI: write body: ~0ns
              │
Client ◄── Rust hyper

Total overhead: ~3.15ms
Biggest cost: Network RTT to actual database (unavoidable)

p99 Latency: The GC Factor

Average latency tells one story. p99 (worst 1% of requests) tells another.

Go’s garbage collector introduces periodic pauses. These are short (~0.5-2ms with modern Go) but unpredictable. Under load, GC pauses hit the tail latency:

Stackp99 FactorWhy
nginx + php-fpmNone (C + separate PHP processes)nginx has no GC. PHP processes are independent — one GC doesn’t affect others.
Apache + mod_phpNone (C)No GC in Apache or PHP request lifecycle.
RoadRunnerGo GC pauses ~0.5-2msGo HTTP server GC affects all in-flight requests. PHP workers are separate processes (no GC).
FrankenPHPGo GC pauses ~1-5ms (worse)PHP runs IN the Go process. PHP’s memory allocations are visible to Go’s GC. In Symfony benchmarks, FrankenPHP showed 45ms std dev on CPU-bound tasks vs RoadRunner’s 8ms.
SwooleNone (C extension)No GC in the server layer. PHP’s GC is per-request.
ePHPmNoneRust has no GC. PHP’s per-request GC is isolated per worker thread. Predictable p99.

FrankenPHP’s GC problem is uniquely bad because PHP memory is allocated inside the Go process. Go’s GC must scan/track these allocations, increasing both GC frequency and pause duration under heavy PHP load.


Memory Efficiency

StackMemory Per Worker1000-Connection OverheadNotes
nginx + php-fpm~30-50MB per FPM worker~30-50GB for 1000 workersEach FPM worker is a full process with its own memory space
Apache + mod_php~30-50MB per Apache process~30-50GBSame — process-per-worker
RoadRunner~30-50MB per PHP process + Go overhead~30-50GB + ~200MB GoPHP processes are separate
FrankenPHP~20-40MB per worker (shared process)~20-40GB + Go runtimeShared address space saves some overhead
Swoole~10-30MB per worker~10-30GBEfficient — shared memory, coroutines
ePHPm~10-30MB per worker~10-30GB + ~50MB RustRust runtime is tiny. No Go runtime overhead. KV store replaces external Redis (saves ~100MB+).

ePHPm’s real memory win: it replaces external Redis (typically 100MB-1GB in production) with the in-process KV store. One fewer process, one fewer memory footprint.


Feature-Adjusted Comparison

Raw speed means nothing without features. Here’s what each stack actually provides:

Featurenginx+fpmApache+mod_phpRoadRunnerFrankenPHPSwooleePHPm
Worker mode (no cold start)NoNoYesYesYesYes
Auto TLS (Let’s Encrypt)Via certbot (external)Via certbotYesYes (Caddy)NoYes
DB connection poolingNoNoNoNoYesYes
Built-in KV/cacheNoNoYes (single-node)NoYes (single-node)Yes (clustered)
Multi-node clusteringNoNoNoNoNoYes
Query digest/analysisNoNoNoNoNoYes
Auto-instrumented tracesNoNoNoNoNoYes
Superglobals workYesYesNoYesNoYes
Zero code changesYesYesNo (PSR-7)YesNo (Swoole API)Yes
GC-free server layerYes (C)Yes (C)No (Go)No (Go)Yes (C)Yes (Rust)
Memory-safe serverNo (C)No (C)Yes (Go)Yes (Go)No (C)Yes (Rust)
Single binary deployNo (nginx+php)No (apache+php)YesYesNo (PECL ext)Yes

The Pitch (by audience)

For developers on nginx + php-fpm:

“ePHPm eliminates 15ms of bootstrap overhead per request, replaces your nginx + php-fpm + Redis stack with a single binary, and adds connection pooling, query analysis, and a built-in observability dashboard. Your existing Laravel/Symfony/WordPress app works with zero code changes.”

For developers on RoadRunner:

“ePHPm gives you the same worker model without the PSR-7 migration tax — superglobals just work. Plus you get DB connection pooling, clustered caching (no external Redis), and zero IPC serialization overhead.”

For developers on FrankenPHP:

“ePHPm eliminates 2.2μs of CGO overhead per request, removes Go’s GC jitter from your p99 latency, and adds DB connection pooling, clustered KV, query analysis, and a full observability dashboard — features FrankenPHP doesn’t have and can’t easily add.”

For developers on Swoole:

“ePHPm gives you connection pooling and in-process caching like Swoole, but with superglobal compatibility, cross-platform support (not Linux-only), multi-node clustering, and no PECL extension to install. Your existing code works unchanged.”


Sources