Nightly Test Suite
Comprehensive tests that run on a daily schedule (cron: "0 4 * * *"). These are too slow, resource-heavy, or infrastructure-dependent for PR-level CI but critical for ongoing confidence in the project.
Estimated wall-clock: ~10 minutes (all jobs run in parallel; bottleneck is app smoke tests).
Architecture
nightly.yml (schedule: cron "0 4 * * *", workflow_dispatch for manual runs)
|
+-- 1. release-build (PHP 8.4 + 8.5 × Linux + macOS) ~4m
+-- 2. fuzz (4 targets in parallel, 5 min each) ~5m
+-- 3. kv-stress (concurrent writers, TTL, compression) ~5m
+-- 4. mysql-proxy-integration (real MySQL container) ~3m
+-- 5. sqlite-cluster-e2e (3-node docker-compose) ~5m
+-- 6. gossip-stress (10-node convergence + churn) ~2m
+-- 7. query-stats-load (100 threads, regression guard) ~1m
+-- 8. app-smoke-tests (WordPress + Laravel) ~10m
+-- 9. windows-cross-compile (PHP 8.4 + 8.5) ~4m
+-- 10. dependency-audit (cargo audit + cargo outdated) ~1mAll stress/integration tests use #[ignore] so they never run during cargo test --workspace in PR CI. The nightly workflow runs them with cargo nextest run --run-ignored ignored-only.
1. Full Release Build Matrix
What: Build cargo xtask release for PHP 8.4 and 8.5 on Linux and macOS using pre-built libphp.a artifacts. Then run all #[ignore] integration tests against the built binary.
Why nightly: With pre-built libphp.a from the external PHP build project, this drops from ~20 min (compiling static-php-cli) to ~4 min (just Rust compilation + linking). Still too heavy for every PR push, and the stub-mode tests in PR CI already catch Rust compilation errors.
What it catches:
- FFI linking failures against real
libphp.a - PHP SAPI integration regressions (the
#[ignore]tests inkv_sapi_integration.rs) - Conditional compilation bugs — code gated behind
#[cfg(php_linked)]that never runs in stub mode - Platform-specific issues (Linux musl vs macOS)
Implementation:
- Matrix:
{php: [8.4, 8.5]} × {os: [ubuntu-latest, macos-latest]} - Download pre-built
libphp.afrom the PHP build project’s artifacts - Set
PHP_SDK_PATHand runcargo build --release - Run
cargo nextest run --run-ignored ignored-onlywith the built binary - Upload binary as workflow artifact for downstream smoke tests
Test files: crates/ephpm-php/tests/kv_sapi_integration.rs, any future #[ignore] tests
2. Fuzz Testing
What: Run cargo fuzz targets for 5 minutes each, all in parallel. Four targets covering the main parser surfaces:
Target: RESP Protocol Parser
- Crate:
ephpm-kv - Entry point:
src/resp/parse.rs - Input: Arbitrary bytes fed as a RESP2 stream
- Invariant: Must never panic. Malformed input returns parse errors gracefully.
- Why it matters: The KV store accepts TCP connections from PHP and potentially from RESP-compatible clients. A panic in the parser crashes the server.
Target: SQL Normalizer
- Crate:
ephpm-query-stats - Entry point:
src/digest.rs→normalize() - Input: Arbitrary UTF-8 strings treated as SQL
- Invariant: Must never panic. Must always return a valid
String. Output length must be ≤ input length + constant overhead (no unbounded growth). - Why it matters: Every query flowing through the DB proxy hits this normalizer. A panic or OOM here is a production crash.
Target: HTTP Path Traversal
- Crate:
ephpm-server - Entry point: Router path resolution logic
- Input: Arbitrary URL paths (percent-encoded,
../, null bytes, Unicode) - Invariant: Resolved path must never escape the document root. Must never serve files outside
sites_dir. - Why it matters: Security-critical. Path traversal = arbitrary file read.
Target: MySQL Wire Protocol Parser
- Crate:
ephpm-db - Entry point:
read_packet(),classify_mysql_query(),parse_stmt_id() - Input: Arbitrary bytes as MySQL packet stream
- Invariant: Must never panic. Malformed packets return errors or disconnect gracefully.
- Why it matters: The DB proxy sits between PHP and MySQL. A parser panic drops all database connections.
Implementation:
- Add
fuzz/directory withcargo-fuzztargets - Each target runs as a separate parallel job (5 min × 4 targets = 5 min wall-clock)
- Corpus stored as CI artifacts, seeded from existing test inputs
- On crash: upload reproducer as artifact, fail the workflow, open a GitHub issue
3. KV Store Stress Tests
What: Hammer the KV store with concurrent load to verify DashMap correctness, TTL accuracy, and compression integrity under pressure.
Crate: ephpm-kv/tests/stress.rs
Sub-tests:
Concurrent Writer Storm
- 100 concurrent TCP connections (using the
rediscrate, same asresp_compat.rs) - Each connection: 10,000 SET/GET operations with unique keys
- Verify: all 1,000,000 keys are readable after completion, no data loss, no cross-key corruption
- Tests DashMap’s lock-free concurrent write correctness
TTL Accuracy Under Load
- Set 1,000 keys with TTLs between 100ms and 500ms (randomized)
- Continuously poll until all keys have expired
- Verify: every key expires within ±50ms of its TTL (accounting for the 100ms reaper interval)
- Verify: no key survives past TTL + 200ms
- Tests the expiry reaper task under memory pressure
Compression Round-Trip at Scale
- Set 10,000 keys with compressible values (repeated patterns, JSON payloads)
- Use each compression algorithm: gzip, zstd, brotli
- GET all keys back and verify byte-for-byte equality
- Measure compression ratio (informational, not a pass/fail criterion)
- Tests that compression/decompression is deterministic under concurrent access
Multi-Tenant Isolation Under Load
- 10 tenants, each authenticated via RESP
AUTHwith their site-specific password - Each tenant: 1,000 concurrent SET operations with keys named
tenant-{id}:key-{n} - After all writes: each tenant reads all their keys and attempts to read other tenants’ keys
- Verify: zero cross-tenant reads succeed, all own-tenant reads return correct values
- Tests the
MultiTenantStore+ HMAC-based auth under race conditions
Memory Pressure
- Configure
max_memoryto a low value (e.g., 10MB) - Fill the store until writes start failing
- Verify: errors are clean (not panics), existing data is still readable
- Tests graceful degradation, not just happy-path behavior
Implementation: Uses the existing TestServer harness from resp_compat.rs. All tests marked #[ignore].
4. MySQL Proxy Integration Tests
What: Boot a real MySQL instance and run the full ephpm DB proxy against it, testing connection pooling, R/W splitting, prepared statements, and session isolation.
Crate: ephpm-db/tests/proxy_integration.rs
Sub-tests:
Basic Query Correctness
- Connect through the proxy, execute:
CREATE TABLE,INSERT,SELECT,UPDATE,DELETE - Verify all results match direct MySQL execution
- Tests that the proxy faithfully forwards MySQL wire protocol
R/W Split Verification
- Configure proxy with 1 primary + 1 replica (two MySQL containers, or simulated via separate databases)
- Execute
SELECTqueries → verify they hit the replica (check@@hostnameor connection ID) - Execute
INSERT→ verify it hits the primary - Execute
SELECTimmediately afterINSERT→ verify sticky routing to primary (withinsticky_duration) - Wait for sticky window to expire → verify reads return to replica
Prepared Statement Routing
PREPAREaSELECT→ verify compiled on replicaEXECUTEthe prepared statement → verify executed on the same replica (not primary, not a different replica)PREPAREanINSERT→ verify compiled on primaryEXECUTEthe insert → verify executed on primaryCLOSEboth statements → verify cleanup
Connection Pooling
- Open 50 concurrent client connections through the proxy
- Each runs a simple query
- Verify proxy multiplexes to ≤
max_connectionsbackend connections (check pool metrics) - Verify
COM_RESET_CONNECTIONis sent between clients reusing the same backend
Session State Isolation
- Client A:
SET @myvar = 42, then disconnect - Client B (gets the recycled backend connection):
SELECT @myvar→ must returnNULL - Tests that
COM_RESET_CONNECTIONproperly clears session state
Transaction Integrity
BEGIN→INSERT→SELECT(within txn, must see the insert) →ROLLBACKSELECTafter rollback → must not see the insert- All queries within a transaction must route to the same backend (primary)
Implementation: Use testcontainers-rs to spin up MySQL 8.0. Proxy started in-process. Tests use mysql_async crate for client connections. All tests #[ignore].
5. SQLite Clustering E2E
What: Test the full clustering lifecycle: primary election, write replication, and failover recovery with 3 ephpm nodes running sqld sidecars.
Infrastructure: docker-compose with 3 ephpm containers on a shared Docker network.
Sub-tests:
Primary Election
- Start 3 nodes simultaneously with
replication.role = "auto"andcluster.enabled = true - Wait for gossip convergence (≤15s)
- Verify exactly one node claims
kv:sqlite:primaryin the gossip KV tier - Verify the primary is the node with the lowest ordinal (consistent with
sqlite_election.rsalgorithm)
Write Replication
- Write 100 rows to the primary’s litewire MySQL endpoint
- Wait for replication lag (poll replicas every 500ms, timeout 30s)
- Read all 100 rows from each replica
- Verify data integrity: all rows present with correct values
Failover
- Kill the primary container (
docker stop) - Wait for gossip failure detection (heartbeat TTL = 10s, so ≤15s)
- Verify a new primary is elected (gossip KV updated)
- Verify the new primary’s sqld sidecar restarted in primary mode
- Write new rows to the new primary → verify they replicate to the remaining replica
Split-Brain Prevention
- Partition the network: isolate one node from the other two (via Docker network disconnect)
- Verify the isolated node does NOT become primary (it can’t reach quorum)
- Verify the two connected nodes maintain a single primary
- Reconnect the network → verify the cluster reconverges to a single primary
Role Change sqld Restart
- Verify that when a node transitions from replica → primary, its sqld process is SIGTERMed and restarted with
--primaryargs - Check logs for the expected lifecycle:
"stopping sqld"→"starting sqld as primary" - Verify the new sqld instance passes health checks
Implementation: docker-compose file with 3 ephpm services, shared network, volume mounts for config. Test runner is a 4th container or host-side script using curl/mysql CLI. Requires a release build with sqld embedded.
6. Gossip Protocol Stress
What: Test chitchat gossip convergence, failure detection, and KV replication under scale and churn.
Crate: ephpm-cluster/tests/stress.rs
Sub-tests:
10-Node Convergence
- Start 10
ClusterHandleinstances on different ports - Each node has 1 seed peer (daisy-chained: node N seeds on node N-1)
- Verify all 10 nodes discover all other nodes within 15s
- Verify
live_nodes()returns 10 on every node
Node Failure Detection
- Start 5 nodes, wait for convergence
- Kill node 3 (drop the
ClusterHandle) - Verify remaining 4 nodes remove node 3 from
live_nodes()within 30s (chitchat failure detection) - Verify gossip KV entries from node 3 are no longer refreshed (TTL expires)
KV Replication Under Churn
- Start 5 nodes
- Node 1 sets 100 KV entries with 60s TTL
- While replication is ongoing: add node 6, kill node 3, add node 7
- After churn settles (30s): verify all surviving nodes have all 100 KV entries
- Tests that membership changes don’t corrupt the KV replication protocol
Large KV Tier
- 5 nodes, each setting 2,000 unique KV entries (10,000 total)
- Wait for full replication (poll until all nodes have 10,000 entries, timeout 60s)
- Verify no entry corruption (value matches expected for each key)
- Tests gossip bandwidth and digest efficiency at scale
Implementation: All in-process using ClusterHandle::start_gossip() on localhost ports. No Docker needed. Tests marked #[ignore].
7. Query Stats Under Load
What: Verify QueryStats DashMap correctness and measure normalization throughput under concurrent access.
Crate: ephpm-query-stats/tests/stress.rs
Sub-tests:
Concurrent Recording Accuracy
- 100 threads, each recording 1,000 queries from a pool of 50 distinct SQL patterns
- After all threads complete: verify total execution count across all digests = 100,000
- Verify each digest’s
countmatches the expected frequency - Tests DashMap’s atomic update correctness under high contention
Normalization Throughput Regression Guard
- Single-threaded: normalize 100,000 realistic SQL queries (mix of SELECT, INSERT, UPDATE with varying literal counts)
- Measure wall-clock time
- Assert throughput > 100,000 queries/second (baseline: current performance is ~500k/sec)
- Fail if throughput drops below threshold → catches accidental O(n²) regressions in the state machine
Prometheus Metric Consistency
- Record 10,000 queries across 100 distinct digests
- Fetch Prometheus metrics output
- Verify
ephpm_query_active_digestsgauge matchesentries.len() - Verify
ephpm_query_totalcounter matches sum of all digest counts - Tests that metrics and internal state don’t drift under concurrent updates
Max Digest Cap
- Configure
max_digests = 100 - Record 200 distinct query patterns
- Verify
entries.len() <= 100at all times - Verify no panic or corruption when cap is hit
- Tests the eviction/rejection behavior at the configured limit
Implementation: Direct Rust tests against QueryStats::new(). No I/O needed. Tests marked #[ignore].
8. Application Smoke Tests
What: Full application lifecycle tests: install a real PHP application, run it against ephpm with litewire SQLite, and verify it renders correctly.
WordPress
- Setup: Download WordPress, run
wp-cli core installwith SQLite via litewire - Tests:
- Front page renders with
<!DOCTYPE html>and expected<title> - Admin login page loads (
/wp-login.php) - Admin dashboard accessible after login (cookie auth)
- Create a post via wp-cli → verify it appears on the front page
- Pretty permalinks work (
/sample-post/resolves to the correct post)
- Front page renders with
- Why it matters: WordPress is the primary target application. If WP works, most PHP apps work.
Laravel
- Setup: Fresh
laravel newproject,artisan migrateagainst litewire SQLite - Tests:
- Welcome page renders with 200 and expected content
artisan route:listworks (CLI mode verification)- API route returns JSON with correct content-type
- Database migration creates expected tables (verify via SQL query)
- Why it matters: Laravel is the second major PHP framework. Tests the full stack: routing, ORM (Eloquent), migrations, artisan CLI.
Implementation: Docker images with pre-installed applications (cached in container registry to avoid download time). ephpm binary mounted into the container. Tests run via curl + response body assertions. Alternatively, extend the existing Kind/Tilt e2e infrastructure with app-specific manifests.
Estimated time: ~10 min (WordPress install + test: ~6 min, Laravel: ~4 min). This is the nightly bottleneck.
9. Windows Cross-Compilation
What: Verify that cargo xtask release --target windows --no-sqld produces a valid Windows executable for PHP 8.4 and 8.5.
Why nightly: Requires cargo-xwin + MSVC cross-toolchain. Slow to set up, and Windows-specific breakage is rare. The PR CI’s stub-mode compile already catches most Rust issues.
Tests:
- Build completes without errors
- Output file exists at
target/x86_64-pc-windows-msvc/release/ephpm.exe filecommand confirms it’s a PE32+ executable- Binary size is within expected range (sanity check — not too small, not unexpectedly large)
What it catches:
- Windows-specific
#[cfg(target_os = "windows")]compilation errors - Linker issues with the Windows PHP SDK
- Missing sqld guard (must bail gracefully, not compile error)
Implementation: Single job, cargo install cargo-xwin (cached), matrix over PHP versions. Upload .exe as artifact.
10. Dependency Audit
What: Check for known security vulnerabilities, license violations, and outdated dependencies.
cargo deny (already in PR CI, extended here)
cargo deny check advisories— RUSTSEC advisory databasecargo deny check licenses— license compatibilitycargo deny check bans— banned crate detectioncargo deny check sources— verify all crates from crates.io
cargo audit (nightly-only addition)
cargo audit— cross-referenceCargo.lockagainst RustSec advisory DB- Hits the network (advisory DB fetch), which can be flaky — not suitable for PR CI
cargo outdated (informational)
cargo outdated --root-deps-only— report outdated direct dependencies- Does not fail the workflow — output is informational
- Posted as a workflow summary for visibility
Implementation: Single job, sequential commands. cargo deny is the authoritative check (fails on issues); cargo audit is a secondary signal; cargo outdated is FYI.
PR CI Changes
With the nightly suite covering heavy testing, consider simplifying PR CI:
| Current PR CI | Proposed PR CI |
|---|---|
| fmt + clippy + test + cargo-deny + e2e (Kind/Tilt) | fmt + clippy + test + cargo-deny |
The E2E tests (e2e.yml) would move to nightly-only + workflow_dispatch for on-demand runs. This cuts PR CI from ~8 min to ~3 min while maintaining the same coverage cadence.
Failure Handling
- Fuzz crash: Upload reproducer artifact, fail workflow, auto-open GitHub issue with label
fuzz-crash - Stress test failure: Retry once (timing-sensitive tests may flake on shared CI runners). Fail on second attempt.
- Release build failure: No retry — this indicates a real linking or compilation problem.
- App smoke test failure: Upload full ephpm logs + HTTP response bodies as artifacts for debugging.
- Dependency audit: Advisory failures block; outdated reports are informational only.
All failures post to a GitHub Actions summary with direct links to logs and artifacts.