Query Stats
A shared query observability layer for ePHPm. Records timing, throughput, and error rates for every SQL query regardless of which runtime path handles it (DB Proxy or LiteWire).
Problem
ePHPm has two SQL strategies (see sql.md):
- DB Proxy (
ephpm-db) – forwards MySQL wire traffic to a real MySQL server - LiteWire (
litewire) – translates MySQL wire traffic to SQLite
Both see every query PHP executes. Neither currently records query-level metrics. Without this, operators have no visibility into slow queries, hot tables, error patterns, or query mix. They have to bolt on external tools (slow query log on MySQL, APM agents in PHP) that don’t exist in the LiteWire/SQLite path at all.
Solution
A single ephpm-query-stats crate that both runtimes call into. One set of metrics, one slow query threshold, one Prometheus dashboard – whether the backend is SQLite or MySQL.
┌───────────────────────────────── ePHPm ─────────────────────────────────┐
│ │
│ LiteWire Path DB Proxy Path │
│ ────────────── ────────────── │
│ PHP (pdo_mysql) PHP (pdo_mysql) │
│ │ │ │
│ ▼ ▼ │
│ LiteWire MySQL Frontend MySQL Proxy ──┐ │
│ │ │ │ │
│ ▼ ▼ │ │
│ TrackedBackend ──┐ Real MySQL │ │
│ │ │ │ │
│ ▼ │ record record │ │
│ rusqlite │ │ │
│ │ │ │
│ └───────► QueryStats ◄──────────────┘ │
│ (ephpm-query-stats) │
│ │ │
│ ▼ │
│ Prometheus /metrics │
│ │
└─────────────────────────────────────────────────────────────────────────┘Architecture
Crate: ephpm-query-stats
New workspace crate. No dependency on litewire, ephpm-db, or any runtime-specific code. Depends only on dashmap, parking_lot, tracing, metrics.
crates/ephpm-query-stats/
src/
lib.rs # QueryStats, DigestEntry, public API
digest.rs # SQL normalization and hashing
prometheus.rs # Metrics registration and recordingCore API
use std::sync::Arc;
use std::time::Duration;
/// Shared query stats collector.
///
/// Thread-safe, cheaply cloneable. Create one per ePHPm process and pass
/// it to both the DB Proxy and LiteWire integration.
#[derive(Clone)]
pub struct QueryStats {
entries: Arc<DashMap<u64, DigestEntry>>,
config: StatsConfig,
}
pub struct StatsConfig {
/// Queries slower than this are logged at WARN level.
/// Default: 1 second.
pub slow_query_threshold: Duration,
/// Maximum number of distinct query digests to track.
/// Prevents unbounded memory growth from unique queries.
/// Default: 100,000.
pub max_digests: usize,
/// Whether to record per-digest Prometheus histograms.
/// Default: true.
pub prometheus_enabled: bool,
}
impl QueryStats {
/// Record a completed query.
///
/// Called by both the DB Proxy (after forwarding) and the TrackedBackend
/// (wrapping LiteWire's rusqlite/hrana-client backend).
pub fn record(&self, sql: &str, duration: Duration, success: bool);
/// Get the digest entry for a normalized query, if tracked.
pub fn get(&self, digest_id: u64) -> Option<DigestEntry>;
/// Snapshot of all tracked digests, sorted by total time descending.
pub fn top_queries(&self, limit: usize) -> Vec<DigestEntry>;
/// Reset all counters. Useful for periodic reporting windows.
pub fn reset(&self);
}Query Digest
The digest system normalizes SQL queries so that queries differing only in literal values map to the same digest. This is how MySQL’s performance_schema.events_statements_summary_by_digest works.
/// Normalize a SQL query for digest computation.
///
/// Replaces literal values with `?` placeholders so that
/// `SELECT * FROM users WHERE id = 42` and
/// `SELECT * FROM users WHERE id = 99` produce the same digest.
pub fn normalize(sql: &str) -> String;
/// Compute a 64-bit hash of the normalized SQL.
pub fn digest_id(normalized: &str) -> u64;Normalization Rules
| Input | Normalized |
|---|---|
SELECT * FROM users WHERE id = 42 | SELECT * FROM users WHERE id = ? |
INSERT INTO t VALUES (1, 'hello', 3.14) | INSERT INTO t VALUES (?, ?, ?) |
WHERE name = 'Alice' AND age > 30 | WHERE name = ? AND age > ? |
WHERE id IN (1, 2, 3, 4, 5) | WHERE id IN (?, ...) |
SELECT 1 | SELECT ? |
| Comments stripped | /* ... */ and -- ... removed |
| Whitespace collapsed | Multiple spaces/newlines become single space |
The normalizer operates on raw SQL strings (not parsed ASTs) for performance. It uses a simple state machine that walks the SQL character by character, detecting quoted strings, numbers, and comments. This is the same approach MySQL and ProxySQL use – full parsing would be correct but expensive for a hot path.
Normalization Implementation
/// State machine for SQL normalization.
enum State {
Normal,
SingleQuotedString,
DoubleQuotedString,
BacktickIdentifier,
LineComment,
BlockComment,
Number,
}Walk the input character by character:
- Quoted strings (
'hello',"world") – replace entire string with? - Numbers (
42,3.14,-1,0xFF) – replace with? - IN lists (
IN (?, ?, ?, ?)) – collapse toIN (?, ...) - Comments – strip entirely
- Whitespace – collapse to single space
- Everything else – pass through (keywords, identifiers, operators)
Edge cases:
- Escaped quotes inside strings:
'it''s'is one string, not two - Negative numbers:
-42– the-is a unary operator,42is the number - Hex literals:
0xDEAD– replace with? - Backtick identifiers:
`table_name`– pass through (not a value) NULLkeyword – do NOT normalize (it’s semantically meaningful, not a literal value)
Digest Entry
/// Accumulated statistics for a single query digest.
#[derive(Clone, Debug)]
pub struct DigestEntry {
/// The 64-bit digest hash.
pub digest_id: u64,
/// The normalized SQL string (with `?` placeholders).
pub digest_text: String,
/// An example of the original (non-normalized) SQL.
/// Updated periodically, not on every call (to avoid allocation pressure).
pub example_sql: String,
/// Total number of executions.
pub count: u64,
/// Number of failed executions (backend returned an error).
pub error_count: u64,
/// Total wall-clock time spent executing this query.
pub total_time: Duration,
/// Minimum execution time observed.
pub min_time: Duration,
/// Maximum execution time observed.
pub max_time: Duration,
/// Sum of rows returned (for queries) or affected (for mutations).
pub total_rows: u64,
/// Timestamp of the first execution.
pub first_seen: Instant,
/// Timestamp of the most recent execution.
pub last_seen: Instant,
}Internal Storage
/// DashMap keyed by digest_id (u64 hash).
///
/// Each shard holds a DigestEntry that is updated atomically.
/// DashMap provides concurrent read/write access without a global lock.
entries: Arc<DashMap<u64, DigestEntry>>Why DashMap and not a Mutex<HashMap>:
- The DB Proxy and LiteWire run on separate tokio tasks. Both call
record()concurrently. DashMapshards the map internally, so concurrent writes to different digests don’t contend.- ePHPm already depends on
dashmap(used byephpm-kv).
Memory bound: max_digests (default 100,000) caps the number of entries. When full, new digests are silently dropped (existing entries continue to update). This prevents OOM from applications that generate unique SQL per request (e.g., ORMs with inline literals).
Slow Query Logging
When record() observes duration > slow_query_threshold:
WARN query.slow: sql="SELECT * FROM users WHERE email = ?" duration=2.3s rows=1 digest=0xABCD1234The log includes the normalized SQL (no PII from literal values) and the duration. The digest field lets operators correlate with Prometheus metrics.
The threshold is configurable via TOML:
[db]
slow_query_threshold = "1s"Prometheus Metrics
Registered once at startup via the metrics crate (same pattern as existing ePHPm metrics):
| Metric | Type | Labels | Description |
|---|---|---|---|
ephpm_query_duration_seconds | Histogram | digest, kind | Per-digest execution time distribution |
ephpm_query_total | Counter | digest, kind, status | Total query count (success/error) |
ephpm_query_rows_total | Counter | digest, kind | Total rows returned/affected |
ephpm_query_slow_total | Counter | Queries exceeding the slow threshold | |
ephpm_query_active_digests | Gauge | Number of distinct digests being tracked |
Labels:
digest– first 16 chars of the normalized SQL (truncated for cardinality control)kind–queryormutation(determined by first keyword: SELECT vs INSERT/UPDATE/DELETE)status–okorerror
Cardinality control: The digest label uses a truncated prefix, not the full SQL or hash. This caps Prometheus label cardinality at max_digests. Applications generating millions of unique queries won’t explode the metrics store.
Alternative: use the digest hash as label instead of truncated SQL. More compact but harder to read in dashboards. Make this configurable.
LiteWire Integration: TrackedBackend
A decorator that wraps any litewire_backend::Backend implementation, recording stats before delegating.
use litewire_backend::{Backend, BackendError, ExecuteResult, ResultSet, Value};
/// A backend wrapper that records query stats.
pub struct TrackedBackend<B> {
inner: B,
stats: QueryStats,
}
impl<B> TrackedBackend<B> {
pub fn new(inner: B, stats: QueryStats) -> Self {
Self { inner, stats }
}
}
#[async_trait::async_trait]
impl<B: Backend> Backend for TrackedBackend<B> {
async fn query(&self, sql: &str, params: &[Value]) -> Result<ResultSet, BackendError> {
let start = std::time::Instant::now();
let result = self.inner.query(sql, params).await;
let duration = start.elapsed();
let rows = result.as_ref().map_or(0, |rs| rs.rows.len() as u64);
self.stats.record_query(sql, duration, result.is_ok(), rows);
result
}
async fn execute(&self, sql: &str, params: &[Value]) -> Result<ExecuteResult, BackendError> {
let start = std::time::Instant::now();
let result = self.inner.execute(sql, params).await;
let duration = start.elapsed();
let rows = result.as_ref().map_or(0, |r| r.affected_rows);
self.stats.record_mutation(sql, duration, result.is_ok(), rows);
result
}
}This lives in ephpm-server (not in ephpm-query-stats) because it depends on both litewire_backend::Backend and QueryStats. The stats crate stays runtime-agnostic.
Wiring in ephpm-server
// In start_litewire():
let stats = query_stats.clone(); // shared with DB proxy
let db = Rusqlite::open(&config.path)?;
let tracked = TrackedBackend::new(db, stats);
LiteWire::new(tracked).mysql("...").serve().awaitDB Proxy Integration
The DB Proxy calls stats.record() directly from its query routing code. No wrapper needed – the proxy already intercepts every query for R/W classification.
// In ephpm-db mysql proxy, after forwarding:
let start = Instant::now();
let result = forward_to_backend(sql, &mut conn).await;
let duration = start.elapsed();
stats.record(sql, duration, result.is_ok(), rows_affected);The QueryStats instance is passed into the proxy at construction time (from start_db_proxies).
Configuration
[db]
# Slow query threshold (applies to both DB Proxy and LiteWire).
# Default: "1s"
slow_query_threshold = "1s"
[db.analysis]
# Maximum distinct query digests to track.
# Default: 100000
digest_max_entries = 100000
# Include query stats in Prometheus /metrics endpoint.
# Default: true
query_metrics = trueThe [db.analysis] section already exists in DbConfig (currently used for planned auto-explain features). Query stats configuration fits naturally here.
Lifecycle
ephpm starts
|
v
QueryStats::new(config) -- creates empty DashMap
|
+---> passed to start_db_proxies() ---> DB Proxy calls stats.record()
|
+---> passed to start_litewire() ---> TrackedBackend calls stats.record()
|
v
Prometheus scrapes /metrics -- reads from QueryStats
|
v
ephpm shuts down -- QueryStats dropped, all stats lost (ephemeral by design)Stats are ephemeral – they exist only in memory for the lifetime of the process. This is intentional:
- No disk I/O on the query hot path
- No persistence complexity
- Prometheus handles long-term storage and alerting
- Process restart gives a clean slate (same as MySQL’s
performance_schemaafter restart)
Top Queries Endpoint
Optional: expose a /__/queries admin endpoint (alongside the existing /__/metrics) that returns the top N queries by total time. Useful for quick debugging without Prometheus.
GET /__/queries?limit=20&sort=total_time
[
{
"digest": "SELECT * FROM users WHERE id = ?",
"count": 45821,
"error_count": 3,
"total_time_ms": 12340,
"avg_time_ms": 0.27,
"max_time_ms": 45.2,
"rows_total": 45818,
"first_seen": "2024-01-15T10:30:00Z",
"last_seen": "2024-01-15T14:22:11Z"
},
...
]Sort options: total_time (default), count, avg_time, max_time, error_count.
Testing Strategy
Unit tests (
ephpm-query-stats):- Normalization: literals, strings, numbers, IN lists, comments, whitespace
- Digest hashing: same SQL produces same hash, different SQL produces different hash
- Stats recording: counts, timing aggregation, max/min tracking
- Memory bound: max_digests enforcement
- Slow query detection: threshold comparison
Integration tests (
ephpm-server):TrackedBackendwraps rusqlite, queries are recorded- Stats appear in Prometheus metrics output
- Slow query threshold triggers warn log
E2E tests (
ephpm-e2e):- PHP executes queries through LiteWire
/metricsendpoint containsephpm_query_duration_seconds- Repeated queries aggregate under the same digest
Dependencies
| Crate | Purpose | Already in workspace? |
|---|---|---|
dashmap | Concurrent digest map | Yes (ephpm-kv) |
parking_lot | Atomic entry updates | Yes |
tracing | Slow query logging | Yes |
metrics | Prometheus integration | Yes |
No new external dependencies required.
Implementation Phases
| Phase | Scope | Milestone |
|---|---|---|
| 1 | ephpm-query-stats crate: normalizer, digest, QueryStats struct, record() | Stats recording works |
| 2 | Slow query logging (tracing warn) | Operators get alerts |
| 3 | TrackedBackend in ephpm-server, LiteWire integration | LiteWire queries tracked |
| 4 | DB Proxy integration (pass QueryStats to proxy) | Both paths tracked |
| 5 | Prometheus metrics registration | Dashboards work |
| 6 | /__/queries admin endpoint | Quick debugging |
| 7 | Config integration ([db.analysis] fields) | User-configurable |
Phase 1-3 are the MVP. Phase 4 requires touching ephpm-db internals. Phase 5-7 are polish.
Prior Art
| Project | Approach |
|---|---|
MySQL performance_schema | In-process digest tracking, same normalization approach |
| ProxySQL | Proxy-level query digest with stats table, regex-based normalization |
| PgBouncer | No query-level stats (connection-level only) |
| pgDog | Query digest with Prometheus export |
The normalizer design follows MySQL’s performance_schema approach (character-level state machine) rather than ProxySQL’s regex approach (fragile with complex SQL).