Skip to content
Query Stats

Query Stats

A shared query observability layer for ePHPm. Records timing, throughput, and error rates for every SQL query regardless of which runtime path handles it (DB Proxy or LiteWire).

Problem

ePHPm has two SQL strategies (see sql.md):

  • DB Proxy (ephpm-db) – forwards MySQL wire traffic to a real MySQL server
  • LiteWire (litewire) – translates MySQL wire traffic to SQLite

Both see every query PHP executes. Neither currently records query-level metrics. Without this, operators have no visibility into slow queries, hot tables, error patterns, or query mix. They have to bolt on external tools (slow query log on MySQL, APM agents in PHP) that don’t exist in the LiteWire/SQLite path at all.

Solution

A single ephpm-query-stats crate that both runtimes call into. One set of metrics, one slow query threshold, one Prometheus dashboard – whether the backend is SQLite or MySQL.

   ┌───────────────────────────────── ePHPm ─────────────────────────────────┐
   │                                                                         │
   │     LiteWire Path                          DB Proxy Path                │
   │     ──────────────                         ──────────────                │
   │     PHP (pdo_mysql)                        PHP (pdo_mysql)              │
   │           │                                       │                     │
   │           ▼                                       ▼                     │
   │     LiteWire MySQL Frontend                MySQL Proxy ──┐              │
   │           │                                       │      │              │
   │           ▼                                       ▼      │              │
   │     TrackedBackend ──┐                       Real MySQL  │              │
   │           │          │                                   │              │
   │           ▼          │ record                     record │              │
   │      rusqlite        │                                   │              │
   │                      │                                   │              │
   │                      └───────► QueryStats ◄──────────────┘              │
   │                              (ephpm-query-stats)                        │
   │                                       │                                 │
   │                                       ▼                                 │
   │                              Prometheus /metrics                        │
   │                                                                         │
   └─────────────────────────────────────────────────────────────────────────┘

Architecture

Crate: ephpm-query-stats

New workspace crate. No dependency on litewire, ephpm-db, or any runtime-specific code. Depends only on dashmap, parking_lot, tracing, metrics.

crates/ephpm-query-stats/
    src/
        lib.rs          # QueryStats, DigestEntry, public API
        digest.rs       # SQL normalization and hashing
        prometheus.rs   # Metrics registration and recording

Core API

use std::sync::Arc;
use std::time::Duration;

/// Shared query stats collector.
///
/// Thread-safe, cheaply cloneable. Create one per ePHPm process and pass
/// it to both the DB Proxy and LiteWire integration.
#[derive(Clone)]
pub struct QueryStats {
    entries: Arc<DashMap<u64, DigestEntry>>,
    config: StatsConfig,
}

pub struct StatsConfig {
    /// Queries slower than this are logged at WARN level.
    /// Default: 1 second.
    pub slow_query_threshold: Duration,

    /// Maximum number of distinct query digests to track.
    /// Prevents unbounded memory growth from unique queries.
    /// Default: 100,000.
    pub max_digests: usize,

    /// Whether to record per-digest Prometheus histograms.
    /// Default: true.
    pub prometheus_enabled: bool,
}

impl QueryStats {
    /// Record a completed query.
    ///
    /// Called by both the DB Proxy (after forwarding) and the TrackedBackend
    /// (wrapping LiteWire's rusqlite/hrana-client backend).
    pub fn record(&self, sql: &str, duration: Duration, success: bool);

    /// Get the digest entry for a normalized query, if tracked.
    pub fn get(&self, digest_id: u64) -> Option<DigestEntry>;

    /// Snapshot of all tracked digests, sorted by total time descending.
    pub fn top_queries(&self, limit: usize) -> Vec<DigestEntry>;

    /// Reset all counters. Useful for periodic reporting windows.
    pub fn reset(&self);
}

Query Digest

The digest system normalizes SQL queries so that queries differing only in literal values map to the same digest. This is how MySQL’s performance_schema.events_statements_summary_by_digest works.

/// Normalize a SQL query for digest computation.
///
/// Replaces literal values with `?` placeholders so that
/// `SELECT * FROM users WHERE id = 42` and
/// `SELECT * FROM users WHERE id = 99` produce the same digest.
pub fn normalize(sql: &str) -> String;

/// Compute a 64-bit hash of the normalized SQL.
pub fn digest_id(normalized: &str) -> u64;

Normalization Rules

InputNormalized
SELECT * FROM users WHERE id = 42SELECT * FROM users WHERE id = ?
INSERT INTO t VALUES (1, 'hello', 3.14)INSERT INTO t VALUES (?, ?, ?)
WHERE name = 'Alice' AND age > 30WHERE name = ? AND age > ?
WHERE id IN (1, 2, 3, 4, 5)WHERE id IN (?, ...)
SELECT 1SELECT ?
Comments stripped/* ... */ and -- ... removed
Whitespace collapsedMultiple spaces/newlines become single space

The normalizer operates on raw SQL strings (not parsed ASTs) for performance. It uses a simple state machine that walks the SQL character by character, detecting quoted strings, numbers, and comments. This is the same approach MySQL and ProxySQL use – full parsing would be correct but expensive for a hot path.

Normalization Implementation

/// State machine for SQL normalization.
enum State {
    Normal,
    SingleQuotedString,
    DoubleQuotedString,
    BacktickIdentifier,
    LineComment,
    BlockComment,
    Number,
}

Walk the input character by character:

  1. Quoted strings ('hello', "world") – replace entire string with ?
  2. Numbers (42, 3.14, -1, 0xFF) – replace with ?
  3. IN lists (IN (?, ?, ?, ?)) – collapse to IN (?, ...)
  4. Comments – strip entirely
  5. Whitespace – collapse to single space
  6. Everything else – pass through (keywords, identifiers, operators)

Edge cases:

  • Escaped quotes inside strings: 'it''s' is one string, not two
  • Negative numbers: -42 – the - is a unary operator, 42 is the number
  • Hex literals: 0xDEAD – replace with ?
  • Backtick identifiers: `table_name` – pass through (not a value)
  • NULL keyword – do NOT normalize (it’s semantically meaningful, not a literal value)

Digest Entry

/// Accumulated statistics for a single query digest.
#[derive(Clone, Debug)]
pub struct DigestEntry {
    /// The 64-bit digest hash.
    pub digest_id: u64,

    /// The normalized SQL string (with `?` placeholders).
    pub digest_text: String,

    /// An example of the original (non-normalized) SQL.
    /// Updated periodically, not on every call (to avoid allocation pressure).
    pub example_sql: String,

    /// Total number of executions.
    pub count: u64,

    /// Number of failed executions (backend returned an error).
    pub error_count: u64,

    /// Total wall-clock time spent executing this query.
    pub total_time: Duration,

    /// Minimum execution time observed.
    pub min_time: Duration,

    /// Maximum execution time observed.
    pub max_time: Duration,

    /// Sum of rows returned (for queries) or affected (for mutations).
    pub total_rows: u64,

    /// Timestamp of the first execution.
    pub first_seen: Instant,

    /// Timestamp of the most recent execution.
    pub last_seen: Instant,
}

Internal Storage

/// DashMap keyed by digest_id (u64 hash).
///
/// Each shard holds a DigestEntry that is updated atomically.
/// DashMap provides concurrent read/write access without a global lock.
entries: Arc<DashMap<u64, DigestEntry>>

Why DashMap and not a Mutex<HashMap>:

  • The DB Proxy and LiteWire run on separate tokio tasks. Both call record() concurrently.
  • DashMap shards the map internally, so concurrent writes to different digests don’t contend.
  • ePHPm already depends on dashmap (used by ephpm-kv).

Memory bound: max_digests (default 100,000) caps the number of entries. When full, new digests are silently dropped (existing entries continue to update). This prevents OOM from applications that generate unique SQL per request (e.g., ORMs with inline literals).

Slow Query Logging

When record() observes duration > slow_query_threshold:

WARN query.slow: sql="SELECT * FROM users WHERE email = ?" duration=2.3s rows=1 digest=0xABCD1234

The log includes the normalized SQL (no PII from literal values) and the duration. The digest field lets operators correlate with Prometheus metrics.

The threshold is configurable via TOML:

[db]
slow_query_threshold = "1s"

Prometheus Metrics

Registered once at startup via the metrics crate (same pattern as existing ePHPm metrics):

MetricTypeLabelsDescription
ephpm_query_duration_secondsHistogramdigest, kindPer-digest execution time distribution
ephpm_query_totalCounterdigest, kind, statusTotal query count (success/error)
ephpm_query_rows_totalCounterdigest, kindTotal rows returned/affected
ephpm_query_slow_totalCounterQueries exceeding the slow threshold
ephpm_query_active_digestsGaugeNumber of distinct digests being tracked

Labels:

  • digest – first 16 chars of the normalized SQL (truncated for cardinality control)
  • kindquery or mutation (determined by first keyword: SELECT vs INSERT/UPDATE/DELETE)
  • statusok or error

Cardinality control: The digest label uses a truncated prefix, not the full SQL or hash. This caps Prometheus label cardinality at max_digests. Applications generating millions of unique queries won’t explode the metrics store.

Alternative: use the digest hash as label instead of truncated SQL. More compact but harder to read in dashboards. Make this configurable.

LiteWire Integration: TrackedBackend

A decorator that wraps any litewire_backend::Backend implementation, recording stats before delegating.

use litewire_backend::{Backend, BackendError, ExecuteResult, ResultSet, Value};

/// A backend wrapper that records query stats.
pub struct TrackedBackend<B> {
    inner: B,
    stats: QueryStats,
}

impl<B> TrackedBackend<B> {
    pub fn new(inner: B, stats: QueryStats) -> Self {
        Self { inner, stats }
    }
}

#[async_trait::async_trait]
impl<B: Backend> Backend for TrackedBackend<B> {
    async fn query(&self, sql: &str, params: &[Value]) -> Result<ResultSet, BackendError> {
        let start = std::time::Instant::now();
        let result = self.inner.query(sql, params).await;
        let duration = start.elapsed();
        let rows = result.as_ref().map_or(0, |rs| rs.rows.len() as u64);
        self.stats.record_query(sql, duration, result.is_ok(), rows);
        result
    }

    async fn execute(&self, sql: &str, params: &[Value]) -> Result<ExecuteResult, BackendError> {
        let start = std::time::Instant::now();
        let result = self.inner.execute(sql, params).await;
        let duration = start.elapsed();
        let rows = result.as_ref().map_or(0, |r| r.affected_rows);
        self.stats.record_mutation(sql, duration, result.is_ok(), rows);
        result
    }
}

This lives in ephpm-server (not in ephpm-query-stats) because it depends on both litewire_backend::Backend and QueryStats. The stats crate stays runtime-agnostic.

Wiring in ephpm-server

// In start_litewire():
let stats = query_stats.clone(); // shared with DB proxy
let db = Rusqlite::open(&config.path)?;
let tracked = TrackedBackend::new(db, stats);
LiteWire::new(tracked).mysql("...").serve().await

DB Proxy Integration

The DB Proxy calls stats.record() directly from its query routing code. No wrapper needed – the proxy already intercepts every query for R/W classification.

// In ephpm-db mysql proxy, after forwarding:
let start = Instant::now();
let result = forward_to_backend(sql, &mut conn).await;
let duration = start.elapsed();
stats.record(sql, duration, result.is_ok(), rows_affected);

The QueryStats instance is passed into the proxy at construction time (from start_db_proxies).

Configuration

[db]
# Slow query threshold (applies to both DB Proxy and LiteWire).
# Default: "1s"
slow_query_threshold = "1s"

[db.analysis]
# Maximum distinct query digests to track.
# Default: 100000
digest_max_entries = 100000

# Include query stats in Prometheus /metrics endpoint.
# Default: true
query_metrics = true

The [db.analysis] section already exists in DbConfig (currently used for planned auto-explain features). Query stats configuration fits naturally here.

Lifecycle

ephpm starts
    |
    v
QueryStats::new(config) -- creates empty DashMap
    |
    +---> passed to start_db_proxies() ---> DB Proxy calls stats.record()
    |
    +---> passed to start_litewire()   ---> TrackedBackend calls stats.record()
    |
    v
Prometheus scrapes /metrics -- reads from QueryStats
    |
    v
ephpm shuts down -- QueryStats dropped, all stats lost (ephemeral by design)

Stats are ephemeral – they exist only in memory for the lifetime of the process. This is intentional:

  • No disk I/O on the query hot path
  • No persistence complexity
  • Prometheus handles long-term storage and alerting
  • Process restart gives a clean slate (same as MySQL’s performance_schema after restart)

Top Queries Endpoint

Optional: expose a /__/queries admin endpoint (alongside the existing /__/metrics) that returns the top N queries by total time. Useful for quick debugging without Prometheus.

GET /__/queries?limit=20&sort=total_time

[
  {
    "digest": "SELECT * FROM users WHERE id = ?",
    "count": 45821,
    "error_count": 3,
    "total_time_ms": 12340,
    "avg_time_ms": 0.27,
    "max_time_ms": 45.2,
    "rows_total": 45818,
    "first_seen": "2024-01-15T10:30:00Z",
    "last_seen": "2024-01-15T14:22:11Z"
  },
  ...
]

Sort options: total_time (default), count, avg_time, max_time, error_count.

Testing Strategy

  1. Unit tests (ephpm-query-stats):

    • Normalization: literals, strings, numbers, IN lists, comments, whitespace
    • Digest hashing: same SQL produces same hash, different SQL produces different hash
    • Stats recording: counts, timing aggregation, max/min tracking
    • Memory bound: max_digests enforcement
    • Slow query detection: threshold comparison
  2. Integration tests (ephpm-server):

    • TrackedBackend wraps rusqlite, queries are recorded
    • Stats appear in Prometheus metrics output
    • Slow query threshold triggers warn log
  3. E2E tests (ephpm-e2e):

    • PHP executes queries through LiteWire
    • /metrics endpoint contains ephpm_query_duration_seconds
    • Repeated queries aggregate under the same digest

Dependencies

CratePurposeAlready in workspace?
dashmapConcurrent digest mapYes (ephpm-kv)
parking_lotAtomic entry updatesYes
tracingSlow query loggingYes
metricsPrometheus integrationYes

No new external dependencies required.

Implementation Phases

PhaseScopeMilestone
1ephpm-query-stats crate: normalizer, digest, QueryStats struct, record()Stats recording works
2Slow query logging (tracing warn)Operators get alerts
3TrackedBackend in ephpm-server, LiteWire integrationLiteWire queries tracked
4DB Proxy integration (pass QueryStats to proxy)Both paths tracked
5Prometheus metrics registrationDashboards work
6/__/queries admin endpointQuick debugging
7Config integration ([db.analysis] fields)User-configurable

Phase 1-3 are the MVP. Phase 4 requires touching ephpm-db internals. Phase 5-7 are polish.

Prior Art

ProjectApproach
MySQL performance_schemaIn-process digest tracking, same normalization approach
ProxySQLProxy-level query digest with stats table, regex-based normalization
PgBouncerNo query-level stats (connection-level only)
pgDogQuery digest with Prometheus export

The normalizer design follows MySQL’s performance_schema approach (character-level state machine) rather than ProxySQL’s regex approach (fragile with complex SQL).