feat(profile): user-profile feature registry + builder (phase A)

Centralizes user-level features (completion_rate_30d, dismiss_rate_30d, mean_dwell_ms_30d, preferred_hour, tip_volume_30d) in a TS registry that owns both definition and SQL aggregation, since the data lives in the TS-owned SQLite tables (tip_views/tip_feedback). Lazy TTL refresh keeps recommend latency bounded; values persist in user_profile_features (KV). ml/serving accepts profile_features on /score + /generate but does not yet consume them — extending the bandit feature vector changes D and resets every user's learned state, so that's a deliberate phase-B step. Includes ml/features/profile_schema.py as a contract mirror with a sync test that diffs name sets against registry.ts. ADR-0011 records the data-locality reasoning (registry in TS, not Python as the issue originally suggested). Phase B (deferred): event-driven incremental updates, bandit consumption with state migration, admin per-user profile page, staleness alerts. Refs #81. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-25 00:22:22 +00:00
parent 430804e9a5
commit 7d4c29e137
13 changed files with 636 additions and 2 deletions
--- a/docs/adr/0011-user-profile-features.md
+++ b/docs/adr/0011-user-profile-features.md
@@ -0,0 +1,82 @@
+# ADR-0011 — User-profile feature registry
+
+**Status:** Accepted (phase A)
+**Date:** 2026-04-25
+**Issue:** #81
+
+## Context
+
+The bandit and LLM tip generator only saw per-candidate features (`is_overdue`,
+`task_age_days`, `priority`) plus contextual time signals. There was no notion
+of a *user-level* profile — completion rate, dismiss rate, preferred hour, tip
+volume — even though all the raw data already lives in `tip_views`,
+`tip_feedback`, and `tip_scores`.
+
+#81 originally proposed putting the feature registry in `ml/features/` (Python).
+We're choosing differently for the data-locality reason: the aggregations are
+SQL queries against tables owned by `services/api`. Computing them in Python
+means a network round-trip per recommendation for queries that are sub-ms in TS.
+
+## Decision
+
+Two-sided design with one source of truth:
+
+- **`services/api/src/profile/registry.ts`** — *source of truth*. Each
+  `FeatureDefinition` declares `{ name, dtype, ttlSec, description, compute }`.
+  `compute(userId, sqlite)` runs the aggregation SQL directly via the raw
+  better-sqlite3 client.
+- **`services/api/src/profile/builder.ts`** — `getProfile(userId)` returns the
+  full feature dict, lazily recomputing any entry whose stored row is past its
+  `ttlSec`. `rebuildProfile(userId)` force-refreshes everything.
+- **`user_profile_features` table** — KV per `(user_id, name)` with `value`
+  (REAL) for numeric and `value_text` (TEXT) for categorical. Phase A
+  ships only numeric features.
+- **`ml/features/profile_schema.py`** — *contract mirror*. Names, dtypes, and
+  descriptions only — no compute. A test reads the TS file and asserts the
+  name sets match, catching drift.
+- **`POST /score` and `POST /generate`** in `ml/serving` accept an optional
+  `profile_features: dict | None`. Stored on the request object but **not
+  consumed by the bandit yet** — extending the feature vector changes `D` and
+  resets every user's learned state. That's a deliberate phase-B decision.
+
+Initial features: `completion_rate_30d`, `dismiss_rate_30d`,
+`mean_dwell_ms_30d`, `preferred_hour`, `tip_volume_30d`.
+
+## Consequences
+
+**Good:**
+- Adding a feature = one entry in `registry.ts` + one mirror line in
+  `profile_schema.py`. No DB migration required (KV table).
+- TTL keeps recommendation latency bounded: every recommend call refreshes at
+  most 5 features, each a single indexed query against an already-warm DB.
+- Profile data is now visible to ml/serving via the request payload — eval
+  harnesses and the LLM tip generator can use it without a DB round-trip.
+
+**Trade-offs:**
+- TS owns compute → ml-side changes that need new features still require a
+  TS PR. Acceptable while the modular monolith holds; if `ml/serving`
+  becomes the system of record for any feature, it should own its own table.
+- TTL-based refresh has up-to-`ttlSec` lag on user-visible behavior change.
+  Phase B replaces this with event-driven incremental updates subscribing to
+  `signals.tip.feedback`.
+
+## Phase B (deferred)
+
+- Subscribe to `signals.tip.feedback` for incremental updates instead of TTL.
+- Extend the bandit feature vector to include profile features (deliberate
+  `D` change with state-migration plan).
+- Admin page: per-user profile view + manual rebuild button.
+- Staleness/data-quality alerts in `/admin/data-quality`.
+
+## Alternatives considered
+
+**Registry in Python (per the original issue text)** — rejected: the
+aggregations live in TS-owned tables; round-tripping per recommend adds
+latency for no architectural gain.
+
+**Compute in the recommender route inline** — rejected: features would be
+recomputed on every recommendation with no cache or staleness semantics.
+
+**Use `tip_scores.featuresJson` as the profile store** — rejected: that
+column is per-tip explainability, not per-user state. Mixing them complicates
+both reads.