4.2 KiB
ADR-0011 — User-profile feature registry
Status: Accepted (phase A) Date: 2026-04-25 Issue: #81
Context
The bandit and LLM tip generator only saw per-candidate features (is_overdue,
task_age_days, priority) plus contextual time signals. There was no notion
of a user-level profile — completion rate, dismiss rate, preferred hour, tip
volume — even though all the raw data already lives in tip_views,
tip_feedback, and tip_scores.
#81 originally proposed putting the feature registry in ml/features/ (Python).
We're choosing differently for the data-locality reason: the aggregations are
SQL queries against tables owned by services/api. Computing them in Python
means a network round-trip per recommendation for queries that are sub-ms in TS.
Decision
Two-sided design with one source of truth:
services/api/src/profile/registry.ts— source of truth. EachFeatureDefinitiondeclares{ name, dtype, ttlSec, description, compute }.compute(userId, sqlite)runs the aggregation SQL directly via the raw better-sqlite3 client.services/api/src/profile/builder.ts—getProfile(userId)returns the full feature dict, lazily recomputing any entry whose stored row is past itsttlSec.rebuildProfile(userId)force-refreshes everything.user_profile_featurestable — KV per(user_id, name)withvalue(REAL) for numeric andvalue_text(TEXT) for categorical. Phase A ships only numeric features.ml/features/profile_schema.py— contract mirror. Names, dtypes, and descriptions only — no compute. A test reads the TS file and asserts the name sets match, catching drift.POST /scoreandPOST /generateinml/servingaccept an optionalprofile_features: dict | None. Stored on the request object but not consumed by the bandit yet — extending the feature vector changesDand resets every user's learned state. That's a deliberate phase-B decision.
Initial features: completion_rate_30d, dismiss_rate_30d,
mean_dwell_ms_30d, preferred_hour, tip_volume_30d.
Consequences
Good:
- Adding a feature = one entry in
registry.ts+ one mirror line inprofile_schema.py. No DB migration required (KV table). - TTL keeps recommendation latency bounded: every recommend call refreshes at most 5 features, each a single indexed query against an already-warm DB.
- Profile data is now visible to ml/serving via the request payload — eval harnesses and the LLM tip generator can use it without a DB round-trip.
Trade-offs:
- TS owns compute → ml-side changes that need new features still require a
TS PR. Acceptable while the modular monolith holds; if
ml/servingbecomes the system of record for any feature, it should own its own table. - TTL-based refresh has up-to-
ttlSeclag on user-visible behavior change. Phase B replaces this with event-driven incremental updates subscribing tosignals.tip.feedback.
Phase B
- ✅ B.1 — Per-user profile view + rebuild action in
/admin/users/:id. - ✅ B.2 — Event-driven invalidation: features declare
invalidatedBysubjects in the registry;profile/subscriber.tsdeletes the affected stored rows on publish so the nextgetProfilecall recomputes immediately rather than waiting up tottlSec. TTL stays as a safety net for clock drift / dropped events. - ✅ B.4 — Staleness panel in
/admin/data-quality(counts missing + stale per feature across eligible users). - ⏳ B.3 — Extend the bandit feature vector to include profile features
(deliberate
Dchange with state-migration plan + shadow rollout per ADR-0002). Tracked separately as #99 since it's a multi-step initiative, not an incremental phase.
Alternatives considered
Registry in Python (per the original issue text) — rejected: the aggregations live in TS-owned tables; round-tripping per recommend adds latency for no architectural gain.
Compute in the recommender route inline — rejected: features would be recomputed on every recommendation with no cache or staleness semantics.
Use tip_scores.featuresJson as the profile store — rejected: that
column is per-tip explainability, not per-user state. Mixing them complicates
both reads.