Centralizes user-level features (completion_rate_30d, dismiss_rate_30d, mean_dwell_ms_30d, preferred_hour, tip_volume_30d) in a TS registry that owns both definition and SQL aggregation, since the data lives in the TS-owned SQLite tables (tip_views/tip_feedback). Lazy TTL refresh keeps recommend latency bounded; values persist in user_profile_features (KV). ml/serving accepts profile_features on /score + /generate but does not yet consume them — extending the bandit feature vector changes D and resets every user's learned state, so that's a deliberate phase-B step. Includes ml/features/profile_schema.py as a contract mirror with a sync test that diffs name sets against registry.ts. ADR-0011 records the data-locality reasoning (registry in TS, not Python as the issue originally suggested). Phase B (deferred): event-driven incremental updates, bandit consumption with state migration, admin per-user profile page, staleness alerts. Refs #81. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
3.8 KiB
ADR-0011 — User-profile feature registry
Status: Accepted (phase A) Date: 2026-04-25 Issue: #81
Context
The bandit and LLM tip generator only saw per-candidate features (is_overdue,
task_age_days, priority) plus contextual time signals. There was no notion
of a user-level profile — completion rate, dismiss rate, preferred hour, tip
volume — even though all the raw data already lives in tip_views,
tip_feedback, and tip_scores.
#81 originally proposed putting the feature registry in ml/features/ (Python).
We're choosing differently for the data-locality reason: the aggregations are
SQL queries against tables owned by services/api. Computing them in Python
means a network round-trip per recommendation for queries that are sub-ms in TS.
Decision
Two-sided design with one source of truth:
services/api/src/profile/registry.ts— source of truth. EachFeatureDefinitiondeclares{ name, dtype, ttlSec, description, compute }.compute(userId, sqlite)runs the aggregation SQL directly via the raw better-sqlite3 client.services/api/src/profile/builder.ts—getProfile(userId)returns the full feature dict, lazily recomputing any entry whose stored row is past itsttlSec.rebuildProfile(userId)force-refreshes everything.user_profile_featurestable — KV per(user_id, name)withvalue(REAL) for numeric andvalue_text(TEXT) for categorical. Phase A ships only numeric features.ml/features/profile_schema.py— contract mirror. Names, dtypes, and descriptions only — no compute. A test reads the TS file and asserts the name sets match, catching drift.POST /scoreandPOST /generateinml/servingaccept an optionalprofile_features: dict | None. Stored on the request object but not consumed by the bandit yet — extending the feature vector changesDand resets every user's learned state. That's a deliberate phase-B decision.
Initial features: completion_rate_30d, dismiss_rate_30d,
mean_dwell_ms_30d, preferred_hour, tip_volume_30d.
Consequences
Good:
- Adding a feature = one entry in
registry.ts+ one mirror line inprofile_schema.py. No DB migration required (KV table). - TTL keeps recommendation latency bounded: every recommend call refreshes at most 5 features, each a single indexed query against an already-warm DB.
- Profile data is now visible to ml/serving via the request payload — eval harnesses and the LLM tip generator can use it without a DB round-trip.
Trade-offs:
- TS owns compute → ml-side changes that need new features still require a
TS PR. Acceptable while the modular monolith holds; if
ml/servingbecomes the system of record for any feature, it should own its own table. - TTL-based refresh has up-to-
ttlSeclag on user-visible behavior change. Phase B replaces this with event-driven incremental updates subscribing tosignals.tip.feedback.
Phase B (deferred)
- Subscribe to
signals.tip.feedbackfor incremental updates instead of TTL. - Extend the bandit feature vector to include profile features (deliberate
Dchange with state-migration plan). - Admin page: per-user profile view + manual rebuild button.
- Staleness/data-quality alerts in
/admin/data-quality.
Alternatives considered
Registry in Python (per the original issue text) — rejected: the aggregations live in TS-owned tables; round-tripping per recommend adds latency for no architectural gain.
Compute in the recommender route inline — rejected: features would be recomputed on every recommendation with no cache or staleness semantics.
Use tip_scores.featuresJson as the profile store — rejected: that
column is per-tip explainability, not per-user state. Mixing them complicates
both reads.