feat(profile): user-profile feature registry + builder (phase A)
Centralizes user-level features (completion_rate_30d, dismiss_rate_30d, mean_dwell_ms_30d, preferred_hour, tip_volume_30d) in a TS registry that owns both definition and SQL aggregation, since the data lives in the TS-owned SQLite tables (tip_views/tip_feedback). Lazy TTL refresh keeps recommend latency bounded; values persist in user_profile_features (KV). ml/serving accepts profile_features on /score + /generate but does not yet consume them — extending the bandit feature vector changes D and resets every user's learned state, so that's a deliberate phase-B step. Includes ml/features/profile_schema.py as a contract mirror with a sync test that diffs name sets against registry.ts. ADR-0011 records the data-locality reasoning (registry in TS, not Python as the issue originally suggested). Phase B (deferred): event-driven incremental updates, bandit consumption with state migration, admin per-user profile page, staleness alerts. Refs #81. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
82
docs/adr/0011-user-profile-features.md
Normal file
82
docs/adr/0011-user-profile-features.md
Normal file
@@ -0,0 +1,82 @@
|
||||
# ADR-0011 — User-profile feature registry
|
||||
|
||||
**Status:** Accepted (phase A)
|
||||
**Date:** 2026-04-25
|
||||
**Issue:** #81
|
||||
|
||||
## Context
|
||||
|
||||
The bandit and LLM tip generator only saw per-candidate features (`is_overdue`,
|
||||
`task_age_days`, `priority`) plus contextual time signals. There was no notion
|
||||
of a *user-level* profile — completion rate, dismiss rate, preferred hour, tip
|
||||
volume — even though all the raw data already lives in `tip_views`,
|
||||
`tip_feedback`, and `tip_scores`.
|
||||
|
||||
#81 originally proposed putting the feature registry in `ml/features/` (Python).
|
||||
We're choosing differently for the data-locality reason: the aggregations are
|
||||
SQL queries against tables owned by `services/api`. Computing them in Python
|
||||
means a network round-trip per recommendation for queries that are sub-ms in TS.
|
||||
|
||||
## Decision
|
||||
|
||||
Two-sided design with one source of truth:
|
||||
|
||||
- **`services/api/src/profile/registry.ts`** — *source of truth*. Each
|
||||
`FeatureDefinition` declares `{ name, dtype, ttlSec, description, compute }`.
|
||||
`compute(userId, sqlite)` runs the aggregation SQL directly via the raw
|
||||
better-sqlite3 client.
|
||||
- **`services/api/src/profile/builder.ts`** — `getProfile(userId)` returns the
|
||||
full feature dict, lazily recomputing any entry whose stored row is past its
|
||||
`ttlSec`. `rebuildProfile(userId)` force-refreshes everything.
|
||||
- **`user_profile_features` table** — KV per `(user_id, name)` with `value`
|
||||
(REAL) for numeric and `value_text` (TEXT) for categorical. Phase A
|
||||
ships only numeric features.
|
||||
- **`ml/features/profile_schema.py`** — *contract mirror*. Names, dtypes, and
|
||||
descriptions only — no compute. A test reads the TS file and asserts the
|
||||
name sets match, catching drift.
|
||||
- **`POST /score` and `POST /generate`** in `ml/serving` accept an optional
|
||||
`profile_features: dict | None`. Stored on the request object but **not
|
||||
consumed by the bandit yet** — extending the feature vector changes `D` and
|
||||
resets every user's learned state. That's a deliberate phase-B decision.
|
||||
|
||||
Initial features: `completion_rate_30d`, `dismiss_rate_30d`,
|
||||
`mean_dwell_ms_30d`, `preferred_hour`, `tip_volume_30d`.
|
||||
|
||||
## Consequences
|
||||
|
||||
**Good:**
|
||||
- Adding a feature = one entry in `registry.ts` + one mirror line in
|
||||
`profile_schema.py`. No DB migration required (KV table).
|
||||
- TTL keeps recommendation latency bounded: every recommend call refreshes at
|
||||
most 5 features, each a single indexed query against an already-warm DB.
|
||||
- Profile data is now visible to ml/serving via the request payload — eval
|
||||
harnesses and the LLM tip generator can use it without a DB round-trip.
|
||||
|
||||
**Trade-offs:**
|
||||
- TS owns compute → ml-side changes that need new features still require a
|
||||
TS PR. Acceptable while the modular monolith holds; if `ml/serving`
|
||||
becomes the system of record for any feature, it should own its own table.
|
||||
- TTL-based refresh has up-to-`ttlSec` lag on user-visible behavior change.
|
||||
Phase B replaces this with event-driven incremental updates subscribing to
|
||||
`signals.tip.feedback`.
|
||||
|
||||
## Phase B (deferred)
|
||||
|
||||
- Subscribe to `signals.tip.feedback` for incremental updates instead of TTL.
|
||||
- Extend the bandit feature vector to include profile features (deliberate
|
||||
`D` change with state-migration plan).
|
||||
- Admin page: per-user profile view + manual rebuild button.
|
||||
- Staleness/data-quality alerts in `/admin/data-quality`.
|
||||
|
||||
## Alternatives considered
|
||||
|
||||
**Registry in Python (per the original issue text)** — rejected: the
|
||||
aggregations live in TS-owned tables; round-tripping per recommend adds
|
||||
latency for no architectural gain.
|
||||
|
||||
**Compute in the recommender route inline** — rejected: features would be
|
||||
recomputed on every recommendation with no cache or staleness semantics.
|
||||
|
||||
**Use `tip_scores.featuresJson` as the profile store** — rejected: that
|
||||
column is per-tip explainability, not per-user state. Mixing them complicates
|
||||
both reads.
|
||||
Reference in New Issue
Block a user