Centralizes user-level features (completion_rate_30d, dismiss_rate_30d, mean_dwell_ms_30d, preferred_hour, tip_volume_30d) in a TS registry that owns both definition and SQL aggregation, since the data lives in the TS-owned SQLite tables (tip_views/tip_feedback). Lazy TTL refresh keeps recommend latency bounded; values persist in user_profile_features (KV). ml/serving accepts profile_features on /score + /generate but does not yet consume them — extending the bandit feature vector changes D and resets every user's learned state, so that's a deliberate phase-B step. Includes ml/features/profile_schema.py as a contract mirror with a sync test that diffs name sets against registry.ts. ADR-0011 records the data-locality reasoning (registry in TS, not Python as the issue originally suggested). Phase B (deferred): event-driven incremental updates, bandit consumption with state migration, admin per-user profile page, staleness alerts. Refs #81. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
83 lines
3.8 KiB
Markdown
83 lines
3.8 KiB
Markdown
# ADR-0011 — User-profile feature registry
|
|
|
|
**Status:** Accepted (phase A)
|
|
**Date:** 2026-04-25
|
|
**Issue:** #81
|
|
|
|
## Context
|
|
|
|
The bandit and LLM tip generator only saw per-candidate features (`is_overdue`,
|
|
`task_age_days`, `priority`) plus contextual time signals. There was no notion
|
|
of a *user-level* profile — completion rate, dismiss rate, preferred hour, tip
|
|
volume — even though all the raw data already lives in `tip_views`,
|
|
`tip_feedback`, and `tip_scores`.
|
|
|
|
#81 originally proposed putting the feature registry in `ml/features/` (Python).
|
|
We're choosing differently for the data-locality reason: the aggregations are
|
|
SQL queries against tables owned by `services/api`. Computing them in Python
|
|
means a network round-trip per recommendation for queries that are sub-ms in TS.
|
|
|
|
## Decision
|
|
|
|
Two-sided design with one source of truth:
|
|
|
|
- **`services/api/src/profile/registry.ts`** — *source of truth*. Each
|
|
`FeatureDefinition` declares `{ name, dtype, ttlSec, description, compute }`.
|
|
`compute(userId, sqlite)` runs the aggregation SQL directly via the raw
|
|
better-sqlite3 client.
|
|
- **`services/api/src/profile/builder.ts`** — `getProfile(userId)` returns the
|
|
full feature dict, lazily recomputing any entry whose stored row is past its
|
|
`ttlSec`. `rebuildProfile(userId)` force-refreshes everything.
|
|
- **`user_profile_features` table** — KV per `(user_id, name)` with `value`
|
|
(REAL) for numeric and `value_text` (TEXT) for categorical. Phase A
|
|
ships only numeric features.
|
|
- **`ml/features/profile_schema.py`** — *contract mirror*. Names, dtypes, and
|
|
descriptions only — no compute. A test reads the TS file and asserts the
|
|
name sets match, catching drift.
|
|
- **`POST /score` and `POST /generate`** in `ml/serving` accept an optional
|
|
`profile_features: dict | None`. Stored on the request object but **not
|
|
consumed by the bandit yet** — extending the feature vector changes `D` and
|
|
resets every user's learned state. That's a deliberate phase-B decision.
|
|
|
|
Initial features: `completion_rate_30d`, `dismiss_rate_30d`,
|
|
`mean_dwell_ms_30d`, `preferred_hour`, `tip_volume_30d`.
|
|
|
|
## Consequences
|
|
|
|
**Good:**
|
|
- Adding a feature = one entry in `registry.ts` + one mirror line in
|
|
`profile_schema.py`. No DB migration required (KV table).
|
|
- TTL keeps recommendation latency bounded: every recommend call refreshes at
|
|
most 5 features, each a single indexed query against an already-warm DB.
|
|
- Profile data is now visible to ml/serving via the request payload — eval
|
|
harnesses and the LLM tip generator can use it without a DB round-trip.
|
|
|
|
**Trade-offs:**
|
|
- TS owns compute → ml-side changes that need new features still require a
|
|
TS PR. Acceptable while the modular monolith holds; if `ml/serving`
|
|
becomes the system of record for any feature, it should own its own table.
|
|
- TTL-based refresh has up-to-`ttlSec` lag on user-visible behavior change.
|
|
Phase B replaces this with event-driven incremental updates subscribing to
|
|
`signals.tip.feedback`.
|
|
|
|
## Phase B (deferred)
|
|
|
|
- Subscribe to `signals.tip.feedback` for incremental updates instead of TTL.
|
|
- Extend the bandit feature vector to include profile features (deliberate
|
|
`D` change with state-migration plan).
|
|
- Admin page: per-user profile view + manual rebuild button.
|
|
- Staleness/data-quality alerts in `/admin/data-quality`.
|
|
|
|
## Alternatives considered
|
|
|
|
**Registry in Python (per the original issue text)** — rejected: the
|
|
aggregations live in TS-owned tables; round-tripping per recommend adds
|
|
latency for no architectural gain.
|
|
|
|
**Compute in the recommender route inline** — rejected: features would be
|
|
recomputed on every recommendation with no cache or staleness semantics.
|
|
|
|
**Use `tip_scores.featuresJson` as the profile store** — rejected: that
|
|
column is per-tip explainability, not per-user state. Mixing them complicates
|
|
both reads.
|