oO/docs/adr/0011-user-profile-features.md

# ADR-0011 — User-profile feature registry

**Status:** Accepted (phase A)
**Date:** 2026-04-25
**Issue:** #81

## Context

The bandit and LLM tip generator only saw per-candidate features (`is_overdue`,
`task_age_days`, `priority`) plus contextual time signals. There was no notion
of a *user-level* profile — completion rate, dismiss rate, preferred hour, tip
volume — even though all the raw data already lives in `tip_views`,
`tip_feedback`, and `tip_scores`.

#81 originally proposed putting the feature registry in `ml/features/` (Python).
We're choosing differently for the data-locality reason: the aggregations are
SQL queries against tables owned by `services/api`. Computing them in Python
means a network round-trip per recommendation for queries that are sub-ms in TS.

## Decision

Two-sided design with one source of truth:

- **`services/api/src/profile/registry.ts`** — *source of truth*. Each
  `FeatureDefinition` declares `{ name, dtype, ttlSec, description, compute }`.
  `compute(userId, sqlite)` runs the aggregation SQL directly via the raw
  better-sqlite3 client.
- **`services/api/src/profile/builder.ts`** — `getProfile(userId)` returns the
  full feature dict, lazily recomputing any entry whose stored row is past its
  `ttlSec`. `rebuildProfile(userId)` force-refreshes everything.
- **`user_profile_features` table** — KV per `(user_id, name)` with `value`
  (REAL) for numeric and `value_text` (TEXT) for categorical. Phase A
  ships only numeric features.
- **`ml/features/profile_schema.py`** — *contract mirror*. Names, dtypes, and
  descriptions only — no compute. A test reads the TS file and asserts the
  name sets match, catching drift.
- **`POST /score` and `POST /generate`** in `ml/serving` accept an optional
  `profile_features: dict | None`. Stored on the request object but **not
  consumed by the bandit yet** — extending the feature vector changes `D` and
  resets every user's learned state. That's a deliberate phase-B decision.

Initial features: `completion_rate_30d`, `dismiss_rate_30d`,
`mean_dwell_ms_30d`, `preferred_hour`, `tip_volume_30d`.

## Consequences

**Good:**
- Adding a feature = one entry in `registry.ts` + one mirror line in
  `profile_schema.py`. No DB migration required (KV table).
- TTL keeps recommendation latency bounded: every recommend call refreshes at
  most 5 features, each a single indexed query against an already-warm DB.
- Profile data is now visible to ml/serving via the request payload — eval
  harnesses and the LLM tip generator can use it without a DB round-trip.

**Trade-offs:**
- TS owns compute → ml-side changes that need new features still require a
  TS PR. Acceptable while the modular monolith holds; if `ml/serving`
  becomes the system of record for any feature, it should own its own table.
- TTL-based refresh has up-to-`ttlSec` lag on user-visible behavior change.
  Phase B replaces this with event-driven incremental updates subscribing to
  `signals.tip.feedback`.

## Phase B (deferred)

- Subscribe to `signals.tip.feedback` for incremental updates instead of TTL.
- Extend the bandit feature vector to include profile features (deliberate
  `D` change with state-migration plan).
- Admin page: per-user profile view + manual rebuild button.
- Staleness/data-quality alerts in `/admin/data-quality`.

## Alternatives considered

**Registry in Python (per the original issue text)** — rejected: the
aggregations live in TS-owned tables; round-tripping per recommend adds
latency for no architectural gain.

**Compute in the recommender route inline** — rejected: features would be
recomputed on every recommendation with no cache or staleness semantics.

**Use `tip_scores.featuresJson` as the profile store** — rejected: that
column is per-tip explainability, not per-user state. Mixing them complicates
both reads.