alvis/oO

Files

alvis 7d4c29e137 feat(profile): user-profile feature registry + builder (phase A)

Centralizes user-level features (completion_rate_30d, dismiss_rate_30d,
mean_dwell_ms_30d, preferred_hour, tip_volume_30d) in a TS registry that
owns both definition and SQL aggregation, since the data lives in the
TS-owned SQLite tables (tip_views/tip_feedback). Lazy TTL refresh keeps
recommend latency bounded; values persist in user_profile_features (KV).

ml/serving accepts profile_features on /score + /generate but does not
yet consume them — extending the bandit feature vector changes D and
resets every user's learned state, so that's a deliberate phase-B step.

Includes ml/features/profile_schema.py as a contract mirror with a sync
test that diffs name sets against registry.ts.

ADR-0011 records the data-locality reasoning (registry in TS, not Python
as the issue originally suggested).

Phase B (deferred): event-driven incremental updates, bandit consumption
with state migration, admin per-user profile page, staleness alerts.

Refs #81.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

2026-04-25 00:22:22 +00:00

3.8 KiB

Raw Blame History

ADR-0011 — User-profile feature registry

Status: Accepted (phase A) Date: 2026-04-25 Issue: #81

Context

The bandit and LLM tip generator only saw per-candidate features (is_overdue, task_age_days, priority) plus contextual time signals. There was no notion of a user-level profile — completion rate, dismiss rate, preferred hour, tip volume — even though all the raw data already lives in tip_views, tip_feedback, and tip_scores.

#81 originally proposed putting the feature registry in ml/features/ (Python). We're choosing differently for the data-locality reason: the aggregations are SQL queries against tables owned by services/api. Computing them in Python means a network round-trip per recommendation for queries that are sub-ms in TS.

Decision

Two-sided design with one source of truth:

services/api/src/profile/registry.ts — source of truth. Each FeatureDefinition declares { name, dtype, ttlSec, description, compute }. compute(userId, sqlite) runs the aggregation SQL directly via the raw better-sqlite3 client.
services/api/src/profile/builder.ts — getProfile(userId) returns the full feature dict, lazily recomputing any entry whose stored row is past its ttlSec. rebuildProfile(userId) force-refreshes everything.
user_profile_features table — KV per (user_id, name) with value (REAL) for numeric and value_text (TEXT) for categorical. Phase A ships only numeric features.
ml/features/profile_schema.py — contract mirror. Names, dtypes, and descriptions only — no compute. A test reads the TS file and asserts the name sets match, catching drift.
POST /score and POST /generate in ml/serving accept an optional profile_features: dict | None. Stored on the request object but not consumed by the bandit yet — extending the feature vector changes D and resets every user's learned state. That's a deliberate phase-B decision.

Initial features: completion_rate_30d, dismiss_rate_30d, mean_dwell_ms_30d, preferred_hour, tip_volume_30d.

Consequences

Good:

Adding a feature = one entry in registry.ts + one mirror line in profile_schema.py. No DB migration required (KV table).
TTL keeps recommendation latency bounded: every recommend call refreshes at most 5 features, each a single indexed query against an already-warm DB.
Profile data is now visible to ml/serving via the request payload — eval harnesses and the LLM tip generator can use it without a DB round-trip.

Trade-offs:

TS owns compute → ml-side changes that need new features still require a TS PR. Acceptable while the modular monolith holds; if ml/serving becomes the system of record for any feature, it should own its own table.
TTL-based refresh has up-to-ttlSec lag on user-visible behavior change. Phase B replaces this with event-driven incremental updates subscribing to signals.tip.feedback.

Phase B (deferred)

Subscribe to signals.tip.feedback for incremental updates instead of TTL.
Extend the bandit feature vector to include profile features (deliberate D change with state-migration plan).
Admin page: per-user profile view + manual rebuild button.
Staleness/data-quality alerts in /admin/data-quality.

Alternatives considered

Registry in Python (per the original issue text) — rejected: the aggregations live in TS-owned tables; round-tripping per recommend adds latency for no architectural gain.

Compute in the recommender route inline — rejected: features would be recomputed on every recommendation with no cache or staleness semantics.

Use tip_scores.featuresJson as the profile store — rejected: that column is per-tip explainability, not per-user state. Mixing them complicates both reads.

3.8 KiB Raw Blame History