feat(profile): user-profile feature registry + builder (phase A)

Centralizes user-level features (completion_rate_30d, dismiss_rate_30d,
mean_dwell_ms_30d, preferred_hour, tip_volume_30d) in a TS registry that
owns both definition and SQL aggregation, since the data lives in the
TS-owned SQLite tables (tip_views/tip_feedback). Lazy TTL refresh keeps
recommend latency bounded; values persist in user_profile_features (KV).

ml/serving accepts profile_features on /score + /generate but does not
yet consume them — extending the bandit feature vector changes D and
resets every user's learned state, so that's a deliberate phase-B step.

Includes ml/features/profile_schema.py as a contract mirror with a sync
test that diffs name sets against registry.ts.

ADR-0011 records the data-locality reasoning (registry in TS, not Python
as the issue originally suggested).

Phase B (deferred): event-driven incremental updates, bandit consumption
with state migration, admin per-user profile page, staleness alerts.

Refs #81.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
2026-04-25 00:22:22 +00:00
parent 430804e9a5
commit 7d4c29e137
13 changed files with 636 additions and 2 deletions

View File

@@ -87,6 +87,20 @@ export const tipScores = sqliteTable('tip_scores', {
tipKind: text('tip_kind'), // 'task' | 'advice' | 'insight' | 'reminder'
});
// ── User profile features (#81 phase A) ────────────────────────────────────
// One row per (userId, name). KV store for aggregated user-level features
// computed from tip_views/tip_feedback/tip_scores. Numeric values land in
// `value`; categorical/string values use `value_text` (never both). Entries
// are recomputed lazily by the profile builder when older than `ttl_sec`.
export const userProfileFeatures = sqliteTable('user_profile_features', {
userId: text('user_id').notNull().references(() => users.id),
name: text('name').notNull(), // e.g. 'completion_rate_30d'
value: integer('value', { mode: 'number' }), // numeric (REAL stored as number); null if categorical
valueText: text('value_text'), // categorical/string; null if numeric
updatedAt: text('updated_at').notNull(),
ttlSec: integer('ttl_sec').notNull(), // staleness threshold; 0 = never auto-refresh
});
// ── Simulation runs ──────────────────────────────────────────────────────────
// One row per offline simulation run (two-policy comparison).
export const simRuns = sqliteTable('sim_runs', {