feat: feature registry + user profile builder #81

Closed
opened 2026-04-16 15:26:11 +00:00 by alvis · 2 comments
Owner

Motivation

Features are currently computed inline in the recommender per Todoist task. As signal sources multiply, we need:

  1. A feature registry — centralized definitions decoupled from sources
  2. A profile builder — aggregates signals over time into persistent user state

Design

Feature Registry

# ml/features/registry.py
class FeatureDefinition:
    name: str           # e.g. "task_age_days", "calendar_busyness", "mood_score"
    source: str         # which signal source produces it
    dtype: str          # float, bool, categorical
    transform: Callable # raw signal → feature value
    staleness_ttl: int  # seconds before feature is considered stale

Profile Builder

Aggregates signals into a per-user profile stored in DB:

  • Activity patterns: when they use the app, typical session length
  • Task behavior: completion rate, snooze rate, preferred categories
  • Temporal preferences: most productive hours, day-of-week patterns
  • Signal freshness: when each source was last synced

Profile features become inputs to the ranking policy alongside per-candidate features.

Tasks

  • Feature definition schema in ml/features/
  • Profile table in SQLite (user_id, feature_name, value, updated_at)
  • Profile builder service that subscribes to signal events
  • Expose profile features to ml/serving alongside candidate features
  • Admin page: per-user profile view
  • Staleness detection + data quality alerts
## Motivation Features are currently computed inline in the recommender per Todoist task. As signal sources multiply, we need: 1. A **feature registry** — centralized definitions decoupled from sources 2. A **profile builder** — aggregates signals over time into persistent user state ## Design ### Feature Registry ```python # ml/features/registry.py class FeatureDefinition: name: str # e.g. "task_age_days", "calendar_busyness", "mood_score" source: str # which signal source produces it dtype: str # float, bool, categorical transform: Callable # raw signal → feature value staleness_ttl: int # seconds before feature is considered stale ``` ### Profile Builder Aggregates signals into a per-user profile stored in DB: - **Activity patterns**: when they use the app, typical session length - **Task behavior**: completion rate, snooze rate, preferred categories - **Temporal preferences**: most productive hours, day-of-week patterns - **Signal freshness**: when each source was last synced Profile features become inputs to the ranking policy alongside per-candidate features. ## Tasks - [ ] Feature definition schema in `ml/features/` - [ ] Profile table in SQLite (user_id, feature_name, value, updated_at) - [ ] Profile builder service that subscribes to signal events - [ ] Expose profile features to ml/serving alongside candidate features - [ ] Admin page: per-user profile view - [ ] Staleness detection + data quality alerts
alvis added this to the M2 — AI tips + multi-source signals milestone 2026-04-16 15:26:11 +00:00
Author
Owner

Phase A shipped in 7d4c29e — registry + lazy-TTL builder in TS, KV table, contract mirror in Python, ADR-0011. ml/serving accepts profile_features but does not yet consume them in the bandit (would reset every user's learned state).

Phase B (still open under this issue):

  • Subscribe to signals.tip.feedback for incremental updates instead of TTL refresh
  • Extend bandit feature vector (deliberate D change with state-migration plan)
  • Admin per-user profile view + manual rebuild button
  • Staleness/data-quality alerts in /admin/data-quality
**Phase A shipped** in 7d4c29e — registry + lazy-TTL builder in TS, KV table, contract mirror in Python, ADR-0011. ml/serving accepts profile_features but does not yet consume them in the bandit (would reset every user's learned state). **Phase B (still open under this issue):** - [ ] Subscribe to `signals.tip.feedback` for incremental updates instead of TTL refresh - [ ] Extend bandit feature vector (deliberate D change with state-migration plan) - [ ] Admin per-user profile view + manual rebuild button - [ ] Staleness/data-quality alerts in /admin/data-quality
Author
Owner

Closing — phase A and phase B (the parts that fit this issue) shipped:

  • Phase A (7d4c29e) — registry + lazy-TTL builder; 5 features; KV table; ADR-0011
  • B.1 (9e96540) — per-user profile view + rebuild action in /admin/users/:id
  • B.4 (4a42a6a) — staleness panel in /admin/data-quality
  • B.2 (ee4eb15) — event-driven invalidation; features declare invalidatedBy subjects, subscriber drops affected rows on publish; TTL stays as safety net

B.3 (bandit consumes profile features, D=12) split out as #99 — needs shadow rollout per ADR-0002, which makes it a multi-step initiative rather than an incremental phase.

Closing — phase A and phase B (the parts that fit this issue) shipped: - **Phase A** (7d4c29e) — registry + lazy-TTL builder; 5 features; KV table; ADR-0011 - **B.1** (9e96540) — per-user profile view + rebuild action in /admin/users/:id - **B.4** (4a42a6a) — staleness panel in /admin/data-quality - **B.2** (ee4eb15) — event-driven invalidation; features declare `invalidatedBy` subjects, subscriber drops affected rows on publish; TTL stays as safety net **B.3** (bandit consumes profile features, D=12) split out as #99 — needs shadow rollout per ADR-0002, which makes it a multi-step initiative rather than an incremental phase.
alvis closed this issue 2026-04-25 00:41:09 +00:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: alvis/oO#81