feat: feature-to-prompt pipeline — batch context materialization via Airflow #94

Closed
opened 2026-04-17 08:13:20 +00:00 by alvis · 0 comments
Owner

Goal

Shift context assembly from on-demand (inline with the recommendation request) to pre-materialized (computed nightly by Airflow). Reduces p99 recommendation latency and cuts Todoist API calls.

Architecture

Airflow DAG (nightly)
  → fetch_signals: pull Todoist tasks + recent tip reactions for all active users
  → compute_context: run context assembler, write to context_store table
  → embed_tasks: compute nomic-embed-text embeddings for all tasks
  → cluster_tasks: update per-user task clusters

Recommendation request (inline)
  → read context_store for user (stale-ok: use last materialized if < 6h old)
  → call LLM with pre-assembled context
  → bandit scores candidates

DB changes

CREATE TABLE context_store (
  user_id TEXT,
  materialized_at TIMESTAMP,
  context_json TEXT,     -- serialized UserContext
  context_hash TEXT,     -- for change detection
  PRIMARY KEY (user_id)
);

Notes

  • Unblocked by #88 (context assembler) and #37 (Airflow DAG setup)
  • Freshness SLA: 6h; fall back to inline assembly if stale
  • Context hash enables skipping LLM call if context unchanged since last tip
## Goal Shift context assembly from on-demand (inline with the recommendation request) to pre-materialized (computed nightly by Airflow). Reduces p99 recommendation latency and cuts Todoist API calls. ## Architecture ``` Airflow DAG (nightly) → fetch_signals: pull Todoist tasks + recent tip reactions for all active users → compute_context: run context assembler, write to context_store table → embed_tasks: compute nomic-embed-text embeddings for all tasks → cluster_tasks: update per-user task clusters Recommendation request (inline) → read context_store for user (stale-ok: use last materialized if < 6h old) → call LLM with pre-assembled context → bandit scores candidates ``` ## DB changes ```sql CREATE TABLE context_store ( user_id TEXT, materialized_at TIMESTAMP, context_json TEXT, -- serialized UserContext context_hash TEXT, -- for change detection PRIMARY KEY (user_id) ); ``` ## Notes - Unblocked by #88 (context assembler) and #37 (Airflow DAG setup) - Freshness SLA: 6h; fall back to inline assembly if stale - Context hash enables skipping LLM call if context unchanged since last tip
alvis added this to the M4 — MLOps at scale milestone 2026-04-17 08:13:20 +00:00
alvis closed this issue 2026-05-14 10:44:29 +00:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: alvis/oO#94