Files
oO/ml
alvis 1d9a395591 feat(agents): quiet window + peak hours + tz prefs for time-of-day agent (#112)
Adds four InferredParams (all TTL=24h, min_history=50 except preferred_hour=10):
- quiet_start / quiet_end: longest contiguous below-baseline hour run (HH:MM)
- peak_hours: top-quartile done-event hours, sorted ascending
- tz: cold-start only ("UTC"); populated from auth provider, no inference function

compute() updated:
- in_quiet check (quiet window) takes precedence over peak hours
- in_peak emits "peak productivity hour" language when current hour is in peak_hours
- approaching peak (within 2h) surfaces for orchestrator timing
- tz surfaced in snippet header when not UTC
- snapshot adds peak_hours, in_quiet, in_peak, tz

- Agent bumped to v1.2.0
- 21 new tests: night-owl, early-bird, shift-worker, quiet/peak snippet rendering
- Fixed test_snapshot_keys in test_agents.py to include new snapshot fields

Closes #112

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-06 06:05:51 +00:00
..

ml/

Python. Owns models, features, training, online scoring.

Dir Role Phase
serving/ FastAPI online scorer (/score, /generate) + LiteLLM gateway + prompt registry (prompts.py) + JetStream consumers for signals.> / feedback.>, called by recommender 12
features/ context assembler (context.py): signals → PromptContext; profile-feature schema mirror (profile_schema.py); Feast adapter later 2
pipelines/ batch feature + training scripts 4
registry/ MLflow-backed model registry integration 4
experiments/ A/B assignment + multi-armed bandit policies 4
notebooks/ research; never imported by production code

Principles

  • Every model has a model card in registry/ describing inputs, offline metrics, fairness checks, and rollout history.
  • Online inference must be stateless and < 50ms p99.
  • Training reads from the offline feature store; serving reads from the online feature store; definitions are shared (no train/serve skew).
  • Shadow deploys before any policy change that affects real users.

Feature contract

Profile features (batched)

User-level features (completion rate, preferred hour, tip volume…) are computed by the TypeScript recommender and shipped to ml/serving on every /score and /generate call as profile_features: dict | None. The Python mirror in features/profile_schema.py documents each feature's name, dtype, TTL, source, and null fallback — keep it in sync with services/api/src/profile/registry.ts (a CI-style test asserts names and ttlSec values match). See ADR-0011.

Context features (JIT)

Request-time signals assembled by features/context.py (hour_of_day, day_of_week, task list). These are never cached — they are derived from the system clock and the live Todoist feed at the moment of the score call. CONTEXT_FEATURES in context.py declares freshness, source, and fallback for each field (issue #61).

Prompt registry

serving/prompts.py keys tip-generation prompts by stable version string. Adding a new variant means adding an entry — no caller changes. Selection precedence: POST /generate body's prompt_version field → env DEFAULT_PROMPT_VERSION"v1". The TypeScript recommender drives selection via TIP_PROMPT_VERSION (single value or comma-separated rotation); the version actually used flows back in the response and is persisted to tip_scores.prompt_version so the admin reward-analytics dashboard can bucket reactions per variant.