feat: prompt versioning — track prompt_version + model in tip_scores #91

Closed
opened 2026-04-17 08:11:55 +00:00 by alvis · 0 comments
Owner

Goal

Every served tip knows which prompt template and model produced it. This enables per-version A/B analysis without running a formal experiment.

DB changes

ALTER TABLE tip_scores ADD COLUMN model TEXT;           -- e.g. "qwen2.5:7b"
ALTER TABLE tip_scores ADD COLUMN prompt_version TEXT;  -- e.g. "v1", "v2-cot"
ALTER TABLE tip_scores ADD COLUMN source TEXT;          -- "llm" | "task_direct" | "fallback"

Prompt template store

  • Store prompts as files: ml/prompts/tip_generator_v1.txt, tip_generator_v2.txt
  • Each prompt file has a header comment with version, author, date, changelog
  • Active version set via TIP_GENERATOR_PROMPT_VERSION env var (default v1)
  • Compute content_hash of template at startup; log it — catches accidental version drift

Invalidation pattern (from taskpile)

  • If the prompt template changes, bump the version string
  • The tip_scores query filters by prompt_version to compare like-for-like
  • Sim framework accepts --prompt-version flag to compare two versions offline before deploying

Admin dashboard

  • Add prompt_version as a breakdown dimension in /admin/reward-analytics
  • Show done-rate / snooze-rate / dismiss-rate per version side-by-side

Notes

  • Depends on #89 (TipCandidate schema adds these fields)
  • Enables the prompt optimization loop planned for M4 (#95)
## Goal Every served tip knows which prompt template and model produced it. This enables per-version A/B analysis without running a formal experiment. ## DB changes ```sql ALTER TABLE tip_scores ADD COLUMN model TEXT; -- e.g. "qwen2.5:7b" ALTER TABLE tip_scores ADD COLUMN prompt_version TEXT; -- e.g. "v1", "v2-cot" ALTER TABLE tip_scores ADD COLUMN source TEXT; -- "llm" | "task_direct" | "fallback" ``` ## Prompt template store - Store prompts as files: `ml/prompts/tip_generator_v1.txt`, `tip_generator_v2.txt` - Each prompt file has a header comment with version, author, date, changelog - Active version set via `TIP_GENERATOR_PROMPT_VERSION` env var (default `v1`) - Compute `content_hash` of template at startup; log it — catches accidental version drift ## Invalidation pattern (from taskpile) - If the prompt template changes, bump the version string - The `tip_scores` query filters by `prompt_version` to compare like-for-like - Sim framework accepts `--prompt-version` flag to compare two versions offline before deploying ## Admin dashboard - Add `prompt_version` as a breakdown dimension in `/admin/reward-analytics` - Show done-rate / snooze-rate / dismiss-rate per version side-by-side ## Notes - Depends on #89 (TipCandidate schema adds these fields) - Enables the prompt optimization loop planned for M4 (#95)
alvis added this to the M2 — AI tips + multi-source signals milestone 2026-04-17 08:11:55 +00:00
alvis closed this issue 2026-04-17 14:22:49 +00:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: alvis/oO#91