feat: prompt versioning — track prompt_version + model in tip_scores #91

New Issue

alvis · 2026-04-17T08:11:55Z

alvis commented

2026-04-17 08:11:55 +00:00

Goal

Every served tip knows which prompt template and model produced it. This enables per-version A/B analysis without running a formal experiment.

DB changes

ALTER TABLE tip_scores ADD COLUMN model TEXT;           -- e.g. "qwen2.5:7b"
ALTER TABLE tip_scores ADD COLUMN prompt_version TEXT;  -- e.g. "v1", "v2-cot"
ALTER TABLE tip_scores ADD COLUMN source TEXT;          -- "llm" | "task_direct" | "fallback"

Prompt template store

Store prompts as files: ml/prompts/tip_generator_v1.txt, tip_generator_v2.txt
Each prompt file has a header comment with version, author, date, changelog
Active version set via TIP_GENERATOR_PROMPT_VERSION env var (default v1)
Compute content_hash of template at startup; log it — catches accidental version drift

Invalidation pattern (from taskpile)

If the prompt template changes, bump the version string
The tip_scores query filters by prompt_version to compare like-for-like
Sim framework accepts --prompt-version flag to compare two versions offline before deploying

Admin dashboard

Add prompt_version as a breakdown dimension in /admin/reward-analytics
Show done-rate / snooze-rate / dismiss-rate per version side-by-side

Notes

Depends on #89 (TipCandidate schema adds these fields)
Enables the prompt optimization loop planned for M4 (#95)

## Goal Every served tip knows which prompt template and model produced it. This enables per-version A/B analysis without running a formal experiment. ## DB changes ```sql ALTER TABLE tip_scores ADD COLUMN model TEXT; -- e.g. "qwen2.5:7b" ALTER TABLE tip_scores ADD COLUMN prompt_version TEXT; -- e.g. "v1", "v2-cot" ALTER TABLE tip_scores ADD COLUMN source TEXT; -- "llm" | "task_direct" | "fallback" ``` ## Prompt template store - Store prompts as files: `ml/prompts/tip_generator_v1.txt`, `tip_generator_v2.txt` - Each prompt file has a header comment with version, author, date, changelog - Active version set via `TIP_GENERATOR_PROMPT_VERSION` env var (default `v1`) - Compute `content_hash` of template at startup; log it — catches accidental version drift ## Invalidation pattern (from taskpile) - If the prompt template changes, bump the version string - The `tip_scores` query filters by `prompt_version` to compare like-for-like - Sim framework accepts `--prompt-version` flag to compare two versions offline before deploying ## Admin dashboard - Add `prompt_version` as a breakdown dimension in `/admin/reward-analytics` - Show done-rate / snooze-rate / dismiss-rate per version side-by-side ## Notes - Depends on #89 (TipCandidate schema adds these fields) - Enables the prompt optimization loop planned for M4 (#95)

alvis added this to the M2 — AI tips + multi-source signals milestone 2026-04-17 08:11:55 +00:00

alvis referenced this issue

2026-04-17 08:12:37 +00:00

feat: LLM tip quality monitoring dashboard in admin #92

~~alvis referenced this issue 2026-04-17 08:13:44 +00:00~~

feat: automated prompt optimization loop — sim A/B → promote winner #95

alvis closed this issue

2026-04-17 14:22:49 +00:00

alvis referenced this issue from a commit

2026-04-24 15:10:18 +00:00

feat: M2 AI tips — LiteLLM gateway, context assembler, end-to-end generation pipeline

Sign in to join this conversation.