feat: LLM tip quality monitoring dashboard in admin #92

Closed
opened 2026-04-17 08:12:37 +00:00 by alvis · 0 comments
Owner

Goal

Make prompt iteration data-driven. The admin can see which model + prompt version produces the best user reactions without running a formal A/B test.

Location

/admin/reward-analytics — extend with a new "LLM quality" section

Metrics to show

  • Done rate / snooze rate / dismiss rate broken down by:
    • source (llm vs task_direct vs fallback)
    • model (qwen2.5:7b, llama3.2:3b, ...)
    • prompt_version (v1, v2, ...)
  • Mean dwell time per model/version
  • LLM parse failure rate (from source=llm_failed rows)
  • Tip kind distribution (task / advice / insight / reminder) over time

Implementation notes

  • All data comes from tip_scores table — requires #89 schema columns and #91 versioning
  • Use Tremor BarList or GroupedBar for the multi-dimension breakdown
  • Date range filter same as existing reward analytics page
  • Link from /admin/experiments MLOps hub page

Why this matters

Without this dashboard, prompt improvements are blind. With it, you can ship a new prompt version to 10% of tips, watch this chart for 48h, and decide in the admin panel.

## Goal Make prompt iteration data-driven. The admin can see which model + prompt version produces the best user reactions without running a formal A/B test. ## Location `/admin/reward-analytics` — extend with a new "LLM quality" section ## Metrics to show - Done rate / snooze rate / dismiss rate broken down by: - `source` (llm vs task_direct vs fallback) - `model` (qwen2.5:7b, llama3.2:3b, ...) - `prompt_version` (v1, v2, ...) - Mean dwell time per model/version - LLM parse failure rate (from `source=llm_failed` rows) - Tip kind distribution (task / advice / insight / reminder) over time ## Implementation notes - All data comes from `tip_scores` table — requires #89 schema columns and #91 versioning - Use Tremor BarList or GroupedBar for the multi-dimension breakdown - Date range filter same as existing reward analytics page - Link from `/admin/experiments` MLOps hub page ## Why this matters Without this dashboard, prompt improvements are blind. With it, you can ship a new prompt version to 10% of tips, watch this chart for 48h, and decide in the admin panel.
alvis added this to the M2 — AI tips + multi-source signals milestone 2026-04-17 08:12:37 +00:00
alvis closed this issue 2026-04-24 15:24:58 +00:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: alvis/oO#92