feat: LLM output validation + structured JSON retry for tip generation #90

New Issue

alvis · 2026-04-17T08:11:30Z

alvis commented

2026-04-17 08:11:30 +00:00

Goal

LLMs return malformed JSON, too-short tips, or wrong kinds. Add a validation + retry layer so the generator is robust in production.

Behaviour

Call LLM with structured output schema in system prompt
Parse response as JSON array of TipCandidate
If parse fails or schema invalid: retry once with a clarification prompt ("Your last response was not valid JSON. Return only a JSON array with this schema: ...")
If still invalid after retry: fall back to source=fallback — use task_direct candidates from the task list
Log every validation failure to tip_scores with source=llm_failed, model, error message

Validation rules

Must be JSON array
Each item: content non-empty string (≥20 chars), kind one of the valid enum values
No PII: strip/reject candidates containing email addresses, phone numbers
Dedup: drop candidates with cosine similarity > 0.9 to recently shown tips

Metrics

Track llm_parse_failure_rate per model in /admin/health
Alert if failure rate > 10% over 1-hour window

Notes

This is the reliability layer for #79 (tip generator)
Depends on #89 (TipCandidate schema) for the validation target

## Goal LLMs return malformed JSON, too-short tips, or wrong kinds. Add a validation + retry layer so the generator is robust in production. ## Behaviour 1. Call LLM with structured output schema in system prompt 2. Parse response as JSON array of `TipCandidate` 3. **If parse fails or schema invalid:** retry once with a clarification prompt ("Your last response was not valid JSON. Return only a JSON array with this schema: ...") 4. **If still invalid after retry:** fall back to `source=fallback` — use `task_direct` candidates from the task list 5. Log every validation failure to `tip_scores` with `source=llm_failed`, `model`, error message ## Validation rules - Must be JSON array - Each item: `content` non-empty string (≥20 chars), `kind` one of the valid enum values - No PII: strip/reject candidates containing email addresses, phone numbers - Dedup: drop candidates with cosine similarity > 0.9 to recently shown tips ## Metrics - Track `llm_parse_failure_rate` per model in `/admin/health` - Alert if failure rate > 10% over 1-hour window ## Notes - This is the reliability layer for #79 (tip generator) - Depends on #89 (TipCandidate schema) for the validation target

alvis added this to the M2 — AI tips + multi-source signals milestone 2026-04-17 08:11:30 +00:00

alvis closed this issue

2026-04-17 14:22:49 +00:00

alvis referenced this issue from a commit

2026-04-24 15:10:18 +00:00

feat: M2 AI tips — LiteLLM gateway, context assembler, end-to-end generation pipeline

alvis referenced this issue from a commit

2026-05-12 15:27:21 +00:00

feat(recommender): LLM schema validation + hardcoded fallback tips on AI failure (#90)

alvis referenced this issue from a commit

2026-05-12 15:37:06 +00:00

chore(m2): close out remaining loose ends (#80, #86, #90)