feat: LLM output validation + structured JSON retry for tip generation #90

Closed
opened 2026-04-17 08:11:30 +00:00 by alvis · 0 comments
Owner

Goal

LLMs return malformed JSON, too-short tips, or wrong kinds. Add a validation + retry layer so the generator is robust in production.

Behaviour

  1. Call LLM with structured output schema in system prompt
  2. Parse response as JSON array of TipCandidate
  3. If parse fails or schema invalid: retry once with a clarification prompt ("Your last response was not valid JSON. Return only a JSON array with this schema: ...")
  4. If still invalid after retry: fall back to source=fallback — use task_direct candidates from the task list
  5. Log every validation failure to tip_scores with source=llm_failed, model, error message

Validation rules

  • Must be JSON array
  • Each item: content non-empty string (≥20 chars), kind one of the valid enum values
  • No PII: strip/reject candidates containing email addresses, phone numbers
  • Dedup: drop candidates with cosine similarity > 0.9 to recently shown tips

Metrics

  • Track llm_parse_failure_rate per model in /admin/health
  • Alert if failure rate > 10% over 1-hour window

Notes

  • This is the reliability layer for #79 (tip generator)
  • Depends on #89 (TipCandidate schema) for the validation target
## Goal LLMs return malformed JSON, too-short tips, or wrong kinds. Add a validation + retry layer so the generator is robust in production. ## Behaviour 1. Call LLM with structured output schema in system prompt 2. Parse response as JSON array of `TipCandidate` 3. **If parse fails or schema invalid:** retry once with a clarification prompt ("Your last response was not valid JSON. Return only a JSON array with this schema: ...") 4. **If still invalid after retry:** fall back to `source=fallback` — use `task_direct` candidates from the task list 5. Log every validation failure to `tip_scores` with `source=llm_failed`, `model`, error message ## Validation rules - Must be JSON array - Each item: `content` non-empty string (≥20 chars), `kind` one of the valid enum values - No PII: strip/reject candidates containing email addresses, phone numbers - Dedup: drop candidates with cosine similarity > 0.9 to recently shown tips ## Metrics - Track `llm_parse_failure_rate` per model in `/admin/health` - Alert if failure rate > 10% over 1-hour window ## Notes - This is the reliability layer for #79 (tip generator) - Depends on #89 (TipCandidate schema) for the validation target
alvis added this to the M2 — AI tips + multi-source signals milestone 2026-04-17 08:11:30 +00:00
alvis closed this issue 2026-04-17 14:22:49 +00:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: alvis/oO#90