feat: AI gateway — wire ml/serving to LiteLLM with model aliases #87

Closed
opened 2026-04-17 08:10:30 +00:00 by alvis · 0 comments
Owner

Goal

Abstract all LLM calls in ml/serving behind a single LITELLM_URL env var. Model choice is config, not code.

Changes

  • Add LITELLM_URL env var (default http://localhost:4000, or Agap http://llm.alogins.net in prod)
  • Create ml/serving/llm_client.py — thin async wrapper around the LiteLLM OpenAI-compatible API
  • Model aliases used by oO code:
    • tip-generator — used by tip generation endpoint
    • embedder — used by task clustering / dedup
    • judge — used by offline sim (llm_judge.py); already calls Anthropic directly, migrate to gateway
  • Update ml/serving/requirements.txt: add openai>=1.0 (LiteLLM speaks OpenAI API)

Why

Swapping qwen2.5 → llama3.2 for tip generation = change one line in infra/litellm/config.yaml, no ml/serving redeploy. A/B testing models in sim = add two model entries, run sim twice with different LITELLM_MODEL env.

Acceptance criteria

  • GET /health in ml/serving checks LiteLLM reachability
  • Existing bandit scoring endpoints unchanged
  • Sim framework llm_judge.py routes through llm_client.py
## Goal Abstract all LLM calls in `ml/serving` behind a single `LITELLM_URL` env var. Model choice is config, not code. ## Changes - Add `LITELLM_URL` env var (default `http://localhost:4000`, or Agap `http://llm.alogins.net` in prod) - Create `ml/serving/llm_client.py` — thin async wrapper around the LiteLLM OpenAI-compatible API - Model aliases used by oO code: - `tip-generator` — used by tip generation endpoint - `embedder` — used by task clustering / dedup - `judge` — used by offline sim (`llm_judge.py`); already calls Anthropic directly, migrate to gateway - Update `ml/serving/requirements.txt`: add `openai>=1.0` (LiteLLM speaks OpenAI API) ## Why Swapping qwen2.5 → llama3.2 for tip generation = change one line in `infra/litellm/config.yaml`, no `ml/serving` redeploy. A/B testing models in sim = add two model entries, run sim twice with different `LITELLM_MODEL` env. ## Acceptance criteria - `GET /health` in `ml/serving` checks LiteLLM reachability - Existing bandit scoring endpoints unchanged - Sim framework `llm_judge.py` routes through `llm_client.py`
alvis added this to the M2 — AI tips + multi-source signals milestone 2026-04-17 08:10:30 +00:00
alvis closed this issue 2026-04-17 14:22:49 +00:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: alvis/oO#87