feat: AI gateway — wire ml/serving to LiteLLM with model aliases #87

New Issue

alvis · 2026-04-17T08:10:30Z

alvis commented

2026-04-17 08:10:30 +00:00

Goal

Abstract all LLM calls in ml/serving behind a single LITELLM_URL env var. Model choice is config, not code.

Changes

Add LITELLM_URL env var (default http://localhost:4000, or Agap http://llm.alogins.net in prod)
Create ml/serving/llm_client.py — thin async wrapper around the LiteLLM OpenAI-compatible API
Model aliases used by oO code:
- tip-generator — used by tip generation endpoint
- embedder — used by task clustering / dedup
- judge — used by offline sim (llm_judge.py); already calls Anthropic directly, migrate to gateway
Update ml/serving/requirements.txt: add openai>=1.0 (LiteLLM speaks OpenAI API)

Why

Swapping qwen2.5 → llama3.2 for tip generation = change one line in infra/litellm/config.yaml, no ml/serving redeploy. A/B testing models in sim = add two model entries, run sim twice with different LITELLM_MODEL env.

Acceptance criteria

GET /health in ml/serving checks LiteLLM reachability
Existing bandit scoring endpoints unchanged
Sim framework llm_judge.py routes through llm_client.py

## Goal Abstract all LLM calls in `ml/serving` behind a single `LITELLM_URL` env var. Model choice is config, not code. ## Changes - Add `LITELLM_URL` env var (default `http://localhost:4000`, or Agap `http://llm.alogins.net` in prod) - Create `ml/serving/llm_client.py` — thin async wrapper around the LiteLLM OpenAI-compatible API - Model aliases used by oO code: - `tip-generator` — used by tip generation endpoint - `embedder` — used by task clustering / dedup - `judge` — used by offline sim (`llm_judge.py`); already calls Anthropic directly, migrate to gateway - Update `ml/serving/requirements.txt`: add `openai>=1.0` (LiteLLM speaks OpenAI API) ## Why Swapping qwen2.5 → llama3.2 for tip generation = change one line in `infra/litellm/config.yaml`, no `ml/serving` redeploy. A/B testing models in sim = add two model entries, run sim twice with different `LITELLM_MODEL` env. ## Acceptance criteria - `GET /health` in `ml/serving` checks LiteLLM reachability - Existing bandit scoring endpoints unchanged - Sim framework `llm_judge.py` routes through `llm_client.py`

alvis added this to the M2 — AI tips + multi-source signals milestone 2026-04-17 08:10:30 +00:00

alvis closed this issue

2026-04-17 14:22:49 +00:00

alvis referenced this issue from a commit

2026-04-24 15:10:18 +00:00

feat: M2 AI tips — LiteLLM gateway, context assembler, end-to-end generation pipeline

Sign in to join this conversation.

1 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: alvis/oO#87