feat: AI gateway — wire ml/serving to LiteLLM with model aliases #87
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Goal
Abstract all LLM calls in
ml/servingbehind a singleLITELLM_URLenv var. Model choice is config, not code.Changes
LITELLM_URLenv var (defaulthttp://localhost:4000, or Agaphttp://llm.alogins.netin prod)ml/serving/llm_client.py— thin async wrapper around the LiteLLM OpenAI-compatible APItip-generator— used by tip generation endpointembedder— used by task clustering / dedupjudge— used by offline sim (llm_judge.py); already calls Anthropic directly, migrate to gatewayml/serving/requirements.txt: addopenai>=1.0(LiteLLM speaks OpenAI API)Why
Swapping qwen2.5 → llama3.2 for tip generation = change one line in
infra/litellm/config.yaml, noml/servingredeploy. A/B testing models in sim = add two model entries, run sim twice with differentLITELLM_MODELenv.Acceptance criteria
GET /healthinml/servingchecks LiteLLM reachabilityllm_judge.pyroutes throughllm_client.py