refactor(infra): drop ai profile; ollama + litellm move to Agap
Ollama and LiteLLM are shared Agap services (agap_git/openai/docker-compose.yml); oO never starts them. Removes the ai profile, the litellm config, and the --profile ai runbook; points ml-serving at https://llm.alogins.net by default and adds host.docker.internal host-gateway so the container can hit Agap ollama on the host. Also updates the tip-generator model alias to qwen2.5:1.5b to match the model actually pulled on Agap ollama (7b is ~4.7 GB and would blow VRAM budget). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
@@ -82,13 +82,13 @@ oO generates tips with an LLM and ranks them with a bandit. All LLM calls route
|
||||
|
||||
| Alias | Model | Used by |
|
||||
|-------|-------|---------|
|
||||
| `tip-generator` | qwen2.5:7b (default) | `ml/serving` tip generation |
|
||||
| `tip-generator` | qwen2.5:1.5b (default) | `ml/serving` tip generation |
|
||||
| `embedder` | nomic-embed-text | task clustering, dedup |
|
||||
| `judge` | claude-haiku-4-5 (cloud, eval only) | offline sim |
|
||||
|
||||
Env vars: `LITELLM_URL` (default `http://localhost:4000`), `OLLAMA_URL` (default `http://localhost:11434`).
|
||||
Env vars: `LITELLM_URL` (prod `https://llm.alogins.net`), `OLLAMA_URL` (Agap host, `http://host.docker.internal:11434` from containers).
|
||||
|
||||
Start with: `docker compose --profile ai up` (adds Ollama + LiteLLM locally). In prod both are shared Agap services.
|
||||
Ollama and LiteLLM are **shared Agap services**, not oO services — they live in `agap_git/openai/docker-compose.yml` along with langfuse (observability). oO never starts them; ml-serving just calls the alias.
|
||||
|
||||
**LLM tip generation pipeline:**
|
||||
1. `ml/features/context.py` assembles user signals → structured prompt context
|
||||
|
||||
Reference in New Issue
Block a user