From b5549700321f3a03b464d25e82095dbd390323fa Mon Sep 17 00:00:00 2001 From: alvis Date: Sun, 26 Apr 2026 03:41:39 +0000 Subject: [PATCH] docs(observability): add services/api README; update ml/serving + recommender docs (#18) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit - services/api/README.md: new — contract, middleware stack, background tasks, config table (LOG_LEVEL, SENTRY_DSN), health story, extraction criteria - ml/serving/README.md: add Observability section (structlog JSON, traceparent → trace_id binding), add SENTRY_DSN + ENV to config table - services/recommender/README.md: fix policy table — egreedy-v2 is active (#99), egreedy-v1 is shadow Co-Authored-By: Claude Sonnet 4.6 --- ml/serving/README.md | 8 +++ services/api/README.md | 89 ++++++++++++++++++++++++++++++++++ services/recommender/README.md | 6 +-- 3 files changed, 100 insertions(+), 3 deletions(-) create mode 100644 services/api/README.md diff --git a/ml/serving/README.md b/ml/serving/README.md index 81dc01b..e12f54c 100644 --- a/ml/serving/README.md +++ b/ml/serving/README.md @@ -47,6 +47,12 @@ On startup, `nats_consumer.py` registers two durable push consumers against NATS **Disabled** when `NATS_URL` is unset (default in local dev without NATS). No import of `nats-py` occurs in that case. +## Observability + +Logs are structured JSON via **structlog**. Every line includes `level`, `logger`, `timestamp`, and — when a W3C `traceparent` header is present on the incoming request — `trace_id` bound via Python `contextvars`, so all log lines within a request carry the same trace ID as the upstream API call. + +Sentry error capture is active when `SENTRY_DSN` is set. + ## Config | Env var | Default | Description | @@ -58,6 +64,8 @@ On startup, `nats_consumer.py` registers two durable push consumers against NATS | `NATS_DURABLE_PREFIX` | `feature-pipeline` | Prefix for durable consumer names | | `NATS_MAX_DELIVER` | `5` | Max redelivery attempts before dropping | | `DEFAULT_PROMPT_VERSION` | `v1` | Fallback prompt version for `/generate` | +| `ENV` | `development` | Environment label (passed to Sentry) | +| `SENTRY_DSN` | `` | Sentry DSN; empty = Sentry disabled | ## Health story diff --git a/services/api/README.md b/services/api/README.md new file mode 100644 index 0000000..d20ff9f --- /dev/null +++ b/services/api/README.md @@ -0,0 +1,89 @@ +# services/api + +Express BFF that serves all client-facing routes, manages sessions, runs background signal sync, and proxies admin calls to `ml/serving`. + +## Contract + +``` +GET /health { ok: true } + +POST /api/auth/login → redirect to Google OAuth +GET /api/auth/callback OAuth return URL +POST /api/auth/logout +GET /api/auth/session → { user? } + +GET /api/integrations list connected integrations +POST /api/integrations/todoist/connect start Todoist OAuth +GET /api/integrations/todoist/callback +DELETE /api/integrations/:provider disconnect + +POST /api/recommend → { tip } +POST /api/tip/:id/feedback { action } → { ok } + +GET /api/user/profile +DELETE /api/user account deletion + +POST /api/push/subscribe +DELETE /api/push/subscribe + +GET /api/admin/stats DAU/WAU, feedback breakdown +GET /api/admin/users +GET /api/admin/events recent event stream (ring buffer) +GET /api/admin/sim/runs offline sim run list +POST /api/admin/sim/run launch offline sim +GET /api/admin/sim/runs/:id/output tail sim stdout +... + +GET /api/ml/* admin-only proxy to ml/serving +``` + +## Middleware stack (request order) + +1. `cors` — origin limited to `WEB_BASE_URL` +2. `tracingMiddleware` — reads or generates W3C `traceparent`; sets `req.traceId` + `req.traceparent` +3. `pinoHttp` — structured JSON request/response logs with `traceId` field; `/health` suppressed +4. `express.json()` / `cookieParser` +5. `sessionMiddleware` — validates `sid` cookie, attaches `req.userId` + +## Observability + +Logs are structured JSON via **pino**. Every line includes `traceId` (extracted from the incoming W3C `traceparent` header, or generated fresh). The same `traceparent` is forwarded on all outbound HTTP calls to `ml/serving` so traces correlate end-to-end. + +Sentry error capture is active when `SENTRY_DSN` is set. + +## Background tasks + +- **Todoist sync scheduler** — runs every `TODOIST_SYNC_INTERVAL_MS` (default 15 min); starts 10 s after boot to avoid startup surge. +- **Retention purge** — deletes `tipScores` and `tipFeedback` rows older than 30 days; runs on boot and daily. +- **Profile TTL invalidation** — listens to `signals.task.synced` and `signals.tip.feedback` on the in-process Bus; invalidates cached user-level profile features so the next `/recommend` gets fresh values. + +## Config + +| Env var | Default | Description | +|---------|---------|-------------| +| `PORT` | `3001` | Listen port | +| `NODE_ENV` | `development` | Environment label | +| `DATABASE_PATH` | `./data/oo.db` | SQLite file | +| `SESSION_SECRET` | required | Cookie signing secret | +| `GOOGLE_CLIENT_ID/SECRET` | required | OAuth | +| `TODOIST_CLIENT_ID/SECRET` | required | OAuth | +| `API_BASE_URL` | `http://localhost:3001` | Self-referential redirect URI | +| `WEB_BASE_URL` | `http://localhost:3000` | CORS + post-login redirect | +| `ML_SERVING_URL` | `http://localhost:8000` | ml/serving base URL | +| `NATS_URL` | `` | NATS broker; empty = in-process bus only | +| `TODOIST_SYNC_INTERVAL_MS` | `900000` | Background sync cadence | +| `TIP_PROMPT_VERSION` | `` | Prompt variant(s) for `/generate` | +| `LOG_LEVEL` | `info` | pino log level | +| `SENTRY_DSN` | `` | Sentry DSN; empty = Sentry disabled | +| `VAPID_*` | | Web push keys | + +## Health story + +`GET /health` returns `{ ok: true }`. No dependency checks — upstream deps (`ml/serving`, NATS) have their own health endpoints checked separately. + +## Extraction criteria + +Extract to its own host when: +- Auth session management needs a dedicated Redis/PG session store, **or** +- Background sync load (Todoist, future connectors) displaces API serving on the shared host, **or** +- Team boundary emerges between auth/BFF and recommender orchestration. diff --git a/services/recommender/README.md b/services/recommender/README.md index e515160..8162296 100644 --- a/services/recommender/README.md +++ b/services/recommender/README.md @@ -31,9 +31,9 @@ Signals carry `features: Record` (bandit-ready) and `m | Policy | Status | Notes | |--------|--------|-------| -| `random` | Shadow | Fallback when ml/serving unreachable | -| `egreedy-v1` | **Active** | d=7, ADR-0007 | -| `egreedy-v2` | Shadow | d=12 + profile features, ADR-0012 | +| `random` | Fallback | Used when ml/serving is unreachable | +| `egreedy-v1` | Shadow | d=7, ADR-0007 | +| `egreedy-v2` | **Active** | d=12 + profile features, ADR-0012 | Shadow → active promotion requires offline sim + online agreement (ADR-0002).