docs(observability): add services/api README; update ml/serving + recommender docs (#18)
- services/api/README.md: new — contract, middleware stack, background tasks, config table (LOG_LEVEL, SENTRY_DSN), health story, extraction criteria - ml/serving/README.md: add Observability section (structlog JSON, traceparent → trace_id binding), add SENTRY_DSN + ENV to config table - services/recommender/README.md: fix policy table — egreedy-v2 is active (#99), egreedy-v1 is shadow Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
@@ -47,6 +47,12 @@ On startup, `nats_consumer.py` registers two durable push consumers against NATS
|
||||
|
||||
**Disabled** when `NATS_URL` is unset (default in local dev without NATS). No import of `nats-py` occurs in that case.
|
||||
|
||||
## Observability
|
||||
|
||||
Logs are structured JSON via **structlog**. Every line includes `level`, `logger`, `timestamp`, and — when a W3C `traceparent` header is present on the incoming request — `trace_id` bound via Python `contextvars`, so all log lines within a request carry the same trace ID as the upstream API call.
|
||||
|
||||
Sentry error capture is active when `SENTRY_DSN` is set.
|
||||
|
||||
## Config
|
||||
|
||||
| Env var | Default | Description |
|
||||
@@ -58,6 +64,8 @@ On startup, `nats_consumer.py` registers two durable push consumers against NATS
|
||||
| `NATS_DURABLE_PREFIX` | `feature-pipeline` | Prefix for durable consumer names |
|
||||
| `NATS_MAX_DELIVER` | `5` | Max redelivery attempts before dropping |
|
||||
| `DEFAULT_PROMPT_VERSION` | `v1` | Fallback prompt version for `/generate` |
|
||||
| `ENV` | `development` | Environment label (passed to Sentry) |
|
||||
| `SENTRY_DSN` | `` | Sentry DSN; empty = Sentry disabled |
|
||||
|
||||
## Health story
|
||||
|
||||
|
||||
89
services/api/README.md
Normal file
89
services/api/README.md
Normal file
@@ -0,0 +1,89 @@
|
||||
# services/api
|
||||
|
||||
Express BFF that serves all client-facing routes, manages sessions, runs background signal sync, and proxies admin calls to `ml/serving`.
|
||||
|
||||
## Contract
|
||||
|
||||
```
|
||||
GET /health { ok: true }
|
||||
|
||||
POST /api/auth/login → redirect to Google OAuth
|
||||
GET /api/auth/callback OAuth return URL
|
||||
POST /api/auth/logout
|
||||
GET /api/auth/session → { user? }
|
||||
|
||||
GET /api/integrations list connected integrations
|
||||
POST /api/integrations/todoist/connect start Todoist OAuth
|
||||
GET /api/integrations/todoist/callback
|
||||
DELETE /api/integrations/:provider disconnect
|
||||
|
||||
POST /api/recommend → { tip }
|
||||
POST /api/tip/:id/feedback { action } → { ok }
|
||||
|
||||
GET /api/user/profile
|
||||
DELETE /api/user account deletion
|
||||
|
||||
POST /api/push/subscribe
|
||||
DELETE /api/push/subscribe
|
||||
|
||||
GET /api/admin/stats DAU/WAU, feedback breakdown
|
||||
GET /api/admin/users
|
||||
GET /api/admin/events recent event stream (ring buffer)
|
||||
GET /api/admin/sim/runs offline sim run list
|
||||
POST /api/admin/sim/run launch offline sim
|
||||
GET /api/admin/sim/runs/:id/output tail sim stdout
|
||||
...
|
||||
|
||||
GET /api/ml/* admin-only proxy to ml/serving
|
||||
```
|
||||
|
||||
## Middleware stack (request order)
|
||||
|
||||
1. `cors` — origin limited to `WEB_BASE_URL`
|
||||
2. `tracingMiddleware` — reads or generates W3C `traceparent`; sets `req.traceId` + `req.traceparent`
|
||||
3. `pinoHttp` — structured JSON request/response logs with `traceId` field; `/health` suppressed
|
||||
4. `express.json()` / `cookieParser`
|
||||
5. `sessionMiddleware` — validates `sid` cookie, attaches `req.userId`
|
||||
|
||||
## Observability
|
||||
|
||||
Logs are structured JSON via **pino**. Every line includes `traceId` (extracted from the incoming W3C `traceparent` header, or generated fresh). The same `traceparent` is forwarded on all outbound HTTP calls to `ml/serving` so traces correlate end-to-end.
|
||||
|
||||
Sentry error capture is active when `SENTRY_DSN` is set.
|
||||
|
||||
## Background tasks
|
||||
|
||||
- **Todoist sync scheduler** — runs every `TODOIST_SYNC_INTERVAL_MS` (default 15 min); starts 10 s after boot to avoid startup surge.
|
||||
- **Retention purge** — deletes `tipScores` and `tipFeedback` rows older than 30 days; runs on boot and daily.
|
||||
- **Profile TTL invalidation** — listens to `signals.task.synced` and `signals.tip.feedback` on the in-process Bus; invalidates cached user-level profile features so the next `/recommend` gets fresh values.
|
||||
|
||||
## Config
|
||||
|
||||
| Env var | Default | Description |
|
||||
|---------|---------|-------------|
|
||||
| `PORT` | `3001` | Listen port |
|
||||
| `NODE_ENV` | `development` | Environment label |
|
||||
| `DATABASE_PATH` | `./data/oo.db` | SQLite file |
|
||||
| `SESSION_SECRET` | required | Cookie signing secret |
|
||||
| `GOOGLE_CLIENT_ID/SECRET` | required | OAuth |
|
||||
| `TODOIST_CLIENT_ID/SECRET` | required | OAuth |
|
||||
| `API_BASE_URL` | `http://localhost:3001` | Self-referential redirect URI |
|
||||
| `WEB_BASE_URL` | `http://localhost:3000` | CORS + post-login redirect |
|
||||
| `ML_SERVING_URL` | `http://localhost:8000` | ml/serving base URL |
|
||||
| `NATS_URL` | `` | NATS broker; empty = in-process bus only |
|
||||
| `TODOIST_SYNC_INTERVAL_MS` | `900000` | Background sync cadence |
|
||||
| `TIP_PROMPT_VERSION` | `` | Prompt variant(s) for `/generate` |
|
||||
| `LOG_LEVEL` | `info` | pino log level |
|
||||
| `SENTRY_DSN` | `` | Sentry DSN; empty = Sentry disabled |
|
||||
| `VAPID_*` | | Web push keys |
|
||||
|
||||
## Health story
|
||||
|
||||
`GET /health` returns `{ ok: true }`. No dependency checks — upstream deps (`ml/serving`, NATS) have their own health endpoints checked separately.
|
||||
|
||||
## Extraction criteria
|
||||
|
||||
Extract to its own host when:
|
||||
- Auth session management needs a dedicated Redis/PG session store, **or**
|
||||
- Background sync load (Todoist, future connectors) displaces API serving on the shared host, **or**
|
||||
- Team boundary emerges between auth/BFF and recommender orchestration.
|
||||
@@ -31,9 +31,9 @@ Signals carry `features: Record<string, number | boolean>` (bandit-ready) and `m
|
||||
|
||||
| Policy | Status | Notes |
|
||||
|--------|--------|-------|
|
||||
| `random` | Shadow | Fallback when ml/serving unreachable |
|
||||
| `egreedy-v1` | **Active** | d=7, ADR-0007 |
|
||||
| `egreedy-v2` | Shadow | d=12 + profile features, ADR-0012 |
|
||||
| `random` | Fallback | Used when ml/serving is unreachable |
|
||||
| `egreedy-v1` | Shadow | d=7, ADR-0007 |
|
||||
| `egreedy-v2` | **Active** | d=12 + profile features, ADR-0012 |
|
||||
|
||||
Shadow → active promotion requires offline sim + online agreement (ADR-0002).
|
||||
|
||||
|
||||
Reference in New Issue
Block a user