# ADR-0010: NATS bridge over the in-process bus, and Todoist background sync ## Status Accepted — 2026-04-18 ## Context ADR-0005 set protobuf + JetStream as the long-term event substrate. M1 shipped an in-process `EventEmitter`-based bus with the right subjects (`signals.*`, `feedback.*`) so the swap would be mechanical. Two pressures pulled forward: 1. **ml/serving** and future feature pipelines need to consume signals across process boundaries — the in-proc emitter cannot do that. 2. **Todoist** signals were only fetched on the recommend path. Cold-cache hits added latency and a single 401/429 stalled the request that triggered it. ## Decision ### 1. Bridge, do not replace The `Bus` stays the producer. A new `Bus.onPublish(hook)` hook fires on every `publish`. When `NATS_URL` is set, `connectNats()` registers a hook that JSON-encodes the payload and `js.publish(subject, data)`s it to JetStream. - Streams are created on startup and are idempotent: `signals` (`signals.>`, 7-day file storage, 500k msgs) and `feedback` (`feedback.>`, 30-day, 200k). - JetStream publish errors are caught inside the hook so an unhealthy broker cannot crash the in-process publisher or its subscribers. - When `NATS_URL` is unset, `connectNats` is a no-op — local dev keeps working. This preserves the existing `bus.subscribe()` contract for in-process consumers (reward inference, ring-buffer tail for the admin event viewer) while making events durably consumable across processes. ### 2. Schedule Todoist, keep on-demand as the SLA fallback A 15-minute background scheduler (`TODOIST_SYNC_INTERVAL_MS`) walks every user with `tokenStatus = 'active'` and calls `todoistSource.fetchSignals(uid)`, which in turn emits `signals.task.synced`. The per-request fetch in `recommender` stays — when the cache is colder than 30 s it still goes to Todoist inline, so freshness on the user's first hit of the day is unchanged. Per-user failures are isolated with `Promise.allSettled`; one expired token cannot stop the rest of the cohort. The whole tick is wrapped so a transient SQLite error logs and skips, never crashes the API. ## Consequences - ml/serving (and any future Python consumer) can durably tail `signals.task.synced`, `signals.tip.served`, `signals.tip.feedback` from JetStream without coupling to the API process. - Local dev still runs without NATS; the bridge is opt-in via env. - Wire format is JSON today (envelope per ADR-0005 not enforced yet) — see Open follow-ups. ## Open follow-ups - A ml/serving JetStream consumer for the feature pipeline (today nothing reads from JetStream — the API only writes). - Move the wire payload to the protobuf envelope from ADR-0005 once the schema-registry CI gate (#54) lands. - Graceful shutdown of the scheduler timer on `SIGTERM`. - Per-publish failure metrics exported to the admin health view.