test: cover NATS bridge + Todoist scheduler; ADR-0010

- bus.test.ts: 4 cases for the new onPublish hook contract
- nats.test.ts: stream creation idempotency + JSON publish bridge
- scheduler.test.ts: startup delay, fan-out, per-user failure isolation
- ADR-0010 documents the bridge-don't-replace decision and the
  Todoist scheduler isolation, plus open follow-ups (#98 ml/serving
  consumer, #54 protobuf migration, graceful shutdown, metrics)
- README/overview/services README reflect the bridged event substrate
- CLAUDE.md gains a "don't nats.publish() directly" rule
- .env.example documents NATS_URL + TODOIST_SYNC_INTERVAL_MS

Verified in deployment 2026-04-18: api -> nats bridge connects on
boot, signals + feedback streams created, scheduler tick logs
"todoist sync: 1 ok, 0 failed (1 users)" within 10s. Closes #21, #22.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
2026-04-18 07:55:25 +00:00
parent 2a7380933c
commit 5b52c6bf40
9 changed files with 414 additions and 6 deletions

View File

@@ -0,0 +1,59 @@
# ADR-0010: NATS bridge over the in-process bus, and Todoist background sync
## Status
Accepted — 2026-04-18
## Context
ADR-0005 set protobuf + JetStream as the long-term event substrate. M1 shipped
an in-process `EventEmitter`-based bus with the right subjects (`signals.*`,
`feedback.*`) so the swap would be mechanical.
Two pressures pulled forward:
1. **ml/serving** and future feature pipelines need to consume signals across
process boundaries — the in-proc emitter cannot do that.
2. **Todoist** signals were only fetched on the recommend path. Cold-cache hits
added latency and a single 401/429 stalled the request that triggered it.
## Decision
### 1. Bridge, do not replace
The `Bus` stays the producer. A new `Bus.onPublish(hook)` hook fires on every
`publish`. When `NATS_URL` is set, `connectNats()` registers a hook that
JSON-encodes the payload and `js.publish(subject, data)`s it to JetStream.
- Streams are created on startup and are idempotent: `signals` (`signals.>`,
7-day file storage, 500k msgs) and `feedback` (`feedback.>`, 30-day, 200k).
- JetStream publish errors are caught inside the hook so an unhealthy broker
cannot crash the in-process publisher or its subscribers.
- When `NATS_URL` is unset, `connectNats` is a no-op — local dev keeps working.
This preserves the existing `bus.subscribe()` contract for in-process consumers
(reward inference, ring-buffer tail for the admin event viewer) while making
events durably consumable across processes.
### 2. Schedule Todoist, keep on-demand as the SLA fallback
A 15-minute background scheduler (`TODOIST_SYNC_INTERVAL_MS`) walks every
user with `tokenStatus = 'active'` and calls `todoistSource.fetchSignals(uid)`,
which in turn emits `signals.task.synced`. The per-request fetch in
`recommender` stays — when the cache is colder than 30 s it still goes to
Todoist inline, so freshness on the user's first hit of the day is unchanged.
Per-user failures are isolated with `Promise.allSettled`; one expired token
cannot stop the rest of the cohort. The whole tick is wrapped so a transient
SQLite error logs and skips, never crashes the API.
## Consequences
- ml/serving (and any future Python consumer) can durably tail
`signals.task.synced`, `signals.tip.served`, `signals.tip.feedback` from
JetStream without coupling to the API process.
- Local dev still runs without NATS; the bridge is opt-in via env.
- Wire format is JSON today (envelope per ADR-0005 not enforced yet) — see
Open follow-ups.
## Open follow-ups
- A ml/serving JetStream consumer for the feature pipeline (today nothing
reads from JetStream — the API only writes).
- Move the wire payload to the protobuf envelope from ADR-0005 once the
schema-registry CI gate (#54) lands.
- Graceful shutdown of the scheduler timer on `SIGTERM`.
- Per-publish failure metrics exported to the admin health view.

View File

@@ -15,7 +15,7 @@
| `auth` | TS | OAuth (Google; Apple in M1), sessions, JWT | identities, sessions | Node monolith |
| `profile` | TS | user profile, preferences, consents | profiles | Node monolith |
| `integrations` | TS | third-party connectors, token vault, signal fetch | credentials, cursors | Node monolith |
| `events` | TS | event-bus abstraction + durable log (M1) | signal store | Node monolith (in-proc emitter) |
| `events` | TS | event-bus abstraction + durable log | signal store | Node monolith (in-proc emitter, bridges to NATS JetStream when `NATS_URL` set) |
| `recommender` | TS | orchestration: candidates → policy → tip; feedback sink | tip history | Node monolith |
| `notifier` | TS | push/email delivery, quiet hours, dedupe | delivery log | Node monolith (web push in M1) |
| `ml/serving` | Python | online scoring for policies/models | — (stateless) | **separate process** |