chore(scheduler): skip agents whose data sources aren't granted #128

Closed
opened 2026-05-11 11:36:15 +00:00 by alvis · 0 comments
Owner

Problem

services/api/src/signals/agent-scheduler.ts:70-79 iterates every (active-user × agent) pair every 15 minutes and calls computeAndStore unconditionally. There is no consent check — the recommender's eligibility filter (services/api/src/profile/eligibility.ts) discards every snippet at request time for users who haven't granted the corresponding data sources.

For user ODGp4Gkr7JWemMsqcMLMn (MLflow trace tr-591449ea8a72af8e81b6a585234a86ab) this manifests as 5 fresh rows in agent_outputs that no /recommend call will ever forward. Multiply by every user who connected nothing and you have continuous wasted DB writes, wasted ml/serving /agents/{id}/infer calls, and noisy logs.

Fix

In runCycle (services/api/src/signals/agent-scheduler.ts:70), look up the eligibility set once per user and skip agents not in it:

import { getEligibleAgentIds } from '../profile/eligibility.js';

for (const userId of userIds) {
  const eligible = await getEligibleAgentIds(userId);
  for (const agentId of agentIds) {
    if (!eligible.has(agentId)) continue;
    try {
      await computeAndStore(userId, agentId);
      ok++;
    } catch (err: any) {
      failed++;
      logger.error({ err, userId, agentId }, 'agent-scheduler: compute error');
    }
  }
}

getEligibleAgentIds already considers active contexts and the per-agent enabled preference, so a user in a silenced context (e.g. vacation) also stops generating snippets. Good side effect.

Dependency

Blocked by #127 — until the data-source consent refactor lands, this filter would drop everything for almost every user (no one has the per-agent consents granted today). Land #127 first, then this becomes safe.

Optional cleanup

After both land, prune agent_outputs rows that no longer pass eligibility for their owner. Optional — they expire on their own; skip unless table growth is a concern.

Tests

Add a unit test in services/api/src/signals/__tests__/agent-scheduler.test.ts (or extend an existing one): seed a user with data:core only and an agent manifest that requires data:todoist; run one cycle; assert no agent_outputs row is written for the Todoist-requiring agent and one is written for a data:core-only agent.

Verification

After shipping, run one scheduler cycle against a fresh user with no integrations: expect zero agent_outputs rows (only data:core is granted, and no agent requires only data:core today after #127 — health-vitals requires data:google-health, focus/overdue require data:todoist, etc.). Confirm log line shows ok includes only consented-source agents.

## Problem `services/api/src/signals/agent-scheduler.ts:70-79` iterates every (active-user × agent) pair every 15 minutes and calls `computeAndStore` unconditionally. There is no consent check — the recommender's eligibility filter (`services/api/src/profile/eligibility.ts`) discards every snippet at request time for users who haven't granted the corresponding data sources. For user `ODGp4Gkr7JWemMsqcMLMn` (MLflow trace `tr-591449ea8a72af8e81b6a585234a86ab`) this manifests as 5 fresh rows in `agent_outputs` that no `/recommend` call will ever forward. Multiply by every user who connected nothing and you have continuous wasted DB writes, wasted `ml/serving /agents/{id}/infer` calls, and noisy logs. ## Fix In `runCycle` (`services/api/src/signals/agent-scheduler.ts:70`), look up the eligibility set once per user and skip agents not in it: ```ts import { getEligibleAgentIds } from '../profile/eligibility.js'; for (const userId of userIds) { const eligible = await getEligibleAgentIds(userId); for (const agentId of agentIds) { if (!eligible.has(agentId)) continue; try { await computeAndStore(userId, agentId); ok++; } catch (err: any) { failed++; logger.error({ err, userId, agentId }, 'agent-scheduler: compute error'); } } } ``` `getEligibleAgentIds` already considers active contexts and the per-agent `enabled` preference, so a user in a silenced context (e.g. `vacation`) also stops generating snippets. Good side effect. ## Dependency Blocked by #127 — until the data-source consent refactor lands, this filter would drop everything for almost every user (no one has the per-agent consents granted today). Land #127 first, then this becomes safe. ## Optional cleanup After both land, prune `agent_outputs` rows that no longer pass eligibility for their owner. Optional — they expire on their own; skip unless table growth is a concern. ## Tests Add a unit test in `services/api/src/signals/__tests__/agent-scheduler.test.ts` (or extend an existing one): seed a user with `data:core` only and an agent manifest that requires `data:todoist`; run one cycle; assert no `agent_outputs` row is written for the Todoist-requiring agent and one is written for a `data:core`-only agent. ## Verification After shipping, run one scheduler cycle against a fresh user with no integrations: expect zero `agent_outputs` rows (only `data:core` is granted, and no agent requires only `data:core` today after #127 — health-vitals requires `data:google-health`, focus/overdue require `data:todoist`, etc.). Confirm log line shows `ok` includes only consented-source agents.
alvis added the backend label 2026-05-11 11:36:15 +00:00
alvis closed this issue 2026-05-12 15:52:30 +00:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: alvis/oO#128