Centralizes user-level features (completion_rate_30d, dismiss_rate_30d, mean_dwell_ms_30d, preferred_hour, tip_volume_30d) in a TS registry that owns both definition and SQL aggregation, since the data lives in the TS-owned SQLite tables (tip_views/tip_feedback). Lazy TTL refresh keeps recommend latency bounded; values persist in user_profile_features (KV). ml/serving accepts profile_features on /score + /generate but does not yet consume them — extending the bandit feature vector changes D and resets every user's learned state, so that's a deliberate phase-B step. Includes ml/features/profile_schema.py as a contract mirror with a sync test that diffs name sets against registry.ts. ADR-0011 records the data-locality reasoning (registry in TS, not Python as the issue originally suggested). Phase B (deferred): event-driven incremental updates, bandit consumption with state migration, admin per-user profile page, staleness alerts. Refs #81. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
54 lines
1.6 KiB
Python
54 lines
1.6 KiB
Python
"""Profile-feature schema mirror (#81 phase A).
|
|
|
|
The TypeScript registry in ``services/api/src/profile/registry.ts`` is the
|
|
*source of truth* — features are computed there because the data lives in the
|
|
TS-owned SQLite DB. This module is a documentation/typing mirror so Python
|
|
code (ml/serving, eval harnesses, notebooks) knows what fields to expect on
|
|
``profile_features`` payloads without round-tripping the API.
|
|
|
|
Update this file whenever you add or rename a feature in the TS registry.
|
|
The accompanying test asserts the two stay in sync at the name level.
|
|
"""
|
|
from __future__ import annotations
|
|
|
|
from dataclasses import dataclass
|
|
from typing import Literal
|
|
|
|
|
|
Dtype = Literal["numeric", "categorical"]
|
|
|
|
|
|
@dataclass(frozen=True)
|
|
class ProfileFeature:
|
|
name: str
|
|
dtype: Dtype
|
|
description: str
|
|
|
|
|
|
PROFILE_FEATURES: tuple[ProfileFeature, ...] = (
|
|
ProfileFeature(
|
|
"completion_rate_30d", "numeric",
|
|
'Fraction of tips served in the last 30 days that received a "done" reaction.',
|
|
),
|
|
ProfileFeature(
|
|
"dismiss_rate_30d", "numeric",
|
|
'Fraction of tips served in the last 30 days that received a "dismiss" reaction.',
|
|
),
|
|
ProfileFeature(
|
|
"mean_dwell_ms_30d", "numeric",
|
|
"Average dwell time (ms between served and reacted) over the last 30 days.",
|
|
),
|
|
ProfileFeature(
|
|
"preferred_hour", "numeric",
|
|
'Hour-of-day with the most "done" reactions in the last 30 days (0-23).',
|
|
),
|
|
ProfileFeature(
|
|
"tip_volume_30d", "numeric",
|
|
"Number of tips served to the user in the last 30 days.",
|
|
),
|
|
)
|
|
|
|
|
|
def feature_names() -> set[str]:
|
|
return {f.name for f in PROFILE_FEATURES}
|