feat: relay daemon skeleton — queue, dispatch, conversation, ntfy (#1)

First-PR scope from #1. Single-process Python daemon that relays between Claude Code instances and chat-Claude (Anthropic API). Components: * relay.config — .env + config.yaml loader. Auto-generates ntfy topic on first run and persists it back to .env. * relay.state — atomic file I/O via tempfile + rename, advisory flock at state/.lock to enforce single-instance. * relay.conversation — append-only history with summarization. Triggers a summarize call when total chars exceed HISTORY_CHAR_CAP (default 400k); replaces history with the summary plus the most recent 10 turns. * relay.anthropic_client — SDK wrapper. Marks the system prompt cacheable (5-min ephemeral cache); concatenates text blocks; estimates per-call cost from the Anthropic price table with cache-write/read accounted for. * relay.queue — JSON envelope intake; oldest-by-mtime; malformed envelopes moved to queue/.rejected/. * relay.dispatch — one-input-at-a-time per session (dispatch/<session_id>/input.txt). Won't overwrite a pending dispatch; queues internally and waits for CC to delete. * relay.ntfy — best-effort POST to https://ntfy.sh/<topic>; failures logged but never block the main loop. * relay.daemon — main loop. Polls jc_input.txt (priority) then queue/. Detects [NEEDS-JC] in the first 200 chars of any response and pauses dispatch until JC writes jc_input.txt. JC override supports @session-N: prefix for direct dispatch without an API call. * relay.__main__ — CLI: relay run / relay status / relay topic. Tests: 57 unit tests pass (config, state, conversation, queue, dispatch, anthropic_client, ntfy, full daemon loop with a fake client). One real-API smoke test marked real_api, opt-in via pytest -m real_api; skips cleanly on credit-balance errors. Out of scope for this PR (deferred to follow-ups): Flask status endpoint, multi-session config in production, exponential backoff, systemd unit, cost-tracking aggregation. Closes #1. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-02 15:24:47 +00:00
parent 50d957e14e
commit 540b4f5b01
24 changed files with 2205 additions and 8 deletions
--- a/README.md
+++ b/README.md
@@ -7,7 +7,7 @@ Relay daemon between Claude Code instances and a Claude.ai chat-equivalent sessi
 When CC produces output that would normally be pasted to a Claude.ai chat for review or a decision, the daemon does the relay automatically:
 1. CC drops a JSON envelope into `queue/`.
-2. Daemon picks oldest-first, appends to a running conversation history, calls the Anthropic API.
+2. Daemon picks oldest-first, appends to a running conversation history, calls the Anthropic API with prompt caching on the system prompt.
 3. If the response contains `[NEEDS-JC]` in its first 200 characters, the daemon pauses and notifies via [ntfy.sh](https://ntfy.sh).
 4. Otherwise, the response is written to `dispatch/<session_id>/input.txt` for the originating CC session to consume.
@@ -18,7 +18,7 @@ JC can override at any time by writing to `state/jc_input.txt`.
 ```sh
 git clone git@localhost:AC/risv3-relay.git
 cd risv3-relay
-python3 -m venv .venv
+python3.14 -m venv .venv
 .venv/bin/pip install -e '.[dev]'
 cp .env.example .env
 # edit .env — add ANTHROPIC_API_KEY
@@ -30,16 +30,107 @@ cp .env.example .env
 .venv/bin/python -m relay run
 ```
-The daemon prints the ntfy subscription URL on startup. Subscribe to it from your phone/laptop to receive `needs_jc` and error notifications.
+On first boot the daemon generates a random ntfy topic, persists it back to `.env`, and prints the subscription URL:
 ```
 ntfy topic:   https://ntfy.sh/<random-16-chars>
 Subscribe on phone/laptop to receive needs_jc + error alerts.
 ```
 Subscribe at that URL on phone/laptop to get pings on `needs_jc` and errors. The topic is functionally a password — anyone subscribed receives the messages, so don't share it.
 ## CC-side protocol
 A CC session integrates with the daemon by **dropping queue envelopes** and **polling its dispatch input**.
 ### Sending CC output to chat-Claude
 Drop a JSON file into `queue/`:
 ```json
 {
  "session_id": "session-1",
  "timestamp":  "2026-05-02T15:30:00Z",
  "content":    "Sub-PR A is open at #438. Tests pass. Awaiting review."
 }
 ```
 File name doesn't matter (use `<session_id>-<unix_ts>-<rand>.json` for sortability). The daemon picks oldest-first by file mtime and processes one entry per loop tick.
 ### Receiving chat-Claude responses
 Poll your session's dispatch directory:
 ```sh
 while true; do
  if [ -s dispatch/session-1/input.txt ]; then
    cat dispatch/session-1/input.txt
    rm dispatch/session-1/input.txt
  fi
  sleep 1
 done
 ```
 Deletion is the acknowledgement. The daemon will not write a new `input.txt` until the previous one is consumed (deleted).
 ## JC operations
 Status:
 ```sh
 .venv/bin/python -m relay status
 ```
 ntfy URL:
 ```sh
 .venv/bin/python -m relay topic
 ```
 Override at any time — the daemon picks up `state/jc_input.txt` on the next tick (≤ 1 second):
 | Format | Effect |
 |---|---|
 | `@session-1: do X` | Direct dispatch to `session-1`. No API call. Body after the prefix becomes the input. Clears `needs_jc` if set. |
 | `(any text without @prefix)` | Treated as the next chat-side turn. The daemon sends it through the API, dispatches the response to the originating session (or `sessions[0]` if not in queue context). Clears `needs_jc` if set. |
 ## Project layout
- `relay/` — Python package
+- `relay/` — Python package (config, state, conversation, anthropic_client, queue, dispatch, ntfy, daemon, __main__)
- `tests/` — pytest tests
+- `tests/` — pytest tests; `pytest -m real_api` opts into live-API smoke
 - `queue/`, `dispatch/`, `state/`, `logs/` — runtime directories created on first run; gitignored
- `config.yaml` — registered CC sessions and per-session settings
+- `config.yaml` — registered CC sessions, system prompt, summarization prompt; auto-seeded on first run
 - `.env` — secrets and per-host overrides; gitignored
-## Status
+## Trust model
-First-PR scope: daemon skeleton, queue + dispatch loop, single-CC-session integration, basic logging, ntfy notifications, conversation history with summarization. See `docs/` and PR descriptions for follow-up scope (status web UI, multi-session, error recovery, cost tracking, systemd unit).
+- The daemon is **pure transport** between CC and chat-Claude. It does not make merge decisions, override CC's existing rules, or run arbitrary commands.
 - The Anthropic API call is the only outbound integration besides ntfy. The system prompt and summarization prompt live in `config.yaml`; edit them to shape chat-Claude's behavior.
 - JC's `jc_input.txt` is authoritative — anything written there is treated as the next chat-side turn (or a direct dispatch with the `@session-N:` prefix).
 ## Status of the project
 First-PR scope (this repo's `main` after merge): daemon skeleton, queue + dispatch loop, single-CC-session integration, basic logging, ntfy notifications, conversation history with summarization, prompt caching on the system prompt, per-call cost estimation logged.
 Follow-up PRs will add: status web UI (Flask endpoint), multi-session config and per-session prompts, exponential backoff on transient API errors, systemd unit, cost-tracking dashboard.
 ## Development
 Run the unit suite:
 ```sh
 .venv/bin/python -m pytest
 ```
 Run the live-API smoke (cost ~$0.0001 against Haiku 4.5; needs a billed `ANTHROPIC_API_KEY`):
 ```sh
 .venv/bin/python -m pytest -m real_api
 ```
 Lint / format:
 ```sh
 .venv/bin/ruff check relay/ tests/
 .venv/bin/ruff format relay/ tests/
 ```
--- a/config.yaml
+++ b/config.yaml
@@ -0,0 +1,15 @@
 system_prompt: 'You are the chat-side counterpart to Claude Code instances working
  on the risv3 project. CC sends you its progress; you respond as a project lead would:
  check decisions, ask clarifying questions, approve or correct. When a CC turn raises
  a question only JC (the human owner) can answer, begin your response with the literal
  token [NEEDS-JC] so the relay daemon pauses and notifies JC. Otherwise reply normally
  and the relay forwards your reply to the originating CC session.'
 summarization_prompt: Summarize the conversation so far. Preserve project context,
  decisions made, open work items, and any outstanding [NEEDS-JC] questions. The summary
  will replace earlier turns in the conversation history; the most recent turns will
  be retained verbatim. Be specific where specificity matters (file paths, issue numbers,
  decisions); be brief on routine back-and-forth.
 sessions:
 - session_id: session-1
  working_dir: null
  description: Default CC session
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -0,0 +1,45 @@
 [build-system]
 requires = ["setuptools>=68"]
 build-backend = "setuptools.build_meta"
 [project]
 name = "risv3-relay"
 version = "0.1.0"
 description = "Relay daemon between Claude Code instances and a Claude.ai chat-equivalent session via the Anthropic API."
 requires-python = ">=3.10"
 authors = [{name = "AC"}]
 dependencies = [
    "anthropic>=0.42,<1",
    "flask>=3.0,<4",
    "requests>=2.32,<3",
    "pyyaml>=6.0,<7",
    "python-dotenv>=1.0,<2",
 ]
 [project.optional-dependencies]
 dev = [
    "pytest>=8.3",
    "ruff>=0.6",
 ]
 [project.scripts]
 relay = "relay.__main__:main"
 [tool.setuptools.packages.find]
 include = ["relay*"]
 [tool.ruff]
 line-length = 100
 target-version = "py310"
 [tool.ruff.lint]
 select = ["E", "F", "W", "I", "B", "UP", "RUF"]
 ignore = ["E501"]  # line-length already set
 [tool.pytest.ini_options]
 testpaths = ["tests"]
 addopts = "-m 'not real_api'"
 filterwarnings = ["ignore::DeprecationWarning"]
 markers = [
    "real_api: hits the live Anthropic API; gated on ANTHROPIC_API_KEY env var. Run with `pytest -m real_api` to opt in",
 ]
--- a/relay/init.py
+++ b/relay/init.py
--- a/relay/main.py
+++ b/relay/main.py
@@ -0,0 +1,75 @@
 """CLI entry point: ``relay <command>``."""
 from __future__ import annotations
 import argparse
 import json
 import logging
 import signal
 import sys
 from relay.config import load_settings
 from relay.daemon import Daemon
 from relay.logs import configure as configure_logs
 from relay.ntfy import topic_url
 from relay.state import read_json
 def _cmd_run(args: argparse.Namespace) -> int:
    settings = load_settings()
    configure_logs(settings.logs_dir, level=logging.DEBUG if args.verbose else logging.INFO)
    daemon = Daemon(settings)
    def _stop(signum: int, _frame: object) -> None:
        logging.getLogger(__name__).info("received signal %s; shutting down", signum)
        daemon.stop()
    signal.signal(signal.SIGINT, _stop)
    signal.signal(signal.SIGTERM, _stop)
    try:
        daemon.run()
    except RuntimeError as exc:
        # Most commonly the lock file: another daemon is running.
        print(f"error: {exc}", file=sys.stderr)
        return 1
    return 0
 def _cmd_status(_args: argparse.Namespace) -> int:
    settings = load_settings()
    status_path = settings.state_dir / "status.json"
    payload = read_json(status_path, default=None)
    if payload is None:
        print("no status file yet — has the daemon run?", file=sys.stderr)
        return 1
    print(json.dumps(payload, indent=2, sort_keys=False))
    return 0
 def _cmd_topic(_args: argparse.Namespace) -> int:
    settings = load_settings()
    print(topic_url(settings.ntfy_topic))
    return 0
 def main(argv: list[str] | None = None) -> int:
    parser = argparse.ArgumentParser(prog="relay", description="risv3-relay daemon")
    sub = parser.add_subparsers(dest="cmd", required=True)
    run = sub.add_parser("run", help="Run the daemon in the foreground")
    run.add_argument("-v", "--verbose", action="store_true")
    run.set_defaults(func=_cmd_run)
    status = sub.add_parser("status", help="Print the current daemon status")
    status.set_defaults(func=_cmd_status)
    topic = sub.add_parser("topic", help="Print the ntfy subscription URL")
    topic.set_defaults(func=_cmd_topic)
    args = parser.parse_args(argv)
    return args.func(args)
 if __name__ == "__main__":
    raise SystemExit(main())
--- a/relay/anthropic_client.py
+++ b/relay/anthropic_client.py
@@ -0,0 +1,119 @@
 """Anthropic API wrapper.
 Wraps the SDK call so the daemon can send a turn and get back the
 assistant text + token usage + estimated cost. Implements prompt
 caching on the system prompt: subsequent calls within the 5-minute TTL
 get a cache hit on the (large, repeated) system prompt and pay the
 cheaper cache-hit rate. Per-turn user/assistant content is never marked
 cacheable because it changes every call.
 Cost estimation uses a model→price table; the table is the source of
 truth and is easy to update when pricing changes.
 """
 from __future__ import annotations
 from dataclasses import dataclass
 from anthropic import Anthropic
 from anthropic.types import Message
 # Pricing per 1M tokens (USD), pulled from Anthropic's published schedule.
 # Cache-hit input is billed at the cache-read rate (~10% of standard input).
 # Cache-write is ~25% more than standard input. These numbers are
 # approximate and used only for log-line cost estimation; the source of
 # truth for billing is Anthropic's invoice.
 _PRICES_PER_MILLION: dict[str, dict[str, float]] = {
    "claude-opus-4-7": {"input": 15.0, "output": 75.0, "cache_write": 18.75, "cache_read": 1.50},
    "claude-opus-4-7-1m": {"input": 15.0, "output": 75.0, "cache_write": 18.75, "cache_read": 1.50},
    "claude-sonnet-4-6": {"input": 3.0, "output": 15.0, "cache_write": 3.75, "cache_read": 0.30},
    "claude-haiku-4-5-20251001": {
        "input": 0.80,
        "output": 4.0,
        "cache_write": 1.00,
        "cache_read": 0.08,
    },
 }
 def _price_for(model: str) -> dict[str, float]:
    if model in _PRICES_PER_MILLION:
        return _PRICES_PER_MILLION[model]
    # Fallback: charge as Opus 4.7 (worst-case) so estimates don't
    # under-report for unknown models. Logged once at startup.
    return _PRICES_PER_MILLION["claude-opus-4-7"]
@dataclass
 class TurnResult:
    text: str
    input_tokens: int
    output_tokens: int
    cache_creation_input_tokens: int
    cache_read_input_tokens: int
    estimated_cost_usd: float
    raw: Message
@dataclass
 class AnthropicClient:
    api_key: str
    model: str
    max_output_tokens: int = 4096
    def __post_init__(self) -> None:
        self._sdk = Anthropic(api_key=self.api_key)
    def send(
        self,
        *,
        system_prompt: str,
        messages: list[dict[str, str]],
    ) -> TurnResult:
        """Send a single API turn. The system prompt is marked cacheable.
        ``messages`` is the full conversation history shaped for the
        Messages API. The most recent message is the user turn we're
        responding to. The daemon is responsible for appending the
        assistant text it gets back into history before the next call.
        """
        response = self._sdk.messages.create(
            model=self.model,
            max_tokens=self.max_output_tokens,
            system=[
                {
                    "type": "text",
                    "text": system_prompt,
                    "cache_control": {"type": "ephemeral"},
                }
            ],
            messages=messages,
        )
        text = "".join(
            block.text for block in response.content if getattr(block, "type", None) == "text"
        )
        usage = response.usage
        in_tokens = int(getattr(usage, "input_tokens", 0) or 0)
        out_tokens = int(getattr(usage, "output_tokens", 0) or 0)
        cache_create = int(getattr(usage, "cache_creation_input_tokens", 0) or 0)
        cache_read = int(getattr(usage, "cache_read_input_tokens", 0) or 0)
        prices = _price_for(self.model)
        cost = (
            in_tokens * prices["input"]
            + out_tokens * prices["output"]
            + cache_create * prices["cache_write"]
            + cache_read * prices["cache_read"]
        ) / 1_000_000.0
        return TurnResult(
            text=text,
            input_tokens=in_tokens,
            output_tokens=out_tokens,
            cache_creation_input_tokens=cache_create,
            cache_read_input_tokens=cache_read,
            estimated_cost_usd=cost,
            raw=response,
        )
--- a/relay/config.py
+++ b/relay/config.py
@@ -0,0 +1,163 @@
 """Settings + config.yaml loader.
 The relay reads two config sources:
 1. ``.env`` for secrets (API key) and per-host overrides (status port,
   history cap). The ntfy topic is auto-generated on first run and
   written back to ``.env`` so it persists across restarts.
 2. ``config.yaml`` for project-level settings: the registered CC
   sessions, the system prompt, the summarization prompt. First-PR
   default config is generated if the file doesn't exist, with one
   ``session-1`` registered.
 Settings are read once at startup; the daemon does not hot-reload them.
 """
 from __future__ import annotations
 import os
 import secrets
 from dataclasses import dataclass, field
 from pathlib import Path
 import yaml
 from dotenv import dotenv_values, set_key
 REPO_ROOT = Path(__file__).resolve().parent.parent
 ENV_FILE = REPO_ROOT / ".env"
 CONFIG_FILE = REPO_ROOT / "config.yaml"
 DEFAULT_SYSTEM_PROMPT = (
    "You are the chat-side counterpart to Claude Code instances working on the "
    "risv3 project. CC sends you its progress; you respond as a project lead "
    "would: check decisions, ask clarifying questions, approve or correct. "
    "When a CC turn raises a question only JC (the human owner) can answer, "
    "begin your response with the literal token [NEEDS-JC] so the relay daemon "
    "pauses and notifies JC. Otherwise reply normally and the relay forwards "
    "your reply to the originating CC session."
 )
 DEFAULT_SUMMARIZATION_PROMPT = (
    "Summarize the conversation so far. Preserve project context, decisions "
    "made, open work items, and any outstanding [NEEDS-JC] questions. The "
    "summary will replace earlier turns in the conversation history; the most "
    "recent turns will be retained verbatim. Be specific where specificity "
    "matters (file paths, issue numbers, decisions); be brief on routine "
    "back-and-forth."
 )
@dataclass(frozen=True)
 class SessionConfig:
    """One registered CC session."""
    session_id: str
    working_dir: str | None = None
    description: str | None = None
@dataclass(frozen=True)
 class Settings:
    api_key: str
    model: str
    ntfy_topic: str
    status_port: int
    history_char_cap: int
    repo_root: Path
    queue_dir: Path
    dispatch_dir: Path
    state_dir: Path
    logs_dir: Path
    system_prompt: str
    summarization_prompt: str
    sessions: tuple[SessionConfig, ...] = field(default_factory=tuple)
 def _ensure_runtime_dirs(repo_root: Path) -> None:
    for sub in ("queue", "dispatch", "state", "logs"):
        (repo_root / sub).mkdir(exist_ok=True)
 def _generate_ntfy_topic() -> str:
    """Cryptographically random 16-character topic. Functionally a password."""
    return secrets.token_urlsafe(12)
 def _load_or_init_ntfy_topic(env_values: dict[str, str | None]) -> str:
    raw = (env_values.get("NTFY_TOPIC") or "").strip()
    # Defensive: dotenv treats everything after = as the value, so an
    # inline `#` comment ends up as the topic. Treat any value that
    # starts with `#` (or contains whitespace, which a real topic never
    # does) as empty.
    if raw and not raw.startswith("#") and " " not in raw:
        return raw
    topic = _generate_ntfy_topic()
    set_key(str(ENV_FILE), "NTFY_TOPIC", topic, quote_mode="never")
    return topic
 def _load_yaml_config() -> dict:
    if not CONFIG_FILE.exists():
        default = {
            "system_prompt": DEFAULT_SYSTEM_PROMPT,
            "summarization_prompt": DEFAULT_SUMMARIZATION_PROMPT,
            "sessions": [
                {
                    "session_id": "session-1",
                    "working_dir": None,
                    "description": "Default CC session",
                }
            ],
        }
        CONFIG_FILE.write_text(yaml.safe_dump(default, sort_keys=False))
        return default
    with CONFIG_FILE.open() as f:
        return yaml.safe_load(f) or {}
 def load_settings() -> Settings:
    """Read .env + config.yaml. Mutates .env on first run to record the ntfy topic."""
    env_values = dotenv_values(ENV_FILE)
    api_key = (
        env_values.get("ANTHROPIC_API_KEY") or os.environ.get("ANTHROPIC_API_KEY") or ""
    ).strip()
    if not api_key:
        raise RuntimeError(f"ANTHROPIC_API_KEY missing from {ENV_FILE}")
    model = (env_values.get("ANTHROPIC_MODEL") or "claude-opus-4-7").strip()
    status_port = int((env_values.get("STATUS_PORT") or "8765").strip())
    history_char_cap = int((env_values.get("HISTORY_CHAR_CAP") or "400000").strip())
    ntfy_topic = _load_or_init_ntfy_topic(env_values)
    yaml_cfg = _load_yaml_config()
    system_prompt = str(yaml_cfg.get("system_prompt") or DEFAULT_SYSTEM_PROMPT)
    summarization_prompt = str(yaml_cfg.get("summarization_prompt") or DEFAULT_SUMMARIZATION_PROMPT)
    sessions_cfg = yaml_cfg.get("sessions") or []
    sessions = tuple(
        SessionConfig(
            session_id=str(s["session_id"]),
            working_dir=s.get("working_dir"),
            description=s.get("description"),
        )
        for s in sessions_cfg
    )
    _ensure_runtime_dirs(REPO_ROOT)
    return Settings(
        api_key=api_key,
        model=model,
        ntfy_topic=ntfy_topic,
        status_port=status_port,
        history_char_cap=history_char_cap,
        repo_root=REPO_ROOT,
        queue_dir=REPO_ROOT / "queue",
        dispatch_dir=REPO_ROOT / "dispatch",
        state_dir=REPO_ROOT / "state",
        logs_dir=REPO_ROOT / "logs",
        system_prompt=system_prompt,
        summarization_prompt=summarization_prompt,
        sessions=sessions,
    )
--- a/relay/conversation.py
+++ b/relay/conversation.py
@@ -0,0 +1,99 @@
 """Conversation history with summarization.
 History shape: list of turns ``{role, content, ts, session_id?}`` where
 ``role`` is ``"user"`` or ``"assistant"``. The user role represents
 either CC output or JC override; the assistant role represents the
 chat-Claude response.
 Summarization fires when the total content character count exceeds
 ``HISTORY_CHAR_CAP``. The summarization prompt is sent as a normal user
 turn, the API's response replaces all earlier turns, and the most
 recent ``RECENT_TURNS_KEPT`` turns are appended verbatim. The summary
 turn is marked with ``meta="summary"`` so the daemon can recognize it
 when paginating.
 """
 from __future__ import annotations
 from dataclasses import asdict, dataclass, field
 from datetime import datetime
 from pathlib import Path
 from typing import Any
 from relay.state import read_json, write_json_atomic
 RECENT_TURNS_KEPT = 10
@dataclass
 class Turn:
    role: str  # 'user' | 'assistant'
    content: str
    ts: str = field(default_factory=lambda: datetime.utcnow().isoformat(timespec="seconds") + "Z")
    session_id: str | None = None
    meta: str | None = None  # 'summary' for replaced summary turns
    def to_api_message(self) -> dict[str, str]:
        """Anthropic Messages API shape — only role + content needed."""
        return {"role": self.role, "content": self.content}
 class Conversation:
    """In-memory + on-disk conversation history."""
    def __init__(self, history_path: Path):
        self._path = history_path
        raw = read_json(self._path, default=[])
        if not isinstance(raw, list):
            raise ValueError(f"Expected list at {self._path}, got {type(raw).__name__}")
        self._turns: list[Turn] = [Turn(**dict(t)) for t in raw]
    @property
    def turns(self) -> list[Turn]:
        return list(self._turns)
    def append(
        self, role: str, content: str, *, session_id: str | None = None, meta: str | None = None
    ) -> Turn:
        turn = Turn(role=role, content=content, session_id=session_id, meta=meta)
        self._turns.append(turn)
        self._persist()
        return turn
    def replace_with_summary(self, summary_text: str) -> None:
        """Replace all but the last RECENT_TURNS_KEPT turns with one summary turn."""
        recent = self._turns[-RECENT_TURNS_KEPT:] if len(self._turns) > RECENT_TURNS_KEPT else []
        summary_turn = Turn(role="assistant", content=summary_text, meta="summary")
        self._turns = [summary_turn, *recent]
        self._persist()
    def total_chars(self) -> int:
        return sum(len(t.content) for t in self._turns)
    def needs_summarization(self, cap: int) -> bool:
        return self.total_chars() > cap
    def to_api_messages(self) -> list[dict[str, str]]:
        return [t.to_api_message() for t in self._turns]
    def _persist(self) -> None:
        write_json_atomic(self._path, [asdict(t) for t in self._turns])
    # Test/debug helper
    def reset(self) -> None:
        self._turns = []
        self._persist()
 def render_for_log(turn: Turn, max_chars: int = 200) -> dict[str, Any]:
    """Compact representation for log lines — full content elided."""
    content = turn.content
    if len(content) > max_chars:
        content = content[:max_chars] + f"... <{len(turn.content) - max_chars} more chars>"
    return {
        "role": turn.role,
        "ts": turn.ts,
        "session_id": turn.session_id,
        "meta": turn.meta,
        "content": content,
    }
--- a/relay/daemon.py
+++ b/relay/daemon.py
@@ -0,0 +1,330 @@
 """Main relay loop.
 One process, one thread, polling-based. The loop:
 1. Drain ``state/jc_input.txt`` if present (highest priority).
 2. Drain the ``queue/`` directory oldest-first.
 3. Heartbeat: check for stuck-queue alerts.
 4. Sleep briefly, repeat.
 Each turn (queue entry or jc_input) goes through ``handle_turn`` which:
 1. Appends the user-side content to history.
 2. Summarizes if history exceeds the cap.
 3. Sends to the Anthropic API.
 4. Appends the assistant response to history.
 5. Routes the response: if it begins (within the first 200 chars) with
   ``[NEEDS-JC]``, set status to ``needs_jc`` and ntfy JC; otherwise
   dispatch to the originating session.
 The status flag is in-memory only (single process); it controls whether
 new queue entries are processed while the daemon is paused waiting for
 JC input. ``state/status.json`` mirrors it on disk for the future
 status endpoint.
 """
 from __future__ import annotations
 import logging
 import re
 import time
 from dataclasses import dataclass, field
 from datetime import datetime, timezone
 from relay.anthropic_client import AnthropicClient, TurnResult
 from relay.config import Settings
 from relay.conversation import Conversation
 from relay.dispatch import DispatchManager
 from relay.ntfy import notify, topic_url
 from relay.queue import QueueEntry, ack, stuck_age_seconds, take_oldest
 from relay.state import InstanceLock, write_json_atomic
 logger = logging.getLogger(__name__)
 NEEDS_JC_TOKEN = "[NEEDS-JC]"
 NEEDS_JC_SCAN_CHARS = 200
 JC_INPUT_FILE = "jc_input.txt"
 STATUS_FILE = "status.json"
 STUCK_QUEUE_THRESHOLD_SEC = 600  # 10 min per spec
 STUCK_QUEUE_REPEAT_SEC = 600  # don't re-notify more often than this
 LOOP_SLEEP_SEC = 1.0
 DISPATCH_PREFIX = re.compile(r"^@(?P<session>[A-Za-z0-9_-]+):\s*", re.MULTILINE)
@dataclass
 class DaemonStatus:
    started_at: str = field(
        default_factory=lambda: datetime.now(timezone.utc).isoformat(timespec="seconds")
    )
    state: str = "running"  # running | needs_jc | error
    last_needs_jc_at: str | None = None
    last_needs_jc_text: str | None = None
    last_dispatch_at: str | None = None
    last_dispatch_session: str | None = None
    queue_depth: int = 0
    history_chars: int = 0
    history_turns: int = 0
    total_input_tokens: int = 0
    total_output_tokens: int = 0
    total_cost_usd: float = 0.0
    last_stuck_alert_ts: float = 0.0
    def as_dict(self) -> dict:
        d = self.__dict__.copy()
        d.pop("last_stuck_alert_ts", None)
        return d
 class Daemon:
    def __init__(self, settings: Settings):
        self.settings = settings
        self.lock = InstanceLock(settings.state_dir / ".lock")
        self.conversation = Conversation(settings.state_dir / "conversation.json")
        self.dispatch = DispatchManager(settings.dispatch_dir)
        self.client = AnthropicClient(api_key=settings.api_key, model=settings.model)
        self.status = DaemonStatus()
        self._stop = False
    # ---- public API used by __main__ ----
    def run(self) -> None:
        self.lock.acquire()
        try:
            self._announce_startup()
            while not self._stop:
                try:
                    self._tick()
                except Exception:
                    logger.exception("uncaught error in daemon loop; continuing")
                    self._notify_error(
                        "Daemon loop error",
                        "An uncaught exception was logged. Check logs/relay.log.",
                    )
                self._persist_status()
                time.sleep(LOOP_SLEEP_SEC)
        finally:
            self.lock.release()
    def stop(self) -> None:
        self._stop = True
    # ---- internals ----
    def _announce_startup(self) -> None:
        url = topic_url(self.settings.ntfy_topic)
        logger.info("=" * 72)
        logger.info("relay daemon starting")
        logger.info("model:        %s", self.settings.model)
        logger.info("status state: %s", self.settings.state_dir)
        logger.info("ntfy topic:   %s", url)
        logger.info("Subscribe on phone/laptop to receive needs_jc + error alerts.")
        logger.info("registered sessions: %s", [s.session_id for s in self.settings.sessions])
        logger.info("history cap:  %d chars", self.settings.history_char_cap)
        logger.info("=" * 72)
        notify(
            self.settings.ntfy_topic,
            title="relay daemon online",
            message=f"model={self.settings.model}, sessions={len(self.settings.sessions)}",
            tags=["robot"],
        )
    def _tick(self) -> None:
        # 1) Try to flush any queued dispatches that were waiting on CC consumption.
        self.dispatch.flush_all()
        # 2) JC override always takes priority.
        if self._handle_jc_input():
            return
        # 3) If paused for needs_jc, do nothing further on the queue.
        if self.status.state == "needs_jc":
            return
        # 4) Drain queue (one entry per tick — keeps logs and dispatch ordering predictable).
        entry = take_oldest(self.settings.queue_dir)
        if entry is not None:
            self._handle_queue_entry(entry)
        # 5) Heartbeat: stuck-queue check.
        self._check_stuck_queue()
    def _handle_jc_input(self) -> bool:
        path = self.settings.state_dir / JC_INPUT_FILE
        if not path.exists():
            return False
        try:
            content = path.read_text(encoding="utf-8")
        except OSError as exc:
            logger.error("could not read %s: %s", path, exc)
            return False
        path.unlink()  # consume immediately so a slow API call doesn't double-process
        if not content.strip():
            logger.info("jc_input.txt was empty; ignoring")
            return False
        # Prefix routing: "@session-id: ..." dispatches directly without an API call.
        match = DISPATCH_PREFIX.match(content)
        if match:
            session_id = match.group("session")
            payload = content[match.end() :]
            logger.info("JC override: direct dispatch to %s (%d chars)", session_id, len(payload))
            self.dispatch.queue_or_write(session_id, payload)
            self.status.last_dispatch_at = datetime.now(timezone.utc).isoformat(timespec="seconds")
            self.status.last_dispatch_session = session_id
            # JC override clears any needs_jc pause.
            self._clear_needs_jc()
            return True
        # No prefix → treat as next chat-side turn (JC speaking from chat).
        logger.info("JC override: chat-side turn (%d chars)", len(content))
        self._clear_needs_jc()
        self._send_chat_turn(user_role_content=content, originating_session=None, source="jc")
        return True
    def _handle_queue_entry(self, entry: QueueEntry) -> None:
        logger.info("queue entry from %s, %d chars", entry.session_id, len(entry.content))
        try:
            self._send_chat_turn(
                user_role_content=entry.content,
                originating_session=entry.session_id,
                source="queue",
            )
        except Exception:
            logger.exception(
                "error processing queue entry %s; leaving in queue for retry", entry.path
            )
            return
        ack(entry)
    def _send_chat_turn(
        self, *, user_role_content: str, originating_session: str | None, source: str
    ) -> None:
        # Append the user-side turn to history before the API call so a crash
        # mid-call doesn't lose the prompt.
        self.conversation.append("user", user_role_content, session_id=originating_session)
        # Summarize if we've outgrown the cap.
        if self.conversation.needs_summarization(self.settings.history_char_cap):
            self._summarize()
        # API call.
        result = self.client.send(
            system_prompt=self.settings.system_prompt,
            messages=self.conversation.to_api_messages(),
        )
        self._record_usage(result)
        logger.info(
            "[%s] api turn ok: in=%d out=%d cache_w=%d cache_r=%d cost=$%.4f",
            source,
            result.input_tokens,
            result.output_tokens,
            result.cache_creation_input_tokens,
            result.cache_read_input_tokens,
            result.estimated_cost_usd,
        )
        # Append assistant response.
        self.conversation.append("assistant", result.text)
        # Route: NEEDS-JC pause vs dispatch.
        if self._contains_needs_jc(result.text):
            self._enter_needs_jc(result.text)
            return
        target_session = originating_session or self._fallback_session_id()
        if not target_session:
            logger.warning(
                "no originating session and no fallback session in config; chat reply dropped"
            )
            return
        self.dispatch.queue_or_write(target_session, result.text)
        self.status.last_dispatch_at = datetime.now(timezone.utc).isoformat(timespec="seconds")
        self.status.last_dispatch_session = target_session
    def _summarize(self) -> None:
        before = self.conversation.total_chars()
        # Send the summarization as a fresh user turn appended to current history.
        # The API responds with the summary; we then collapse history into
        # [summary, last 10 turns].
        self.conversation.append(
            "user", self.settings.summarization_prompt, meta="summarize_request"
        )
        result = self.client.send(
            system_prompt=self.settings.system_prompt,
            messages=self.conversation.to_api_messages(),
        )
        self._record_usage(result)
        # Replace history with summary + most-recent. This drops the
        # summarize_request turn we just appended (it's only there to
        # produce the summary; not useful in the rolling history).
        self.conversation.replace_with_summary(result.text)
        after = self.conversation.total_chars()
        logger.info(
            "summarization: %d chars -> %d chars (cost $%.4f)",
            before,
            after,
            result.estimated_cost_usd,
        )
    def _contains_needs_jc(self, text: str) -> bool:
        return NEEDS_JC_TOKEN in text[:NEEDS_JC_SCAN_CHARS]
    def _enter_needs_jc(self, response_text: str) -> None:
        self.status.state = "needs_jc"
        self.status.last_needs_jc_at = datetime.now(timezone.utc).isoformat(timespec="seconds")
        self.status.last_needs_jc_text = response_text[:1000]
        logger.warning("[NEEDS-JC] flagged; daemon paused awaiting state/jc_input.txt")
        notify(
            self.settings.ntfy_topic,
            title="[NEEDS-JC] relay paused",
            message=response_text[:400],
            priority="high",
            tags=["warning"],
        )
    def _clear_needs_jc(self) -> None:
        if self.status.state != "running":
            logger.info("clearing needs_jc state (was %s)", self.status.state)
        self.status.state = "running"
    def _fallback_session_id(self) -> str | None:
        if self.settings.sessions:
            return self.settings.sessions[0].session_id
        return None
    def _record_usage(self, result: TurnResult) -> None:
        self.status.total_input_tokens += result.input_tokens
        self.status.total_output_tokens += result.output_tokens
        self.status.total_cost_usd += result.estimated_cost_usd
    def _persist_status(self) -> None:
        self.status.queue_depth = (
            len(list((self.settings.queue_dir).iterdir()))
            if self.settings.queue_dir.exists()
            else 0
        )
        self.status.history_chars = self.conversation.total_chars()
        self.status.history_turns = len(self.conversation.turns)
        write_json_atomic(self.settings.state_dir / STATUS_FILE, self.status.as_dict())
    def _check_stuck_queue(self) -> None:
        age = stuck_age_seconds(self.settings.queue_dir)
        if age <= STUCK_QUEUE_THRESHOLD_SEC:
            return
        now = time.time()
        if now - self.status.last_stuck_alert_ts < STUCK_QUEUE_REPEAT_SEC:
            return
        self.status.last_stuck_alert_ts = now
        logger.warning("queue stuck: oldest entry is %.0fs old", age)
        notify(
            self.settings.ntfy_topic,
            title="relay queue stuck",
            message=f"oldest entry is {int(age)}s old; daemon may be paused or the API failing.",
            priority="high",
            tags=["warning"],
        )
    def _notify_error(self, title: str, message: str) -> None:
        notify(self.settings.ntfy_topic, title=title, message=message, priority="high", tags=["x"])
--- a/relay/dispatch.py
+++ b/relay/dispatch.py
@@ -0,0 +1,71 @@
 """Dispatch writer.
 Each registered CC session has a directory ``dispatch/<session_id>/``.
 The daemon delivers a chat-side response by writing it to
 ``dispatch/<session_id>/input.txt``. CC's polling loop reads, acts on
 the content, and **deletes** the file as the acknowledgement.
 The "only write when prior is consumed" rule means the daemon must not
 overwrite a pending dispatch — otherwise CC could miss a turn. If the
 daemon has new content for a session that hasn't yet consumed the
 previous one, it queues internally and waits.
 """
 from __future__ import annotations
 import logging
 from collections import defaultdict, deque
 from dataclasses import dataclass, field
 from pathlib import Path
 from relay.state import write_atomic
 logger = logging.getLogger(__name__)
@dataclass
 class DispatchManager:
    dispatch_dir: Path
    _pending: dict[str, deque[str]] = field(default_factory=lambda: defaultdict(deque))
    def session_dir(self, session_id: str) -> Path:
        return self.dispatch_dir / session_id
    def input_path(self, session_id: str) -> Path:
        return self.session_dir(session_id) / "input.txt"
    def has_pending(self, session_id: str) -> bool:
        return bool(self._pending[session_id])
    def session_input_present(self, session_id: str) -> bool:
        """True iff the session's input.txt exists (CC hasn't consumed yet)."""
        return self.input_path(session_id).exists()
    def queue_or_write(self, session_id: str, content: str) -> bool:
        """Try to deliver content. If session's input.txt is still present,
        queue internally and return False; otherwise write and return True.
        """
        self._pending[session_id].append(content)
        return self._flush_session(session_id)
    def flush_all(self) -> int:
        """Try to flush queued dispatches for every session. Returns count delivered."""
        delivered = 0
        for session_id in list(self._pending.keys()):
            while self._pending[session_id]:
                if not self._flush_session(session_id):
                    break
                delivered += 1
        return delivered
    def _flush_session(self, session_id: str) -> bool:
        if not self._pending[session_id]:
            return False
        if self.session_input_present(session_id):
            return False
        next_content = self._pending[session_id].popleft()
        path = self.input_path(session_id)
        path.parent.mkdir(parents=True, exist_ok=True)
        write_atomic(path, next_content)
        logger.info("dispatched %d chars to %s -> %s", len(next_content), session_id, path)
        return True
--- a/relay/logs.py
+++ b/relay/logs.py
@@ -0,0 +1,46 @@
 """Logging setup.
 Root logger writes to stdout (so tmux/systemd captures it) and to
 ``logs/relay-YYYY-MM-DD.log`` with daily rotation. Format includes
 timestamp, level, logger name, and message.
 """
 from __future__ import annotations
 import logging
 import sys
 from logging.handlers import TimedRotatingFileHandler
 from pathlib import Path
 def configure(logs_dir: Path, level: int = logging.INFO) -> None:
    logs_dir.mkdir(parents=True, exist_ok=True)
    fmt = logging.Formatter(
        fmt="%(asctime)s %(levelname)-7s %(name)-22s %(message)s",
        datefmt="%Y-%m-%dT%H:%M:%S",
    )
    root = logging.getLogger()
    root.setLevel(level)
    # Remove pre-existing handlers (idempotent across reload during tests)
    for handler in list(root.handlers):
        root.removeHandler(handler)
    stdout = logging.StreamHandler(stream=sys.stdout)
    stdout.setFormatter(fmt)
    root.addHandler(stdout)
    file_handler = TimedRotatingFileHandler(
        filename=str(logs_dir / "relay.log"),
        when="midnight",
        backupCount=14,
        encoding="utf-8",
        utc=True,
    )
    file_handler.setFormatter(fmt)
    root.addHandler(file_handler)
    # Quiet libraries that are too chatty at INFO
    for noisy in ("urllib3", "httpx", "httpcore", "anthropic"):
        logging.getLogger(noisy).setLevel(logging.WARNING)
--- a/relay/ntfy.py
+++ b/relay/ntfy.py
@@ -0,0 +1,59 @@
 """ntfy.sh notifications.
 Topic is loaded from settings (auto-generated on first run). The topic
 is functionally a password — anyone subscribed to the topic URL
 receives notifications. We use cryptographically-random topics
 (secrets.token_urlsafe(12)) to make brute-force discovery impractical.
 Notifications are best-effort: a failure to deliver (network down,
 ntfy.sh outage) is logged but does NOT block the daemon's main loop.
 """
 from __future__ import annotations
 import logging
 import requests
 logger = logging.getLogger(__name__)
 NTFY_BASE = "https://ntfy.sh"
 def topic_url(topic: str) -> str:
    return f"{NTFY_BASE}/{topic}"
 def notify(
    topic: str,
    title: str,
    message: str,
    *,
    priority: str = "default",
    tags: list[str] | None = None,
 ) -> bool:
    """Post one notification. Returns True on HTTP 200, False otherwise.
    Never raises on transport errors — this path runs from the daemon's
    main loop and a failed notification should not stop work.
    """
    if not topic:
        logger.warning("ntfy topic is empty; skipping notification: %s", title)
        return False
    headers = {
        "Title": title,
        "Priority": priority,
    }
    if tags:
        headers["Tags"] = ",".join(tags)
    try:
        resp = requests.post(
            topic_url(topic), data=message.encode("utf-8"), headers=headers, timeout=10
        )
    except requests.RequestException as exc:
        logger.warning("ntfy delivery failed for topic <redacted>: %s", exc)
        return False
    if resp.status_code != 200:
        logger.warning("ntfy non-200 (%s): %s", resp.status_code, resp.text[:200])
        return False
    return True
--- a/relay/queue.py
+++ b/relay/queue.py
@@ -0,0 +1,129 @@
 """Queue intake.
 CC sessions drop JSON envelopes into ``queue/``. The envelope shape:
    {
      "session_id": "session-1",
      "timestamp": "2026-05-02T12:34:56Z",
      "content":   "..."
    }
 The daemon picks the oldest entry by mtime, validates the envelope,
 returns it, and deletes the file once processing succeeds. A
 malformed envelope is moved to ``queue/.rejected/`` so the daemon
 doesn't retry it forever.
 """
 from __future__ import annotations
 import json
 import logging
 import shutil
 import time
 from dataclasses import dataclass
 from pathlib import Path
 logger = logging.getLogger(__name__)
 class QueueError(RuntimeError):
    pass
@dataclass(frozen=True)
 class QueueEntry:
    path: Path
    session_id: str
    timestamp: str
    content: str
    mtime: float
 def _validate_envelope(payload: object, *, path: Path) -> tuple[str, str, str]:
    if not isinstance(payload, dict):
        raise QueueError(f"{path}: envelope must be a JSON object")
    for required in ("session_id", "timestamp", "content"):
        if required not in payload:
            raise QueueError(f"{path}: missing required key '{required}'")
    session_id = str(payload["session_id"]).strip()
    timestamp = str(payload["timestamp"]).strip()
    content = str(payload["content"])
    if not session_id:
        raise QueueError(f"{path}: session_id is empty")
    if not timestamp:
        raise QueueError(f"{path}: timestamp is empty")
    if not content.strip():
        raise QueueError(f"{path}: content is empty")
    return session_id, timestamp, content
 def _reject(path: Path, reason: str) -> None:
    rejected_dir = path.parent / ".rejected"
    rejected_dir.mkdir(exist_ok=True)
    target = rejected_dir / path.name
    shutil.move(str(path), str(target))
    logger.warning("rejected queue entry %s -> %s: %s", path, target, reason)
 def list_pending(queue_dir: Path) -> list[Path]:
    """Return queue entries oldest-first, excluding the .rejected dir."""
    if not queue_dir.exists():
        return []
    entries = [
        p
        for p in queue_dir.iterdir()
        if p.is_file() and p.suffix == ".json" and not p.name.startswith(".")
    ]
    entries.sort(key=lambda p: p.stat().st_mtime)
    return entries
 def take_oldest(queue_dir: Path) -> QueueEntry | None:
    """Read and validate the oldest queue entry. Returns None if queue empty.
    Rejects malformed envelopes by moving them to .rejected/. The caller
    is responsible for calling ``ack(entry)`` once processing succeeds
    to delete the file; until then it remains in the queue.
    """
    pending = list_pending(queue_dir)
    if not pending:
        return None
    path = pending[0]
    try:
        raw = path.read_text(encoding="utf-8")
        payload = json.loads(raw)
        session_id, timestamp, content = _validate_envelope(payload, path=path)
    except (OSError, json.JSONDecodeError) as exc:
        _reject(path, f"unreadable / not JSON: {exc}")
        return None
    except QueueError as exc:
        _reject(path, str(exc))
        return None
    return QueueEntry(
        path=path,
        session_id=session_id,
        timestamp=timestamp,
        content=content,
        mtime=path.stat().st_mtime,
    )
 def ack(entry: QueueEntry) -> None:
    """Delete the queue file. Called by the daemon after the turn is dispatched."""
    try:
        entry.path.unlink()
    except FileNotFoundError:
        pass
 def stuck_age_seconds(queue_dir: Path) -> float:
    """How long the oldest pending entry has been waiting. 0 if queue empty.
    Used by the daemon's heartbeat to fire a ntfy alert when the queue
    is stuck (e.g., persistent API failure).
    """
    pending = list_pending(queue_dir)
    if not pending:
        return 0.0
    return time.time() - pending[0].stat().st_mtime
--- a/relay/state.py
+++ b/relay/state.py
@@ -0,0 +1,100 @@
 """Atomic state-file I/O and instance lock.
 The conversation history lives at ``state/conversation.json``. Mutated
 in-memory by the daemon and written via temp+rename to avoid partial
 writes if the process is killed mid-write. A ``state/.lock`` advisory
 file stops two daemons from running against the same directory.
 """
 from __future__ import annotations
 import errno
 import fcntl
 import json
 import os
 from dataclasses import dataclass
 from pathlib import Path
 from tempfile import NamedTemporaryFile
 class StateError(RuntimeError):
    pass
@dataclass
 class InstanceLock:
    """Holds a flock on state/.lock for the daemon's lifetime.
    Released automatically on process exit (kernel closes the fd) or by
    calling ``release()``. ``acquire()`` raises StateError if another
    daemon already holds the lock.
    """
    lock_path: Path
    _fd: int | None = None
    def acquire(self) -> None:
        self._fd = os.open(self.lock_path, os.O_RDWR | os.O_CREAT, 0o600)
        try:
            fcntl.flock(self._fd, fcntl.LOCK_EX | fcntl.LOCK_NB)
        except OSError as exc:
            os.close(self._fd)
            self._fd = None
            if exc.errno in {errno.EAGAIN, errno.EACCES}:
                raise StateError(
                    f"Another daemon is holding {self.lock_path}; refusing to start"
                ) from exc
            raise
        os.write(self._fd, str(os.getpid()).encode())
    def release(self) -> None:
        if self._fd is None:
            return
        try:
            fcntl.flock(self._fd, fcntl.LOCK_UN)
        finally:
            os.close(self._fd)
            self._fd = None
 def write_atomic(path: Path, data: str) -> None:
    """Atomically write text to ``path`` via temp file + rename.
    Crash-safe: a partial write leaves the temp file but does not
    overwrite the target. fsync the data file (not the directory) so the
    rename atomicity gives us durability up to the OS-level rename
    barrier.
    """
    path.parent.mkdir(parents=True, exist_ok=True)
    with NamedTemporaryFile(
        mode="w",
        encoding="utf-8",
        dir=str(path.parent),
        prefix=f".{path.name}.",
        suffix=".tmp",
        delete=False,
    ) as tmp:
        tmp.write(data)
        tmp.flush()
        os.fsync(tmp.fileno())
        tmp_path = Path(tmp.name)
    os.replace(tmp_path, path)
 def write_json_atomic(path: Path, value: object) -> None:
    write_atomic(path, json.dumps(value, indent=2, ensure_ascii=False, sort_keys=False))
 def read_json(path: Path, default: object) -> object:
    """Read JSON or return default when the file is missing or empty."""
    if not path.exists():
        return default
    raw = path.read_text(encoding="utf-8").strip()
    if not raw:
        return default
    try:
        return json.loads(raw)
    except json.JSONDecodeError as exc:
        raise StateError(f"Corrupt JSON at {path}: {exc}") from exc
--- a/tests/init.py
+++ b/tests/init.py
--- a/tests/test_anthropic_client.py
+++ b/tests/test_anthropic_client.py
@@ -0,0 +1,122 @@
 from __future__ import annotations
 from types import SimpleNamespace
 from unittest.mock import MagicMock
 from relay.anthropic_client import AnthropicClient
 def _fake_message(
    text: str, *, in_tokens: int = 100, out_tokens: int = 50, cache_w: int = 0, cache_r: int = 0
 ):
    """Build a stand-in for anthropic.types.Message with .content + .usage."""
    return SimpleNamespace(
        content=[SimpleNamespace(type="text", text=text)],
        usage=SimpleNamespace(
            input_tokens=in_tokens,
            output_tokens=out_tokens,
            cache_creation_input_tokens=cache_w,
            cache_read_input_tokens=cache_r,
        ),
    )
 def _client_with_mock(model: str = "claude-opus-4-7") -> tuple[AnthropicClient, MagicMock]:
    client = AnthropicClient(api_key="sk-fake", model=model)
    mock_create = MagicMock(return_value=_fake_message("response text"))
    client._sdk = SimpleNamespace(messages=SimpleNamespace(create=mock_create))
    return client, mock_create
 def test_send_returns_assistant_text_and_usage() -> None:
    client, _ = _client_with_mock()
    result = client.send(system_prompt="sys", messages=[{"role": "user", "content": "u1"}])
    assert result.text == "response text"
    assert result.input_tokens == 100
    assert result.output_tokens == 50
 def test_send_passes_system_prompt_with_cache_control() -> None:
    client, mock_create = _client_with_mock()
    client.send(system_prompt="SYSTEM PROMPT", messages=[{"role": "user", "content": "u"}])
    call = mock_create.call_args
    system_arg = call.kwargs["system"]
    assert system_arg[0]["text"] == "SYSTEM PROMPT"
    assert system_arg[0]["cache_control"] == {"type": "ephemeral"}
 def test_cost_estimation_opus() -> None:
    client, _ = _client_with_mock()
    # Override response with specific token counts
    client._sdk.messages.create = MagicMock(
        return_value=_fake_message("x", in_tokens=1_000_000, out_tokens=0)
    )
    result = client.send(system_prompt="s", messages=[{"role": "user", "content": "u"}])
    # Opus: $15/M input → $15
    assert abs(result.estimated_cost_usd - 15.0) < 0.01
 def test_cost_estimation_includes_cache_savings() -> None:
    client, _ = _client_with_mock()
    client._sdk.messages.create = MagicMock(
        return_value=_fake_message(
            "x",
            in_tokens=0,
            out_tokens=0,
            cache_w=1_000_000,  # 1M cache write at $18.75/M
            cache_r=1_000_000,  # 1M cache read at $1.50/M
        )
    )
    result = client.send(system_prompt="s", messages=[{"role": "user", "content": "u"}])
    # cache_w 1M @ $18.75 + cache_r 1M @ $1.50 = $20.25
    assert abs(result.estimated_cost_usd - 20.25) < 0.01
 def test_cost_estimation_unknown_model_falls_back_to_opus() -> None:
    client = AnthropicClient(api_key="sk-fake", model="claude-future-9000")
    client._sdk = SimpleNamespace(
        messages=SimpleNamespace(
            create=MagicMock(return_value=_fake_message("x", in_tokens=1_000_000, out_tokens=0))
        )
    )
    result = client.send(system_prompt="s", messages=[{"role": "user", "content": "u"}])
    # Falls back to Opus pricing
    assert abs(result.estimated_cost_usd - 15.0) < 0.01
 def test_send_concatenates_multiple_text_blocks() -> None:
    client, _ = _client_with_mock()
    multi = SimpleNamespace(
        content=[
            SimpleNamespace(type="text", text="hello "),
            SimpleNamespace(type="text", text="world"),
        ],
        usage=SimpleNamespace(
            input_tokens=0,
            output_tokens=0,
            cache_creation_input_tokens=0,
            cache_read_input_tokens=0,
        ),
    )
    client._sdk.messages.create = MagicMock(return_value=multi)
    result = client.send(system_prompt="s", messages=[{"role": "user", "content": "u"}])
    assert result.text == "hello world"
 def test_send_ignores_non_text_blocks() -> None:
    client, _ = _client_with_mock()
    mixed = SimpleNamespace(
        content=[
            SimpleNamespace(type="text", text="text part"),
            SimpleNamespace(type="tool_use"),  # no .text — would crash if not filtered
        ],
        usage=SimpleNamespace(
            input_tokens=0,
            output_tokens=0,
            cache_creation_input_tokens=0,
            cache_read_input_tokens=0,
        ),
    )
    client._sdk.messages.create = MagicMock(return_value=mixed)
    result = client.send(system_prompt="s", messages=[{"role": "user", "content": "u"}])
    assert result.text == "text part"
--- a/tests/test_config.py
+++ b/tests/test_config.py
@@ -0,0 +1,101 @@
 from __future__ import annotations
 from pathlib import Path
 from unittest.mock import patch
 import pytest
 def _write_env(tmp_path: Path, body: str) -> Path:
    env = tmp_path / ".env"
    env.write_text(body)
    return env
 def test_load_settings_reads_env_and_creates_runtime_dirs(
    tmp_path: Path, monkeypatch: pytest.MonkeyPatch
 ) -> None:
    _write_env(
        tmp_path,
        "ANTHROPIC_API_KEY=sk-ant-test\nANTHROPIC_MODEL=claude-haiku-4-5-20251001\nNTFY_TOPIC=topictopic12\n",
    )
    monkeypatch.setattr("relay.config.REPO_ROOT", tmp_path)
    monkeypatch.setattr("relay.config.ENV_FILE", tmp_path / ".env")
    monkeypatch.setattr("relay.config.CONFIG_FILE", tmp_path / "config.yaml")
    from relay.config import load_settings  # import after monkeypatch
    s = load_settings()
    assert s.api_key == "sk-ant-test"
    assert s.model == "claude-haiku-4-5-20251001"
    assert s.ntfy_topic == "topictopic12"
    for d in (s.queue_dir, s.dispatch_dir, s.state_dir, s.logs_dir):
        assert d.exists()
 def test_load_settings_generates_ntfy_topic_when_blank(
    tmp_path: Path, monkeypatch: pytest.MonkeyPatch
 ) -> None:
    _write_env(tmp_path, "ANTHROPIC_API_KEY=sk-ant-test\nNTFY_TOPIC=\n")
    monkeypatch.setattr("relay.config.REPO_ROOT", tmp_path)
    monkeypatch.setattr("relay.config.ENV_FILE", tmp_path / ".env")
    monkeypatch.setattr("relay.config.CONFIG_FILE", tmp_path / "config.yaml")
    from relay.config import load_settings
    with patch("relay.config._generate_ntfy_topic", return_value="generatedtopic1"):
        s = load_settings()
    assert s.ntfy_topic == "generatedtopic1"
    # Persisted back to .env
    env_after = (tmp_path / ".env").read_text()
    assert "NTFY_TOPIC=generatedtopic1" in env_after
 def test_load_settings_treats_inline_hash_comment_as_blank(
    tmp_path: Path, monkeypatch: pytest.MonkeyPatch
 ) -> None:
    """Defensive: dotenv parses 'KEY=  # comment' as 'KEY' = '  # comment'.
    The settings loader must recognise that as effectively blank so a
    fresh topic is generated.
    """
    _write_env(tmp_path, "ANTHROPIC_API_KEY=sk-ant-test\nNTFY_TOPIC=  # generated on first run\n")
    monkeypatch.setattr("relay.config.REPO_ROOT", tmp_path)
    monkeypatch.setattr("relay.config.ENV_FILE", tmp_path / ".env")
    monkeypatch.setattr("relay.config.CONFIG_FILE", tmp_path / "config.yaml")
    from relay.config import load_settings
    with patch("relay.config._generate_ntfy_topic", return_value="newtopic"):
        s = load_settings()
    assert s.ntfy_topic == "newtopic"
 def test_missing_api_key_raises(tmp_path: Path, monkeypatch: pytest.MonkeyPatch) -> None:
    _write_env(tmp_path, "ANTHROPIC_MODEL=claude-opus-4-7\n")
    monkeypatch.setattr("relay.config.REPO_ROOT", tmp_path)
    monkeypatch.setattr("relay.config.ENV_FILE", tmp_path / ".env")
    monkeypatch.setattr("relay.config.CONFIG_FILE", tmp_path / "config.yaml")
    # Also ensure env var doesn't leak in
    monkeypatch.delenv("ANTHROPIC_API_KEY", raising=False)
    from relay.config import load_settings
    with pytest.raises(RuntimeError, match="ANTHROPIC_API_KEY"):
        load_settings()
 def test_default_config_yaml_is_seeded_when_missing(
    tmp_path: Path, monkeypatch: pytest.MonkeyPatch
 ) -> None:
    _write_env(tmp_path, "ANTHROPIC_API_KEY=sk-ant-test\nNTFY_TOPIC=t1234\n")
    monkeypatch.setattr("relay.config.REPO_ROOT", tmp_path)
    monkeypatch.setattr("relay.config.ENV_FILE", tmp_path / ".env")
    monkeypatch.setattr("relay.config.CONFIG_FILE", tmp_path / "config.yaml")
    from relay.config import load_settings
    s = load_settings()
    cfg_path = tmp_path / "config.yaml"
    assert cfg_path.exists()
    assert any(sess.session_id == "session-1" for sess in s.sessions)
--- a/tests/test_conversation.py
+++ b/tests/test_conversation.py
@@ -0,0 +1,67 @@
 from __future__ import annotations
 import json
 from pathlib import Path
 from relay.conversation import RECENT_TURNS_KEPT, Conversation, render_for_log
 def test_append_and_read_back(tmp_path: Path) -> None:
    convo = Conversation(tmp_path / "c.json")
    convo.append("user", "hello", session_id="session-1")
    convo.append("assistant", "hi back")
    assert [t.role for t in convo.turns] == ["user", "assistant"]
    persisted = json.loads((tmp_path / "c.json").read_text())
    assert persisted[0]["session_id"] == "session-1"
 def test_total_chars_sums_content(tmp_path: Path) -> None:
    convo = Conversation(tmp_path / "c.json")
    convo.append("user", "abc")
    convo.append("assistant", "defg")
    assert convo.total_chars() == 7
 def test_needs_summarization_threshold(tmp_path: Path) -> None:
    convo = Conversation(tmp_path / "c.json")
    convo.append("user", "x" * 100)
    assert not convo.needs_summarization(200)
    assert convo.needs_summarization(50)
 def test_replace_with_summary_keeps_recent_turns(tmp_path: Path) -> None:
    convo = Conversation(tmp_path / "c.json")
    for i in range(RECENT_TURNS_KEPT + 5):
        convo.append("user", f"u{i}")
        convo.append("assistant", f"a{i}")
    convo.replace_with_summary("SUMMARY")
    assert convo.turns[0].role == "assistant"
    assert convo.turns[0].content == "SUMMARY"
    assert convo.turns[0].meta == "summary"
    # 1 summary + RECENT_TURNS_KEPT verbatim
    assert len(convo.turns) == 1 + RECENT_TURNS_KEPT
 def test_replace_with_summary_persists(tmp_path: Path) -> None:
    convo = Conversation(tmp_path / "c.json")
    for i in range(20):
        convo.append("user", f"u{i}")
    convo.replace_with_summary("S")
    reloaded = Conversation(tmp_path / "c.json")
    assert reloaded.turns[0].content == "S"
 def test_to_api_messages_strips_metadata(tmp_path: Path) -> None:
    convo = Conversation(tmp_path / "c.json")
    convo.append("user", "x", session_id="s1")
    convo.append("assistant", "y")
    msgs = convo.to_api_messages()
    assert msgs == [{"role": "user", "content": "x"}, {"role": "assistant", "content": "y"}]
 def test_render_for_log_truncates_long_content(tmp_path: Path) -> None:
    convo = Conversation(tmp_path / "c.json")
    convo.append("user", "a" * 1000)
    rendered = render_for_log(convo.turns[0], max_chars=50)
    assert len(rendered["content"]) < 100
    assert "more chars" in rendered["content"]
--- a/tests/test_daemon_loop.py
+++ b/tests/test_daemon_loop.py
@@ -0,0 +1,229 @@
 """End-to-end loop test with a fake AnthropicClient.
 Drives one daemon `_tick` at a time so we can inspect state after each
 event. Tests:
 - queue entry → API call → dispatch
 - response containing [NEEDS-JC] → state=needs_jc, no dispatch
 - jc_input.txt with @session-N: prefix → direct dispatch, no API call
 - jc_input.txt without prefix → next chat-side turn (API call), clears needs_jc
 - summarization triggers when history exceeds the cap
 - malformed queue file is rejected, doesn't break the loop
 """
 from __future__ import annotations
 import json
 from dataclasses import replace
 from pathlib import Path
 from types import SimpleNamespace
 from unittest.mock import patch
 import pytest
 from relay.anthropic_client import TurnResult
 from relay.config import SessionConfig, Settings
 from relay.daemon import Daemon
@pytest.fixture
 def settings(tmp_path: Path) -> Settings:
    s = Settings(
        api_key="sk-fake",
        model="claude-opus-4-7",
        ntfy_topic="test-topic",
        status_port=0,
        history_char_cap=400_000,
        repo_root=tmp_path,
        queue_dir=tmp_path / "queue",
        dispatch_dir=tmp_path / "dispatch",
        state_dir=tmp_path / "state",
        logs_dir=tmp_path / "logs",
        system_prompt="SYS",
        summarization_prompt="please summarize",
        sessions=(SessionConfig(session_id="session-1"),),
    )
    for d in (s.queue_dir, s.dispatch_dir, s.state_dir, s.logs_dir):
        d.mkdir(parents=True, exist_ok=True)
    return s
 def _fake_turn_result(text: str, *, cost: float = 0.001) -> TurnResult:
    return TurnResult(
        text=text,
        input_tokens=100,
        output_tokens=50,
        cache_creation_input_tokens=0,
        cache_read_input_tokens=0,
        estimated_cost_usd=cost,
        raw=SimpleNamespace(),
    )
 def _drop_queue(settings: Settings, name: str, content: str, session_id: str = "session-1") -> Path:
    path = settings.queue_dir / name
    path.write_text(
        json.dumps(
            {"session_id": session_id, "timestamp": "2026-05-02T00:00:00Z", "content": content}
        )
    )
    return path
 def _build_daemon(settings: Settings, *, response_text: str) -> Daemon:
    daemon = Daemon(settings)
    daemon.client.send = lambda **kwargs: _fake_turn_result(response_text)  # type: ignore[assignment]
    return daemon
 def test_queue_entry_flows_through_to_dispatch(settings: Settings) -> None:
    _drop_queue(settings, "a.json", "first message")
    daemon = _build_daemon(settings, response_text="ok cool reply")
    with patch("relay.daemon.notify"):
        daemon._tick()
    dispatched = settings.dispatch_dir / "session-1" / "input.txt"
    assert dispatched.read_text() == "ok cool reply"
    # Queue entry consumed
    assert list(settings.queue_dir.glob("*.json")) == []
    # History recorded
    assert daemon.conversation.total_chars() > 0
 def test_needs_jc_token_pauses_dispatch(settings: Settings) -> None:
    _drop_queue(settings, "a.json", "i need a decision")
    daemon = _build_daemon(settings, response_text="[NEEDS-JC] should I pick option A or B?")
    with patch("relay.daemon.notify") as ntfy:
        daemon._tick()
    assert daemon.status.state == "needs_jc"
    assert daemon.status.last_needs_jc_text is not None
    # No dispatch happened
    dispatched = settings.dispatch_dir / "session-1" / "input.txt"
    assert not dispatched.exists()
    # ntfy was called with the high priority alert
    assert any("paused" in c.kwargs.get("title", "").lower() for c in ntfy.call_args_list)
 def test_needs_jc_pause_blocks_subsequent_queue_entries(settings: Settings) -> None:
    _drop_queue(settings, "a.json", "first")
    daemon = _build_daemon(settings, response_text="[NEEDS-JC] need decision")
    with patch("relay.daemon.notify"):
        daemon._tick()
    # New queue entry while paused
    _drop_queue(settings, "b.json", "second")
    with patch("relay.daemon.notify"):
        daemon._tick()
    # Second queue entry NOT consumed
    pending = list(settings.queue_dir.glob("*.json"))
    assert len(pending) == 1
 def test_jc_input_prefix_dispatches_directly_without_api_call(settings: Settings) -> None:
    daemon = _build_daemon(settings, response_text="should not be called")
    api_calls: list[object] = []
    daemon.client.send = lambda **kwargs: (api_calls.append(kwargs), _fake_turn_result("x"))[1]  # type: ignore[assignment]
    (settings.state_dir / "jc_input.txt").write_text("@session-1: do this thing")
    with patch("relay.daemon.notify"):
        daemon._tick()
    dispatched = settings.dispatch_dir / "session-1" / "input.txt"
    assert dispatched.read_text() == "do this thing"
    assert api_calls == []  # no API call for direct dispatch
    assert not (settings.state_dir / "jc_input.txt").exists()
 def test_jc_input_clears_needs_jc(settings: Settings) -> None:
    daemon = _build_daemon(settings, response_text="[NEEDS-JC] question")
    _drop_queue(settings, "a.json", "trigger needs_jc")
    with patch("relay.daemon.notify"):
        daemon._tick()
    assert daemon.status.state == "needs_jc"
    # JC writes a non-prefix response — treated as next chat turn.
    daemon.client.send = lambda **kwargs: _fake_turn_result("post-jc reply")  # type: ignore[assignment]
    (settings.state_dir / "jc_input.txt").write_text("here's the answer: do A")
    with patch("relay.daemon.notify"):
        daemon._tick()
    assert daemon.status.state == "running"
    dispatched = settings.dispatch_dir / "session-1" / "input.txt"
    assert dispatched.read_text() == "post-jc reply"
 def test_summarization_triggers_at_cap(settings: Settings) -> None:
    tiny = replace(settings, history_char_cap=200)
    daemon = Daemon(tiny)
    # Pre-load history to exceed the cap
    daemon.conversation.append("user", "u" * 100)
    daemon.conversation.append("assistant", "a" * 200)
    assert daemon.conversation.needs_summarization(200)
    daemon.client.send = lambda **kwargs: _fake_turn_result("CONDENSED SUMMARY")  # type: ignore[assignment]
    _drop_queue(tiny, "a.json", "next user turn")
    with patch("relay.daemon.notify"):
        daemon._tick()
    # First turn after cap was hit invokes summarization, then sends the user
    # turn through the same call. The summary turn is now the head.
    assert daemon.conversation.turns[0].meta == "summary"
    assert daemon.conversation.turns[0].content == "CONDENSED SUMMARY"
 def test_malformed_queue_file_does_not_crash_loop(settings: Settings) -> None:
    bad = settings.queue_dir / "broken.json"
    bad.write_text("{not json")
    daemon = _build_daemon(settings, response_text="x")
    with patch("relay.daemon.notify"):
        daemon._tick()
    # Bad file moved to .rejected, no crash
    assert not bad.exists()
    assert (settings.queue_dir / ".rejected" / "broken.json").exists()
 def test_dispatch_queues_when_session_input_pending(settings: Settings) -> None:
    """Two queue entries for the same session and the first dispatch hasn't been consumed yet."""
    daemon = _build_daemon(settings, response_text="reply 1")
    _drop_queue(settings, "a.json", "msg 1")
    with patch("relay.daemon.notify"):
        daemon._tick()
    assert (settings.dispatch_dir / "session-1" / "input.txt").read_text() == "reply 1"
    daemon.client.send = lambda **kwargs: _fake_turn_result("reply 2")  # type: ignore[assignment]
    _drop_queue(settings, "b.json", "msg 2")
    with patch("relay.daemon.notify"):
        daemon._tick()
    # First dispatch still present (CC hasn't consumed); second is queued.
    assert (settings.dispatch_dir / "session-1" / "input.txt").read_text() == "reply 1"
    assert daemon.dispatch.has_pending("session-1")
    # CC consumes
    (settings.dispatch_dir / "session-1" / "input.txt").unlink()
    with patch("relay.daemon.notify"):
        daemon._tick()
    assert (settings.dispatch_dir / "session-1" / "input.txt").read_text() == "reply 2"
 def test_status_persists_to_disk(settings: Settings) -> None:
    daemon = _build_daemon(settings, response_text="ok")
    _drop_queue(settings, "a.json", "hi")
    with patch("relay.daemon.notify"):
        daemon._tick()
    daemon._persist_status()
    status_path = settings.state_dir / "status.json"
    payload = json.loads(status_path.read_text())
    assert payload["state"] == "running"
    assert payload["last_dispatch_session"] == "session-1"
    assert payload["history_turns"] >= 2  # user + assistant
--- a/tests/test_dispatch.py
+++ b/tests/test_dispatch.py
@@ -0,0 +1,48 @@
 from __future__ import annotations
 from pathlib import Path
 from relay.dispatch import DispatchManager
 def test_writes_input_when_session_dir_empty(tmp_path: Path) -> None:
    mgr = DispatchManager(tmp_path / "dispatch")
    delivered = mgr.queue_or_write("session-1", "hello")
    assert delivered is True
    assert (tmp_path / "dispatch" / "session-1" / "input.txt").read_text() == "hello"
 def test_does_not_overwrite_pending_input(tmp_path: Path) -> None:
    mgr = DispatchManager(tmp_path / "dispatch")
    mgr.queue_or_write("session-1", "first")
    delivered = mgr.queue_or_write("session-1", "second")
    assert delivered is False
    # First file unchanged
    assert (tmp_path / "dispatch" / "session-1" / "input.txt").read_text() == "first"
    assert mgr.has_pending("session-1")
 def test_flush_all_delivers_after_consumer_deletes(tmp_path: Path) -> None:
    mgr = DispatchManager(tmp_path / "dispatch")
    mgr.queue_or_write("session-1", "first")
    mgr.queue_or_write("session-1", "second")
    # Consumer "consumes" first
    (tmp_path / "dispatch" / "session-1" / "input.txt").unlink()
    delivered = mgr.flush_all()
    assert delivered == 1
    assert (tmp_path / "dispatch" / "session-1" / "input.txt").read_text() == "second"
    assert not mgr.has_pending("session-1")
 def test_independent_sessions_do_not_block_each_other(tmp_path: Path) -> None:
    mgr = DispatchManager(tmp_path / "dispatch")
    mgr.queue_or_write("session-1", "a")  # blocks on session-1
    delivered = mgr.queue_or_write("session-2", "b")  # session-2 unaffected
    assert delivered is True
    assert (tmp_path / "dispatch" / "session-2" / "input.txt").read_text() == "b"
 def test_input_path_resolution(tmp_path: Path) -> None:
    mgr = DispatchManager(tmp_path / "dispatch")
    p = mgr.input_path("session-1")
    assert p == tmp_path / "dispatch" / "session-1" / "input.txt"
--- a/tests/test_ntfy.py
+++ b/tests/test_ntfy.py
@@ -0,0 +1,39 @@
 from __future__ import annotations
 from unittest.mock import MagicMock, patch
 from relay.ntfy import notify, topic_url
 def test_topic_url_builds_correctly() -> None:
    assert topic_url("abc123") == "https://ntfy.sh/abc123"
 def test_notify_returns_false_on_empty_topic() -> None:
    assert notify("", "title", "msg") is False
 def test_notify_posts_with_headers_and_body() -> None:
    with patch("relay.ntfy.requests.post") as post:
        post.return_value = MagicMock(status_code=200)
        ok = notify("topic", "the title", "body text", priority="high", tags=["warning"])
    assert ok is True
    args, kwargs = post.call_args
    assert args[0] == "https://ntfy.sh/topic"
    assert kwargs["data"] == b"body text"
    assert kwargs["headers"]["Title"] == "the title"
    assert kwargs["headers"]["Priority"] == "high"
    assert kwargs["headers"]["Tags"] == "warning"
 def test_notify_returns_false_on_non_200() -> None:
    with patch("relay.ntfy.requests.post") as post:
        post.return_value = MagicMock(status_code=500, text="oops")
        assert notify("topic", "t", "m") is False
 def test_notify_returns_false_on_network_error() -> None:
    import requests
    with patch("relay.ntfy.requests.post", side_effect=requests.ConnectionError("down")):
        assert notify("topic", "t", "m") is False
--- a/tests/test_queue.py
+++ b/tests/test_queue.py
@@ -0,0 +1,90 @@
 from __future__ import annotations
 import json
 import time
 from pathlib import Path
 from relay.queue import ack, list_pending, stuck_age_seconds, take_oldest
 def _drop(queue_dir: Path, name: str, payload: dict) -> Path:
    queue_dir.mkdir(parents=True, exist_ok=True)
    path = queue_dir / name
    path.write_text(json.dumps(payload))
    return path
 def test_take_oldest_returns_none_on_empty_queue(tmp_path: Path) -> None:
    assert take_oldest(tmp_path / "queue") is None
 def test_take_oldest_returns_oldest_by_mtime(tmp_path: Path) -> None:
    queue_dir = tmp_path / "queue"
    older = _drop(queue_dir, "a.json", {"session_id": "s1", "timestamp": "t1", "content": "first"})
    time.sleep(0.01)
    newer = _drop(queue_dir, "b.json", {"session_id": "s2", "timestamp": "t2", "content": "second"})
    entry = take_oldest(queue_dir)
    assert entry is not None
    assert entry.path == older
    assert entry.session_id == "s1"
    assert entry.content == "first"
    assert newer.exists()
 def test_invalid_json_is_rejected(tmp_path: Path) -> None:
    queue_dir = tmp_path / "queue"
    queue_dir.mkdir()
    bad = queue_dir / "bad.json"
    bad.write_text("not json")
    assert take_oldest(queue_dir) is None
    assert not bad.exists()
    assert (queue_dir / ".rejected" / "bad.json").exists()
 def test_envelope_missing_keys_is_rejected(tmp_path: Path) -> None:
    queue_dir = tmp_path / "queue"
    _drop(queue_dir, "x.json", {"session_id": "s1"})  # no timestamp / content
    assert take_oldest(queue_dir) is None
    assert (queue_dir / ".rejected" / "x.json").exists()
 def test_envelope_blank_content_is_rejected(tmp_path: Path) -> None:
    queue_dir = tmp_path / "queue"
    _drop(queue_dir, "x.json", {"session_id": "s1", "timestamp": "t", "content": "   "})
    assert take_oldest(queue_dir) is None
 def test_ack_deletes_file(tmp_path: Path) -> None:
    queue_dir = tmp_path / "queue"
    _drop(queue_dir, "a.json", {"session_id": "s1", "timestamp": "t", "content": "x"})
    entry = take_oldest(queue_dir)
    assert entry is not None
    ack(entry)
    assert not entry.path.exists()
 def test_list_pending_skips_rejected_dir(tmp_path: Path) -> None:
    queue_dir = tmp_path / "queue"
    queue_dir.mkdir()
    rejected = queue_dir / ".rejected"
    rejected.mkdir()
    (rejected / "old.json").write_text("{}")
    a = _drop(queue_dir, "a.json", {"session_id": "s", "timestamp": "t", "content": "x"})
    pending = list_pending(queue_dir)
    assert pending == [a]
 def test_stuck_age_returns_zero_on_empty_queue(tmp_path: Path) -> None:
    assert stuck_age_seconds(tmp_path / "queue") == 0
 def test_stuck_age_reports_oldest_age(tmp_path: Path) -> None:
    queue_dir = tmp_path / "queue"
    path = _drop(queue_dir, "a.json", {"session_id": "s", "timestamp": "t", "content": "x"})
    # Backdate the file
    old_mtime = time.time() - 60
    import os
    os.utime(path, (old_mtime, old_mtime))
    age = stuck_age_seconds(queue_dir)
    assert age >= 59
--- a/tests/test_real_api.py
+++ b/tests/test_real_api.py
@@ -0,0 +1,57 @@
 """Live-API smoke test.
 Runs only when ANTHROPIC_API_KEY is set in the env (or .env). Sends one
 real turn, verifies we get back a non-empty assistant text and usage
 counts. Marked ``real_api`` so it is excluded from the default suite —
 opt in with ``pytest -m real_api``.
 """
 from __future__ import annotations
 import os
 import pytest
 from dotenv import dotenv_values
 from relay.anthropic_client import AnthropicClient
 from relay.config import REPO_ROOT
 def _api_key_available() -> str | None:
    env_values = dotenv_values(REPO_ROOT / ".env")
    return (
        env_values.get("ANTHROPIC_API_KEY") or os.environ.get("ANTHROPIC_API_KEY") or ""
    ).strip() or None
@pytest.mark.real_api
@pytest.mark.skipif(_api_key_available() is None, reason="ANTHROPIC_API_KEY not configured")
 def test_real_round_trip_against_haiku() -> None:
    """One round-trip against the Haiku model — cheapest available, ~$0.0001 per call.
    Skips (rather than fails) on credit-balance errors so the test is
    honest about why it didn't run. Real authentication errors (401)
    still fail loudly.
    """
    import anthropic
    api_key = _api_key_available()
    assert api_key is not None
    client = AnthropicClient(
        api_key=api_key, model="claude-haiku-4-5-20251001", max_output_tokens=64
    )
    try:
        result = client.send(
            system_prompt="You are a terse echo. Reply with exactly 'pong' and nothing else.",
            messages=[{"role": "user", "content": "ping"}],
        )
    except anthropic.BadRequestError as exc:
        msg = str(exc).lower()
        if "credit balance" in msg or "billing" in msg:
            pytest.skip(f"Anthropic API key has no credit balance: {exc}")
        raise
    assert result.text.strip().lower() == "pong"
    assert result.input_tokens > 0
    assert result.output_tokens > 0
    assert result.estimated_cost_usd > 0
--- a/tests/test_state.py
+++ b/tests/test_state.py
@@ -0,0 +1,102 @@
 from __future__ import annotations
 import json
 import os
 import threading
 from pathlib import Path
 import pytest
 from relay.state import InstanceLock, StateError, read_json, write_atomic, write_json_atomic
 def test_write_atomic_round_trips(tmp_path: Path) -> None:
    target = tmp_path / "f.json"
    write_atomic(target, "hello")
    assert target.read_text() == "hello"
 def test_write_atomic_does_not_leave_temp_files(tmp_path: Path) -> None:
    target = tmp_path / "f.json"
    write_atomic(target, "hello")
    siblings = [p.name for p in tmp_path.iterdir() if p != target]
    assert siblings == []
 def test_write_json_atomic_round_trips(tmp_path: Path) -> None:
    target = tmp_path / "value.json"
    payload = [{"k": "v"}, {"k2": "v2"}]
    write_json_atomic(target, payload)
    assert json.loads(target.read_text()) == payload
 def test_read_json_returns_default_when_missing(tmp_path: Path) -> None:
    assert read_json(tmp_path / "nope.json", default=[]) == []
    assert read_json(tmp_path / "nope.json", default={"x": 1}) == {"x": 1}
 def test_read_json_returns_default_when_empty(tmp_path: Path) -> None:
    target = tmp_path / "empty.json"
    target.write_text("")
    assert read_json(target, default=[]) == []
 def test_read_json_raises_on_corruption(tmp_path: Path) -> None:
    target = tmp_path / "bad.json"
    target.write_text("{not json")
    with pytest.raises(StateError):
        read_json(target, default=[])
 def test_instance_lock_acquires_and_releases(tmp_path: Path) -> None:
    lock = InstanceLock(tmp_path / ".lock")
    lock.acquire()
    try:
        # PID written to file
        assert (tmp_path / ".lock").read_text().strip() == str(os.getpid())
    finally:
        lock.release()
 def test_instance_lock_rejects_second_acquirer(tmp_path: Path) -> None:
    lock1 = InstanceLock(tmp_path / ".lock")
    lock1.acquire()
    try:
        lock2 = InstanceLock(tmp_path / ".lock")
        with pytest.raises(StateError):
            lock2.acquire()
    finally:
        lock1.release()
 def test_instance_lock_release_is_idempotent(tmp_path: Path) -> None:
    lock = InstanceLock(tmp_path / ".lock")
    lock.acquire()
    lock.release()
    lock.release()  # second release should not raise
 def test_concurrent_acquire_serializes(tmp_path: Path) -> None:
    """Confirm flock semantics: two threads acquiring the same lock can't overlap."""
    lock_path = tmp_path / ".lock"
    held: list[bool] = []
    error_count = [0]
    def worker():
        lock = InstanceLock(lock_path)
        try:
            lock.acquire()
            held.append(True)
            lock.release()
        except StateError:
            error_count[0] += 1
    threads = [threading.Thread(target=worker) for _ in range(2)]
    for t in threads:
        t.start()
    for t in threads:
        t.join()
    # At least one succeeded; the other either succeeded after the first
    # released, or failed to acquire (depending on scheduling).
    assert sum(held) + error_count[0] == 2