Spec: Agentic Source Orchestrator v2 (Hardening)

Agentic Source Orchestrator v2 defines the hardened implementation layer for the vault's ingestion pipeline. It focuses on lifecycle control, human-in-the-loop (HITL) enforcement, and provenance guarantees.

1. Source Intake Lifecycle

The orchestrator manages sources through a strict 8-stage state machine. No stage may be skipped. proposed -> mapped -> approved -> crawled -> indexed -> verified -> synthesized -> promoted

2. Multi-Agent Responsibilities

The system enforces the Two-Role Invariant: at least two distinct agent roles must touch any promoted artifact to reduce hallucination risk.

Gemini (Librarian): Detects gaps, proposes sources, prepares intake.
Codex (Engineer): Tool execution, schema validation, policy enforcement.
Claude (Chronicler): Distillation, draft building, YANP formatting.

3. MCP Tool Surface

The orchestration layer is exposed via 8 core tools:

propose_source_intake: Register a URL and rationale.
orchestrate_ingestion: Map the site and estimate costs.
approve_intake_plan: Record human approval for execution.
execute_source_crawl: Bounded fetch of raw documents.
index_crawled_source: Chunking and embedding into sidecar.
verify_source_index: Integrity check of indexed content.
semantic_search_sources: Retrieve attributed evidence.
promote_synthesis_candidate: Final gate for permanent note creation.

4. Policy Enforcement

Authority is centralized in 02_System/pipeline-policy.yaml.

Fail-Closed: If policy cannot be loaded, all ingestion tools must stop.
Thresholds: Mandatory human approval for new domains or costs > 20 credits.

4.1 Transport Security & Authorization

Tools exposed via HTTP transport MUST be protected per lit-mcp-authorization; the propose_source_intake and promote_synthesis_candidate gates are highest-privilege and SHOULD require scope validation before execution.

See lit-mcp-security-best-practices for the full threat model applicable to HTTP-exposed MCP tool surfaces.

5. Handoff & Seams

Every orchestration step must produce a Seam Artifact containing the current state, open risks, and next recommended action. This ensures continuity across agent sessions.

See protocol-source-ingestion-runbook for the step-by-step operational protocol.