Spec: Agentic Source Orchestrator v2 (Hardening)
Agentic Source Orchestrator v2 defines the hardened implementation layer for the vault's ingestion pipeline. It focuses on lifecycle control, human-in-the-loop (HITL) enforcement, and provenance guarantees.
1. Source Intake Lifecycle
The orchestrator manages sources through a strict 8-stage state machine. No stage may be skipped. proposed -> mapped -> approved -> crawled -> indexed -> verified -> synthesized -> promoted
2. Multi-Agent Responsibilities
The system enforces the Two-Role Invariant: at least two distinct agent roles must touch any promoted artifact to reduce hallucination risk.
- Gemini (Librarian): Detects gaps, proposes sources, prepares intake.
- Codex (Engineer): Tool execution, schema validation, policy enforcement.
- Claude (Chronicler): Distillation, draft building, YANP formatting.
3. MCP Tool Surface
The orchestration layer is exposed via 8 core tools:
propose_source_intake: Register a URL and rationale.orchestrate_ingestion: Map the site and estimate costs.approve_intake_plan: Record human approval for execution.execute_source_crawl: Bounded fetch of raw documents.index_crawled_source: Chunking and embedding into sidecar.verify_source_index: Integrity check of indexed content.semantic_search_sources: Retrieve attributed evidence.promote_synthesis_candidate: Final gate for permanent note creation.
4. Policy Enforcement
Authority is centralized in 02_System/pipeline-policy.yaml.
- Fail-Closed: If policy cannot be loaded, all ingestion tools must stop.
- Thresholds: Mandatory human approval for new domains or costs > 20 credits.
4.1 Transport Security & Authorization
Tools exposed via HTTP transport MUST be protected per lit-mcp-authorization; the propose_source_intake and promote_synthesis_candidate gates are highest-privilege and SHOULD require scope validation before execution.
See lit-mcp-security-best-practices for the full threat model applicable to HTTP-exposed MCP tool surfaces.
5. Handoff & Seams
Every orchestration step must produce a Seam Artifact containing the current state, open risks, and next recommended action. This ensures continuity across agent sessions.
See protocol-source-ingestion-runbook for the step-by-step operational protocol.