ADR-005: Single global ToolRegistry without RBAC scoping (DEPRECATED) | Agentic Data Pipeline

Context

Original v0 design (ADR-005, originally accepted) had one global ToolRegistry instance shared across all agents. Workers would registry.get("query_database") and execute it. This was the simplest possible design and shipped on day 2 of M02 build-out.

It worked fine until M03 wired in a Quality agent and a Transform agent. Then we hit two production-shaped problems:

Quality agent gained access to Transform tools. The Quality agent's job is to validate, not to mutate. With a global registry, nothing prevented quality_agent.registry.call("transform_apply_schema_change") — the registry didn't know which agent was asking. We caught this in code review of an unrelated PR; the agent had been "helpfully" applying schema fixes during a routine validation run for ~2 weeks.
No audit trail of which agent invoked which tool. When a destructive write happened, the audit log said "Tool transform_drop_table executed at 14:22:11". The next question was always "which agent decided to do that?" and the answer was a 20-minute git-blame + LangSmith trace excavation. Useless during incidents.

What was originally decided

# DEPRECATED — see "What we got wrong" below
class ToolRegistry:
    def __init__(self):
        self._tools: dict[str, Tool] = {}

    def register(self, name: str, tool: Tool) -> None:
        self._tools[name] = tool

    def get(self, name: str) -> Tool:
        return self._tools[name]

# At startup
registry = ToolRegistry()
registry.register("query_database", DatabaseTool())
registry.register("transform_apply_schema_change", SchemaTool())
# ... 20 more tools

# In every worker
class IngestionWorker:
    def __init__(self, registry: ToolRegistry):
        self.registry = registry  # has access to ALL tools

What we reversed to

RBAC-aware registry with per-agent scope:

# src/tools/registry.py — v2, current
class ToolRegistry:
    def __init__(self):
        self._tools: dict[str, Tool] = {}
        self._scopes: dict[str, set[AgentRole]] = {}    # tool -> allowed agents

    def register(self, name: str, tool: Tool, allowed_roles: set[AgentRole]) -> None:
        self._tools[name] = tool
        self._scopes[name] = allowed_roles

    def for_agent(self, role: AgentRole) -> "ScopedToolView":
        """Return a view that only sees tools allowed for this agent role."""
        return ScopedToolView(
            tools={n: t for n, t in self._tools.items() if role in self._scopes[n]},
            audit_logger=self._audit,
            agent_role=role,
        )


class ScopedToolView:
    async def call(self, name: str, *args, **kwargs):
        if name not in self._tools:
            raise ToolNotPermitted(f"Agent {self._agent_role} cannot use {name}")
        self._audit_logger.log_tool_call(self._agent_role, name, args, kwargs)
        return await self._tools[name](*args, **kwargs)

Workers receive a scoped view, not the global registry:

# src/agents/workers.py
class QualityWorker:
    def __init__(self, scoped_tools: ScopedToolView):
        self.tools = scoped_tools     # only validation tools

# Wiring
quality_view = registry.for_agent(AgentRole.QUALITY)   # only validation tools
transform_view = registry.for_agent(AgentRole.TRANSFORM)  # only transform tools

Why reversed

Two concrete incidents drove the reversal:

2026-04-12: Quality agent applied a schema migration during a validation run because the supervisor's routing decision had wedged on a high-confidence validate-then-fix pattern. The schema change broke the M04 dashboard for 23 minutes. Logged in runbooks/incident-2026-04-12-quality-agent-overreach.md.
2026-04-13: Code review caught the Quality agent's tools.call("transform_apply_schema_change") line in a routine PR. The reviewer flagged it; investigation revealed it had been silently working in the regression suite for ~2 weeks, masked by idempotent reapplication.

The first incident was the gating one. The second was the credibility one — global access invites accidents we can't see in CR.

What we got wrong (and what we'd do again)

Got wrong: treated the tool registry as a directory (one canonical lookup table) when it's actually a capability boundary. Capabilities should be granted to roles, not held in a global namespace anyone can read from.

Would do again: the Tool Protocol shape (async def __call__(self, ...) -> ToolResult) and the registration-by-name pattern. Both still hold. Only the access model changed.

Reversal cost

Schema rewrite (src/tools/registry.py): 1 day
Worker wiring updates (src/agents/workers.py × 4): 1 day
Test rewrite + per-agent test fixtures: 1 day
Audit-log integration: 0.5 day
Per-tool allowed_roles enumeration (audit each of 20+ tools): 1 day
Total: ~4.5 engineer-days

Lessons

"Default-deny" is a feature, not a polish item. v0 was default-allow with a global namespace. v1 is default-deny via the scoped view. The default flips the cognitive load: you have to explicitly grant access in register(...), not explicitly forbid it later.
Capability boundaries belong at the registry, not the tool. We considered putting role checks inside each tool's __call__. That's per-tool — easy to forget on the next tool added. Registry-level scoping enforces it once for all tools.
Audit log naming has to identify the agent. The v0 audit log said "Tool X executed". v1 says "Agent Q (role: QUALITY) called Tool X". One word saves 20 minutes of incident response.
Document the deprecation. The next engineer who proposes "let's just use one global registry" needs to find this ADR and the incident runbook before they re-introduce the bug.

References

src/tools/registry.py — current v2 RBAC-aware implementation
src/security/rbac.py — AgentRole enum + role assignments
runbooks/incident-2026-04-12-quality-agent-overreach.md — the incident that drove the reversal
ADR-001 (LangGraph orchestrator)
ADR-003 (supervisor-worker topology — workers receive scoped views, not the registry)
ADR-004 (HITL approval is the second line of defense; RBAC is the first)