Sentinel - AI Agent Supervisor
Team consisting of Joe Hewett (CTO, Entropy Labs; BSc Warwick) and David Mlčoch (Co‑founder/CTO, Trident AI; MSc Edinburgh), building LLM-powered fraud detection and agent supervision.
YouTube Video
Project Description
AI agents are powerful but dangerous because we can’t reliably control their behavior. Human in the loop is a necessary component, but not sufficient on its own. It is too expensive for human to review every agent action.
We need a way to automatically detect which action requests should be escalated to a human and which can be automatically approved. For the AI agent revolution to happen, we need flexible and adaptable AI agent supervisors.
We’ve developed the industry’s first dynamic supervision framework that mimics an expert human supervisor’s judgment - continuously monitoring the agent, preventing potentially harmful actions before execution and learning from previous interactions. Our system allows users to define supervisors and attach them to agent tools, ensuring every action is appropriately supervised based on supervisors’s policies and agent execution context.
Integration is straightforward: developers add our supervise() decorator and specify supervisor configurations for any llm tool in their codebase. Our framework offers three types of supervisors:
- Human Supervisor: Routes actions to our intuitive UI for manual review and approval
- AI Supervisor: Utilizes specialized LLM models, custom instructions, policies, and contextual data to automatically approve, reject, or escalate actions
- Deterministic Supervisor: Applies predefined rules for consistent decision-making (e.g., automatically approving emails to verified addresses while escalating others)
These supervisors can be combined to create sophisticated escalation workflows. With more human interactions, our system will continuously improve its oversight capabilities and adapting to new scenarios in real-time.
This has a huge real-world impact for companies building autonomous AI. Our systems means they can deploy more reliable agents faster and more safely. Companies right now try to solve this by model guardrails and offline evaluations. But this is useless for agents as we need online supervision at scale. We move away from offline agent evaluations to runtime supervisors. With our supervisors, they don’t need to change their agent design, they only add our supervision layer. We can integrate with any agent architecture and learn the right behavior automatically.
Prior Work
What we’ve built before the hackathon:
We started with a Postgres database and an OpenAPI API hosted on a simple Go webserver. We used an existing basic React frontend using pre-built shadcn components to build most of our UI, and had a websocket set up between the frontend and backend to stream human reviews between the client and the server.
What we’ve built at the hackathon:
During the hackathon we implemented supervisor escalation mechanisms and AI supervisors that oversee the running of other agents. We made it possible to configure chains of supervisors that work in parallel to monitor agents during runtime. We built a supervise() Python decorator that can be wrapped around Python agent tool calls and applied it to TAU Bench, the first benchmark measuring both tool use and user interaction in the customer support agent setting.
We demonstrated that combining agents with our supervisors significantly increases the agent performance with no updates needed to the underlying agent. We ran our supervisors on 20 example tasks that were failing on the TAU-Bench and managed to finish 16 of them successfully. We should have state of the art on TAU bench by the end of today after we run our system on the full benchmark.
We integrated 9 sophisticated supervisors on top of the default agent from the TAU-bench paper. Here are example supervisors:
- Policy Supervisor - Checks if the agent is following the right company customer support communication policy.
- Goal Supervisor - Checks if the agent is solving the goal is not stuck, helps to restart it if needed.
- Correct Tool Arguments Supervisor - Checks if the filled arguments to the tool call are correct given the context window (e.g. Tracking ID presented to the user)
- All Customer Requests Satisfied Supervisor - Supervisor keeps track of all user requests and make sure the agent has satisfied all of them before finishing the conversation.
- Human Supervisor - When automated supervisors are not sure, they escalate to human supervisor.
What we want to build next:
In the future we’re going to make the AI Supervisors continuously learn at runtime based on the human supervisor actions and previous executions. We’re enabling any agent to benefit from human supervision, and for the human supervisors to train automated supervisors passively whilst they go about their work.
Team
Technologies
Products & Tools
Additional Links
Main github repo - Hackathon code is in the joe/register-tool-supervisors branch