CallFlow - Humans-in-the-Loop Agents Hackathon
AI Tinkerers - San Francisco
Hackathon Showcase 1st Place Winner

CallFlow

CallFlow: AI-powered confidence-based routing with real-time verification and monitoring, auto-resolving routine inquiries and escalating complex cases for human approval.

3 members Watch Demo

Overview:
Conversational AI platform that seamlessly blends AI automation with human oversight to revolutionize call center operations. Our solution employs a confidence-based routing system that:

  • Automatically handles routine inquiries for immediate resolution
  • Intelligently escalates complex cases to human agents: using HumanLayer to wait for their approval
  • Uses a verification classifier layer to let humans understand which cases most need their attention
  • Logs analytics & learns from interactions to continuously reduce human involvement & improve scalability over time.

What’s our problem?
The US spends >$130b/y on call centers - and Americans spend over 900m hours a year on hold!!
Customers often face long, frustrating wait times when trying to reach the IRS or their banks, which can lead to increased costs and dissatisfaction. While conversational AI has made its way into call centers to help, one major hurdle remains: these systems often lack the flexibility and contextual awareness to adapt their responses based on unique customer needs. Moving too quickly from limited to full automation can worsen the experience.

Our solution:

  • Our conversational AI agent is built to improve customer interactions by taking the right action based on the situation. When the agent is confident, it can handle simpler tasks autonomously, speeding up resolutions. For more complex questions, the agent will ask for human approval before proceeding, creating a balanced approach that blends efficiency with necessary oversight. This way, customers get a faster, more attentive service experience that builds trust while leveraging AI to its fullest potential.

See a simplified diagram of our technical flow here: https://www.tldraw.com/s/v2c-2n4u4WqJXGDlAs6i_g4C?d=v-2139.-1299.3914.1794.page

Human in the loop:
Our project embodies the theme by:

  • Using a conversational AI system that can escalate to human for approval in cases where it’s needed

Our project transcends the theme by:

  • Having an additional “AI evaluator in the loop” that categorizes various cases, allowing the human to more selectively apply oversight
  • Having the infrastructure ready to collect usage data - including human approvals, and evaluators: to ensure that we do not scale our number of employees linearly with number of inbound calls. We aim to have customers see their number of humans needed plummet as they scale.
    See a brief version of our anticipated data fly wheel here: https://www.tldraw.com/s/v2_c_r-7hLi9e5M_mjGKHPMk_F?d=v-24.-449.3365.1542.Ietwc14DVAZYRiji92_0e

Impact:
The market is large (>$130b in the US, and nearly $1T globally), and calls are pricy - around $5 per single call! We want to improve both the standard of customer support & it’s price.
This system could be highly impactful in sectors like banking, government services, healthcare, and other industries where customer interactions can be complex and sensitive. By integrating intelligent AI-driven responses with human oversight, our solution addresses the growing need for faster, more responsive service in high-stakes environments, such as financial or legal inquiries

Detailing the Working Code:
CallFlow integrates Vapi’s voice-to-voice technology with a custom Gemini 1.5 Pro model. The custom LLM handles essential banking operations, such as card freezes and replacements, through direct tool integration. A secondary LLM model sits on top: validating the outputs of the first model and acting as an early warning sign on when human intervention may be needed. Weight & Biases Weave is used to analyze the calls of the LLM. Each conversation is securely captured and stored in Firebase, preserving both customer interactions and AI responses for analysis and compliance. Through an intuitive real-time dashboard built with Next.js and Tailwind, call center agents can monitor conversations, review AI decisions, and intervene when necessary, ensuring quality service while maintaining human oversight via HumanLayer. This hybrid approach combines the efficiency of AI automation with the assurance of human expertise, enabling faster response times while maintaining high standards of banking security and customer service.

Innovation and Creativity:

Our approach stands out by enabling the AI to assess its confidence before acting, fostering trust and reducing errors. Rather than rigid automation, our design incorporates a dynamic hand-off system that uses a human-in-the-loop model for critical approval situations. A color-coded feedback loop—from green to red—visually signals confidence levels, allowing for smooth transitions and efficient, quality-driven customer interactions. The centrality of the data flywheel for long term improvement is also a big part of this. This integration of human oversight ensures that human agents are involved when necessary, maximizing efficiency while maintaining transparency and accountability. Overall, our solution represents a novel advancement in customer service automation.

None: we started this project from scratch.

AI Tinkerers Google HumanLayer Weights & Biases

A diagram of data flywheels we hope to develop - to ensure robustness & sublinear scaling with # of call center employees.

Summarizing URL...

An architecture diagram of some of our core novel components

Summarizing URL...

Verifier model rubric

Summarizing URL...