Hackathon Showcase
Cactus Confidence Router
One-man one-agent. Let's go.
Project Description
An intelligent edge-cloud hybrid routing system for on-device function calling,
using FunctionGemma-270M (via Cactus Compute) as the primary inference engine
and Gemini 2.0 Flash as a confidence-gated fallback.
Architecture:
- Tool Relevance Ranking — Before calling FunctionGemma, tools are scored by
keyword overlap with the user query. For single-intent requests, only the
top-5 most relevant tools are passed to the model. This reduces context
noise without hiding information, improving accuracy for the compact 270M
parameter model.
- Intent-Scaled Token Budget — The system estimates the number of distinct
function calls requested (via action verb + conjunction counting). Token
budget scales with intent count (128 × n_intents), balancing speed for
simple requests and capacity for complex multi-tool queries.
- FunctionGemma-First Inference — FunctionGemma runs on every request via
Cactus Compute. The model is cached across calls (loaded once per process),
eliminating repeated model-load overhead. All routing decisions start
on-device; cloud is never the default path.
- Argument Verification Pipeline — After inference, argument values are
cross-checked against direct text extraction for deterministically parseable
fields (timer durations, clock times, reminder titles). The model decides
WHICH tool to call; this step verifies that argument VALUES match what the
user stated — catching hallucinated-value failures common in small models.
- Confidence-Gated Cloud Routing — Cactus’s built-in confidence score drives
the escalation decision. When confidence ≥ 0.7 or the expected number of
tool calls was found, the on-device result is returned immediately. Only
genuinely uncertain cases (low confidence + incomplete output) escalate to
Gemini 2.0 Flash.
Technologies: Cactus Compute SDK, FunctionGemma-270M-IT, Google Gemini 2.0
Flash API, Python standard library (re, json).
Local benchmark: 88.6% score, 100% on-device ratio, 289ms average latency.
Team
Products & Tools
AI Tinkerers
Cactus Compute
Google DeepMind