Hackathon Showcase

Cactus Confidence Router

One-man one-agent. Let's go.

1 member

Project Description

An intelligent edge-cloud hybrid routing system for on-device function calling,
using FunctionGemma-270M (via Cactus Compute) as the primary inference engine
and Gemini 2.0 Flash as a confidence-gated fallback.

Architecture:

Tool Relevance Ranking — Before calling FunctionGemma, tools are scored by

 keyword overlap with the user query. For single-intent requests, only the  
 top-5 most relevant tools are passed to the model. This reduces context  
 noise without hiding information, improving accuracy for the compact 270M  
 parameter model.

Intent-Scaled Token Budget — The system estimates the number of distinct

 function calls requested (via action verb + conjunction counting). Token  
 budget scales with intent count (128 × n_intents), balancing speed for  
 simple requests and capacity for complex multi-tool queries.

FunctionGemma-First Inference — FunctionGemma runs on every request via

 Cactus Compute. The model is cached across calls (loaded once per process),  
 eliminating repeated model-load overhead. All routing decisions start  
 on-device; cloud is never the default path.

Argument Verification Pipeline — After inference, argument values are

 cross-checked against direct text extraction for deterministically parseable  
 fields (timer durations, clock times, reminder titles). The model decides  
 WHICH tool to call; this step verifies that argument VALUES match what the  
 user stated — catching hallucinated-value failures common in small models.

Confidence-Gated Cloud Routing — Cactus’s built-in confidence score drives

 the escalation decision. When confidence ≥ 0.7 or the expected number of  
 tool calls was found, the on-device result is returned immediately. Only  
 genuinely uncertain cases (low confidence + incomplete output) escalate to  
 Gemini 2.0 Flash.

Technologies: Cactus Compute SDK, FunctionGemma-270M-IT, Google Gemini 2.0
Flash API, Python standard library (re, json).

Local benchmark: 88.6% score, 100% on-device ratio, 289ms average latency.

Team

Quilee Simeon

Products & Tools

AI Tinkerers Cactus Compute Google DeepMind

Additional Links

https://www.linkedin.com/in/quilee-simeon-7843a3178/

Summarizing URL...