Cactus Confidence Router - Google DeepMind x Cactus Compute Global Hackathon
AI Tinkerers - San Francisco
Hackathon Showcase

Cactus Confidence Router

One-man one-agent. Let's go.

1 member

An intelligent edge-cloud hybrid routing system for on-device function calling,
using FunctionGemma-270M (via Cactus Compute) as the primary inference engine
and Gemini 2.0 Flash as a confidence-gated fallback.

Architecture:

  1. Tool Relevance Ranking — Before calling FunctionGemma, tools are scored by
 keyword overlap with the user query. For single-intent requests, only the  
 top-5 most relevant tools are passed to the model. This reduces context  
 noise without hiding information, improving accuracy for the compact 270M  
 parameter model.  
  1. Intent-Scaled Token Budget — The system estimates the number of distinct
 function calls requested (via action verb + conjunction counting). Token  
 budget scales with intent count (128 × n_intents), balancing speed for  
 simple requests and capacity for complex multi-tool queries.  
  1. FunctionGemma-First Inference — FunctionGemma runs on every request via
 Cactus Compute. The model is cached across calls (loaded once per process),  
 eliminating repeated model-load overhead. All routing decisions start  
 on-device; cloud is never the default path.  
  1. Argument Verification Pipeline — After inference, argument values are
 cross-checked against direct text extraction for deterministically parseable  
 fields (timer durations, clock times, reminder titles). The model decides  
 WHICH tool to call; this step verifies that argument VALUES match what the  
 user stated — catching hallucinated-value failures common in small models.  
  1. Confidence-Gated Cloud Routing — Cactus’s built-in confidence score drives
 the escalation decision. When confidence ≥ 0.7 or the expected number of  
 tool calls was found, the on-device result is returned immediately. Only  
 genuinely uncertain cases (low confidence + incomplete output) escalate to  
 Gemini 2.0 Flash.  

Technologies: Cactus Compute SDK, FunctionGemma-270M-IT, Google Gemini 2.0
Flash API, Python standard library (re, json).

Local benchmark: 88.6% score, 100% on-device ratio, 289ms average latency.

AI Tinkerers Cactus Compute Google DeepMind