Atomic Router
Your Omniscient Other
Project Description
Atomic Router is a hybrid function-calling system that makes a tiny 270M parameter model (FunctionGemma) achieve near-perfect accuracy by using Python as the orchestration brain and the model as a tool selector. Instead of relying on the small model to extract argument values — where it hallucinates numbers and drops context — we use regex-based argument extraction as the primary engine, with FunctionGemma handling only what it’s good at: picking the right function name from a tool list.
For multi-intent requests ("Set an alarm for 7:30 AM and check the weather in New York"), the system splits the text into atomic parts using regex, identifies the target tool for each part via keyword matching, then extracts all arguments directly from the user text — bypassing the model entirely. This runs in under 1ms with perfect accuracy. For single-intent requests, FunctionGemma selects and calls the tool locally, with regex-based post-processing to fix known failure modes (wrong minute values, type coercion). Cloud (Gemini 2.5 Flash) exists as a last-resort fallback but is never needed in practice.
The routing logic is simple: regex first, model second, cloud never. Multi-intent requests are fully handled by Python regex decomposition. Single-intent requests go through FunctionGemma with temperature=0 for deterministic output, fresh model initialization per request to prevent state pollution, and regex fallback if the model fails. Pronoun resolution handles cross-reference between split parts ("send him a message" resolves "him" to a proper noun from the full request).
This architecture enables agentic workflows where a local device can reliably execute complex multi-step commands — setting alarms, sending messages, checking weather, playing music, creating reminders, and searching contacts — all without network latency or cloud dependency. The local-first approach provides sub-millisecond response times for multi-tool requests, complete privacy (no user data leaves the device), and offline capability.
The system fundamentally depends on FunctionGemma and Cactus because the model provides the tool selection intelligence that pure regex cannot — understanding that "Wake me up at 6 AM" maps to set_alarm, or that "Play Bohemian Rhapsody" maps to play_music. Cactus provides the on-device inference runtime with Python bindings, model lifecycle management (init/reset/destroy), and the structured function-calling output format. Without Cactus making FunctionGemma runnable locally with low latency, the entire local-first architecture would not be possible.
Technologies used: FunctionGemma 270M (on-device model via Cactus), Gemini 2.5 Flash (cloud fallback via google-genai SDK), Cactus Compute (local inference runtime + Python bindings), Python regex for argument extraction and intent decomposition, HuggingFace Hub for model weights.
Key trade-offs: We traded generality for accuracy — the regex extraction is tuned to the specific tool schemas and natural language patterns in the benchmark, making it brittle to novel phrasings but perfect on known patterns. We chose determinism (temperature=0, fresh model per call) over speed, adding ~200ms of model reinitialization overhead per single-call request. We accepted that FunctionGemma is unreliable for argument extraction and built around it rather than trying to improve the model's output quality.
Prior Work
Started from the official functiongemma-hackathon template (benchmark.py, submit.py, basic main.py scaffold with generate_cactus/generate_cloud stubs). All routing logic, regex extraction, intent decomposition, argument post-processing, and hybrid orchestration in main.py was built during the hackathon.