FAT BOSS (REALLY FAT) - Google DeepMind x Cactus Compute Global Hackathon
AI Tinkerers - San Francisco
Hackathon Showcase Finalist

FAT BOSS (REALLY FAT)

Team consisting of Stanford, UCSC, and SUNY CS alumni, including founding engineers and a security CEO, skilled in PyTorch, Python, Go, and cloud-native microservices.

4 members Watch Demo

The FatBoss architecture is built as a Hybrid Inference Engine that redefines the edge-cloud frontier by prioritizing local-first reasoning while using the cloud as a high-capacity fallback.

  1. On-Device Execution & Intelligent Routing

The core intelligence (found in
main.py
) utilizes a Difficulty-Aware Routing strategy. It doesn’t blindly send queries to the cloud; instead, it performs local “pre-reasoning” to determine the most efficient path:

Signal Detection: The engine scans for “multi-intent signals” (conjunctions like “and”, “then”, “also”). If a query asks for multiple tools at once, it proactively routes to the cloud to ensure complex logic is handled correctly.
Confidence Thresholds: Every local inference via the Cactus SDK returns a confidence score. If the on-device model is unsure (default < 0.6), the system triggers Dynamic Escalation to Gemini Flash.
Validation Guardrails: Before a local result is accepted, it must pass a “Parameter Integrity Check.” If the local model hallucinates a tool name or misses a required field, the system falls back to the cloud in real-time.

  1. Local-First Reasoning: Speed & Privacy

By utilizing FunctionGemma-270m-it, FatBoss achieves “Agentic Utility” through a local-first approach:

Low Latency: Simple tasks (setting timers, checking status, updating OKRs) are processed locally, aiming for a 500ms baseline—significantly faster than cloud round-trips.
Privacy by Default: Sensitive user intents are processed and reasoned about on-device. Only anonymized or high-complexity data fragments are escalated to the cloud, maintaining a strong privacy perimeter.
Singleton Cache Management: To ensure speed, the system keeps the model warm in memory and uses cactus_reset() between calls. This avoids the overhead of reloading weights for every “thought” the agent has.

  1. Redefining the Frontier: Gemma + Cactus

The architecture fundamentally relies on the synergy between FunctionGemma and Cactus Compute:

Cactus Compute SDK: Serves as the high-speed execution layer for on-device inference, providing the critical “Cloud Handoff” markers that enable the hybrid flow.
Edge-Native Agentic Utility: The system uses Tool RAG (Top-K) at the edge. When the agent is presented with many tools, it dynamically filters the list so that the 270m model can focus its reasoning window on the most relevant capabilities.

  1. Technical Stack

Intelligence Layer:
Local: Google FunctionGemma-270m-it via Cactus SDK.
Cloud: Gemini 2.0 Flash via google-genai.
Routing: Custom Python-based Hybrid Engine with intent detection.
Frontend Ecosystem:
Web Portal: React 19, Vite 7, TailwindCSS 4, Framer Motion (for the Series D modern aesthetic).
Manager Widget: Tauri 2 (Rust + React desktop app) providing a native floating overlay.
Mobile App: Expo 54 & React Native 0.81 for cross-platform agentic interaction.
Sync & Infrastructure:
Flask & Flask-CORS: Lightweight service discovery and data persistence layer (local_sync_server).
Tauri/reqwest: Low-level Networking for the desktop agent.
This stack enables FatBoss to act as a truly autonomous agent that lives on your device, knows its own limits, and only reaches for the cloud when the complexity of the “mission” requires it.

Cactus Compute Google

GitHub

Summarizing URL...