FAT BOSS (REALLY FAT)
Team consisting of Stanford, UCSC, and SUNY CS alumni, including founding engineers and a security CEO, skilled in PyTorch, Python, Go, and cloud-native microservices.
YouTube Video
Project Description
The FatBoss architecture is built as a Hybrid Inference Engine that redefines the edge-cloud frontier by prioritizing local-first reasoning while using the cloud as a high-capacity fallback.
- On-Device Execution & Intelligent Routing
The core intelligence (found in
main.py
) utilizes a Difficulty-Aware Routing strategy. It doesn’t blindly send queries to the cloud; instead, it performs local “pre-reasoning” to determine the most efficient path:
Signal Detection: The engine scans for “multi-intent signals” (conjunctions like “and”, “then”, “also”). If a query asks for multiple tools at once, it proactively routes to the cloud to ensure complex logic is handled correctly.
Confidence Thresholds: Every local inference via the Cactus SDK returns a confidence score. If the on-device model is unsure (default < 0.6), the system triggers Dynamic Escalation to Gemini Flash.
Validation Guardrails: Before a local result is accepted, it must pass a “Parameter Integrity Check.” If the local model hallucinates a tool name or misses a required field, the system falls back to the cloud in real-time.
- Local-First Reasoning: Speed & Privacy
By utilizing FunctionGemma-270m-it, FatBoss achieves “Agentic Utility” through a local-first approach:
Low Latency: Simple tasks (setting timers, checking status, updating OKRs) are processed locally, aiming for a 500ms baseline—significantly faster than cloud round-trips.
Privacy by Default: Sensitive user intents are processed and reasoned about on-device. Only anonymized or high-complexity data fragments are escalated to the cloud, maintaining a strong privacy perimeter.
Singleton Cache Management: To ensure speed, the system keeps the model warm in memory and uses cactus_reset() between calls. This avoids the overhead of reloading weights for every “thought” the agent has.
- Redefining the Frontier: Gemma + Cactus
The architecture fundamentally relies on the synergy between FunctionGemma and Cactus Compute:
Cactus Compute SDK: Serves as the high-speed execution layer for on-device inference, providing the critical “Cloud Handoff” markers that enable the hybrid flow.
Edge-Native Agentic Utility: The system uses Tool RAG (Top-K) at the edge. When the agent is presented with many tools, it dynamically filters the list so that the 270m model can focus its reasoning window on the most relevant capabilities.
- Technical Stack
Intelligence Layer:
Local: Google FunctionGemma-270m-it via Cactus SDK.
Cloud: Gemini 2.0 Flash via google-genai.
Routing: Custom Python-based Hybrid Engine with intent detection.
Frontend Ecosystem:
Web Portal: React 19, Vite 7, TailwindCSS 4, Framer Motion (for the Series D modern aesthetic).
Manager Widget: Tauri 2 (Rust + React desktop app) providing a native floating overlay.
Mobile App: Expo 54 & React Native 0.81 for cross-platform agentic interaction.
Sync & Infrastructure:
Flask & Flask-CORS: Lightweight service discovery and data persistence layer (local_sync_server).
Tauri/reqwest: Low-level Networking for the desktop agent.
This stack enables FatBoss to act as a truly autonomous agent that lives on your device, knows its own limits, and only reaches for the cloud when the complexity of the “mission” requires it.