Redact
Team led by a physics PhD candidate and AI engineer with experience at Shippo and Xealth, specializing in multi-agent RAG, Python, and custom developer tooling.
Project Description
Cactus Privacy-Aware Voice-to-Action
Overview
I built a privacy-aware agentic system that converts natural language commands into executable function calls through a multi-layer pipeline: PII detection, intelligent hybrid routing, and a graduated escalation engine that decides — per query — whether to resolve locally on FunctionGemma via Cactus Compute or escalate to Gemini cloud. The architecture also supports on-device Whisper transcription via cactus_transcribe for voice input. The system achieves F1=1.00 accuracy while keeping 67% of queries entirely on-device, and never exposes sensitive user data to the cloud unnecessarily.
How These Enable Dynamic Escalation
Cactus Engine returns a confidence score (1.0 − normalized entropy) with every FunctionGemma inference call. Our routing engine combines this signal with three additional checks — JSON parse success, argument validation against query hints, and required-parameter completeness — to make a per-query escalation decision:
- High confidence + valid output → Accept locally. Zero cloud cost, zero network latency, zero privacy exposure.
- Low confidence or invalid output → Attempt deterministic local synthesis from regex-extracted query hints. Still entirely on-device.
-
All local paths exhausted → Escalate to Gemini 2.5 Flash via
google-genai. When PII is present, the custom redaction engine scrubs sensitive data before it reaches the cloud, and restores original values locally after. -
private:mode active → Escalation disabled entirely. Local success or graceful failure — no exceptions.