Energy optimized cactus router
Team consisting of an Apple Sr. SDET and an HPE Sr. Cloud Manager skilled in Python, MCP, agentic frameworks, and CI/CD pipelines.
Project Description
Cactus Hackathon Submission
Team: QA Agent
Location: SF
Summary
This submission implements an energy-aware hybrid agent in main.py and a voice-to-action product demo in voice_demo.py.
The design goal is to maximize practical utility under real constraints:
- Prefer low-energy on-device execution when intent is clear.
- Fall back to cloud only when estimated cloud device-side energy is lower or local confidence is insufficient.
- Preserve function-call correctness and user experience for real-world assistant tasks.
Rubric 1: Hybrid Routing Algorithm (Depth + Cleverness)
What was implemented
- Deterministic intent router (on-device first)
- Added a rule-based parser for high-frequency assistant intents:
set_alarmset_timercreate_remindersend_messagesearch_contactsget_weatherplay_music
- Handles multi-intent commands by clause splitting and preserves action order.
- Handles pronoun carry-over (
find Tom ... send him a message ...) via contact memory.
- Energy-aware policy layer
- Added local vs cloud energy estimators and an explicit decision policy:
- estimate local device energy from text complexity + tool-call count
- estimate cloud device-side energy from network/interaction overhead
- New metadata emitted with routing decisions:
estimated_local_energy_mjestimated_cloud_energy_mjdecision
- Robust cloud fallback
- Keeps cloud model fallback chain with default
gemini-2.5-flash-lite. - Added input hardening for cloud calls so empty/None user content does not crash.
Why this is strong for Rubric 1
- Uses structured decision layers (deterministic parser + policy + fallback), not only confidence-threshold switching.
- Explicitly optimizes a real systems criterion (energy), not just static local/cloud ratio.
- Maintains compatibility with benchmark interface while improving practical routing behavior.
Rubric 2: End-to-End Product That Executes Function Calls
Product implemented
A complete voice/text assistant workflow:
- Input (scenario, text, or audio)
- Optional ASR transcription via
cactus_transcribe - Hybrid routing via
generate_hybrid - Tool-call execution simulation
- Structured result + latency metrics logging
Files
-
voice_demo.py— full demo pipeline -
demo_runbook.md— timed live demo script + contingency plan -
runs/voice_demo_metrics.jsonl— persistent execution logs
Demo scenarios included
morning_plannerhandsfree_commutequick_focus
These scenarios demonstrate single-intent and multi-intent action chains with practical assistant tasks.
Why this is strong for Rubric 2
- Not just tool prediction: executes a full product loop from input to action outputs.
- Includes operational logging and reproducible commands for judging.
- Includes fallback and error-handling paths for demo resilience.
Rubric 3: Low-Latency Voice-to-Action
Voice-to-action pipeline
- Audio input (
.wav) ->cactus_transcribe - Transcript -> hybrid router
- Tool execution + immediate confirmation
- Per-stage latency capture (
stt,routing,execution,total)
Latency instrumentation
voice_demo.py logs stage-wise latency for every run into JSONL, enabling objective evaluation.
Real recording support
- Added helper script for one-command mic capture + run:
scripts/record_and_transcribe_demo.sh
- Added conversion + test workflow for desktop recordings.
Why this is strong for Rubric 3
- Demonstrates low overhead routing path after transcription.
- Measures and reports latency as first-class metric.
- Supports real recording demos, not only synthetic prompts.
Reproducibility
Benchmark
/Users/prateekkapoor/Documents/cactus/.venv/bin/python benchmark.py
Voice demo (scenario)
/Users/prateekkapoor/Documents/cactus/.venv/bin/python voice_demo.py --scenario morning_planner
Voice demo (audio)
/Users/prateekkapoor/Documents/cactus/.venv/bin/python voice_demo.py --audio runs/desktop_wav/canonical/morning_planner.wav
Metrics log tail
tail -n 5 runs/voice_demo_metrics.jsonl
Design Notes
- This submission intentionally balances correctness, latency, and energy rather than optimizing only one metric.
- Routing remains transparent and debuggable, with explicit policy outputs for each decision.
- The architecture is modular enough to replace heuristic components with learned predictors later without changing external interfaces.
Solution Diagrams
This document provides two views of the implemented solution:
Architecture diagram (component-level design)
Sequence diagram (runtime control flow)
1) Architecture Diagram
flowchart TB
U[User / Voice Input] –> VD[voice_demo.py]
VD -->|audio .wav| STT[cactus_transcribe\nWhisper-small]
VD -->|scenario/text| HYB[generate_hybrid\nmain.py]
STT --> HYB
subgraph Router[Hybrid Router Layer - main.py]
HYB --> TXT[_extract_user_text]
TXT --> RULES[_rule_based_calls\nIntent parser]
RULES --> ENERGY[Energy Policy\n_estimate_local_energy_mj\n_estimate_cloud_energy_mj]
ENERGY -->|local <= cloud| DETRET[Deterministic on-device return]
ENERGY -->|cloud < local| CLOUD[generate_cloud]
RULES -->|no calls| LOCAL[generate_cactus]
LOCAL --> CONF{confidence >= threshold?}
CONF -->|yes| LOCRET[On-device model return]
CONF -->|no| CLOUD
end
CLOUD --> GM[Gemini API\nmodel candidate chain]
GM --> CLOUDRET[Cloud return]
DETRET --> EXEC[Tool Execution Layer\n(mock actions)]
LOCRET --> EXEC
CLOUDRET --> EXEC
EXEC --> MET[Metrics Logger\nruns/voice_demo_metrics.jsonl]
EXEC --> OUT[Structured Result JSON\ntranscript, calls, latency, source] 2) Sequence Diagram sequenceDiagram
autonumber
participant User
participant Demo as voice_demo.py
participant STT as cactus_transcribe
participant Router as generate_hybrid
participant Rule as _rule_based_calls
participant Energy as Energy Estimator
participant Local as generate_cactus
participant Cloud as generate_cloud
participant Gemini as Gemini API
participant Tools as Tool Executor
participant Metrics as JSONL Logger
User->>Demo: Provide input (audio OR text/scenario)
alt Audio input
Demo->>STT: Transcribe WAV
STT-->>Demo: Transcript + STT timing
else Text/scenario input
Demo-->>Demo: Use direct text transcript
end
Demo->>Router: generate_hybrid(messages, tools)
Router->>Rule: Parse deterministic intents
alt Deterministic calls found
Router->>Energy: Estimate local vs cloud energy
alt Cloud energy lower and ENERGY_OPTIMIZE on
Router->>Cloud: generate_cloud(messages, tools)
Cloud->>Gemini: Try model candidates in order
Gemini-->>Cloud: Function calls
Cloud-->>Router: Cloud result
else Local energy preferred
Router-->>Demo: On-device deterministic result
end
else No deterministic calls
Router->>Local: generate_cactus(messages, tools)
Local-->>Router: function_calls, confidence, latency
alt Confidence >= threshold
Router-->>Demo: On-device local model result
else Confidence < threshold
Router->>Cloud: generate_cloud(messages, tools)
Cloud->>Gemini: Try model candidates in order
Gemini-->>Cloud: Function calls
Cloud-->>Router: Cloud fallback result
end
end
Demo->>Tools: Execute returned tool calls
Tools-->>Demo: Execution confirmations
Demo->>Metrics: Append timing/source/success record
Demo-->>User: Final structured output Notes The deterministic parser is the first, fast path for common intents. Energy policy is applied only when deterministic calls are available. Local model confidence gate is used when deterministic parsing does not produce calls. Cloud path uses model fallback order from GEMINI_MODEL_CANDIDATES.