Hackathon Showcase

Energy optimized cactus router

Team consisting of an Apple Sr. SDET and an HPE Sr. Cloud Manager skilled in Python, MCP, agentic frameworks, and CI/CD pipelines.

2 members

Project Description

Cactus Hackathon Submission

Team: QA Agent
Location: SF

Summary

This submission implements an energy-aware hybrid agent in main.py and a voice-to-action product demo in voice_demo.py.

The design goal is to maximize practical utility under real constraints:

Prefer low-energy on-device execution when intent is clear.
Fall back to cloud only when estimated cloud device-side energy is lower or local confidence is insufficient.
Preserve function-call correctness and user experience for real-world assistant tasks.

Rubric 1: Hybrid Routing Algorithm (Depth + Cleverness)

What was implemented

Deterministic intent router (on-device first)

Added a rule-based parser for high-frequency assistant intents:
- set_alarm
- set_timer
- create_reminder
- send_message
- search_contacts
- get_weather
- play_music
Handles multi-intent commands by clause splitting and preserves action order.
Handles pronoun carry-over (find Tom ... send him a message ...) via contact memory.

Energy-aware policy layer

Added local vs cloud energy estimators and an explicit decision policy:
- estimate local device energy from text complexity + tool-call count
- estimate cloud device-side energy from network/interaction overhead
New metadata emitted with routing decisions:
- estimated_local_energy_mj
- estimated_cloud_energy_mj
- decision

Robust cloud fallback

Keeps cloud model fallback chain with default gemini-2.5-flash-lite.
Added input hardening for cloud calls so empty/None user content does not crash.

Why this is strong for Rubric 1

Uses structured decision layers (deterministic parser + policy + fallback), not only confidence-threshold switching.
Explicitly optimizes a real systems criterion (energy), not just static local/cloud ratio.
Maintains compatibility with benchmark interface while improving practical routing behavior.

Rubric 2: End-to-End Product That Executes Function Calls

Product implemented

A complete voice/text assistant workflow:

Input (scenario, text, or audio)
Optional ASR transcription via cactus_transcribe
Hybrid routing via generate_hybrid
Tool-call execution simulation
Structured result + latency metrics logging

Files

voice_demo.py — full demo pipeline
demo_runbook.md — timed live demo script + contingency plan
runs/voice_demo_metrics.jsonl — persistent execution logs

Demo scenarios included

morning_planner
handsfree_commute
quick_focus

These scenarios demonstrate single-intent and multi-intent action chains with practical assistant tasks.

Why this is strong for Rubric 2

Not just tool prediction: executes a full product loop from input to action outputs.
Includes operational logging and reproducible commands for judging.
Includes fallback and error-handling paths for demo resilience.

Rubric 3: Low-Latency Voice-to-Action

Voice-to-action pipeline

Audio input (.wav) -> cactus_transcribe
Transcript -> hybrid router
Tool execution + immediate confirmation
Per-stage latency capture (stt, routing, execution, total)

Latency instrumentation

voice_demo.py logs stage-wise latency for every run into JSONL, enabling objective evaluation.

Real recording support

Added helper script for one-command mic capture + run:
- scripts/record_and_transcribe_demo.sh
Added conversion + test workflow for desktop recordings.

Why this is strong for Rubric 3

Demonstrates low overhead routing path after transcription.
Measures and reports latency as first-class metric.
Supports real recording demos, not only synthetic prompts.

Reproducibility

Benchmark

/Users/prateekkapoor/Documents/cactus/.venv/bin/python benchmark.py

Voice demo (scenario)

/Users/prateekkapoor/Documents/cactus/.venv/bin/python voice_demo.py --scenario morning_planner

Voice demo (audio)

/Users/prateekkapoor/Documents/cactus/.venv/bin/python voice_demo.py --audio runs/desktop_wav/canonical/morning_planner.wav

Metrics log tail

tail -n 5 runs/voice_demo_metrics.jsonl

Design Notes

This submission intentionally balances correctness, latency, and energy rather than optimizing only one metric.
Routing remains transparent and debuggable, with explicit policy outputs for each decision.
The architecture is modular enough to replace heuristic components with learned predictors later without changing external interfaces.

Solution Diagrams
This document provides two views of the implemented solution:

Architecture diagram (component-level design)
Sequence diagram (runtime control flow)
1) Architecture Diagram
flowchart TB
U[User / Voice Input] –> VD[voice_demo.py]

VD -->|audio .wav| STT[cactus_transcribe\nWhisper-small]  
VD -->|scenario/text| HYB[generate_hybrid\nmain.py]  
STT --> HYB  
  
subgraph Router[Hybrid Router Layer - main.py]  
    HYB --> TXT[_extract_user_text]  
    TXT --> RULES[_rule_based_calls\nIntent parser]  
    RULES --> ENERGY[Energy Policy\n_estimate_local_energy_mj\n_estimate_cloud_energy_mj]  
  
    ENERGY -->|local <= cloud| DETRET[Deterministic on-device return]  
    ENERGY -->|cloud < local| CLOUD[generate_cloud]  
  
    RULES -->|no calls| LOCAL[generate_cactus]  
    LOCAL --> CONF{confidence >= threshold?}  
    CONF -->|yes| LOCRET[On-device model return]  
    CONF -->|no| CLOUD  
end  
  
CLOUD --> GM[Gemini API\nmodel candidate chain]  
GM --> CLOUDRET[Cloud return]  
  
DETRET --> EXEC[Tool Execution Layer\n(mock actions)]  
LOCRET --> EXEC  
CLOUDRET --> EXEC  
  
EXEC --> MET[Metrics Logger\nruns/voice_demo_metrics.jsonl]  
EXEC --> OUT[Structured Result JSON\ntranscript, calls, latency, source]   2) Sequence Diagram   sequenceDiagram  
autonumber  
participant User  
participant Demo as voice_demo.py  
participant STT as cactus_transcribe  
participant Router as generate_hybrid  
participant Rule as _rule_based_calls  
participant Energy as Energy Estimator  
participant Local as generate_cactus  
participant Cloud as generate_cloud  
participant Gemini as Gemini API  
participant Tools as Tool Executor  
participant Metrics as JSONL Logger  
  
User->>Demo: Provide input (audio OR text/scenario)  
  
alt Audio input  
    Demo->>STT: Transcribe WAV  
    STT-->>Demo: Transcript + STT timing  
else Text/scenario input  
    Demo-->>Demo: Use direct text transcript  
end  
  
Demo->>Router: generate_hybrid(messages, tools)  
Router->>Rule: Parse deterministic intents  
  
alt Deterministic calls found  
    Router->>Energy: Estimate local vs cloud energy  
    alt Cloud energy lower and ENERGY_OPTIMIZE on  
        Router->>Cloud: generate_cloud(messages, tools)  
        Cloud->>Gemini: Try model candidates in order  
        Gemini-->>Cloud: Function calls  
        Cloud-->>Router: Cloud result  
    else Local energy preferred  
        Router-->>Demo: On-device deterministic result  
    end  
else No deterministic calls  
    Router->>Local: generate_cactus(messages, tools)  
    Local-->>Router: function_calls, confidence, latency  
    alt Confidence >= threshold  
        Router-->>Demo: On-device local model result  
    else Confidence < threshold  
        Router->>Cloud: generate_cloud(messages, tools)  
        Cloud->>Gemini: Try model candidates in order  
        Gemini-->>Cloud: Function calls  
        Cloud-->>Router: Cloud fallback result  
    end  
end  
  
Demo->>Tools: Execute returned tool calls  
Tools-->>Demo: Execution confirmations  
Demo->>Metrics: Append timing/source/success record  
Demo-->>User: Final structured output   Notes   The deterministic parser is the first, fast path for common intents.   Energy policy is applied only when deterministic calls are available.   Local model confidence gate is used when deterministic parsing does not produce calls.   Cloud path uses model fallback order from GEMINI_MODEL_CANDIDATES.

Team

Nipun Gupta

Prateek Kapoor

Products & Tools

Cactus Compute