EdgeSense - Google DeepMind x Cactus Compute Global Hackathon
AI Tinkerers - San Francisco
Hackathon Showcase

EdgeSense

Our local-first assistant uses hybrid edge-cloud routing for private, real-time multimodal perception and guidance using agentic workflows.

1 member

EdgeSense — Local-First Agentic Mobility Assistant
EdgeSense is a real-time mobility safety assistant designed for visually impaired users, built entirely around Cactus Compute and FunctionGemma as its foundational edge inference layer. The system captures live camera frames from a user’s device and processes every single frame through Cactus on-device first — ensuring sub-second hazard detection with zero network dependency, zero data leaving the device, and zero cloud latency for safety-critical scenarios. Cloud escalation to Gemini 2.5 Flash is a selective, intelligent fallback — not a default.
On-Device Execution via Cactus Compute:
EdgeSense runs two Cactus models simultaneously on Apple Silicon: LFM2-VL-450M (a Vision-Language Model) for real-time scene understanding from camera frames, and openai/whisper-small for on-device voice transcription — enabling a fully local voice-to-action pipeline where users speak naturally and receive spoken guidance without any data ever leaving their Mac. The Cactus Python SDK (cactus_init, cactus_complete, cactus_transcribe, cactus_destroy) powers all local inference through a Flask microservice, handling image analysis, structured JSON response parsing, and speech-to-text conversion entirely on-device.
Intelligent Hybrid Routing Engine (6-Rule Multi-Signal Strategy):
The routing engine — adapted from our hackathon benchmark work (64.5% F1 on the FunctionGemma eval) — uses six prioritized signals to decide edge vs. cloud for every frame:
Hazard mode → always local — safety-critical detections cannot tolerate network latency
Offline mode → local only — full functionality without internet
Cloud disabled → local only — user privacy control
Local result validity check — inspects Cactus output for hallucinations, garbage tokens, low confidence, and the cloud_handoff flag; invalid results escalate to cloud
High confidence (≥0.75) → local — when Cactus is confident, trust it
Rate limiting — prevents cloud spam by enforcing a 5-second cooldown between Gemini calls
This is not a naive confidence threshold — it’s a multi-signal validation system that checks output structure (non-empty, reasonable length, contains semantic content), confidence scores, and Cactus’s own cloud_handoff signal, then combines these with session state and mode context to make per-frame routing decisions. The system fundamentally prioritizes local-first: cloud is only invoked when local inference demonstrably fails quality checks.
Voice-to-Action (Rubric 3):
Users tap a microphone button, speak naturally, and EdgeSense processes the entire pipeline locally: audio capture → WebM/Opus to 16kHz mono WAV conversion (via pydub + ffmpeg) → on-device Whisper transcription via cactus_transcribe → Q&A routing through the same hybrid engine → spoken response via Web Speech API TTS. The voice pipeline achieves sub-2-second end-to-end latency for local responses.
Technologies, Frameworks, and Libraries:
Cactus Compute SDK (Python) — cactus_init, cactus_complete, cactus_transcribe, cactus_destroy for all on-device inference
LFM2-VL-450M — Vision-Language Model running on Apple Silicon NPU via Cactus for real-time scene analysis
openai/whisper-small — On-device speech transcription via Cactus for voice-to-action
Google Gemini 2.5 Flash — Cloud fallback for complex scene descriptions when local inference quality is insufficient (REST API, no SDK)
Node.js + Express + TypeScript — Backend server with WebSocket (real-time streaming) and HTTP fallback
React + Vite + TypeScript — Frontend with Canvas API for camera capture, Web Speech API for TTS
Flask (Python) — Microservice wrapping Cactus SDK, handling image preprocessing (Pillow 224×224 resize) and audio conversion
Tailwind CSS + shadcn/ui — Modern responsive UI optimized for accessibility
ngrok — Secure tunnel exposing the local Mac-based Cactus service to the cloud-deployed app, enabling the edge-cloud bridge where Cactus runs on Apple Silicon locally while the Node.js app is deployed on Replit
pydub + ffmpeg — Audio format conversion (WebM/Opus → 16kHz mono WAV) for Whisper compatibility
Pillow — Image preprocessing to ensure VLM-compatible input dimensions
Zod — Runtime type validation for all message schemas between frontend and backend
Architecture Summary:
Camera Frame → Canvas API (224×224) → WebSocket → Cactus VLM (on-device, ~500ms) → 6-rule routing engine → [if needed] Gemini 2.5 Flash (cloud, ~1s) → TTS response. Every frame touches Cactus first. Cloud is earned, not assumed. EdgeSense redefines the edge-cloud frontier by proving that a local-first agentic mobility assistant can deliver real-time safety guidance with the intelligence of cloud models but the speed and privacy of on-device inference.

AI Tinkerers Cactus Compute Cactus Compute SDK (Python) — cactus_init Flask (Python) — Microservice wrapping Cactus SDK Frameworks Google DeepMind Google Gemini 2.5 Flash — Cloud fallback for complex scene descriptions when local inference quality is insufficient (REST API LFM2-VL-450M — Vision-Language Model running on Apple Silicon NPU via Cactus for real-time scene analysis Node.js + Express + TypeScript — Backend server with WebSocket (real-time streaming) and HTTP fallback Pillow — Image preprocessing to ensure VLM-compatible input dimensions React + Vite + TypeScript — Frontend with Canvas API for camera capture Tailwind CSS + shadcn/ui — Modern responsive UI optimized for accessibility Technologies Web Speech API for TTS Zod — Runtime type validation for all message schemas between frontend and backend and Libraries: cactus_complete cactus_destroy for all on-device inference cactus_transcribe enabling the edge-cloud bridge where Cactus runs on Apple Silicon locally while the Node.js app is deployed on Replit handling image preprocessing (Pillow 224×224 resize) and audio conversion ngrok — Secure tunnel exposing the local Mac-based Cactus service to the cloud-deployed app no SDK) openai/whisper-small — On-device speech transcription via Cactus for voice-to-action pydub + ffmpeg — Audio format conversion (WebM/Opus → 16kHz mono WAV) for Whisper compatibility

deployed app to try on your phone

Summarizing URL...

github repo

Summarizing URL...