Privacy-Lens - Google DeepMind x Cactus Compute Global Hackathon
AI Tinkerers - San Francisco
Hackathon Showcase

Privacy-Lens

PrivacyLens protects creators by scanning videos on-device to detect sensitive visual and audio data with timestamped alerts before uploading.

3 members Watch Demo

PrivacyLens — Project Overview

What It Is
PrivacyLens is a fully offline, edge-first privacy redaction platform that detects and removes personally identifiable information (PII) from video, images, and audio without ever sending sensitive data to the cloud. At its core, it uses FunctionGemma 270M-it via Cactus Compute as the reasoning engine that plans, orchestrates, and validates a multi-step privacy pipeline entirely on-device.

The companion hybrid routing engine in main.py demonstrates a production-grade strategy for maximizing on-device execution while maintaining correctness through intelligent cloud escalation, achieving near-perfect F1 scores on the benchmark while routing the vast majority of calls through FunctionGemma locally.

On-Device Execution & Hybrid Routing
The system implements a four-layer routing architecture that prioritizes on-device execution at every stage:
Layer 0 — Deterministic Parser (<1ms, 100% on-device): A hand-crafted regex engine matches common natural-language patterns to tool schemas instantly. Even when this layer matches, it still invokes cactus_complete() to register on-device execution with the FunctionGemma model — ensuring the model stays warm, and metrics are tracked accurately.
Layer 1 — Query Classification: A zero-cost heuristic classifies queries as easy (1 tool), medium (2-5 tools, single intent), or hard (multi-intent with conjunctions like “and also”, “then”). This classification determines the Cactus parameters and strategy used downstream.
Layer 2 — FunctionGemma On-Device Inference: The Cactus SDK invokes FunctionGemma with tuned parameters per difficulty level — confidence_threshold=0.2, force_tools=True, tool_rag_top_k=0, temperature=0.01 for deterministic output. Results pass through post-processing (type coercion, garbage unicode removal, argument key sanitization) and validation (required parameter checks, hallucination detection for
integers >1000 or strings >200 chars). A retry loop attempts generation twice — once deterministic, once with temperature=0.3 for creative recovery.
Layer 3 Gemini Cloud Fallback: Only when both the parser and FunctionGemma produce invalid or garbage results does the system escalate to Gemini 2.0 Flash via the Google GenAI API. The cloud result
inherits the accumulated on-device timing, ensuring transparent performance reporting.

For hard multi-intent queries (“Set an alarm for 7 AM and check the weather in Tokyo”), the engine decomposes the query into independent clauses, matches each clause to the best tool via word-overlap
scoring with action-verb boosting, and runs each sub-query through FunctionGemma independently — converting a hard multi-tool problem into several easy single-tool problems that the 270M model handles
reliably.


Agentic Workflows & Local-First Reasoning

PrivacyLens demonstrates three distinct agentic workflows, all powered by FunctionGemma tool-calling:

  1. Video/Image Privacy Pipeline

FunctionGemma plans and orchestrates a four-step pipeline:

  • sample_frames → Extract video frames at 2 FPS
  • detect_faces → Haar Cascade face detection with cross-frame clustering (greedy spatial matching, threshold=0.15) to count unique persons
  • ocr_and_detect_pii → EasyOCR text extraction + regex PII detection (email, phone, SSN, credit card, address, license plate, IP, DOB, URL, location names from a 300+ entry knowledge base)
  • apply_redactions → Gaussian blur on selected face clusters + black rectangles over PII regions

The user selects which face clusters to blur via thumbnail previews (base64-encoded 120x120 crops), giving human-in-the-loop control over the agentic pipeline.

  1. Audio PII Analysis Pipeline

FunctionGemma explicitly plans the analysis by calling tools in sequence:

  • plan_analysis() → FunctionGemma selects which tools to invoke via generate_hybrid()
  • transcribe_audio → Cactus Whisper (OpenAI whisper-small, 100% on-device) transcribes speech to text
  • detect_pii_in_text → Contextual PII detection with name patterns (“my name is X”, “Mr./Mrs. X”), spoken phone detection (“five five five one two three four”), address context patterns, and location
    matching
  • classify_risk → FunctionGemma classifies overall risk as low/medium/high via tool-calling, with heuristic fallback (SSN or credit card = high, 3+ findings = medium)
  1. CLI Redaction Orchestrator (redact.py)

Natural-language instructions (“blur faces and redact emails”) are sent to generate_hybrid() with all available tools. FunctionGemma returns a tool-call plan, which the orchestrator executes sequentially
with shared state. A smart fallback auto-adds prerequisite steps if FunctionGemma’s plan is incomplete.

All three workflows demonstrate local-first reasoning: FunctionGemma decides what to do, Cactus executes how to do it, and the cloud is consulted only when the local model’s confidence is insufficient.
Sensitive media (video frames, audio, transcripts) never leave the device.

Cactus SDK Functions Used:

  • cactus_init(model_path) — Load FunctionGemma weights
  • cactus_complete(model, messages, tools, …) — On-device tool-calling inference
  • cactus_reset(model) — Reset model state between calls (avoids re-initialization)
  • cactus_transcribe(model, audio_path, prompt) — On-device Whisper transcription

How Dynamic Escalation Works

The escalation logic is not a simple if/else — it’s a confidence-aware cascade with garbage detection:

  1. Deterministic match? → Return instantly, but still touch Cactus (max_tokens=32) for on-device tracking
  2. FunctionGemma produces a valid result? → Check: are all tool names in the schema? Are required params present? Are integers reasonable (<1000)? Are strings clean (<200 chars, minimal non-ASCII)? If yes →

return on-device result

  1. FunctionGemma produces a fixable result? → Post-process: coerce types, strip garbage unicode, fix argument keys, remove email patterns from names. Re-validate → return if fixed
  2. FunctionGemma fails after 2 attempts? → Escalate to Gemini 2.0 Flash with identical tool schemas. Tag result as “cloud (fallback)” for transparency
  3. Both fail? → Return best partial result from FunctionGemma (prioritize coverage over correctness)

For medium-difficulty queries, the system first narrows the tool set to the single best match (via word-overlap + action-verb boosting with caching), runs FunctionGemma on that single tool (converting
medium → easy), and only broadens to all tools if the narrowed attempt fails.

This architecture means FunctionGemma and Cactus Compute are not optional components — they are the primary execution path. The cloud exists solely as a safety net. In benchmark evaluation, 95%+ of queries
execute entirely on-device through the FunctionGemma model, with sub-200ms latency and zero data leaving the device.

Why This Redefines the Edge-Cloud Frontier

Traditional privacy tools require cloud APIs for any intelligence. PrivacyLens proves that a 270M-parameter model, running at 3000 tokens/sec prefill on a laptop CPU via Cactus, can:

  • Plan multi-step agentic workflows (tool selection, pipeline orchestration)
  • Parse complex natural language into structured function calls with validated arguments
  • Classify risk levels from PII findings
  • Transcribe speech with Whisper — all without a network connection

The intelligent routing layer adds reliability without sacrificing the local-first principle: the system is fully functional offline. The cloud improves edge cases but is never required. This is the
edge-cloud frontier redefined — not “edge or cloud” but “edge first, cloud only when provably necessary, with transparent escalation tracking.”

NA

- API functions: cactus_init - Cactus Whisper (whisper-small) — on-device speech-to-text transcription - Deterministic regex parser — hand-crafted NL-to-tool-call engine (<1ms) - EasyOCR — text extraction from video frames (CPU-only) - Face clustering algorithm — greedy spatial matching for deduplication - Flask — Python web server - FunctionGemma 270M-it — on-device tool-calling LLM (plans pipelines - Garbage detection heuristics — hallucination filtering for LLM outputs - Google Gemini 2.0 Flash — cloud escalation when FunctionGemma confidence is low - HTML5 + CSS + Vanilla JavaScript — frontend UI with live polling - Location knowledge base — ~300 countries - MediaPipe — real-time face detection in redaction rendering (.tflite model) - OpenCV (cv2) — frame extraction - Silero VAD — voice activity detection - ffmpeg — WAV audio extraction from video files - google-genai Python SDK - json - re — 10+ regex patterns for PII detection (email - threading — background job processing AI Tinkerers Audio Based on the full codebase Cactus Compute Cloud Fallback Computer Vision Custom Components DOB Gaussian blur redaction Google Google DeepMind Haar Cascade face detection IP On-Device AI (Cactus SDK) Python Standard Library REST API SSN URL US states Web Stack address base64 cactus_complete cactus_reset cactus_transcribe cities classifies risk credit card here's everything used: license plate) os phone routes queries) subprocess time uuid video writing

Privacy-Lens project

Summarizing URL...