Omar AGI - Generative UI Global Hackathon: Agentic Interfaces
AI Tinkerers - San Francisco
Hackathon Showcase

Omar AGI

Team led by Omar AGI founder Ben Bae, a systems architect specializing in AI reliability, LLM orchestration, Lua-based agent memory, and human-facing interaction design.

1 member Watch Demo
https://omar-benchmark-replay-production.up.railway.app/

Omar AGI is a reliability layer and live benchmark replay interface for LLM and agent systems.

The project turns model evaluation into an interactive generative UI: instead of showing a static benchmark claim, the interface lets users trigger live checks, inspect baseline vs Omar/RCC performance, and see when a reasoning layer improves, stays neutral, or fails across different benchmark families.

The core idea is routed validation. Omar does not assume one reasoning strategy works everywhere. It classifies tasks, applies a structured reasoning route when appropriate, and exposes the result through a dashboard-style UI with scores, uplift, route labels, and evidence boundaries.

The interface is designed for production AI teams who need to understand model behavior under repeated runs, incomplete information, and unstable agent outputs. It moves beyond normal chat by turning reasoning reliability into a visible workflow: run, compare, inspect, and decide.

The current implementation includes a public benchmark replay path, live-check style result display, and structured evidence output for LLM reliability validation. The stack uses OpenAI models, a custom Omar/RCC routing layer, and a web interface for benchmark replay and result inspection.

Omar/RCC and the benchmark replay backend existed before the hackathon as an ongoing reliability-layer project for LLM and agent systems.

For this hackathon submission, the work is framed and adapted as an interactive generative UI: a live reliability dashboard / benchmark replay interface where users can trigger checks, compare baseline vs routed reasoning performance, and inspect whether a reasoning layer improves, stays neutral, or fails across benchmark families.

The prior work includes the Omar/RCC routing concept, benchmark result structure, and deployed replay backend. The hackathon-specific contribution is the generative UI presentation, public submission packaging, and the interactive workflow around run → compare → inspect → decide.

OpenAI