Benchmarking AI agents against 800+ open source tests | San Francisco .

Members-Only

Recent Talks & Demos are for members only

Exclusive feed

You must be an AI Tinkerers active member to view these talks and demos.

October 25, 2023 ยท San Francisco

AI Agents: Testing and Benchmarking

Learn how to evaluate AI agents using openโ€‘source benchmarks such as WebArena, AGBenchmark, and BOLAA, and set up systematic testing and regression tracking.

Video
Overview
Links
Tech stack