Ultra-Low Inference Latency LLaMA | San Francisco .

Members-Only

Recent Talks & Demos are for members only

Exclusive feed

You must be an AI Tinkerers active member to view these talks and demos.

July 06, 2023 · San Francisco

LLaMA: 10x Faster Inference on TPU

How to achieve 6‑10× faster inference for LLaMA‑65B using PyTorch/XLA on TPU, covering implementation details, performance results, and practical tips.

Overview
Links
Tech stack