Playground
Local LLM in your browser
Pick a model, hit load, and chat. Everything runs on your GPU via WebGPU. No API key. No server. Nothing leaves your tab.
Load Phi-3.5 Mini
Model weights (~2.2 GB) download once and are cached in your browser.
Requires a WebGPU-capable browser (Chrome / Edge on desktop)
How it works
Model weights are fetched from Hugging Face, stored in your browser's Cache API, and executed via WebLLM — a WebGPU-native inference engine that runs the same MLC-compiled models as the desktop mlc-llm runtime.
Chrome Prompt API
If you're on Chrome 126+ with the Gemini Nano flag enabled, the Chrome Prompt API option appears automatically. It uses the model already on your device — zero download, near-instant start.
Privacy
Your messages never leave the browser tab. Once the weights are cached (typically 1–4 GB depending on model), the playground works completely offline. No telemetry. No account required.