Bonsai

launchMarch 31, 2026

powered byBonsai 8B, Bonsai 4B, Bonsai 1.7B

goblin vibe check:

runs on actual phones without melting them which matters if you're targeting mobile or need something that works offline

ultra-compressed 1-bit language model family built for capable local inference on memory-constrained phones and edge hardware.

speed

tok/s

True 1-bit end-to-end weightsBonsai 8B fits in roughly 1.15GB RAM8x faster and 4–5x lower energy use on edge hardwareBuilt for local agents and code completion without cloud dependence

key features

True 1-bit end-to-end weightsBonsai 8B fits in roughly 1.15GB RAM8x faster and 4–5x lower energy use on edge hardwareBuilt for local agents and code completion without cloud dependence

spec & usage

Apache 2.0 family spanning 8B, 4B, and 1.7B variants for phones, wearables, and industrial sensors

Designed around custom 1-bit kernels for Apple Silicon and NVIDIA hardware

Strong fit for conversational agents, local copilots, and robotics control loops

limitations

Needs a custom llama.cpp fork or MLX path to hit full 1-bit speedups

scope:

codelanguageagentresearchlocalopen-sourcefreefastlightweight

launchMarch 31, 2026