Bonsai
launchMarch 31, 2026
powered byBonsai 8B, Bonsai 4B, Bonsai 1.7B
goblin vibe check:
runs on actual phones without melting them which matters if you're targeting mobile or need something that works offline
ultra-compressed 1-bit language model family built for capable local inference on memory-constrained phones and edge hardware.
speed
44
tok/s
True 1-bit end-to-end weightsBonsai 8B fits in roughly 1.15GB RAM8x faster and 4–5x lower energy use on edge hardwareBuilt for local agents and code completion without cloud dependence
key features
True 1-bit end-to-end weightsBonsai 8B fits in roughly 1.15GB RAM8x faster and 4–5x lower energy use on edge hardwareBuilt for local agents and code completion without cloud dependence
spec & usage
Apache 2.0 family spanning 8B, 4B, and 1.7B variants for phones, wearables, and industrial sensors
Designed around custom 1-bit kernels for Apple Silicon and NVIDIA hardware
Strong fit for conversational agents, local copilots, and robotics control loops
limitations
Needs a custom llama.cpp fork or MLX path to hit full 1-bit speedups
scope:
codelanguageagentresearchlocalopen-sourcefreefastlightweight
launchMarch 31, 2026