What exactly is a 1-bit LLM and how does BitNet work?
Quick Answer
BitNet uses ternary weights {-1, 0, +1} instead of 16/32-bit floats, reducing memory to ~400MB and replacing multiplications with additions for faster, efficient inference.
Detailed Answer
A 1-bit LLM uses ternary weights {-1, 0, +1} instead of traditional 16-bit or 32-bit floating point numbers. BitNet b1.58 achieves ~1.58 bits per parameter (log₂(3) ≈ 1.58). This dramatically reduces memory usage and replaces expensive floating-point multiplications with simple integer additions, making inference much faster and more energy-efficient. Microsoft's BitNet 2B model uses only 400MB of memory while outperforming larger models.


Comments
Loading comments...