Skip to content

Feature Request: Support for Bonsai-8B 1-bit Quantized Model #474

@kamjin3086

Description

@kamjin3086

Feature Request: Support for Bonsai-8B 1-bit Quantized Model

Hello,

I'd like to request support for Prism-ML's Bonsai-8B, a groundbreaking 1-bit quantized model available on Hugging Face:

Model URL: https://huggingface.co/prism-ml/Bonsai-8B-gguf

Why This Matters

Bonsai-8B represents a significant advancement in efficient LLM deployment:

  • Compact Size: An 8B parameter model compressed to just ~1.15 GB
  • Impressive Performance: Achieves an MMLU-R of 65.7, competitive with larger models
  • Energy Efficient: Reported to be 5x more energy efficient than comparable 8B models
  • Dramatically Smaller: 14x smaller footprint than typical 8B models

NPU Potential

The 1-bit quantization approach makes Bonsai-8B an ideal candidate for NPU acceleration. The dramatically reduced memory footprint and simplified computations could enable:

  • Smooth inference on edge devices with NPUs
  • Significantly reduced power consumption
  • Real-time AI capabilities without cloud dependency
  • Privacy-preserving local inference

Request

I'd love to see FastFlowLM add support for this model. Given its efficiency characteristics, it seems like a perfect fit for demonstrating NPU performance advantages with quantized models.

Thank you for considering this request!

Best regards

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions