--- license: apache-2.0 base_model: LiquidAI/LFM2.5-1.2B-Instruct language: - en pipeline_tag: text-generation tags: - 4-bit - 8-bit - blackwell-optimized - dgx-spark - gguf - liquid-ai - quantized - sm121 --- --- ## πŸš€ v0.1.6: Real-time Metrics & Blackwell-Optimized Docker (Recommended) This model is fully compatible with the **[DGX-Spark-llama.cpp-Bench](https://github.com/sowilow/DGX-Spark-llama.cpp-Bench)**. Experience the state-of-the-art inference engine optimized for NVIDIA Blackwell (DGX Spark) hardware. ### 🌟 Key Features (v0.1.6) - **Real-time Performance Metrics**: Now visualizes `Input TPS` and `Output TPS` during streaming. - **Improved Reasoning UI**: Seamlessly renders and stabilizes the model's Chain-of-Thought (CoT). - **Blackwell Optimization**: Native support for ARM64/SM121 and CUDA 13.0 FP4. ### 🐳 Quick Start ```bash # Pull the latest optimized image docker pull ghcr.io/sowilow/dgx-spark-llama.cpp-bench:v0.1.6 ``` For more details, visit our [GitHub Repository](https://github.com/sowilow/DGX-Spark-llama.cpp-Bench). --- ## πŸš€ v0.1.6: μ‹€μ‹œκ°„ μ§€ν‘œ 및 Blackwell μ΅œμ ν™” 도컀 (ꢌμž₯) 이 λͺ¨λΈμ€ **[DGX-Spark-llama.cpp-Bench](https://github.com/sowilow/DGX-Spark-llama.cpp-Bench)** μ‹œμŠ€ν…œμ— μ΅œμ ν™”λ˜μ–΄ μžˆμŠ΅λ‹ˆλ‹€. NVIDIA Blackwell (DGX Spark) ν•˜λ“œμ›¨μ–΄μ˜ μ„±λŠ₯을 μ΅œλŒ€λ‘œ ν™œμš©ν•˜μ„Έμš”. ### 🌟 μ£Όμš” νŠΉμ§• (v0.1.6) - **μ‹€μ‹œκ°„ μ„±λŠ₯ μ§€ν‘œ μ‹œκ°ν™”**: 슀트리밍 쀑 `Input TPS` 및 `Output TPS`λ₯Ό μ‹€μ‹œκ°„μœΌλ‘œ ν‘œμ‹œν•©λ‹ˆλ‹€. - **μ§€λŠ₯ν˜• μΆ”λ‘  UI 고도화**: λͺ¨λΈμ˜ μƒκ°ν•˜λŠ” κ³Όμ •(CoT)을 더 μ•ˆμ •μ μœΌλ‘œ λ Œλ”λ§ν•©λ‹ˆλ‹€. - **Blackwell μ΅œμ ν™”**: ARM64/SM121 μ•„ν‚€ν…μ²˜ 및 CUDA 13.0 FP4 가속 지원. ### 🐳 μ‹€ν–‰ 방법 ```bash # μ΅œμ‹  μ΅œμ ν™” 이미지 λ‚΄λ €λ°›κΈ° docker pull ghcr.io/sowilow/dgx-spark-llama.cpp-bench:v0.1.6 ``` μƒμ„Έν•œ μ‚¬μš©λ²•μ€ [GitHub 리포지토리](https://github.com/sowilow/DGX-Spark-llama.cpp-Bench)λ₯Ό μ°Έμ‘°ν•˜μ„Έμš”. --- --- ## πŸš€ v0.1.5: Real-time Metrics & Blackwell-Optimized Docker (Recommended) This model is fully compatible with the **[DGX-Spark-llama.cpp-Bench](https://github.com/sowilow/DGX-Spark-llama.cpp-Bench)**. Experience the state-of-the-art inference engine optimized for NVIDIA Blackwell (DGX Spark) hardware. ### 🌟 Key Features (v0.1.5) - **Real-time Performance Metrics**: Now visualizes `Input TPS` and `Output TPS` during streaming. - **Improved Reasoning UI**: Seamlessly renders and stabilizes the model's Chain-of-Thought (CoT). - **Blackwell Optimization**: Native support for ARM64/SM121 and CUDA 13.0 FP4. ### 🐳 Quick Start ```bash # Pull the latest optimized image docker pull ghcr.io/sowilow/dgx-spark-llama.cpp-bench:v0.1.5 ``` For more details, visit our [GitHub Repository](https://github.com/sowilow/DGX-Spark-llama.cpp-Bench). --- ## πŸš€ v0.1.5: μ‹€μ‹œκ°„ μ§€ν‘œ 및 Blackwell μ΅œμ ν™” 도컀 (ꢌμž₯) 이 λͺ¨λΈμ€ **[DGX-Spark-llama.cpp-Bench](https://github.com/sowilow/DGX-Spark-llama.cpp-Bench)** μ‹œμŠ€ν…œμ— μ΅œμ ν™”λ˜μ–΄ μžˆμŠ΅λ‹ˆλ‹€. NVIDIA Blackwell (DGX Spark) ν•˜λ“œμ›¨μ–΄μ˜ μ„±λŠ₯을 μ΅œλŒ€λ‘œ ν™œμš©ν•˜μ„Έμš”. ### 🌟 μ£Όμš” νŠΉμ§• (v0.1.5) - **μ‹€μ‹œκ°„ μ„±λŠ₯ μ§€ν‘œ μ‹œκ°ν™”**: 슀트리밍 쀑 `Input TPS` 및 `Output TPS`λ₯Ό μ‹€μ‹œκ°„μœΌλ‘œ ν‘œμ‹œν•©λ‹ˆλ‹€. - **μ§€λŠ₯ν˜• μΆ”λ‘  UI 고도화**: λͺ¨λΈμ˜ μƒκ°ν•˜λŠ” κ³Όμ •(CoT)을 더 μ•ˆμ •μ μœΌλ‘œ λ Œλ”λ§ν•©λ‹ˆλ‹€. - **Blackwell μ΅œμ ν™”**: ARM64/SM121 μ•„ν‚€ν…μ²˜ 및 CUDA 13.0 FP4 가속 지원. ### 🐳 μ‹€ν–‰ 방법 ```bash # μ΅œμ‹  μ΅œμ ν™” 이미지 λ‚΄λ €λ°›κΈ° docker pull ghcr.io/sowilow/dgx-spark-llama.cpp-bench:v0.1.5 ``` μƒμ„Έν•œ μ‚¬μš©λ²•μ€ [GitHub 리포지토리](https://github.com/sowilow/DGX-Spark-llama.cpp-Bench)λ₯Ό μ°Έμ‘°ν•˜μ„Έμš”. --- --- ## πŸš€ v0.1.4: Quick Start with Blackwell-Optimized Docker (Recommended) This model is fully compatible with the **[DGX-Spark-llama.cpp-Bench](https://github.com/sowilow/DGX-Spark-llama.cpp-Bench)**. Experience the best performance on NVIDIA Blackwell (DGX Spark) hardware with our optimized inference engine. ### 🌟 Key Features (v0.1.4) - **Blackwell Optimized**: Native support for ARM64/SM121 and CUDA 13.0 FP4. - **Intelligent Reasoning UI**: Automatic extraction and visualization of reasoning processes (CoT). - **One-Click Deployment**: Standardized environment via GHCR Docker image. ### 🐳 How to Run ```bash # Pull the latest optimized image docker pull ghcr.io/sowilow/dgx-spark-llama.cpp-bench:v0.1.4 # Follow the instructions in our repo to serve this model # GitHub: https://github.com/sowilow/DGX-Spark-llama.cpp-Bench ``` --- ## πŸš€ v0.1.4: Blackwell μ΅œμ ν™” 도컀 ν€΅μŠ€νƒ€νŠΈ (ꢌμž₯) 이 λͺ¨λΈμ€ **[DGX-Spark-llama.cpp-Bench](https://github.com/sowilow/DGX-Spark-llama.cpp-Bench)** μ‹œμŠ€ν…œμ— μ΅œμ ν™”λ˜μ–΄ μžˆμŠ΅λ‹ˆλ‹€. NVIDIA Blackwell (DGX Spark) ν•˜λ“œμ›¨μ–΄μ˜ μ„±λŠ₯을 μ΅œλŒ€λ‘œ ν™œμš©ν•˜λŠ” μ΅œμ ν™”λœ μΆ”λ‘  엔진을 κ²½ν—˜ν•΄ λ³΄μ„Έμš”. ### 🌟 μ£Όμš” νŠΉμ§• (v0.1.4) - **Blackwell μ΅œμ ν™”**: ARM64/SM121 μ•„ν‚€ν…μ²˜ 및 CUDA 13.0 FP4 ν•˜λ“œμ›¨μ–΄ 가속 지원. - **μ§€λŠ₯ν˜• μΆ”λ‘  UI**: λͺ¨λΈμ˜ μƒκ°ν•˜λŠ” κ³Όμ •(CoT)을 μžλ™μœΌλ‘œ κ°μ§€ν•˜κ³  μ‹œκ°ν™”ν•©λ‹ˆλ‹€. - **κ°„νŽΈν•œ 배포**: GHCR 도컀 이미지λ₯Ό 톡해 ν™˜κ²½ μ„€μ • 없이 μ¦‰μ‹œ μ‹€ν–‰ κ°€λŠ₯ν•©λ‹ˆλ‹€. ### 🐳 μ‹€ν–‰ 방법 ```bash # μ΅œμ‹  μ΅œμ ν™” 이미지 λ‚΄λ €λ°›κΈ° docker pull ghcr.io/sowilow/dgx-spark-llama.cpp-bench:v0.1.4 ``` μƒμ„Έν•œ μ‚¬μš©λ²•μ€ [GitHub 리포지토리](https://github.com/sowilow/DGX-Spark-llama.cpp-Bench)λ₯Ό μ°Έμ‘°ν•˜μ„Έμš”. --- --- ## πŸš€ Quick Start with Docker (Recommended) You can easily run this model using the **DGX-Spark-llama.cpp-Bench** inference engine. It's pre-configured for high-performance inference on NVIDIA hardware (especially Blackwell/DGX Spark). ### 1. Pull the Docker Image ```bash docker pull ghcr.io/sowilow/dgx-spark-llama.cpp-bench:latest ``` ### 2. Run the Inference Server For detailed configuration and usage, visit the [GitHub Repository](https://github.com/sowilow/DGX-Spark-llama.cpp-Bench). --- # LFM2.5-1.2B-Instruct-DGX-Spark-GGUF This repository contains GGUF-quantized weights for **LFM2.5-1.2B-Instruct**, specifically optimized for **NVIDIA Blackwell (DGX Spark)** hardware. ## πŸš€ Key Features - **Hardware Optimized**: Built with CUDA 13.0 and SM121 (Blackwell) native acceleration. - **Quantization**: - **Q4_K_M**: Balanced performance and accuracy. - **Q8_0**: High precision preservation. - **Base Model Integration**: Linked directly to the original [LiquidAI/LFM2.5-1.2B-Instruct](https://huggingface.co/LiquidAI/LFM2.5-1.2B-Instruct). ## βš–οΈ License & Attribution This model is a quantized version of the original [LiquidAI/LFM2.5-1.2B-Instruct](https://huggingface.co/LiquidAI/LFM2.5-1.2B-Instruct) and is subject to its original license. ## πŸ“‚ Files Included - `lfm2.5-1.2b-instruct-q4_k_m.gguf`: 4-bit quantized model. - `lfm2.5-1.2b-instruct-q8_0.gguf`: 8-bit quantized model. --- *Created using [DGX-Spark-llama.cpp-Bench](https://github.com/sowilow/DGX-Spark-llama.cpp-Bench)*