132 lines
7.3 KiB
Markdown
132 lines
7.3 KiB
Markdown
---
|
|
license: apache-2.0
|
|
license_link: https://huggingface.co/AS-SiliconMind/SiliconMind-V1-Qwen2.5-C-7B-I/blob/main/LICENSE
|
|
language:
|
|
- en
|
|
base_model:
|
|
- Qwen/Qwen2.5-Coder-7B-Instruct
|
|
pipeline_tag: text-generation
|
|
tags:
|
|
- verilog
|
|
- reasoning
|
|
- multi-agent
|
|
---
|
|
|
|
<p align="center">
|
|
<img alt="SiliconMind Logo" src="https://raw.githubusercontent.com/AS-SiliconMind/SiliconMind-V1/refs/heads/gh-pages/images/logo.webp"/>
|
|
</p>
|
|
|
|
# SiliconMind-V1: Multi-Agent Distillation and Debug-Reasoning Workflows for Verilog Code Generation
|
|
|
|
## Model Overview
|
|
**SiliconMind-V1** is a family of open-source Large Language Models (LLMs) specialized for Verilog code generation, testing, and debugging. Unlike previous approaches that rely heavily on commercial models or external EDA tools, SiliconMind-V1 is locally fine-tuned to iteratively **generate**, **test**, and **debug** RTL designs through test-time scaling.
|
|
|
|
The **SiliconMind-V1** models are enabled by a unified multi-agent framework for reasoning-oriented training data generation with integrated testbench-driven verification to achieve state-of-the-art functional correctness on major benchmarks.
|
|
|
|
**Key Features:**
|
|
* **Reasoning-Oriented:** Trained to "think" before coding, producing reasoning traces that guide functional correctness.
|
|
* **Self-Testing & Debugging:** Capable of generating its own test report to fix bugs without tool-calling.
|
|
* **Tool-Free Verification:** Reduces reliance on expensive, proprietary EDA software during the generation loop.
|
|
* **Multi-Strategy Inference:** Supports Regular, Deep Thinking, and Agentic inference modes for scalable performance.
|
|
|
|
## Model Variants
|
|
We provide SiliconMind-V1 variants fine-tuned from the following base models:
|
|
|
|
| Model Name | Base Model | Size |
|
|
|:---|:---|:---|
|
|
| [**SiliconMind-V1-Qwen2.5-C-7B-I**](https://huggingface.co/AS-SiliconMind/SiliconMind-V1-Qwen2.5-C-7B-I) | Qwen2.5-Coder-7B-Instruct | 7B |
|
|
| [**SiliconMind-V1-Qwen3-4B-T-2507**](https://huggingface.co/AS-SiliconMind/SiliconMind-V1-Olmo-3-7B-Think) | Qwen3-4B-Thinking-2507 | 4B |
|
|
| [**SiliconMind-V1-Qwen3-8B**](https://huggingface.co/AS-SiliconMind/SiliconMind-V1-Qwen3-8B) | Qwen3-8B | 8B |
|
|
| [**SiliconMind-V1-Olmo-3-7B-Think**](https://huggingface.co/AS-SiliconMind/SiliconMind-V1-Qwen3-4B-T-2507) | Olmo-3-7B-Think | 7B |
|
|
|
|
### Model Sources
|
|
|
|
- **Project Page:** https://AS-SiliconMind.github.io/SiliconMind-V1
|
|
- **Repositories:**
|
|
- Inference Engine: https://github.com/AS-SiliconMind/SiliconMind-V1
|
|
- **Paper:** arxiv
|
|
|
|
## Usage & Inference Strategies
|
|
|
|
SiliconMind-V1 is designed to work with three distinct inference strategies, allowing users to trade off between latency/cost and accuracy. Please refer to our [inference engine](https://github.com/AS-SiliconMind/SiliconMind-V1) for more details on how to get started with **SiliconMind-V1**.
|
|
|
|
### 1. Regular Strategy
|
|
The model acts as a standard code generator but is prompted to produce a reasoning trace before the final code.
|
|
* **Best for:** Quick prototyping and simple modules.
|
|
|
|
### 2. Deep Thinking Strategy
|
|
Explicit instructions are given to the model to solve the problem by:
|
|
1. Drafting an initial solution.
|
|
2. Mentally "testing" it against scenarios.
|
|
3. Self-debugging within the reasoning trace.
|
|
* **Best for:** Complex logic where single-pass generation often fails.
|
|
|
|
### 3. Agentic Strategy (Recommended for SOTA Results)
|
|
A multi-turn workflow where the model plays different "Agent" roles sequentially:
|
|
1. **Solution Agent:** Generates initial code + reasoning.
|
|
2. **Test Agent:** Generates a test report for the code.
|
|
3. **Debug Agent:** Reviews the test report and fixes errors.
|
|
* **Performance:** Achieves the highest pass rates (Pass@1) by allowing iterative refinement (up to 3 interactions recommended).
|
|
|
|
## Training
|
|
The models were trained on a Multi-Faceted Dataset constructed via a custom two-phase pipeline:
|
|
|
|
* **Code Generation Phase:** A multi-agent system (Revision, Solution, Testbench, Verification Agents) synthesized 36k functionally verified (problem, reasoning, code, testbench) tuples from public sources.
|
|
|
|
* **Self-Correction Phase:** The model was stress-tested against these problems. Hard samples (where the model failed) were augmented with "Test" and "Debug" curriculum, teaching the model how to write test reports and fix its own errors.
|
|
|
|
## Evaluation: Pass@1 Performance (%) Across Major Verilog Benchmarks
|
|
|
|
| Model Name | Base Model | RTLLM-v2 | VerilogEval-v2 | VerilogEval-v2-NTU | CVDP-cid02&03 |
|
|
| :--- | :--- | :---: | :---: | :---: | :---: |
|
|
| *Foundation Models:* | | | | | |
|
|
| DeepSeek-R1-0528 | -- | 68.7 | 80.9 | 86.4 | 25.6 |
|
|
| gpt-oss-120b (high) | -- | 70.0 | 83.2 | 87.9 | 27.6 |
|
|
| Qwen3-32B | -- | 55.4 | 70.3 | 76.3 | 12.8 |
|
|
| Qwen3-14B | -- | 50.0 | 64.2 | 69.5 | 12.9 |
|
|
| | | | | | |
|
|
| Qwen2.5-C-7B-I | -- | 29.3 | 31.5 | 33.6 | 7.3 |
|
|
| Qwen3-4B-T-2507 | -- | 36.4 | 48.2 | 52.5 | 12.4 |
|
|
| Qwen3-8B | -- | 40.2 | 53.7 | 57.4 | 11.9 |
|
|
| Olmo-3-7B-Think | -- | 10.4 | 7.8 | 8.9 | 1.2 |
|
|
| *Fine-tuned Models:* | | | | | |
|
|
| CodeV-R1-7B-Distill | Qwen2.5-C-7B-I | 58.5 | 66.4 | 69.6 | 19.0 |
|
|
| CodeV-R1-7B | Qwen2.5-C-7B-I | 🥉 **66.1** | **69.7** | 73.2 | 21.3 |
|
|
| **SiliconMind-V1** | Qwen2.5-C-7B-I | 63.8 | **69.7** | **73.9** | 🥉 **22.3** |
|
|
| **SiliconMind-V1** | Qwen3-4B-T-2507 | 🥇 67.9 | 🥈 76.4 | 🥇 82.0 | 🥈 23.5 |
|
|
| **SiliconMind-V1** | Qwen3-8B | 🥈 66.6 | 🥇 76.5 | 🥈 81.0 | 🥇 24.0 |
|
|
| **SiliconMind-V1** | Olmo-3-7B-Think | 63.3 | 🥉 73.5 | 🥉 79.5 | 21.2 |
|
|
|
|
<br>
|
|
|
|
**Note:** - **Bold** values denote the better-performing model between CodeV-R1 and ours using the same base model.
|
|
- Rankings among specialized models: 🥇 First, 🥈 Second, 🥉 Third.
|
|
- For brevity, we refer to *Qwen2.5-Coder-7b-Instruct* as *Qwen2.5-C-7B-I* and *Qwen3-4B-Thinking-2507* as *Qwen3-4B-T-2507*.
|
|
- **SiliconMind-V1** models' results were obtained using the Agentic Strategy, and we allow up to 3 Test/Debug Agent interactions.
|
|
|
|
## License
|
|
**SiliconMind-V1** is licensed under [Apache 2.0](https://huggingface.co/AS-SiliconMind/SiliconMind-V1-Qwen2.5-C-7B-I/blob/main/LICENSE).
|
|
<br>
|
|
The base models' licenses:
|
|
[Qwen2.5-Coder-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-Coder-7B-Instruct/blob/main/LICENSE),
|
|
[Qwen3-4B-Thinking-2507](https://huggingface.co/Qwen/Qwen3-4B-Thinking-2507/blob/main/LICENSE),
|
|
[Qwen3-8B](https://huggingface.co/Qwen/Qwen3-8B/blob/main/LICENSE),
|
|
[Olmo-3-7B-Think](https://huggingface.co/datasets/choosealicense/licenses/blob/main/markdown/apache-2.0.md) ([Responsible Use Guidelines](https://allenai.org/responsible-use)).
|
|
|
|
## Acknowledgements
|
|
We acknowledge the financial support from Academia Sinica's SiliconMind Project (AS-IAIA-114-M11). We also thank the National Center for High-Performance Computing (NCHC) for providing computational and storage resources, and Taipei-1 for providing H100 computing resources. In addition, we acknowledge financial support from the National Science and Technology Council.
|
|
|
|
## Citation
|
|
|
|
**BibTeX:**
|
|
```
|
|
@misc{Chen2026SiliconMindV1,
|
|
title = {{SiliconMind-V1}: Multi-Agent Distillation and Debug-Reasoning Workflows for Verilog Code Generation},
|
|
author = {Mu-Chi Chen and Yu-Hung Kao and Po-Hsuan Huang and Shao-Chun Ho
|
|
and Hsiang-Yu Tsou and I-Ting Wu and En-Ming Huang
|
|
and Yu-Kai Hung and Wei-Po Hsin and Cheng Liang
|
|
and Chia-Heng Tu and Shih-Hao Hung and H.T. Kung},
|
|
year = {2026},
|
|
url = {https://AS-SiliconMind.github.io/SiliconMind-V1}
|
|
}
|
|
``` |