Files
qwen3-1.8b-semantic-ids/README.md
ModelHub XC aa99d04387 初始化项目,由ModelHub XC社区提供模型
Model: kalistratov/qwen3-1.8b-semantic-ids
Source: Original Platform
2026-05-10 07:18:20 +08:00

69 lines
2.1 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

---
language: en
tags:
- semantic-ids
- recommendation
- generative-retrieval
- qwen3
- fine-tuned
license: apache-2.0
base_model: Qwen/Qwen3-1.7B
datasets:
- amazon-pet-supplies
---
# Qwen3-1.8B Fine-tuned for Semantic ID Recommendation
## Overview
Qwen3-1.8B fine-tuned for generative product recommendation via hierarchical semantic identifiers. The model generates 4-level Semantic IDs (`<|sid_start|><|A#|><|B#|><|C#|><|D#|><|sid_end|>`) given product descriptions, purchase histories, or co-purchase contexts.
This is the smaller model in a controlled comparison experiment (1.8B vs 8B) conducted under identical training conditions.
## Training
### Stage 1: Vocabulary Expansion
- Added 1,027 special tokens (3 structural + 4×256 codebook tokens)
- Trained only embedding matrices (0.3% of parameters)
- 2,000 steps, LR 1×10⁻³, batch 64
### Stage 2: Full Fine-tuning
- **Dataset**: 4,719,994 instruction-formatted conversations (Amazon Pet Supplies)
- **Task types**: text→SID, sequential recommendation, co-purchase prediction
- **Optimizer**: AdamW 8-bit, LR 2×10⁻⁵, cosine with min LR (0.2×peak)
- **Warmup**: 3%, weight decay 0.01
- **Batch**: 64 × 2 = 128 effective, 3 epochs
- **Techniques**: Custom instruction masking, greedy sequence packing (~3× throughput)
- **Hardware**: NVIDIA H100 80GB (vast.ai)
## Results
Hierarchical SID prediction accuracy (A-level match, greedy decoding):
| Task | Accuracy |
|------|----------|
| Text → SID | 59.9% |
| Sequential recommendation | 7.0% |
| Co-purchase prediction | 5.5% |
Evaluation: 3,000 samples per task, 11 task types.
## Usage
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("kalistratov/qwen3-1.8b-semantic-ids")
tokenizer = AutoTokenizer.from_pretrained("kalistratov/qwen3-1.8b-semantic-ids")
```
## Citation
Master's thesis, Moscow Institute of Physics and Technology (MIPT), 2026.
## References
1. Y. Sun et al. "OpenOneRec," arXiv:2502.18851, 2025.
2. J. Liu et al. "PLUM," arXiv:2406.12346, 2024.
3. E. Yan. "semantic-ids-llm," GitHub, 2024.