69 lines
2.1 KiB
Markdown
69 lines
2.1 KiB
Markdown
|
|
---
|
|||
|
|
language: en
|
|||
|
|
tags:
|
|||
|
|
- semantic-ids
|
|||
|
|
- recommendation
|
|||
|
|
- generative-retrieval
|
|||
|
|
- qwen3
|
|||
|
|
- fine-tuned
|
|||
|
|
license: apache-2.0
|
|||
|
|
base_model: Qwen/Qwen3-1.7B
|
|||
|
|
datasets:
|
|||
|
|
- amazon-pet-supplies
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
# Qwen3-1.8B Fine-tuned for Semantic ID Recommendation
|
|||
|
|
|
|||
|
|
## Overview
|
|||
|
|
|
|||
|
|
Qwen3-1.8B fine-tuned for generative product recommendation via hierarchical semantic identifiers. The model generates 4-level Semantic IDs (`<|sid_start|><|A#|><|B#|><|C#|><|D#|><|sid_end|>`) given product descriptions, purchase histories, or co-purchase contexts.
|
|||
|
|
|
|||
|
|
This is the smaller model in a controlled comparison experiment (1.8B vs 8B) conducted under identical training conditions.
|
|||
|
|
|
|||
|
|
## Training
|
|||
|
|
|
|||
|
|
### Stage 1: Vocabulary Expansion
|
|||
|
|
- Added 1,027 special tokens (3 structural + 4×256 codebook tokens)
|
|||
|
|
- Trained only embedding matrices (0.3% of parameters)
|
|||
|
|
- 2,000 steps, LR 1×10⁻³, batch 64
|
|||
|
|
|
|||
|
|
### Stage 2: Full Fine-tuning
|
|||
|
|
- **Dataset**: 4,719,994 instruction-formatted conversations (Amazon Pet Supplies)
|
|||
|
|
- **Task types**: text→SID, sequential recommendation, co-purchase prediction
|
|||
|
|
- **Optimizer**: AdamW 8-bit, LR 2×10⁻⁵, cosine with min LR (0.2×peak)
|
|||
|
|
- **Warmup**: 3%, weight decay 0.01
|
|||
|
|
- **Batch**: 64 × 2 = 128 effective, 3 epochs
|
|||
|
|
- **Techniques**: Custom instruction masking, greedy sequence packing (~3× throughput)
|
|||
|
|
- **Hardware**: NVIDIA H100 80GB (vast.ai)
|
|||
|
|
|
|||
|
|
## Results
|
|||
|
|
|
|||
|
|
Hierarchical SID prediction accuracy (A-level match, greedy decoding):
|
|||
|
|
|
|||
|
|
| Task | Accuracy |
|
|||
|
|
|------|----------|
|
|||
|
|
| Text → SID | 59.9% |
|
|||
|
|
| Sequential recommendation | 7.0% |
|
|||
|
|
| Co-purchase prediction | 5.5% |
|
|||
|
|
|
|||
|
|
Evaluation: 3,000 samples per task, 11 task types.
|
|||
|
|
|
|||
|
|
## Usage
|
|||
|
|
|
|||
|
|
```python
|
|||
|
|
from transformers import AutoModelForCausalLM, AutoTokenizer
|
|||
|
|
|
|||
|
|
model = AutoModelForCausalLM.from_pretrained("kalistratov/qwen3-1.8b-semantic-ids")
|
|||
|
|
tokenizer = AutoTokenizer.from_pretrained("kalistratov/qwen3-1.8b-semantic-ids")
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
## Citation
|
|||
|
|
|
|||
|
|
Master's thesis, Moscow Institute of Physics and Technology (MIPT), 2026.
|
|||
|
|
|
|||
|
|
## References
|
|||
|
|
|
|||
|
|
1. Y. Sun et al. "OpenOneRec," arXiv:2502.18851, 2025.
|
|||
|
|
2. J. Liu et al. "PLUM," arXiv:2406.12346, 2024.
|
|||
|
|
3. E. Yan. "semantic-ids-llm," GitHub, 2024.
|