初始化项目,由ModelHub XC社区提供模型
Model: oisee/qwen2.5-coder-abap Source: Original Platform
This commit is contained in:
193
README.md
Normal file
193
README.md
Normal file
@@ -0,0 +1,193 @@
|
||||
---
|
||||
license: apache-2.0
|
||||
base_model: Qwen/Qwen2.5-Coder-7B-Instruct
|
||||
tags:
|
||||
- abap
|
||||
- sap
|
||||
- code
|
||||
- orpo
|
||||
- fine-tuned
|
||||
- qwen2
|
||||
language:
|
||||
- en
|
||||
pipeline_tag: text-generation
|
||||
library_name: transformers
|
||||
---
|
||||
|
||||
# Qwen-Coder-ABAP
|
||||
|
||||
Fine-tuned [Qwen2.5-Coder-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-Coder-7B-Instruct) for **modern ABAP 7.4+ code generation**.
|
||||
|
||||
Trained using **ORPO (Odds Ratio Preference Optimization)** on a high-quality dataset of 280 ABAP preference pairs to promote modern syntax and eliminate legacy patterns.
|
||||
|
||||
## Model Details
|
||||
|
||||
| Attribute | Value |
|
||||
|-----------|-------|
|
||||
| Base Model | Qwen2.5-Coder-7B-Instruct |
|
||||
| Fine-tuning Method | ORPO |
|
||||
| Training Examples | 280 preference pairs |
|
||||
| LoRA Rank | 32 |
|
||||
| LoRA Alpha | 64 |
|
||||
| Training Epochs | 3 |
|
||||
| Hardware | NVIDIA RTX 4060 Ti 16GB |
|
||||
|
||||
## Performance
|
||||
|
||||
Benchmarked on 12 ABAP coding tasks (modernization, basic coding, completion):
|
||||
|
||||
| Metric | Base Model | Fine-tuned | Improvement |
|
||||
|--------|------------|------------|-------------|
|
||||
| Modern ABAP patterns | 18 | 23 | +28% |
|
||||
| Legacy patterns | 7 | 2 | -71% |
|
||||
| Net score | +11 | +21 | +91% |
|
||||
| Inference time | 74.7s | 23.5s | 3x faster |
|
||||
|
||||
## Usage
|
||||
|
||||
### Transformers
|
||||
|
||||
```python
|
||||
from transformers import AutoModelForCausalLM, AutoTokenizer
|
||||
|
||||
model = AutoModelForCausalLM.from_pretrained("oisee/qwen-coder-abap")
|
||||
tokenizer = AutoTokenizer.from_pretrained("oisee/qwen-coder-abap")
|
||||
|
||||
messages = [
|
||||
{"role": "system", "content": "You are an ABAP programming assistant specialized in modern ABAP 7.4+ syntax."},
|
||||
{"role": "user", "content": "Convert this to modern ABAP: READ TABLE lt_data INTO ls_row WITH KEY id = 1."}
|
||||
]
|
||||
|
||||
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
|
||||
inputs = tokenizer(text, return_tensors="pt")
|
||||
outputs = model.generate(**inputs, max_new_tokens=256)
|
||||
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
|
||||
```
|
||||
|
||||
### Ollama
|
||||
|
||||
```bash
|
||||
ollama run oisee/qwen-coder-abap "Convert READ TABLE to modern ABAP"
|
||||
```
|
||||
|
||||
Also available as quantized GGUF: [ollama.com/oisee/qwen-coder-abap](https://ollama.com/oisee/qwen-coder-abap)
|
||||
|
||||
## Modern ABAP Patterns (Promoted)
|
||||
|
||||
The model is trained to prefer these modern ABAP 7.4+ patterns:
|
||||
|
||||
```abap
|
||||
" Inline declarations
|
||||
DATA(lv_result) = calculate_total( ).
|
||||
FIELD-SYMBOL(<ls_row>) TYPE ty_row.
|
||||
|
||||
" Table expressions (instead of READ TABLE)
|
||||
DATA(ls_customer) = lt_customers[ id = '12345' ].
|
||||
|
||||
" NEW operator (instead of CREATE OBJECT)
|
||||
DATA(lo_handler) = NEW zcl_handler( iv_config = 'DEFAULT' ).
|
||||
|
||||
" String templates (instead of CONCATENATE)
|
||||
DATA(lv_msg) = |Customer { lv_id } has { lv_count } orders|.
|
||||
|
||||
" VALUE constructor
|
||||
DATA(lt_data) = VALUE #( ( id = 1 name = 'A' ) ( id = 2 name = 'B' ) ).
|
||||
|
||||
" REDUCE for aggregation
|
||||
DATA(lv_sum) = REDUCE #( INIT s = 0 FOR row IN lt_data NEXT s = s + row-amount ).
|
||||
|
||||
" FILTER for table filtering
|
||||
DATA(lt_active) = FILTER #( lt_data WHERE status = 'A' ).
|
||||
|
||||
" Modern LOOP with inline field-symbol
|
||||
LOOP AT lt_data ASSIGNING FIELD-SYMBOL(<ls_row>).
|
||||
<ls_row>-processed = abap_true.
|
||||
ENDLOOP.
|
||||
```
|
||||
|
||||
## Legacy Patterns (Avoided)
|
||||
|
||||
The model learns to avoid these legacy patterns:
|
||||
|
||||
```abap
|
||||
" Legacy - model avoids these
|
||||
READ TABLE lt_data INTO ls_row WITH KEY id = 1.
|
||||
CREATE OBJECT lo_handler.
|
||||
CALL METHOD lo_handler->process.
|
||||
CONCATENATE lv_a lv_b INTO lv_result.
|
||||
MOVE lv_source TO lv_target.
|
||||
MOVE-CORRESPONDING ls_source TO ls_target.
|
||||
DATA: lv_var TYPE string. " Colon syntax
|
||||
```
|
||||
|
||||
## Training Dataset
|
||||
|
||||
The ORPO training dataset contains **280 high-quality preference pairs** covering:
|
||||
|
||||
| Category | Examples | Patterns |
|
||||
|----------|----------|----------|
|
||||
| Constructor Expressions | 45 | VALUE #, NEW #, CORRESPONDING #, COND #, SWITCH #, REDUCE |
|
||||
| Inline Declarations | 30 | DATA(), FIELD-SYMBOL(), @DATA for SELECT |
|
||||
| String Templates | 25 | \|text { var }\| with formatting |
|
||||
| Table Expressions | 35 | lt_table[ key = value ], OPTIONAL, DEFAULT |
|
||||
| Modern SELECT | 25 | @DATA, INTO TABLE @, host variables |
|
||||
| Exception Handling | 15 | TRY/CATCH with cx_root |
|
||||
| AMDP/HANA | 12 | AMDP procedures, table functions |
|
||||
| RAP/BDEF | 10 | Behavior definitions, draft handling |
|
||||
| ALV/SALV | 15 | CL_SALV_TABLE patterns |
|
||||
| Unit Testing | 18 | cl_abap_unit_assert patterns |
|
||||
| Other | 50 | JSON, HTTP, File operations, BAL logging |
|
||||
|
||||
Each example contains:
|
||||
- `prompt`: The coding task
|
||||
- `chosen`: Modern ABAP solution (preferred)
|
||||
- `rejected`: Legacy ABAP equivalent (discouraged)
|
||||
|
||||
## Training Configuration
|
||||
|
||||
```python
|
||||
# ORPO Config
|
||||
ORPOConfig(
|
||||
max_length=1536,
|
||||
beta=0.1, # ORPO penalty strength
|
||||
learning_rate=8e-6,
|
||||
per_device_train_batch_size=1,
|
||||
gradient_accumulation_steps=8,
|
||||
num_train_epochs=3,
|
||||
optim="adamw_8bit",
|
||||
)
|
||||
|
||||
# LoRA Config
|
||||
r=32, lora_alpha=64, lora_dropout=0.05
|
||||
target_modules=["q_proj", "k_proj", "v_proj", "o_proj",
|
||||
"gate_proj", "up_proj", "down_proj"]
|
||||
```
|
||||
|
||||
## Limitations
|
||||
|
||||
- Focused on ABAP 7.4+ syntax; may not cover all SAP-specific APIs
|
||||
- Training data is synthetic; real-world edge cases may vary
|
||||
- Best for code modernization and generation tasks
|
||||
- 7B parameter model; larger models may produce higher quality for complex tasks
|
||||
|
||||
## License
|
||||
|
||||
Apache 2.0 (inherited from [Qwen2.5-Coder-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-Coder-7B-Instruct))
|
||||
|
||||
## Citation
|
||||
|
||||
```bibtex
|
||||
@misc{qwen-coder-abap,
|
||||
author = {oisee},
|
||||
title = {Qwen-Coder-ABAP: Fine-tuned Qwen2.5-Coder for Modern ABAP},
|
||||
year = {2024},
|
||||
publisher = {Hugging Face},
|
||||
url = {https://huggingface.co/oisee/qwen-coder-abap}
|
||||
}
|
||||
```
|
||||
|
||||
## Acknowledgments
|
||||
|
||||
- [Qwen Team](https://github.com/QwenLM) for Qwen2.5-Coder
|
||||
- [Unsloth](https://github.com/unslothai/unsloth) for efficient fine-tuning
|
||||
- [TRL](https://github.com/huggingface/trl) for ORPO implementation
|
||||
Reference in New Issue
Block a user