--- license: apache-2.0 base_model: Qwen/Qwen2.5-Coder-7B-Instruct tags: - abap - sap - code - orpo - fine-tuned - qwen2 language: - en pipeline_tag: text-generation library_name: transformers --- # Qwen-Coder-ABAP Fine-tuned [Qwen2.5-Coder-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-Coder-7B-Instruct) for **modern ABAP 7.4+ code generation**. Trained using **ORPO (Odds Ratio Preference Optimization)** on a high-quality dataset of 280 ABAP preference pairs to promote modern syntax and eliminate legacy patterns. ## Model Details | Attribute | Value | |-----------|-------| | Base Model | Qwen2.5-Coder-7B-Instruct | | Fine-tuning Method | ORPO | | Training Examples | 280 preference pairs | | LoRA Rank | 32 | | LoRA Alpha | 64 | | Training Epochs | 3 | | Hardware | NVIDIA RTX 4060 Ti 16GB | ## Performance Benchmarked on 12 ABAP coding tasks (modernization, basic coding, completion): | Metric | Base Model | Fine-tuned | Improvement | |--------|------------|------------|-------------| | Modern ABAP patterns | 18 | 23 | +28% | | Legacy patterns | 7 | 2 | -71% | | Net score | +11 | +21 | +91% | | Inference time | 74.7s | 23.5s | 3x faster | ## Usage ### Transformers ```python from transformers import AutoModelForCausalLM, AutoTokenizer model = AutoModelForCausalLM.from_pretrained("oisee/qwen-coder-abap") tokenizer = AutoTokenizer.from_pretrained("oisee/qwen-coder-abap") messages = [ {"role": "system", "content": "You are an ABAP programming assistant specialized in modern ABAP 7.4+ syntax."}, {"role": "user", "content": "Convert this to modern ABAP: READ TABLE lt_data INTO ls_row WITH KEY id = 1."} ] text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True) inputs = tokenizer(text, return_tensors="pt") outputs = model.generate(**inputs, max_new_tokens=256) print(tokenizer.decode(outputs[0], skip_special_tokens=True)) ``` ### Ollama ```bash ollama run oisee/qwen-coder-abap "Convert READ TABLE to modern ABAP" ``` Also available as quantized GGUF: [ollama.com/oisee/qwen-coder-abap](https://ollama.com/oisee/qwen-coder-abap) ## Modern ABAP Patterns (Promoted) The model is trained to prefer these modern ABAP 7.4+ patterns: ```abap " Inline declarations DATA(lv_result) = calculate_total( ). FIELD-SYMBOL() TYPE ty_row. " Table expressions (instead of READ TABLE) DATA(ls_customer) = lt_customers[ id = '12345' ]. " NEW operator (instead of CREATE OBJECT) DATA(lo_handler) = NEW zcl_handler( iv_config = 'DEFAULT' ). " String templates (instead of CONCATENATE) DATA(lv_msg) = |Customer { lv_id } has { lv_count } orders|. " VALUE constructor DATA(lt_data) = VALUE #( ( id = 1 name = 'A' ) ( id = 2 name = 'B' ) ). " REDUCE for aggregation DATA(lv_sum) = REDUCE #( INIT s = 0 FOR row IN lt_data NEXT s = s + row-amount ). " FILTER for table filtering DATA(lt_active) = FILTER #( lt_data WHERE status = 'A' ). " Modern LOOP with inline field-symbol LOOP AT lt_data ASSIGNING FIELD-SYMBOL(). -processed = abap_true. ENDLOOP. ``` ## Legacy Patterns (Avoided) The model learns to avoid these legacy patterns: ```abap " Legacy - model avoids these READ TABLE lt_data INTO ls_row WITH KEY id = 1. CREATE OBJECT lo_handler. CALL METHOD lo_handler->process. CONCATENATE lv_a lv_b INTO lv_result. MOVE lv_source TO lv_target. MOVE-CORRESPONDING ls_source TO ls_target. DATA: lv_var TYPE string. " Colon syntax ``` ## Training Dataset The ORPO training dataset contains **280 high-quality preference pairs** covering: | Category | Examples | Patterns | |----------|----------|----------| | Constructor Expressions | 45 | VALUE #, NEW #, CORRESPONDING #, COND #, SWITCH #, REDUCE | | Inline Declarations | 30 | DATA(), FIELD-SYMBOL(), @DATA for SELECT | | String Templates | 25 | \|text { var }\| with formatting | | Table Expressions | 35 | lt_table[ key = value ], OPTIONAL, DEFAULT | | Modern SELECT | 25 | @DATA, INTO TABLE @, host variables | | Exception Handling | 15 | TRY/CATCH with cx_root | | AMDP/HANA | 12 | AMDP procedures, table functions | | RAP/BDEF | 10 | Behavior definitions, draft handling | | ALV/SALV | 15 | CL_SALV_TABLE patterns | | Unit Testing | 18 | cl_abap_unit_assert patterns | | Other | 50 | JSON, HTTP, File operations, BAL logging | Each example contains: - `prompt`: The coding task - `chosen`: Modern ABAP solution (preferred) - `rejected`: Legacy ABAP equivalent (discouraged) ## Training Configuration ```python # ORPO Config ORPOConfig( max_length=1536, beta=0.1, # ORPO penalty strength learning_rate=8e-6, per_device_train_batch_size=1, gradient_accumulation_steps=8, num_train_epochs=3, optim="adamw_8bit", ) # LoRA Config r=32, lora_alpha=64, lora_dropout=0.05 target_modules=["q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj"] ``` ## Limitations - Focused on ABAP 7.4+ syntax; may not cover all SAP-specific APIs - Training data is synthetic; real-world edge cases may vary - Best for code modernization and generation tasks - 7B parameter model; larger models may produce higher quality for complex tasks ## License Apache 2.0 (inherited from [Qwen2.5-Coder-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-Coder-7B-Instruct)) ## Citation ```bibtex @misc{qwen-coder-abap, author = {oisee}, title = {Qwen-Coder-ABAP: Fine-tuned Qwen2.5-Coder for Modern ABAP}, year = {2024}, publisher = {Hugging Face}, url = {https://huggingface.co/oisee/qwen-coder-abap} } ``` ## Acknowledgments - [Qwen Team](https://github.com/QwenLM) for Qwen2.5-Coder - [Unsloth](https://github.com/unslothai/unsloth) for efficient fine-tuning - [TRL](https://github.com/huggingface/trl) for ORPO implementation