初始化项目，由ModelHub XC社区提供模型

Model: DDIDU/ETRI_CodeLLaMA_7B_CPP Source: Original Platform
2026-06-07 02:14:18 +08:00
commit 3e907963ce
23 changed files with 94034 additions and 0 deletions
--- a/README.md
+++ b/README.md
@@ -0,0 +1,97 @@
+---
+license: llama2
+model-index:
+- name: ETRI_CodeLLaMA_7B_CPP
+  results:
+  - task:
+      type: text-generation
+    dataset:
+      type: HumanEval-X
+      name: humanevalsynthesize-cpp
+    metrics:
+    - name: pass@1
+      type: pass@1
+      value: 34.3%
+      verified: false
+---
+
+
+## **ETRI_CodeLLaMA_7B_CPP**
+
+We used LoRa to further pre-train Meta's CodeLLaMA-7B-hf model with high-quality C++ code tokens.
+
+Furthermore, we fine-tuned on CodeM's C++ instruction data.
+
+## Model Details
+
+This model was trained using LoRa and achieved a pass@1 of 34.3% on HumanEvalX-cpp.
+
+ETRI_CodeLLaMA_7B_CPP is a C++ specialized model.
+
+## Dataset Details
+
+We pre-trained CodeLLaMA-7B further using 543 GB of C++ code collected online, and fine-tuned it using CodeM's C++ instruction data. We utilized 1 x A100-80GB GPU for the training.
+
+## Requirements
+
+```
+pip install torch transformers accelerate
+```
+
+## How to reproduce HumanEval-X results
+
+We use Bigcode-evaluation-harness repo for evaluating our trained model.
+
+bigcode-evaluation-harness
+
+```
+git clone https://github.com/bigcode-project/bigcode-evaluation-harness.git
+```
+
+Then, run main.py as follows.
+
+```
+accelerate launch bigcode-evaluation-harness/main.py \
+  --model DDIDU/ETRI_CodeLLaMA_7B_CPP \
+  --max_length_generation 512 \
+  --prompt continue \
+  --tasks humanevalsynthesize-cpp \
+  --temperature 0.2 \
+  --n_samples 100 \
+  --precision bf16 \
+  --do_sample True \
+  --batch_size 10 \
+  --allow_code_execution \
+  --save_generations \
+```
+
+## Model use
+
+```
+from transformers import AutoTokenizer
+import transformers
+import torch
+
+model = "DDIDU/ETRI_CodeLLaMA_7B_CPP"
+
+tokenizer = AutoTokenizer.from_pretrained(model)
+pipeline = transformers.pipeline(
+    "text-generation",
+    model=model,
+    torch_dtype=torch.bfloat16,
+    device_map="auto",
+)
+
+sequences = pipeline(
+    '#include <iostream>\n#include <vector>\n\nusing namespace std;\n\nvoid quickSort(int *data, int start, int end) {',
+    do_sample=True,
+    top_k=10,
+    temperature=0.1,
+    top_p=0.95,
+    num_return_sequences=1,
+    eos_token_id=tokenizer.eos_token_id,
+    max_length=200,
+)
+for seq in sequences:
+    print(f"Result: {seq['generated_text']}")
+```