初始化项目，由ModelHub XC社区提供模型

Model: nuprl/MultiPL-T-StarCoderBase_1b Source: Original Platform
2026-04-23 23:01:13 +08:00
commit 565d51005e
11 changed files with 147366 additions and 0 deletions
--- a/README.md
+++ b/README.md
@@ -0,0 +1,73 @@
+---
+license: bigscience-openrail-m
+library_name: transformers
+tags:
+- code
+- gpt_bigcode
+datasets:
+- nuprl/MultiPL-T
+metrics:
+- code_eval
+model-index:
+- name: MultiPLCoder-1b-OCaml
+  results:
+  - task:
+      type: text-generation
+    dataset:
+      name: MultiPL-HumanEval (Lua)
+      type: nuprl/MultiPL-E
+    metrics:
+    - type: pass@1
+      value: 0.173
+      name: pass@1
+      verified: true
+    - type: pass@1
+      value: 0.113
+      name: pass@1
+      verified: true
+    - type: pass@1
+      value: 0.097
+      name: pass@1
+      verified: true
+---
+# MultiPLCoder-1b
+
+1 billion parameter version of MultiPLCoder, a set of StarCoder-based models finetuned on the [MultiPL-T dataset](https://huggingface.co/datasets/nuprl/MultiPL-T).
+These models are state-of-the-art at low-resource languages, such as: Lua, Racket, and OCaml.
+
+
+## Language Revision Index
+
+This is the revision index for the best-performing models for their respective langauge.
+
+| Langauge      | Revision ID | Epoch |
+| ------------- | ----------- | ----- |
+| Lua           | `7e96d931547e342ad0661cdd91236fe4ccf52545`         | 3    |
+| Racket        | `2cdc541bee1db4da80c0b43384b0d6a0cacca5b2`         | 5    |
+| OCaml         | `e8a24f9e2149cbda8c3cca264a53c2b361b7a031`         | 6    |
+
+## Usage
+
+To utilize one of the models in this repository, you must first select a commit revision for that model from the table above.
+For example, to use the Lua model:
+```py
+from transformers import AutoTokenizer, AutoModelForCausalLM
+tokenizer = AutoTokenizer.from_pretrained("nuprl/MultiPLCoder-1b")
+lua_revision="7e96d931547e342ad0661cdd91236fe4ccf52545"
+model = AutoModelForCausalLM.from_pretrained("nuprl/MultiPLCoder-1b", revision=lua_revision)
+```
+
+Note that the model's default configuration does not enable caching, therefore you must specify to use the cache on generation.
+```py
+toks = tokenizer.encode("-- Hello World", return_tensors="pt")
+out = model.generate(toks, use_cache=True,  do_sample=True, temperature=0.2, top_p=0.95, max_length=50)
+print(tokenizer.decode(out[0], skip_special_tokens=True))
+```
+```
+-- Hello World!
+-- :param name: The name of the person to say hello to
+-- :return: A greeting
+local function say_hello(name)
+  return "Hello ".. name
+end
+```