初始化项目，由ModelHub XC社区提供模型

Model: jeongseokoh/LatentSC_llama3.1_8b_6SummaryTokens Source: Original Platform
2026-05-26 03:47:15 +08:00
commit 1d54dc258b
13 changed files with 2799 additions and 0 deletions
--- a/README.md
+++ b/README.md
@@ -0,0 +1,104 @@
+---
+base_model: meta-llama/Llama-3.1-8B-Instruct
+library_name: transformers
+pipeline_tag: text-generation
+language: en
+---
+
+# LatentSC Llama 3.1 8B with Summary Tokens
+
+This repository contains a Llama 3.1 8B Instruct backbone with LatentSC Summary-token embeddings attached. The base model weights are unchanged; only the Summary token embeddings are added so that LatentSC inference can use the trained Summary tokens.
+
+## Usage
+
+```python
+import torch
+import torch.nn.functional as F
+from transformers import AutoTokenizer, AutoModelForCausalLM
+
+repo = "jeongseokoh/LatentSC_llama3.1_8b_6SummaryTokens"
+
+tokenizer = AutoTokenizer.from_pretrained(repo)
+model = AutoModelForCausalLM.from_pretrained(
+    repo, torch_dtype=torch.bfloat16, device_map="auto"
+)
+
+# Summary tokens (default: 6)
+summary_tokens = [f"<|Summary{i}|>" for i in range(1, 7)]
+
+messages = [
+    {"role": "system", "content": "You are a helpful assistant."},
+    {"role": "user", "content": "Solve: 17 * 23. Show the final answer only."},
+]
+prompt = tokenizer.apply_chat_template(messages, tokenize=False)
+prompt_with_summary = prompt + "".join(summary_tokens)
+
+inputs = tokenizer(prompt_with_summary, return_tensors="pt").to(model.device)
+with torch.no_grad():
+    out = model.generate(
+        **inputs,
+        max_new_tokens=128,
+        do_sample=True,
+        temperature=0.9,
+        top_p=0.95,
+        num_return_sequences=10,
+        pad_token_id=tokenizer.eos_token_id,
+        return_dict_in_generate=True,
+        output_hidden_states=True,
+    )
+
+# Decode candidates
+sequences = out.sequences
+answers = tokenizer.batch_decode(sequences, skip_special_tokens=True)
+
+# Embeddings: use last hidden state of the final token per sequence
+last_hs = out.hidden_states[-1][-1]  # (batch, seq, hidden)
+seq_lens = inputs["attention_mask"].sum(dim=1) - 1
+idx = torch.arange(last_hs.size(0), device=last_hs.device)
+embs = last_hs[idx, seq_lens, :]  # (N, D)
+
+# LSC selection (cosine similarity)
+embs = F.normalize(embs.float(), p=2, dim=1)
+sim = embs @ embs.T
+sim.fill_diagonal_(0.0)
+avg_sim = sim.mean(dim=1)
+best_idx = int(torch.argmax(avg_sim))
+best_answer = answers[best_idx]
+
+# Dynamic TopK LSC
+def lsc_topk(embs, answers, k):
+    embs = F.normalize(embs.float(), p=2, dim=1)
+    sim = embs @ embs.T
+    sim.fill_diagonal_(0.0)
+    avg_sim = sim.mean(dim=1)
+    topk_idx = torch.topk(avg_sim, k=k).indices
+    sub = embs[topk_idx]
+    sub_sim = sub @ sub.T
+    sub_sim.fill_diagonal_(0.0)
+    sub_avg = sub_sim.mean(dim=1)
+    best_local = int(torch.argmax(sub_avg))
+    return answers[int(topk_idx[best_local])], float(sub_avg.max())
+
+best = None
+best_score = -1e9
+for k in [3, 5, 7]:
+    cand, score = lsc_topk(embs, answers, k)
+    if score > best_score:
+        best_score = score
+        best = cand
+```
+
+### Stored LatentSC config fields
+
+The following config fields are saved (when present) to guide LatentSC inference:
+
+```text
+lsc_num_special_tokens
+lsc_special_token_prefix
+lsc_aggr
+lsc_remove_eos
+lsc_temp
+```
+
+For detailed training/inference scripts and full usage, see the GitHub repository:
+https://github.com/jeongseokO/LatentSC_official