初始化项目，由ModelHub XC社区提供模型

Model: RichardErkhov/AstroMLab_-_astrollama-3-8b-base_aic-gguf Source: Original Platform
2026-04-16 17:33:16 +08:00
commit de00af455c
21 changed files with 260 additions and 0 deletions
--- a/.gitattributes
+++ b/.gitattributes
@@ -0,0 +1,54 @@
 *.7z filter=lfs diff=lfs merge=lfs -text
 *.arrow filter=lfs diff=lfs merge=lfs -text
 *.bin filter=lfs diff=lfs merge=lfs -text
 *.bz2 filter=lfs diff=lfs merge=lfs -text
 *.ckpt filter=lfs diff=lfs merge=lfs -text
 *.ftz filter=lfs diff=lfs merge=lfs -text
 *.gz filter=lfs diff=lfs merge=lfs -text
 *.h5 filter=lfs diff=lfs merge=lfs -text
 *.joblib filter=lfs diff=lfs merge=lfs -text
 *.lfs.* filter=lfs diff=lfs merge=lfs -text
 *.mlmodel filter=lfs diff=lfs merge=lfs -text
 *.model filter=lfs diff=lfs merge=lfs -text
 *.msgpack filter=lfs diff=lfs merge=lfs -text
 *.npy filter=lfs diff=lfs merge=lfs -text
 *.npz filter=lfs diff=lfs merge=lfs -text
 *.onnx filter=lfs diff=lfs merge=lfs -text
 *.ot filter=lfs diff=lfs merge=lfs -text
 *.parquet filter=lfs diff=lfs merge=lfs -text
 *.pb filter=lfs diff=lfs merge=lfs -text
 *.pickle filter=lfs diff=lfs merge=lfs -text
 *.pkl filter=lfs diff=lfs merge=lfs -text
 *.pt filter=lfs diff=lfs merge=lfs -text
 *.pth filter=lfs diff=lfs merge=lfs -text
 *.rar filter=lfs diff=lfs merge=lfs -text
 *.safetensors filter=lfs diff=lfs merge=lfs -text
 saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *.tar.* filter=lfs diff=lfs merge=lfs -text
 *.tar filter=lfs diff=lfs merge=lfs -text
 *.tflite filter=lfs diff=lfs merge=lfs -text
 *.tgz filter=lfs diff=lfs merge=lfs -text
 *.wasm filter=lfs diff=lfs merge=lfs -text
 *.xz filter=lfs diff=lfs merge=lfs -text
 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
 astrollama-3-8b-base_aic.Q2_K.gguf filter=lfs diff=lfs merge=lfs -text
 astrollama-3-8b-base_aic.Q3_K_S.gguf filter=lfs diff=lfs merge=lfs -text
 astrollama-3-8b-base_aic.Q3_K.gguf filter=lfs diff=lfs merge=lfs -text
 astrollama-3-8b-base_aic.Q3_K_M.gguf filter=lfs diff=lfs merge=lfs -text
 astrollama-3-8b-base_aic.Q3_K_L.gguf filter=lfs diff=lfs merge=lfs -text
 astrollama-3-8b-base_aic.IQ4_XS.gguf filter=lfs diff=lfs merge=lfs -text
 astrollama-3-8b-base_aic.Q4_0.gguf filter=lfs diff=lfs merge=lfs -text
 astrollama-3-8b-base_aic.IQ4_NL.gguf filter=lfs diff=lfs merge=lfs -text
 astrollama-3-8b-base_aic.Q4_K_S.gguf filter=lfs diff=lfs merge=lfs -text
 astrollama-3-8b-base_aic.Q4_K.gguf filter=lfs diff=lfs merge=lfs -text
 astrollama-3-8b-base_aic.Q4_K_M.gguf filter=lfs diff=lfs merge=lfs -text
 astrollama-3-8b-base_aic.Q4_1.gguf filter=lfs diff=lfs merge=lfs -text
 astrollama-3-8b-base_aic.Q5_0.gguf filter=lfs diff=lfs merge=lfs -text
 astrollama-3-8b-base_aic.Q5_K_S.gguf filter=lfs diff=lfs merge=lfs -text
 astrollama-3-8b-base_aic.Q5_K.gguf filter=lfs diff=lfs merge=lfs -text
 astrollama-3-8b-base_aic.Q5_K_M.gguf filter=lfs diff=lfs merge=lfs -text
 astrollama-3-8b-base_aic.Q5_1.gguf filter=lfs diff=lfs merge=lfs -text
 astrollama-3-8b-base_aic.Q6_K.gguf filter=lfs diff=lfs merge=lfs -text
 astrollama-3-8b-base_aic.Q8_0.gguf filter=lfs diff=lfs merge=lfs -text
--- a/README.md
+++ b/README.md
@@ -0,0 +1,149 @@
 Quantization made by Richard Erkhov.
 [Github](https://github.com/RichardErkhov)
 [Discord](https://discord.gg/pvy7H8DZMG)
 [Request more models](https://github.com/RichardErkhov/quant_request)
 astrollama-3-8b-base_aic - GGUF
 - Model creator: https://huggingface.co/AstroMLab/
 - Original model: https://huggingface.co/AstroMLab/astrollama-3-8b-base_aic/
 | Name | Quant method | Size |
 | ---- | ---- | ---- |
 | [astrollama-3-8b-base_aic.Q2_K.gguf](https://huggingface.co/RichardErkhov/AstroMLab_-_astrollama-3-8b-base_aic-gguf/blob/main/astrollama-3-8b-base_aic.Q2_K.gguf) | Q2_K | 2.96GB |
 | [astrollama-3-8b-base_aic.Q3_K_S.gguf](https://huggingface.co/RichardErkhov/AstroMLab_-_astrollama-3-8b-base_aic-gguf/blob/main/astrollama-3-8b-base_aic.Q3_K_S.gguf) | Q3_K_S | 3.41GB |
 | [astrollama-3-8b-base_aic.Q3_K.gguf](https://huggingface.co/RichardErkhov/AstroMLab_-_astrollama-3-8b-base_aic-gguf/blob/main/astrollama-3-8b-base_aic.Q3_K.gguf) | Q3_K | 3.74GB |
 | [astrollama-3-8b-base_aic.Q3_K_M.gguf](https://huggingface.co/RichardErkhov/AstroMLab_-_astrollama-3-8b-base_aic-gguf/blob/main/astrollama-3-8b-base_aic.Q3_K_M.gguf) | Q3_K_M | 3.74GB |
 | [astrollama-3-8b-base_aic.Q3_K_L.gguf](https://huggingface.co/RichardErkhov/AstroMLab_-_astrollama-3-8b-base_aic-gguf/blob/main/astrollama-3-8b-base_aic.Q3_K_L.gguf) | Q3_K_L | 2.44GB |
 | [astrollama-3-8b-base_aic.IQ4_XS.gguf](https://huggingface.co/RichardErkhov/AstroMLab_-_astrollama-3-8b-base_aic-gguf/blob/main/astrollama-3-8b-base_aic.IQ4_XS.gguf) | IQ4_XS | 3.28GB |
 | [astrollama-3-8b-base_aic.Q4_0.gguf](https://huggingface.co/RichardErkhov/AstroMLab_-_astrollama-3-8b-base_aic-gguf/blob/main/astrollama-3-8b-base_aic.Q4_0.gguf) | Q4_0 | 4.34GB |
 | [astrollama-3-8b-base_aic.IQ4_NL.gguf](https://huggingface.co/RichardErkhov/AstroMLab_-_astrollama-3-8b-base_aic-gguf/blob/main/astrollama-3-8b-base_aic.IQ4_NL.gguf) | IQ4_NL | 4.38GB |
 | [astrollama-3-8b-base_aic.Q4_K_S.gguf](https://huggingface.co/RichardErkhov/AstroMLab_-_astrollama-3-8b-base_aic-gguf/blob/main/astrollama-3-8b-base_aic.Q4_K_S.gguf) | Q4_K_S | 4.37GB |
 | [astrollama-3-8b-base_aic.Q4_K.gguf](https://huggingface.co/RichardErkhov/AstroMLab_-_astrollama-3-8b-base_aic-gguf/blob/main/astrollama-3-8b-base_aic.Q4_K.gguf) | Q4_K | 4.58GB |
 | [astrollama-3-8b-base_aic.Q4_K_M.gguf](https://huggingface.co/RichardErkhov/AstroMLab_-_astrollama-3-8b-base_aic-gguf/blob/main/astrollama-3-8b-base_aic.Q4_K_M.gguf) | Q4_K_M | 4.58GB |
 | [astrollama-3-8b-base_aic.Q4_1.gguf](https://huggingface.co/RichardErkhov/AstroMLab_-_astrollama-3-8b-base_aic-gguf/blob/main/astrollama-3-8b-base_aic.Q4_1.gguf) | Q4_1 | 4.78GB |
 | [astrollama-3-8b-base_aic.Q5_0.gguf](https://huggingface.co/RichardErkhov/AstroMLab_-_astrollama-3-8b-base_aic-gguf/blob/main/astrollama-3-8b-base_aic.Q5_0.gguf) | Q5_0 | 5.21GB |
 | [astrollama-3-8b-base_aic.Q5_K_S.gguf](https://huggingface.co/RichardErkhov/AstroMLab_-_astrollama-3-8b-base_aic-gguf/blob/main/astrollama-3-8b-base_aic.Q5_K_S.gguf) | Q5_K_S | 5.21GB |
 | [astrollama-3-8b-base_aic.Q5_K.gguf](https://huggingface.co/RichardErkhov/AstroMLab_-_astrollama-3-8b-base_aic-gguf/blob/main/astrollama-3-8b-base_aic.Q5_K.gguf) | Q5_K | 5.34GB |
 | [astrollama-3-8b-base_aic.Q5_K_M.gguf](https://huggingface.co/RichardErkhov/AstroMLab_-_astrollama-3-8b-base_aic-gguf/blob/main/astrollama-3-8b-base_aic.Q5_K_M.gguf) | Q5_K_M | 5.34GB |
 | [astrollama-3-8b-base_aic.Q5_1.gguf](https://huggingface.co/RichardErkhov/AstroMLab_-_astrollama-3-8b-base_aic-gguf/blob/main/astrollama-3-8b-base_aic.Q5_1.gguf) | Q5_1 | 5.65GB |
 | [astrollama-3-8b-base_aic.Q6_K.gguf](https://huggingface.co/RichardErkhov/AstroMLab_-_astrollama-3-8b-base_aic-gguf/blob/main/astrollama-3-8b-base_aic.Q6_K.gguf) | Q6_K | 6.14GB |
 | [astrollama-3-8b-base_aic.Q8_0.gguf](https://huggingface.co/RichardErkhov/AstroMLab_-_astrollama-3-8b-base_aic-gguf/blob/main/astrollama-3-8b-base_aic.Q8_0.gguf) | Q8_0 | 7.95GB |
 Original model description:
 ---
 license: mit
 language:
 - en
 pipeline_tag: text-generation
 tags:
 - llama-3
 - astronomy
 - astrophysics
 - arxiv
 inference: false
 base_model:
 - meta-llama/Llama-3-8b-hf
 ---
 # AstroLLaMA-3-8B-Base_AIC
 AstroLLaMA-3-8B is a specialized base language model for astronomy, developed by fine-tuning Meta's LLaMA-3-8b architecture on astronomical literature. This model was developed by the AstroMLab team. It is designed for next token prediction tasks and is not an instruct/chat model.
 ## Model Details
 - **Base Architecture**: LLaMA-3-8b
 - **Training Data**: Abstract, Introduction, and Conclusion (AIC) sections from arXiv's astro-ph category papers
 - **Data Processing**: Optical character recognition (OCR) on PDF files using the Nougat tool, followed by summarization using Qwen-2-8B and LLaMA-3.1-8B.
 - **Fine-tuning Method**: Continual Pre-Training (CPT) using the LMFlow framework
 - **Training Details**:
  - Learning rate: 2 × 10⁻⁵
  - Total batch size: 96
  - Maximum token length: 512
  - Warmup ratio: 0.03
  - No gradient accumulation
  - BF16 format
  - Cosine decay schedule for learning rate reduction
  - Training duration: 1 epoch
 - **Primary Use**: Next token prediction for astronomy-related text generation and analysis
 - **Reference**: Pan et al. 2024 [Link to be added]
 ## Generating text from a prompt
 ```python
 from transformers import AutoModelForCausalLM, AutoTokenizer
 import torch
 # Load the model and tokenizer
 tokenizer = AutoTokenizer.from_pretrained("AstroMLab/astrollama-3-8b-base_aic")
 model = AutoModelForCausalLM.from_pretrained("AstroMLab/astrollama-3-8b-base_aic", device_map="auto")
 # Create the pipeline with explicit truncation
 from transformers import pipeline
 generator = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    device_map="auto",
    truncation=True,
    max_length=512
 )
 # Example prompt from an astronomy paper
 prompt = "In this letter, we report the discovery of the highest redshift, " \
    "heavily obscured, radio-loud QSO candidate selected using JWST NIRCam/MIRI, " \
    "mid-IR, sub-mm, and radio imaging in the COSMOS-Web field. "
 # Set seed for reproducibility
 torch.manual_seed(42)
 # Generate text
 generated_text = generator(prompt, do_sample=True)
 print(generated_text[0]['generated_text'])
 ```
 ## Model Limitations and Biases
 A key limitation identified during the development of this model is that training solely on astro-ph data may not be sufficient to significantly improve performance over the base model, especially for the already highly performant LLaMA-3 series. This suggests that to achieve substantial gains, future iterations may need to incorporate a broader range of high-quality astronomical data beyond arXiv, such as textbooks, Wikipedia, and curated summaries.
 Here's a performance comparison chart based upon the astronomical benchmarking Q&A as described in [Ting et al. 2024](https://arxiv.org/abs/2407.11194), and Pan et al. 2024:
 | Model | Score (%) |
 |-------|-----------|
 | LLaMA-3.1-8B | 73.7 |
 | LLaMA-3-8B | 72.9 |
 | **<span style="color:green">AstroLLaMA-3-8B-Base_AIC (AstroMLab)</span>** | **<span style="color:green">71.9</span>** |
 | Gemma-2-9B | 71.5 |
 | Qwen-2.5-7B | 70.4 |
 | Yi-1.5-9B | 68.4 |
 | InternLM-2.5-7B | 64.5 |
 | Mistral-7B-v0.3 | 63.9 |
 | ChatGLM3-6B | 50.4 |
 | AstroLLaMA-2-7B-AIC | 44.3 |
 | AstroLLaMA-2-7B-Abstract | 43.5 |
 As shown, while AstroLLaMA-3-8B performs competitively among models in its class, it does not surpass the performance of the base LLaMA-3-8B model. This underscores the challenges in developing specialized models and the need for more diverse and comprehensive training data.
 This model is released primarily for reproducibility purposes, allowing researchers to track the development process and compare different iterations of AstroLLaMA models.
 For optimal performance and the most up-to-date capabilities in astronomy-related tasks, we recommend using AstroSage-8B, where these limitations have been addressed. The newer model incorporates expanded training data beyond astro-ph and features a greatly expanded fine-tuning process, resulting in significantly improved performance.
 ## Ethical Considerations
 While this model is designed for scientific use, users should be mindful of potential misuse, such as generating misleading scientific content. Always verify model outputs against peer-reviewed sources for critical applications.
 ## Citation
 If you use this model in your research, please cite:
 ```
 [Citation for Pan et al. 2024 to be added]
 ```
--- a/astrollama-3-8b-base_aic.IQ4_NL.gguf
+++ b/astrollama-3-8b-base_aic.IQ4_NL.gguf
@@ -0,0 +1,3 @@
 version https://git-lfs.github.com/spec/v1
 oid sha256:0bfb0ea87025e86d4fc2bb4fd261d426bb42f37765f7475bc2c8daee8ee8301b
 size 4707349056
--- a/astrollama-3-8b-base_aic.IQ4_XS.gguf
+++ b/astrollama-3-8b-base_aic.IQ4_XS.gguf
@@ -0,0 +1,3 @@
 version https://git-lfs.github.com/spec/v1
 oid sha256:4f55515bcd9bf338b47c0b308edc6fbd12e1f34dd0fd75d743a06d6e3009275d
 size 3516661760
--- a/astrollama-3-8b-base_aic.Q2_K.gguf
+++ b/astrollama-3-8b-base_aic.Q2_K.gguf
@@ -0,0 +1,3 @@
 version https://git-lfs.github.com/spec/v1
 oid sha256:137bf1824913f4a9ecd458a34ed511fc2fee57d5c845f56ecc0c8347bf22c5de
 size 3179131456
--- a/astrollama-3-8b-base_aic.Q3_K.gguf
+++ b/astrollama-3-8b-base_aic.Q3_K.gguf
@@ -0,0 +1,3 @@
 version https://git-lfs.github.com/spec/v1
 oid sha256:dba5f5804ca04b6fe2411ea4b5b4c512312accd5a8e4584bea1058102d7db66e
 size 4018917952
--- a/astrollama-3-8b-base_aic.Q3_K_L.gguf
+++ b/astrollama-3-8b-base_aic.Q3_K_L.gguf
@@ -0,0 +1,3 @@
 version https://git-lfs.github.com/spec/v1
 oid sha256:34486e63f77e54bde1398b1b7f25f5d33bc0d907278234d993f1bf78e8245c77
 size 2618556416
--- a/astrollama-3-8b-base_aic.Q3_K_M.gguf
+++ b/astrollama-3-8b-base_aic.Q3_K_M.gguf
@@ -0,0 +1,3 @@
 version https://git-lfs.github.com/spec/v1
 oid sha256:dba5f5804ca04b6fe2411ea4b5b4c512312accd5a8e4584bea1058102d7db66e
 size 4018917952
--- a/astrollama-3-8b-base_aic.Q3_K_S.gguf
+++ b/astrollama-3-8b-base_aic.Q3_K_S.gguf
@@ -0,0 +1,3 @@
 version https://git-lfs.github.com/spec/v1
 oid sha256:170015236125b19caff4069271f79f432aacdda88a81c30538682a9e339ce4eb
 size 3664499264
--- a/astrollama-3-8b-base_aic.Q4_0.gguf
+++ b/astrollama-3-8b-base_aic.Q4_0.gguf
@@ -0,0 +1,3 @@
 version https://git-lfs.github.com/spec/v1
 oid sha256:07ce961a973d5fa8ca374a7086357fe836bf7a05dcdab67e0163e0cd88ce0264
 size 4661211712
--- a/astrollama-3-8b-base_aic.Q4_1.gguf
+++ b/astrollama-3-8b-base_aic.Q4_1.gguf
@@ -0,0 +1,3 @@
 version https://git-lfs.github.com/spec/v1
 oid sha256:ea0486f10ea285e67360eea16136e1c1b7d93c21be180b4e85164215faf60691
 size 5130252864
--- a/astrollama-3-8b-base_aic.Q4_K.gguf
+++ b/astrollama-3-8b-base_aic.Q4_K.gguf
@@ -0,0 +1,3 @@
 version https://git-lfs.github.com/spec/v1
 oid sha256:7b8dd11b31f12a7ccd20e0a95c3a344cc5fd2cc304576bc8bf6c14e98cfb03f8
 size 4920734272
--- a/astrollama-3-8b-base_aic.Q4_K_M.gguf
+++ b/astrollama-3-8b-base_aic.Q4_K_M.gguf
@@ -0,0 +1,3 @@
 version https://git-lfs.github.com/spec/v1
 oid sha256:7b8dd11b31f12a7ccd20e0a95c3a344cc5fd2cc304576bc8bf6c14e98cfb03f8
 size 4920734272
--- a/astrollama-3-8b-base_aic.Q4_K_S.gguf
+++ b/astrollama-3-8b-base_aic.Q4_K_S.gguf
@@ -0,0 +1,3 @@
 version https://git-lfs.github.com/spec/v1
 oid sha256:7a87db2239aa4b3e8b408e26f16c6e3b090e8c97054cf74a14f3f1414abf5241
 size 4692668992
--- a/astrollama-3-8b-base_aic.Q5_0.gguf
+++ b/astrollama-3-8b-base_aic.Q5_0.gguf
@@ -0,0 +1,3 @@
 version https://git-lfs.github.com/spec/v1
 oid sha256:dcca1df52f4508147d0de59717e7202b2c7d8777cc645f9aa9ad5a8d7d6426de
 size 5599294016
--- a/astrollama-3-8b-base_aic.Q5_1.gguf
+++ b/astrollama-3-8b-base_aic.Q5_1.gguf
@@ -0,0 +1,3 @@
 version https://git-lfs.github.com/spec/v1
 oid sha256:70a176b8610d827518f7d99fae0d7813ce6e9914cde54e11af5eeae11d87fbf9
 size 6068335168
--- a/astrollama-3-8b-base_aic.Q5_K.gguf
+++ b/astrollama-3-8b-base_aic.Q5_K.gguf
@@ -0,0 +1,3 @@
 version https://git-lfs.github.com/spec/v1
 oid sha256:12a6724a6e4d246b3e18dcdbdf56d043d92df40def59270017a06a6e4f66d3db
 size 5732987456
--- a/astrollama-3-8b-base_aic.Q5_K_M.gguf
+++ b/astrollama-3-8b-base_aic.Q5_K_M.gguf
@@ -0,0 +1,3 @@
 version https://git-lfs.github.com/spec/v1
 oid sha256:12a6724a6e4d246b3e18dcdbdf56d043d92df40def59270017a06a6e4f66d3db
 size 5732987456
--- a/astrollama-3-8b-base_aic.Q5_K_S.gguf
+++ b/astrollama-3-8b-base_aic.Q5_K_S.gguf
@@ -0,0 +1,3 @@
 version https://git-lfs.github.com/spec/v1
 oid sha256:8e0eb23968bcc2f345d21ff7ff921e437ed49f1e0248b53ab794370e241a737f
 size 5599294016
--- a/astrollama-3-8b-base_aic.Q6_K.gguf
+++ b/astrollama-3-8b-base_aic.Q6_K.gguf
@@ -0,0 +1,3 @@
 version https://git-lfs.github.com/spec/v1
 oid sha256:7ee76d4eb5c03163405ca4f9d40165c975c49b4950ae01527b0909937cd6e18b
 size 6596006464
--- a/astrollama-3-8b-base_aic.Q8_0.gguf
+++ b/astrollama-3-8b-base_aic.Q8_0.gguf
@@ -0,0 +1,3 @@
 version https://git-lfs.github.com/spec/v1
 oid sha256:2aafd8fb4b129333cae274820842e1a571f50cdb8d68da25297bf69a3d06c571
 size 8540770880