初始化项目，由ModelHub XC社区提供模型

Model: mrdbourke/FoodExtract-gemma-3-270m-fine-tune-v2 Source: Original Platform
2026-05-31 22:00:21 +08:00
commit 4931da09a0
12 changed files with 51788 additions and 0 deletions
--- a/.gitattributes
+++ b/.gitattributes
@@ -0,0 +1,36 @@
 *.7z filter=lfs diff=lfs merge=lfs -text
 *.arrow filter=lfs diff=lfs merge=lfs -text
 *.bin filter=lfs diff=lfs merge=lfs -text
 *.bz2 filter=lfs diff=lfs merge=lfs -text
 *.ckpt filter=lfs diff=lfs merge=lfs -text
 *.ftz filter=lfs diff=lfs merge=lfs -text
 *.gz filter=lfs diff=lfs merge=lfs -text
 *.h5 filter=lfs diff=lfs merge=lfs -text
 *.joblib filter=lfs diff=lfs merge=lfs -text
 *.lfs.* filter=lfs diff=lfs merge=lfs -text
 *.mlmodel filter=lfs diff=lfs merge=lfs -text
 *.model filter=lfs diff=lfs merge=lfs -text
 *.msgpack filter=lfs diff=lfs merge=lfs -text
 *.npy filter=lfs diff=lfs merge=lfs -text
 *.npz filter=lfs diff=lfs merge=lfs -text
 *.onnx filter=lfs diff=lfs merge=lfs -text
 *.ot filter=lfs diff=lfs merge=lfs -text
 *.parquet filter=lfs diff=lfs merge=lfs -text
 *.pb filter=lfs diff=lfs merge=lfs -text
 *.pickle filter=lfs diff=lfs merge=lfs -text
 *.pkl filter=lfs diff=lfs merge=lfs -text
 *.pt filter=lfs diff=lfs merge=lfs -text
 *.pth filter=lfs diff=lfs merge=lfs -text
 *.rar filter=lfs diff=lfs merge=lfs -text
 *.safetensors filter=lfs diff=lfs merge=lfs -text
 saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *.tar.* filter=lfs diff=lfs merge=lfs -text
 *.tar filter=lfs diff=lfs merge=lfs -text
 *.tflite filter=lfs diff=lfs merge=lfs -text
 *.tgz filter=lfs diff=lfs merge=lfs -text
 *.wasm filter=lfs diff=lfs merge=lfs -text
 *.xz filter=lfs diff=lfs merge=lfs -text
 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
 tokenizer.json filter=lfs diff=lfs merge=lfs -text
--- a/README.md
+++ b/README.md
@@ -0,0 +1,245 @@
 ---
 base_model: google/gemma-3-270m-it
 library_name: transformers
 model_name: checkpoint_models
 tags:
 - generated_from_trainer
 - sft
 - trl
 license: gemma
 ---
 # FoodExtract-v2
 This is a food and drink extraction language model built on [Gemma 3 270M](https://huggingface.co/google/gemma-3-270m-it).
 Given raw text, it's designed to:
 1. Classify the text into food or drink (e.g. "a photo of a dog" = not food or drink, "a photo of a pizza" = food or drink).
 2. Tag the text with one or more tags (see tags_dict below).
 3. Extract the edible food-related items as a list.
 4. Extract the edible drink-related items as a list.
 For example, the input text might be:
 ```
 British Breakfast with baked beans, fried eggs, black pudding, sausages, bacon, mushrooms, a cup of tea and toast and fried tomatoes
 ```
 And the model will generate:
 ```
 food_or_drink: 1
 tags: fi, di
 foods: British Breakfast, baked beans, fried eggs, black pudding, sausages, bacon, mushrooms, toast, fried tomatoes
 drinks: tea
 ```
 This model can be used for filtering a large image caption (e.g. [DataComp-1B](https://huggingface.co/datasets/UCSC-VLAA/Recap-DataComp-1B)) text dataset for food and drink related items.
 ## Dataset
 The model was trained on the [FoodExtract-135k](https://huggingface.co/datasets/mrdbourke/FoodExtract-135k) dataset.
 This dataset contains 135,000 samples of raw text and JSON output pairs of structured food extractions provided by `gpt-oss-120b`.
 For example, a raw image caption input might be:
 ```
 another optional quest takes place on windfall island during the night time play the song of passing a number of times and each time, glance towards the sky
 ```
 And the `gpt-oss-120b` generated output (JSON) would be:
 ```
 {'is_food_or_drink': 'false', 'tags': [], 'food_items': [], 'drink_items': []}
 ```
 This is condensed to:
 ```
 food_or_drink: 0\ntags: \nfoods: \ndrinks:
 ```
 ### Tags dictionary mapping
 These tags are designed for fast filtering.
 For example, the model can assign a certain tag based on what's in the raw text and then we can filter for "ingredient list" items.
 ```
 tags_dict = {'np': 'nutrition_panel',
 'il': 'ingredient list',
 'me': 'menu',
 're': 'recipe',
 'fi': 'food_items',
 'di': 'drink_items',
 'fa': 'food_advertistment',
 'fp': 'food_packaging'}
 ```
 ## Helper functions
 The model is trained to output a condensed version of the structured data.
 We do this so the model can generate less tokens (e.g. it doesn't have to generate JSON outputs).
 The following functions help to condense and uncondense raw text outputs/inputs into the desired structure.
 ```python
 def condense_output(original_output):
    '''Helper function to condense a given FoodExtract string.
    Example input: {'is_food_or_drink': True, 'tags': ['fi'], 'food_items': ['cape gooseberries', 'mulberry', 'chilli powder', 'flathead lobster', 'hoisin sauce', 'duck leg', 'chestnuts', 'raw quail', 'duck breast', 'rogan josh curry sauce', 'brown rice', 'dango'], 'drink_items': []}
    Example output: food_or_drink: 1\ntags: fi\nfoods: cape gooseberries, mulberry, chilli powder, flathead lobster, hoisin sauce, duck leg, chestnuts, raw quail, duck breast, rogan josh curry sauce, brown rice, dango\ndrinks:'''
    condensed_output_string_base = '''food_or_drink: <is_food_or_drink>
    tags: <output_tags>
    foods: <food_items>
    drinks: <drink_items>'''
    is_food_or_drink = str(1) if str(original_output["is_food_or_drink"]).lower() == "true" else str(0)
    tags = ", ".join(original_output["tags"]) if len(original_output["tags"]) > 0 else ""
    foods = ", ".join(original_output["food_items"]) if len(original_output["food_items"]) > 0 else ""
    drinks = ", ".join(original_output["drink_items"]) if len(original_output["drink_items"]) > 0 else ""
    condensed_output_string_formatted = condensed_output_string_base.replace("<is_food_or_drink>", is_food_or_drink).replace("<output_tags>", tags).replace("<food_items>", foods).replace("<drink_items>", drinks)
    return condensed_output_string_formatted.strip()
 def uncondense_output(condensed_output):
    '''Helper to go from condensed output to uncondensed output.
    Example input: food_or_drink: 1\ntags: fi\nfoods: cape gooseberries, mulberry, chilli powder, flathead lobster, hoisin sauce, duck leg, chestnuts, raw quail, duck breast, rogan josh curry sauce, brown rice, dango\ndrinks:
    Example output: {'is_food_or_drink': True, 'tags': ['fi'], 'food_items': ['cape gooseberries', 'mulberry', 'chilli powder', 'flathead lobster', 'hoisin sauce', 'duck leg', 'chestnuts', 'raw quail', 'duck breast', 'rogan josh curry sauce', 'brown rice', 'dango'], 'drink_items': []}
    '''
    condensed_list = condensed_output.split("\n")
    condensed_dict_base = {
        "is_food_or_drink": "",
        "tags": [],
        "food_items": [],
        "drink_items": []
    }
    # Set values to defaults
    food_or_drink_item = None
    tags_item = None
    foods_item = None
    drinks_item = None
    # Extract items from condensed_list
    for item in condensed_list:
        if "food_or_drink:" in item.strip():
            food_or_drink_item = item
        if "tags:" in item:
            tags_item = item
        if "foods:" in item:
            foods_item = item
        if "drinks:" in item:
            drinks_item = item
    if food_or_drink_item:
        is_food_or_drink_bool = True if food_or_drink_item.replace("food_or_drink: ", "").strip() == "1" else False
    else:
        is_food_or_drink_bool = None
    if tags_item:
        tags_list = [item.replace("tags: ", "").replace("tags:", "").strip() for item in tags_item.split(", ")]
        tags_list = [item for item in tags_list if item] # Filter for empty items
    else:
        tags_list = []
    if foods_item:
        foods_list = [item.replace("foods:", "").replace("foods: ", "").strip() for item in foods_item.split(", ")]
        foods_list = [item for item in foods_list if item] # Filter for empty items
    else:
        foods_list = []
    if drinks_item:
        drinks_list = [item.replace("drinks:", "").replace("drinks: ", "").strip() for item in drinks_item.split(", ")]
        drinks_list = [item for item in drinks_list if item] # Filter for empty items
    else:
        drinks_list = []
    condensed_dict_base["is_food_or_drink"] = is_food_or_drink_bool
    condensed_dict_base["tags"] = tags_list
    condensed_dict_base["food_items"] = foods_list
    condensed_dict_base["drink_items"] = drinks_list
    return condensed_dict_base
 ````
 ## Quick start
 ```python
 from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline
 MODEL_PATH = "mrdbourke/FoodExtract-gemma-3-270m-fine-tune-v2"
 # Load the model into a pipeline
 loaded_model = AutoModelForCausalLM.from_pretrained(
    pretrained_model_name_or_path=MODEL_PATH,
    dtype="auto",
    device_map="auto",
    attn_implementation="eager"
 )
 # Load the tokenizer
 tokenizer = AutoTokenizer.from_pretrained(
    pretrained_model_name_or_path=MODEL_PATH,
 )
 # Create model pipeline
 loaded_model_pipeline = pipeline("text-generation",
                                 model=loaded_model,
                                 tokenizer=tokenizer)
 # Create a sample to predict on
 input_text = "A plate with bacon, eggs and toast on it"
 input_text_user = [{'content': input_text, 'role': 'user'}]
 # Apply the chat template
 input_prompt = loaded_model_pipeline.tokenizer.apply_chat_template(conversation=input_text_user,
                                                                    tokenize=False,
                                                                    add_generation_prompt=True)
 # Let's run the default model on our input
 default_outputs = loaded_model_pipeline(text_inputs=input_prompt, 
                                        max_new_tokens=256)
 # View the outputs
 print(f"[INFO] Test sample input:\n{input_prompt}\n")
 print(f"[INFO] Fine-tuned model output:\n{default_outputs[0]['generated_text'][len(input_prompt):]}\n")
 ```
 You should see an output similar to:
 ```
 [INFO] Test sample input:
 <bos><start_of_turn>user
 A plate with bacon, eggs and toast on it<end_of_turn>
 <start_of_turn>model
 [INFO] Fine-tuned model output:
 food_or_drink: 1
 tags: fi
 foods: bacon, eggs, toast
 drinks:
 ```
 ## Training procedure
 This model was trained with SFT (Supervised Fine-Tuning) via Hugging Face's TRL library.
 See the full training walkthrough at: https://www.learnhuggingface.com/notebooks/hugging_face_llm_full_fine_tune_tutorial
 ## Citations
 * Reference for structured data extraction was taken from the paper (tk- paper link to the paper which used a small qwen model to extract structured data from webpages)
--- a/added_tokens.json
+++ b/added_tokens.json
@@ -0,0 +1,3 @@
 {
  "<image_soft_token>": 262144
 }
--- a/chat_template.jinja
+++ b/chat_template.jinja
@@ -0,0 +1,47 @@
 {{ bos_token }}
 {%- if messages[0]['role'] == 'system' -%}
    {%- if messages[0]['content'] is string -%}
        {%- set first_user_prefix = messages[0]['content'] + '
 ' -%}
    {%- else -%}
        {%- set first_user_prefix = messages[0]['content'][0]['text'] + '
 ' -%}
    {%- endif -%}
    {%- set loop_messages = messages[1:] -%}
 {%- else -%}
    {%- set first_user_prefix = "" -%}
    {%- set loop_messages = messages -%}
 {%- endif -%}
 {%- for message in loop_messages -%}
    {%- if (message['role'] == 'user') != (loop.index0 % 2 == 0) -%}
        {{ raise_exception("Conversation roles must alternate user/assistant/user/assistant/...") }}
    {%- endif -%}
    {%- if (message['role'] == 'assistant') -%}
        {%- set role = "model" -%}
    {%- else -%}
        {%- set role = message['role'] -%}
    {%- endif -%}
    {{ '<start_of_turn>' + role + '
 ' + (first_user_prefix if loop.first else "") }}
    {%- if message['content'] is string -%}
        {{ message['content'] | trim }}
    {%- elif message['content'] is iterable -%}
        {%- for item in message['content'] -%}
            {%- if item['type'] == 'image' -%}
                {{ '<start_of_image>' }}
            {%- elif item['type'] == 'text' -%}
                {{ item['text'] | trim }}
            {%- endif -%}
        {%- endfor -%}
    {%- else -%}
        {{ raise_exception("Invalid content type") }}
    {%- endif -%}
    {{ '<end_of_turn>
 ' }}
 {%- endfor -%}
 {%- if add_generation_prompt -%}
    {{'<start_of_turn>model
 '}}
 {%- endif -%}
--- a/config.json
+++ b/config.json
@@ -0,0 +1,54 @@
 {
  "_sliding_window_pattern": 6,
  "architectures": [
    "Gemma3ForCausalLM"
  ],
  "attention_bias": false,
  "attention_dropout": 0.0,
  "attn_logit_softcapping": null,
  "bos_token_id": 2,
  "dtype": "bfloat16",
  "eos_token_id": 1,
  "final_logit_softcapping": null,
  "head_dim": 256,
  "hidden_activation": "gelu_pytorch_tanh",
  "hidden_size": 640,
  "initializer_range": 0.02,
  "intermediate_size": 2048,
  "layer_types": [
    "sliding_attention",
    "sliding_attention",
    "sliding_attention",
    "sliding_attention",
    "sliding_attention",
    "full_attention",
    "sliding_attention",
    "sliding_attention",
    "sliding_attention",
    "sliding_attention",
    "sliding_attention",
    "full_attention",
    "sliding_attention",
    "sliding_attention",
    "sliding_attention",
    "sliding_attention",
    "sliding_attention",
    "full_attention"
  ],
  "max_position_embeddings": 32768,
  "model_type": "gemma3_text",
  "num_attention_heads": 4,
  "num_hidden_layers": 18,
  "num_key_value_heads": 1,
  "pad_token_id": 0,
  "query_pre_attn_scalar": 256,
  "rms_norm_eps": 1e-06,
  "rope_local_base_freq": 10000.0,
  "rope_scaling": null,
  "rope_theta": 1000000.0,
  "sliding_window": 512,
  "transformers_version": "4.57.6",
  "use_bidirectional_attention": false,
  "use_cache": true,
  "vocab_size": 262144
 }
--- a/generation_config.json
+++ b/generation_config.json
@@ -0,0 +1,13 @@
 {
  "bos_token_id": 2,
  "cache_implementation": "hybrid",
  "do_sample": true,
  "eos_token_id": [
    1,
    106
  ],
  "pad_token_id": 0,
  "top_k": 64,
  "top_p": 0.95,
  "transformers_version": "4.57.6"
 }
--- a/model.safetensors
+++ b/model.safetensors
@@ -0,0 +1,3 @@
 version https://git-lfs.github.com/spec/v1
 oid sha256:c3228e8e3184ab1d338259947e94cd16b34d03f7c7bbb16a3a7ff674b5d2026d
 size 536223056
--- a/special_tokens_map.json
+++ b/special_tokens_map.json
@@ -0,0 +1,33 @@
 {
  "boi_token": "<start_of_image>",
  "bos_token": {
    "content": "<bos>",
    "lstrip": false,
    "normalized": false,
    "rstrip": false,
    "single_word": false
  },
  "eoi_token": "<end_of_image>",
  "eos_token": {
    "content": "<eos>",
    "lstrip": false,
    "normalized": false,
    "rstrip": false,
    "single_word": false
  },
  "image_token": "<image_soft_token>",
  "pad_token": {
    "content": "<pad>",
    "lstrip": false,
    "normalized": false,
    "rstrip": false,
    "single_word": false
  },
  "unk_token": {
    "content": "<unk>",
    "lstrip": false,
    "normalized": false,
    "rstrip": false,
    "single_word": false
  }
 }
--- a/tokenizer.json
+++ b/tokenizer.json
@@ -0,0 +1,3 @@
 version https://git-lfs.github.com/spec/v1
 oid sha256:4667f2089529e8e7657cfb6d1c19910ae71ff5f28aa7ab2ff2763330affad795
 size 33384568
--- a/tokenizer.model
+++ b/tokenizer.model
@@ -0,0 +1,3 @@
 version https://git-lfs.github.com/spec/v1
 oid sha256:1299c11d7cf632ef3b4e11937501358ada021bbdf7c47638d13c0ee982f2e79c
 size 4689074
--- a/tokenizer_config.json
+++ b/tokenizer_config.json
--- a/training_args.bin
+++ b/training_args.bin
@@ -0,0 +1,3 @@
 version https://git-lfs.github.com/spec/v1
 oid sha256:425a3fd403a0e8d6ff08415fe1a9ef490e9da0044ee3de86881d08ada8e2e7f6
 size 6225