初始化项目，由ModelHub XC社区提供模型

Model: tifa-benchmark/llama2_tifa_question_generation Source: Original Platform
2026-05-07 02:00:50 +08:00
commit 111c48a5eb
10 changed files with 93982 additions and 0 deletions
--- a/README.md
+++ b/README.md
@@ -0,0 +1,131 @@
+---
+license: apache-2.0
+inference: true
+widget:
+ - text: "<s>[INST] <<SYS>>\nGiven an image description, generate one or two multiple-choice questions that verifies if the image description is correct.\nClassify each concept into a type (object, human, animal, food, activity, attribute, counting, color, material, spatial, location, shape, other), and then generate a question for each type.\n\n<</SYS>>\n\nDescription: a blue rabbit and a red plane [/INST] Entities:"
+pipeline_tag: text-generation
+tags:
+- text-generation-inference
+- llama2
+- text-to-image
+datasets:
+- TIFA
+language:
+- en
+---
+Project page: <https://tifa-benchmark.github.io/>
+
+This is the text parsing and question generation model for the ICCV 2023 paper [TIFA: Accurate and Interpretable Text-to-Image Faithfulness Evaluation with Question Answering](https://arxiv.org/abs/2303.11897)
+
+We introduce TIFA (Text-to-Image Faithfulness evaluation with question Answering), an automatic evaluation metric that measures the faithfulness of a generated image to its text input via visual question answering (VQA). Specifically, given a text input, we automatically generate several question-answer pairs using a language model. We calculate image faithfulness by checking whether existing VQA models can answer these questions using the generated image. 
+
+Specifically, this fine-tuned LLaMA 2 model is the substitute for the GPT-3 model in the paper. It can parse an arbitrary prompt into visual entities, attributes, relations, etc. and generate question-answer tuples for each of them. See examples below.
+
+
+# QuickStart
+
+All codes are from <https://github.com/Yushi-Hu/tifa>. Clone this repo to easily use this model together with other modules (e.g. VQA) provided in TIFA.
+
+Please follow the prompt format, which will give the best performance.
+
+
+```python
+import torch
+import transformers
+
+# prepare the LLaMA 2 model
+model_name = "tifa-benchmark/llama2_tifa_question_generation"
+pipeline = transformers.pipeline(
+    "text-generation",
+    model=model_name,
+    torch_dtype=torch.float16,
+    device_map="auto",
+)
+
+
+# formating prompt following LLaMA 2 style
+def create_qg_prompt(caption):
+    INTRO_BLURB = "Given an image description, generate one or two multiple-choice questions that verifies if the image description is correct.\nClassify each concept into a type (object, human, animal, food, activity, attribute, counting, color, material, spatial, location, shape, other), and then generate a question for each type.\n"
+    formated_prompt = f"<s>[INST] <<SYS>>\n{INTRO_BLURB}\n<</SYS>>\n\n"
+    formated_prompt += f"Description: {caption} [/INST] Entities:"
+    return formated_prompt
+
+
+test_caption = "a blue rabbit and a red plane"
+
+# create prompt
+prompt = create_qg_prompt(text_caption)
+
+# text completion
+sequences = pipeline(
+        prompt, do_sample=False, num_beams=5, num_return_sequences=1, max_length=512)
+output = sequences[0]['generated_text'][len(prompt):]
+output = output.split('\n\n')[0]
+
+# output
+print(output)
+
+#### Expected output ###
+#  rabbit, plane
+# Activites:
+# Colors: blue, red
+# Counting:
+# Other attributes:
+# About rabbit (animal):
+# Q: is this a rabbit?
+# Choices: yes, no
+# A: yes
+# About rabbit (animal):
+# Q: what animal is in the picture?
+# Choices: rabbit, dog, cat, fish
+# A: rabbit
+# About plane (object):
+# Q: is this a plane?
+# Choices: yes, no
+# A: yes
+# About plane (object):
+# Q: what type of vehicle is this?
+# Choices: plane, car, motorcycle, bus
+# A: plane
+# About blue (color):
+# Q: is the rabbit blue?
+# Choices: yes, no
+# A: yes
+# About blue (color):
+# Q: what color is the rabbit?
+# Choices: blue, red, yellow, green
+# A: blue
+# About red (color):
+# Q: is the plane red?
+# Choices: yes, no
+# A: yes
+# About red (color):
+# Q: what color is the plane?
+# Choices: red, blue, yellow, green
+# A: red
+```
+
+# Use this LM under tifascore package
+
+tifascore provides extra functions to parse this output etc. First install tifascore according to <https://github.com/Yushi-Hu/tifa>. Then the usage is below
+
+```python
+from tifascore import get_llama2_pipeline, get_llama2_question_and_answers
+
+pipeline = get_llama2_pipeline("tifa-benchmark/llama2_tifa_question_generation")
+
+print(get_llama2_question_and_answers(pipeline, "a blue rabbit and a red plane"))
+
+#### Expected output ###
+# [{'caption': 'a blue rabbit and a red plane', 'element': 'rabbit', 'question': 'what animal is in the picture?', 'choices': ['rabbit', 'dog', 'cat', 'fish'], 'answer': 'rabbit', 'element_type': 'animal/human'}, {'caption': 'a blue rabbit and a red plane', 'element': 'plane', 'question': 'is this a plane?', 'choices': ['yes', 'no'], 'answer': 'yes', 'element_type': 'object'}, {'caption': 'a blue rabbit and a red plane', 'element': 'plane', 'question': 'what type of vehicle is this?', 'choices': ['plane', 'car', 'motorcycle', 'bus'], 'answer': 'plane', 'element_type': 'object'}, {'caption': 'a blue rabbit and a red plane', 'element': 'blue', 'question': 'is the rabbit blue?', 'choices': ['yes', 'no'], 'answer': 'yes', 'element_type': 'color'}, {'caption': 'a blue rabbit and a red plane', 'element': 'blue', 'question': 'what color is the rabbit?', 'choices': ['blue', 'red', 'yellow', 'green'], 'answer': 'blue', 'element_type': 'color'}, {'caption': 'a blue rabbit and a red plane', 'element': 'red', 'question': 'is the plane red?', 'choices': ['yes', 'no'], 'answer': 'yes', 'element_type': 'color'}, {'caption': 'a blue rabbit and a red plane', 'element': 'red', 'question': 'what color is the plane?', 'choices': ['red', 'blue', 'yellow', 'green'], 'answer': 'red', 'element_type': 'color'}]
+```
+
+## Bibtex
+```
+@article{hu2023tifa,
+  title={Tifa: Accurate and interpretable text-to-image faithfulness evaluation with question answering},
+  author={Hu, Yushi and Liu, Benlin and Kasai, Jungo and Wang, Yizhong and Ostendorf, Mari and Krishna, Ranjay and Smith, Noah A},
+  journal={arXiv preprint arXiv:2303.11897},
+  year={2023}
+}
+```