初始化项目,由ModelHub XC社区提供模型
Model: IAAR-Shanghai/MemReader-4B-thinking Source: Original Platform
This commit is contained in:
459
README.md
Normal file
459
README.md
Normal file
@@ -0,0 +1,459 @@
|
||||
---
|
||||
license: apache-2.0
|
||||
language:
|
||||
- en
|
||||
- zh
|
||||
pipeline_tag: text-generation
|
||||
tags:
|
||||
- qwen3
|
||||
- memory
|
||||
- memory-extraction
|
||||
- tool-calling
|
||||
- reasoning
|
||||
- agent
|
||||
base_model:
|
||||
- Qwen/Qwen3-4B
|
||||
---
|
||||
|
||||
# MemReader-4B-thinking
|
||||
|
||||
## Introduction
|
||||
|
||||
MemReader-4B-thinking is a 4B language model for long-term agent memory management. Instead of treating memory writing as a one-step structured extraction task, it formulates memory construction as a reasoning-and-action process: the model first evaluates whether incoming information is valuable, complete, and unambiguous, and then selects one of four memory operations:
|
||||
|
||||
- `add_memory`: write useful and complete information into long-term memory
|
||||
- `search_memory`: retrieve historical memory for disambiguation
|
||||
- `buffer_memory`: temporarily hold incomplete but potentially valuable information
|
||||
- `ignore_memory`: discard low-value or repetitive content
|
||||
|
||||
Built on top of [Qwen/Qwen3-4B](https://huggingface.co/Qwen/Qwen3-4B), MemReader-4B-thinking is further optimized for memory management with supervised fine-tuning and GRPO. It is designed for long-horizon dialogue systems, personalized assistants, and agent frameworks that require low-noise, updatable, and retrievable long-term memory.
|
||||
|
||||
## News
|
||||
|
||||
- MemReader-4B-thinking is released as an open model for active memory management.
|
||||
- The model is designed for tool-calling workflows and memory-centric agent systems.
|
||||
- It is part of the MemReader family introduced in the paper *MemReader: Active Memory Management for Long-Term Agent Memory*.
|
||||
|
||||
## Usage
|
||||
|
||||
- Model ID: `IAAR-Shanghai/MemReader-4B-thinking`
|
||||
- Base model: `Qwen/Qwen3-4B`
|
||||
- Primary use: long-term memory extraction and memory management for agents
|
||||
- Inference modes: `transformers`, OpenAI-compatible serving, `vLLM`, and SGLang
|
||||
|
||||
## Citation
|
||||
|
||||
If you use MemReader in your research or product, please cite:
|
||||
|
||||
```bibtex
|
||||
@misc{kang2025memreader,
|
||||
title={MemReader: Active Memory Management for Long-Term Agent Memory},
|
||||
author={Kang, Jingyi and Li, Chunyu and Chen, Ding and Tang, Bo and Xiong, Feiyu and Li, Zhiyu},
|
||||
year={2026},
|
||||
note={Manuscript in preparation}
|
||||
}
|
||||
```
|
||||
|
||||
## Highlights
|
||||
|
||||
- Active memory management instead of passive memory extraction
|
||||
- Explicit reasoning with thinking traces and tool calls
|
||||
- Strong performance on ambiguity resolution, knowledge update, and temporal reasoning
|
||||
- Native fit for OpenAI-style tool-calling workflows
|
||||
- Efficient local deployment with a 4B parameter footprint
|
||||
- Designed for integration with memory-centric agent systems such as MemOS
|
||||
|
||||
## What Makes MemReader Different
|
||||
|
||||
Most memory pipelines directly convert the current dialogue into JSON memories. In realistic settings, that approach is often insufficient:
|
||||
|
||||
- low-value chatter can pollute memory
|
||||
- pronouns and missing references may require historical lookup
|
||||
- some information is useful but not yet complete
|
||||
- newer facts may need to update or overwrite older memory
|
||||
|
||||
MemReader-4B-thinking reframes memory writing as active memory management. Under a ReAct-style workflow, the model reasons before acting, making memory construction closer to how practical agent systems maintain state over time.
|
||||
|
||||
## Benchmark Performance
|
||||
|
||||
MemReader was evaluated on LOCOMO, LongMemEval, and HaluMem. The 4B GRPO version showed especially strong gains on knowledge update, temporal reasoning, and end-to-end memory usability.
|
||||
|
||||
### LOCOMO
|
||||
|
||||
| Model | Single Hop | Multi Hop | Temporal | Open Domain | Overall | F1 | Avg. Token |
|
||||
| --- | --- | --- | --- | --- | --- | --- | --- |
|
||||
| MemOS (4o-mini) | 84.06% | 73.16% | 75.90% | 57.29% | 78.70% | 51.90% | 1854 |
|
||||
| MemReader-0.6B | 84.70% | 76.95% | 76.22% | 53.40% | 79.56% | 52.54% | 1976 |
|
||||
| MemReader-4B-SFT | 81.88% | 76.12% | 71.02% | 62.15% | 77.33% | 47.77% | 784 |
|
||||
| MemReader-4B-GRPO | **85.37%** | **81.44%** | 75.80% | **65.62%** | **81.42%** | 49.45% | 1950 |
|
||||
|
||||
### LongMemEval
|
||||
|
||||
| Model | Avg. Token | SS-User | SS-Asst | SS-Pref | Multi-Session | Knowledge Update | Temporal Reasoning | Overall |
|
||||
| --- | --- | --- | --- | --- | --- | --- | --- | --- |
|
||||
| MemOS | 1400 | 95.71% | 67.86% | **96.67%** | 70.67% | 74.26% | 77.44% | 77.80% |
|
||||
| EverMemOS | 2800 | **97.14%** | **85.71%** | 93.33% | 73.68% | 89.74% | 77.44% | **83.00%** |
|
||||
| MemReader-0.6B | 1166 | 95.71% | 75.00% | 90.00% | **75.18%** | 82.05% | 75.90% | 80.20% |
|
||||
| MemReader-4B-SFT | 963 | 97.10% | 69.64% | 90.00% | 71.42% | 85.80% | 78.19% | 80.00% |
|
||||
| MemReader-4B-GRPO | **922** | 94.29% | 73.21% | 90.00% | 73.68% | **91.03%** | **84.21%** | **83.00%** |
|
||||
|
||||
### HaluMem
|
||||
|
||||
The full HaluMem table in the paper is relatively long. Below we report a compact subset of the memory extraction and memory updating results.
|
||||
|
||||
| Model | Extraction Recall | Extraction Weighted Recall | Extraction F1 | Update Correctness | Update Hallucination | Update Omission |
|
||||
| --- | --- | --- | --- | --- | --- | --- |
|
||||
| MemOS | 74.07% | 84.81% | 79.70% | 62.11% | 0.42% | 37.48% |
|
||||
| MemReader-0.6B | 88.40% | 91.38% | 93.76% | 82.69% | 0.77% | 16.51% |
|
||||
| MemReader-4B-SFT | 93.56% | 95.49% | 96.61% | 90.78% | 0.26% | 8.74% |
|
||||
| MemReader-4B-GRPO | **96.57%** | **97.19%** | **98.21%** | **94.55%** | 0.32% | **5.12%** |
|
||||
|
||||
These results show that stronger memory writing quality also translates into better memory updating behavior, especially on correctness and omission.
|
||||
|
||||
## Recommended Use Cases
|
||||
|
||||
- long-term conversational agents
|
||||
- personalized assistants
|
||||
- agent memory extraction pipelines
|
||||
- memory update and conflict resolution workflows
|
||||
- retrieval-augmented memory systems
|
||||
|
||||
## Intended Use
|
||||
|
||||
MemReader-4B-thinking is intended for research and production scenarios where an agent needs to convert conversational context into structured long-term memory. Typical use cases include memory extraction, ambiguity resolution with retrieval, memory update pipelines, and persistent assistant systems.
|
||||
|
||||
The model is especially suitable when the application requires explicit control over memory-writing behavior through tool calls such as `search_memory`, `add_memory`, `buffer_memory`, and `ignore_memory`.
|
||||
|
||||
## Model Specs
|
||||
|
||||
- Base model: `Qwen/Qwen3-4B`
|
||||
- Parameters: 4B
|
||||
- Tensor type: BF16
|
||||
- Architecture: `Qwen3ForCausalLM`
|
||||
- Context length: 40,960 tokens
|
||||
- Primary capability: reasoning-driven memory extraction with tool calling
|
||||
|
||||
## Quickstart
|
||||
|
||||
### OpenAI-Compatible API Example
|
||||
|
||||
The following example calls the model through an OpenAI-compatible endpoint with required tool calling.
|
||||
|
||||
```python
|
||||
import json
|
||||
import requests
|
||||
|
||||
url = "https://YOUR_ENDPOINT/v1/chat/completions"
|
||||
|
||||
payload = {
|
||||
"model": "IAAR-Shanghai/MemReader-4B-thinking",
|
||||
"extra_body": {
|
||||
"chat_template_kwargs": {
|
||||
"enable_thinking": True
|
||||
}
|
||||
},
|
||||
"messages": [
|
||||
{
|
||||
"role": "system",
|
||||
"content": (
|
||||
"You are a memory extraction agent. Your job is to analyze "
|
||||
"conversations and decide what information is worth storing in "
|
||||
"long-term memory.\n\n"
|
||||
"Available actions (call exactly one per turn):\n"
|
||||
"- search_memory: Search existing memories for context\n"
|
||||
"- add_memory: Extract and store valuable facts, preferences, or events\n"
|
||||
"- buffer_memory: Accumulate this turn and wait for more context\n"
|
||||
"- ignore_memory: Nothing worth storing\n\n"
|
||||
"Guidelines:\n"
|
||||
"- Store specific, verifiable facts\n"
|
||||
"- Do not store generic greetings, chitchat, or vague statements\n"
|
||||
"- UserMemory: personal attributes or preferences about the user\n"
|
||||
"- LongTermMemory: facts, events, or shared knowledge from the conversation\n"
|
||||
"- If unsure whether information already exists, call search_memory first"
|
||||
),
|
||||
},
|
||||
{
|
||||
"role": "user",
|
||||
"content": (
|
||||
"Please analyze the following conversation and decide what to store:\n\n"
|
||||
"[user]: How is that project at the company going lately? The one he said he wanted to rewrite with a new language.\n"
|
||||
"[assistant]: Do you mean the recommendation system refactoring project? Last time we mentioned that Michael planned to rewrite some core modules in Rust, and it was still in the evaluation stage.\n"
|
||||
"[user]: Yes, that one. He said he is going to produce a performance comparison report this week, benchmarking Python against Rust."
|
||||
),
|
||||
},
|
||||
],
|
||||
"tools": [
|
||||
{
|
||||
"type": "function",
|
||||
"function": {
|
||||
"name": "search_memory",
|
||||
"description": "Search historical memories for context.",
|
||||
"parameters": {
|
||||
"type": "object",
|
||||
"properties": {
|
||||
"query": {"type": "string"}
|
||||
},
|
||||
"required": ["query"],
|
||||
},
|
||||
},
|
||||
},
|
||||
{
|
||||
"type": "function",
|
||||
"function": {
|
||||
"name": "add_memory",
|
||||
"description": "Extract and store memories.",
|
||||
"parameters": {
|
||||
"type": "object",
|
||||
"properties": {
|
||||
"memory_list": {
|
||||
"type": "array",
|
||||
"items": {
|
||||
"type": "object",
|
||||
"properties": {
|
||||
"key": {"type": "string"},
|
||||
"memory_type": {
|
||||
"type": "string",
|
||||
"enum": ["LongTermMemory", "UserMemory"],
|
||||
},
|
||||
"value": {"type": "string"},
|
||||
"tags": {
|
||||
"type": "array",
|
||||
"items": {"type": "string"},
|
||||
},
|
||||
},
|
||||
"required": ["key", "memory_type", "value", "tags"],
|
||||
},
|
||||
},
|
||||
"summary": {"type": "string"},
|
||||
},
|
||||
"required": ["memory_list", "summary"],
|
||||
},
|
||||
},
|
||||
},
|
||||
{
|
||||
"type": "function",
|
||||
"function": {
|
||||
"name": "buffer_memory",
|
||||
"description": "Buffer for later processing.",
|
||||
"parameters": {
|
||||
"type": "object",
|
||||
"properties": {
|
||||
"reason": {"type": "string"}
|
||||
},
|
||||
"required": ["reason"],
|
||||
},
|
||||
},
|
||||
},
|
||||
{
|
||||
"type": "function",
|
||||
"function": {
|
||||
"name": "ignore_memory",
|
||||
"description": "Ignore low-value content.",
|
||||
"parameters": {
|
||||
"type": "object",
|
||||
"properties": {
|
||||
"reason": {"type": "string"}
|
||||
},
|
||||
"required": ["reason"],
|
||||
},
|
||||
},
|
||||
},
|
||||
],
|
||||
"tool_choice": "required",
|
||||
"temperature": 0.2,
|
||||
"max_tokens": 1024,
|
||||
}
|
||||
|
||||
headers = {
|
||||
"Authorization": "Bearer YOUR_API_KEY",
|
||||
"Content-Type": "application/json",
|
||||
}
|
||||
|
||||
response = requests.post(url, headers=headers, json=payload)
|
||||
print(response.text)
|
||||
```
|
||||
|
||||
### Hugging Face Transformers Usage
|
||||
|
||||
You can also load the model directly from Hugging Face and run memory extraction with tool calling.
|
||||
|
||||
```python
|
||||
import torch
|
||||
from transformers import AutoModelForCausalLM, AutoTokenizer
|
||||
|
||||
model_name = "IAAR-Shanghai/MemReader-4B-thinking"
|
||||
|
||||
tokenizer = AutoTokenizer.from_pretrained(model_name)
|
||||
model = AutoModelForCausalLM.from_pretrained(
|
||||
model_name,
|
||||
torch_dtype="auto",
|
||||
device_map="auto",
|
||||
)
|
||||
|
||||
messages = [
|
||||
{
|
||||
"role": "system",
|
||||
"content": (
|
||||
"You are a memory extraction agent. Analyze conversations and decide "
|
||||
"what information should be stored in long-term memory."
|
||||
),
|
||||
},
|
||||
{
|
||||
"role": "user",
|
||||
"content": (
|
||||
"Please analyze the following conversation and decide what to store:\n\n"
|
||||
"[user]: How is that project at the company going lately? The one he said he wanted to rewrite with a new language.\n"
|
||||
"[assistant]: Do you mean the recommendation system refactoring project? Last time we mentioned that Michael planned to rewrite some core modules in Rust, and it was still in the evaluation stage.\n"
|
||||
"[user]: Yes, that one. He said he is going to produce a performance comparison report this week, benchmarking Python against Rust."
|
||||
),
|
||||
},
|
||||
]
|
||||
|
||||
tools = [
|
||||
{
|
||||
"type": "function",
|
||||
"function": {
|
||||
"name": "search_memory",
|
||||
"description": "Search historical memories for context.",
|
||||
"parameters": {
|
||||
"type": "object",
|
||||
"properties": {"query": {"type": "string"}},
|
||||
"required": ["query"],
|
||||
},
|
||||
},
|
||||
},
|
||||
{
|
||||
"type": "function",
|
||||
"function": {
|
||||
"name": "add_memory",
|
||||
"description": "Extract and store memories.",
|
||||
"parameters": {
|
||||
"type": "object",
|
||||
"properties": {
|
||||
"memory_list": {
|
||||
"type": "array",
|
||||
"items": {
|
||||
"type": "object",
|
||||
"properties": {
|
||||
"key": {"type": "string"},
|
||||
"memory_type": {
|
||||
"type": "string",
|
||||
"enum": ["LongTermMemory", "UserMemory"],
|
||||
},
|
||||
"value": {"type": "string"},
|
||||
"tags": {
|
||||
"type": "array",
|
||||
"items": {"type": "string"},
|
||||
},
|
||||
},
|
||||
"required": ["key", "memory_type", "value", "tags"],
|
||||
},
|
||||
},
|
||||
"summary": {"type": "string"},
|
||||
},
|
||||
"required": ["memory_list", "summary"],
|
||||
},
|
||||
},
|
||||
},
|
||||
{
|
||||
"type": "function",
|
||||
"function": {
|
||||
"name": "buffer_memory",
|
||||
"description": "Buffer for later processing.",
|
||||
"parameters": {
|
||||
"type": "object",
|
||||
"properties": {"reason": {"type": "string"}},
|
||||
"required": ["reason"],
|
||||
},
|
||||
},
|
||||
},
|
||||
{
|
||||
"type": "function",
|
||||
"function": {
|
||||
"name": "ignore_memory",
|
||||
"description": "Ignore low-value content.",
|
||||
"parameters": {
|
||||
"type": "object",
|
||||
"properties": {"reason": {"type": "string"}},
|
||||
"required": ["reason"],
|
||||
},
|
||||
},
|
||||
},
|
||||
]
|
||||
|
||||
text = tokenizer.apply_chat_template(
|
||||
messages,
|
||||
tools=tools,
|
||||
tokenize=False,
|
||||
add_generation_prompt=True,
|
||||
enable_thinking=True,
|
||||
)
|
||||
|
||||
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
|
||||
generated_ids = model.generate(**model_inputs, max_new_tokens=1024)
|
||||
output_ids = generated_ids[0][len(model_inputs.input_ids[0]):]
|
||||
output = tokenizer.decode(output_ids, skip_special_tokens=True)
|
||||
print(output)
|
||||
```
|
||||
|
||||
### vLLM Usage
|
||||
|
||||
Start an OpenAI-compatible vLLM server:
|
||||
|
||||
```bash
|
||||
python -m vllm.entrypoints.openai.api_server \
|
||||
--model IAAR-Shanghai/MemReader-4B-thinking \
|
||||
--served-model-name MemReader-4B-thinking \
|
||||
--port 8000 \
|
||||
--tensor-parallel-size 1 \
|
||||
--enable-auto-tool-choice \
|
||||
--tool-call-parser hermes
|
||||
```
|
||||
|
||||
Then send a standard chat completion request to `http://localhost:8000/v1/chat/completions`.
|
||||
|
||||
### SGLang Usage
|
||||
|
||||
MemReader-4B-thinking can also be deployed with SGLang through its OpenAI-compatible serving interface. Please make sure tool calling and thinking mode are enabled in your serving configuration.
|
||||
|
||||
## Output Format
|
||||
|
||||
MemReader-4B-thinking is trained to produce thinking traces and tool calls. A typical response looks like this:
|
||||
|
||||
```xml
|
||||
<think>
|
||||
The conversation refers to an already known project and adds a new update:
|
||||
Michael plans to produce a Python vs Rust benchmark report this week.
|
||||
This is valuable project-state information and should be added to memory.
|
||||
</think>
|
||||
|
||||
<tool_call>
|
||||
{"name": "add_memory", "arguments": {"memory_list": [{"key": "Rust benchmark plan", "memory_type": "LongTermMemory", "value": "Michael said the recommendation system refactoring project is still in evaluation, and he plans to produce a Python-vs-Rust benchmark report this week for the core modules under consideration for Rust rewriting.", "tags": ["project", "Rust", "benchmark", "refactoring"]}], "summary": "Added one memory about the project update and the planned benchmark report."}}
|
||||
</tool_call>
|
||||
```
|
||||
|
||||
## Best Practices
|
||||
|
||||
- Use `search_memory` first when the conversation contains pronouns, ellipsis, or implicit historical references.
|
||||
- Use `buffer_memory` only when the information is genuinely incomplete and cannot be resolved from history.
|
||||
- Keep tool definitions stable between training and inference.
|
||||
- For production pipelines, execute tool calls externally and feed tool responses back to the model when multi-step reasoning is needed.
|
||||
- If you want shorter outputs, reduce `max_tokens` and control whether thinking traces are exposed in your serving layer.
|
||||
|
||||
## Limitations
|
||||
|
||||
- The model is optimized for memory-management scenarios rather than general-purpose chatting.
|
||||
- Quality depends on the external memory schema, retrieval quality, and tool-execution loop.
|
||||
- For highly domain-specific memory schemas, additional instruction tuning may still be beneficial.
|
||||
- As with other LLMs, outputs may still contain mistakes, omissions, or unsupported inferences and should be validated in safety-critical workflows.
|
||||
|
||||
## License Notice
|
||||
|
||||
This model is released under the Apache-2.0 license. As it is derived from [Qwen/Qwen3-4B](https://huggingface.co/Qwen/Qwen3-4B), users should also review and comply with the upstream base model license, usage terms, and any applicable third-party requirements before deployment.
|
||||
|
||||
## Links
|
||||
|
||||
- GitHub: [MemTensor/MemOS](https://github.com/MemTensor/MemOS)
|
||||
- API Documentation: [docs.openmem.net](https://docs.openmem.net/)
|
||||
- Model: [IAAR-Shanghai/MemReader-4B-thinking](https://huggingface.co/IAAR-Shanghai/MemReader-4B-thinking)
|
||||
- Base model: [Qwen/Qwen3-4B](https://huggingface.co/Qwen/Qwen3-4B)
|
||||
Reference in New Issue
Block a user