--- license: apache-2.0 library_name: transformers --- # Osmosis-Apply-1.7B

`Osmosis-Apply-1.7B` is a specialized language model finetuned on `Qwen3-1.7B` designed to perform code merges, similar to the apply feature of modern AI code editors. Given an original code snippet and an edit snippet, this model can apply the edit snippet to original code snippet, updating the code snippet with the edit. Here's an example. Let's say we prompt an LLM to fill out the body of this binary search function. ```python def binary_search(arr, x): left = 0 right = len(arr) # TODO: fill out the body of this return -1 arr = [1,2,3,4,5,6,7,8,9] assert binary_search(arr, 0) == -1 assert binary_search(arr, 1) == 0 assert binary_search(arr, 2) == 1 assert binary_search(arr, 3) == 2 assert binary_search(arr, 8) == 7 assert binary_search(arr, 9) == 8 assert binary_search(arr, 10) == -1 ``` With a custom prompt, the LLM produces an edit snippet that includes the binary search code and some surrounding context. ```python // ... existing code ... left = 0 right = len(arr) while(left < right): mid = left + (right - left) // 2 if(arr[mid] == x): return mid elif(arr[mid] < x): left = mid + 1 else: right = mid return -1 arr = [1,2,3,4,5,6,7,8,9] // ... existing code ... ``` `Osmosis-Apply-1.7B` can apply this edit snippet to the original code, producing the updated, final code. ```python def binary_search(arr, x): left = 0 right = len(arr) while(left < right): mid = left + (right - left) // 2 if(arr[mid] == x): return mid elif(arr[mid] < x): left = mid + 1 else: right = mid return -1 arr = [1,2,3,4,5,6,7,8,9] assert binary_search(arr, 0) == -1 assert binary_search(arr, 1) == 0 assert binary_search(arr, 2) == 1 assert binary_search(arr, 3) == 2 assert binary_search(arr, 8) == 7 assert binary_search(arr, 9) == 8 assert binary_search(arr, 10) == -1 ``` ## Benchmarks We benchmarked our model against several large language models using 10,000 random samples from commitpackft. The rewards are calculated according to our reward function (see: Reward function section).

| Model | Average reward | |-------|:-------------:| | Osmosis-Apply-1.7B | 0.98046 | | Claude 4 Sonnet | 0.93284 | | OpenAI o3 | 0.86394 | | Gemini-2.5-Flash | 0.77452 | Table 1: Performance on 10k samples from commitpackft.

## Methodology `Osmosis-Apply-1.7B` was trained on about 100k randomly sampled commits from the [commitpackft dataset](https://huggingface.co/datasets/bigcode/commitpackft), which is less than 15% of the entire dataset. A unified diff was generated between `old_contents` and `new_contents`, and the unified diff was parsed to create a natural language diff, similar to those outputted by LLMs. ```python import difflib unified_diff = difflib.unified_diff(old_code, new_code) natural_language_diff = generate_from_unified_diff(unified_diff) ``` The original code + edit were provided as input to the model along with a custom system prompt. ```xml


{ORIGINAL CODE}

{EDIT SNIPPET} ``` ### Infrastructure We used [verl](https://github.com/volcengine/verl) as the framework to train our model and [SGLang](https://github.com/sgl-project/sglang) as the rollout backend. ### Model system prompt Below is the system prompt we trained our model with. ```python SYSTEM_PROMPT = \ ''' You are a helpful assistant for a code editor that applies an edit to code to merge them together. That is, you will be given code wrapper in

 tags and an edit wrapped in  tags, and you will apply the edit to the code.

For example:


CODE_SNIPPET



EDIT_SNIPPET


The code is any type of code and the edit is in the form of:

// ... existing code ...
FIRST_EDIT
// ... existing code ...
SECOND_EDIT
// ... existing code ...
THIRD_EDIT
// ... existing code ...

The merged code must be exact with no room for any errors. Make sure all whitespaces are preserved correctly. A small typo in code will cause it to fail to compile or error out, leading to poor user experience.

Output the code wrapped in  tags.
'''
```

### Edit format

The edit format is designed to be in mostly natural language, with `// ... existing code ...` condensing original code that remains unchanged between edits. It is important that when prompting the LLM, it is also instructed provide some additional context (unchanged lines from the original code surrounding the edit), so that `Osmosis-Apply-1.7B` can locate where to insert the edit.

```
// ... existing code ...
FIRST_EDIT
// ... existing code ...
SECOND_EDIT
// ... existing code ...
THIRD_EDIT
// ... existing code ...
```

We find that the simple, sequential nature of this edit format makes it easier for smaller models to work with and larger models to output, in exchange for parsability and exactness. 

### Reward function

We use a simple reward function that looks for exactness in the model outputs.

**TL;DR**: 

1. If the new code is exactly correct including whitespaces, then give a large reward (1.0).
2. If the new code is correct when excluding empty lines, then give a small reward (0.2).
3. Otherwise, give no reward (0.0).

Below is the entire reward function.

```python
import re

def extract_solution(solution_str):
    matches = list(re.finditer(r'(.*?)', solution_str, re.DOTALL))

    # If nonempty matches and exactly one  block exists
    if(matches and len(matches) == 1):
        return matches[0].group(1).strip()
    return None

def filter_empty_lines(lines):
    return list(filter(lambda line : line.strip() != "", lines))

def calc_score(answer, ground_truth):
    answer = answer.strip()
    ground_truth = ground_truth.strip()

    if(answer == ground_truth):
        return 1.0
    else:
        answer_lines = filter_empty_lines(answer.splitlines(True))
        ground_truth_lines = filter_empty_lines(ground_truth.splitlines(True))

        # Give small positive reward if lines are almost correct
        if(answer_lines == ground_truth_lines):
            return 0.2

        return 0


def compute_score(data_source, solution_str, ground_truth, extra_info=None, format_score=0.0, score=1.0):
    answer = extract_solution(solution_str=solution_str)
    if answer is None:
        return 0
    else:
        return calc_score(answer, ground_truth)
```

## Usage

### LLM prompt


Since edits should be generated in a specific format, we have provided an example prompt to give to a coding LLM. This prompt is by no means perfect and can be tweaked a bit to get better results.

````
You are an AI coding assistant that takes in original code and responds with an edit snippet to the users.

```

// ... existing code ...
FIRST_EDIT
// ... existing code ...
SECOND_EDIT
// ... existing code ...
THIRD_EDIT
// ... existing code ...

```

Your response must strictly follow this format.

Guidelines for creating the edit snippet:

1. Regardless of programming language, collapse unchanged lines of code with this exact literal (ignoring backticks): `// ... existing code ...`
2. Provide 2-3 lines of context above and below your changes in the edit to help indicate where it is in the file. If the change is at the start or end of the file, just provide what you can.
3. You do not need to begin or end with `// ... existing code ...` for edits that include the beginning or end of file.
4. Make sure whitespaces, indentation, and formatting matches the original code.
5. You may make as many edits as you would like, but condense edits so that there are not too many, similar to a unified diff.
6. Wrap your final output in  tags.

Here is an example.

Original code:

```
def binary_search(arr, x):
    left = 0
    right = len(arr)

    # TODO: fill out the body of this
    return -1

arr = [1,2,3,4,5,6,7,8,9]

assert binary_search(arr, 0) == -1
assert binary_search(arr, 1) == 0
assert binary_search(arr, 2) == 1
assert binary_search(arr, 3) == 2
assert binary_search(arr, 8) == 7
assert binary_search(arr, 9) == 8
assert binary_search(arr, 10) == -1
```

Generated edit:

```

// ... existing code ...
    left = 0
    right = len(arr)

    while(left < right):
        mid = left + (right - left) // 2

        if(arr[mid] == x):
            return mid
        elif(arr[mid] < x):
            left = mid + 1
        else:
            right = mid
    return -1

arr = [1,2,3,4,5,6,7,8,9]
// ... existing code ...

```
````

### Serving

During development, we used SGLang to serve the model, though it should be straightforward enough to do something similiar with Ollama.

Below is an example using SGLang.

`python3 -m sglang.launch_server --model-path osmosis-ai/Osmosis-Apply-1.7B --host 0.0.0.0 --api-key osmosis`

```python
from openai import OpenAI
import re

def create_query(old_code, edit):
    return f"\n{old_code}\n\n\n\n{edit}\n"

def extract_solution(solution_str):
    matches = list(re.finditer(r'(.*?)', solution_str, re.DOTALL))

    # If nonempty matches and exactly one  block exists
    if(matches and len(matches) == 1):
        return matches[0].group(1).strip()
    return None

SYSTEM_PROMPT = \
'''
You are a helpful assistant for a code editor that applies an edit to code to merge them together. That is, you will be given code wrapper in  tags and an edit wrapped in  tags, and you will apply the edit to the code.

For example:


CODE_SNIPPET



EDIT_SNIPPET


The code is any type of code and the edit is in the form of:

// ... existing code ...
FIRST_EDIT
// ... existing code ...
SECOND_EDIT
// ... existing code ...
THIRD_EDIT
// ... existing code ...

The merged code must be exact with no room for any errors. Make sure all whitespaces are preserved correctly. A small typo in code will cause it to fail to compile or error out, leading to poor user experience.

Output the code wrapped in  tags.
'''

api_key = "osmosis"
api_base_url = "http://0.0.0.0:30000/v1"
client = OpenAI(
    api_key=api_key,
    base_url=api_base_url,
)

def generate_completion(query: str, system_prompt: str) -> str:
    messages = [
        {
            "role": "user",
            "content": query,
        },
        {
            "role": "system",
            "content": system_prompt,
        },
    ]

    response = client.chat.completions.create(
        model="",
        messages=messages,
        temperature=0,
        max_tokens=3072,
    )

    completion = response.choices[0].message.content
    return completion

original_code = \
'''
def binary_search(arr, x):
    left = 0
    right = len(arr)

    # TODO: fill out the body of this
    return -1

arr = [1,2,3,4,5,6,7,8,9]

assert binary_search(arr, 0) == -1
assert binary_search(arr, 1) == 0
assert binary_search(arr, 2) == 1
assert binary_search(arr, 3) == 2
assert binary_search(arr, 8) == 7
assert binary_search(arr, 9) == 8
assert binary_search(arr, 10) == -1
'''

edit = \
'''
// ... existing code ...
    left = 0
    right = len(arr)

    while(left < right):
        mid = left + (right - left) // 2

        if(arr[mid] == x):
            return mid
        elif(arr[mid] < x):
            left = mid + 1
        else:
            right = mid
    return -1

arr = [1,2,3,4,5,6,7,8,9]
// ... existing code ...
'''

completion = generate_completion(create_query(original_code, edit), SYSTEM_PROMPT)
updated_code = extract_solution(completion)
print(updated_code)
```