初始化项目,由ModelHub XC社区提供模型

Model: ai-guru/lakhclean_mmmtrack_4bars_d-2048
Source: Original Platform
This commit is contained in:
ModelHub XC
2026-06-07 06:55:16 +08:00
commit 9dc06ed753
8 changed files with 491 additions and 0 deletions

28
.gitattributes vendored Normal file
View File

@@ -0,0 +1,28 @@
*.7z filter=lfs diff=lfs merge=lfs -text
*.arrow filter=lfs diff=lfs merge=lfs -text
*.bin filter=lfs diff=lfs merge=lfs -text
*.bin.* filter=lfs diff=lfs merge=lfs -text
*.bz2 filter=lfs diff=lfs merge=lfs -text
*.ftz filter=lfs diff=lfs merge=lfs -text
*.gz filter=lfs diff=lfs merge=lfs -text
*.h5 filter=lfs diff=lfs merge=lfs -text
*.joblib filter=lfs diff=lfs merge=lfs -text
*.lfs.* filter=lfs diff=lfs merge=lfs -text
*.model filter=lfs diff=lfs merge=lfs -text
*.msgpack filter=lfs diff=lfs merge=lfs -text
*.onnx filter=lfs diff=lfs merge=lfs -text
*.ot filter=lfs diff=lfs merge=lfs -text
*.parquet filter=lfs diff=lfs merge=lfs -text
*.pb filter=lfs diff=lfs merge=lfs -text
*.pt filter=lfs diff=lfs merge=lfs -text
*.pth filter=lfs diff=lfs merge=lfs -text
*.rar filter=lfs diff=lfs merge=lfs -text
saved_model/**/* filter=lfs diff=lfs merge=lfs -text
*.tar.* filter=lfs diff=lfs merge=lfs -text
*.tflite filter=lfs diff=lfs merge=lfs -text
*.tgz filter=lfs diff=lfs merge=lfs -text
*.wasm filter=lfs diff=lfs merge=lfs -text
*.xz filter=lfs diff=lfs merge=lfs -text
*.zip filter=lfs diff=lfs merge=lfs -text
*.zstandard filter=lfs diff=lfs merge=lfs -text
*tfevents* filter=lfs diff=lfs merge=lfs -text

107
README.md Normal file
View File

@@ -0,0 +1,107 @@
---
tags:
- gpt2
- text-generation
- music-modeling
- music-generation
widget:
- text: PIECE_START
- text: PIECE_START PIECE_START TRACK_START INST=34 DENSITY=8
- text: PIECE_START TRACK_START INST=1
---
# GPT-2 for Music
Language Models such as GPT-2 can be used for Music Generation. The idea is to represent pieces of music as texts, effectively reducing the task to Language Generation.
This model is a rather small instance of GPT-2 trained the [Lakhclean dataset](https://colinraffel.com/projects/lmd/). The model generates 4 bars at a time at a 16th note resolution with 4/4 meter.
If you want to contribute, if you want to say hello, if you want to know more, find me here:
- https://www.linkedin.com/in/dr-tristan-behrens-734967a2/
- https://www.youtube.com/@drtristanbehrens
- https://twitter.com/DrTBehrens
- https://github.com/AI-Guru
- https://huggingface.co/TristanBehrens
- https://huggingface.co/ai-guru
Run the model on Google Colab: https://colab.research.google.com/drive/1Mz-KJ8vX4Wylr4mzvgP-MclDwQJ06KSq?usp=sharing
## License
You are free to use this model in any open-source context without charge. If you do so, please credit me.
However, if you wish to use the model for commercial purposes, please contact me to discuss licensing terms. Depending on the specific use case, there may be fees associated with commercial use. I am open to negotiating the terms of the license to meet your needs and ensure that the model is used appropriately. Please feel free to reach out to me at your earliest convenience to discuss further.
## Model description
The model is GPT-2 with 6 decoders and 8 attention heads each. The context length is 2048. The embedding dimensions are 512.
## Model family
This model is part of a huge group of Transformers I have trained. Most of them are not publicly available.
If you are interested in using andor licensing one of the models, please get in touch.
### Lakhclean
These models were trained on roundabout 15K MIDI files (the same as the model you are viewing now) from the Lakhclean dataset.
- lakhclean_mmmbar_4bars_d-2048: 4 bars resolution, bar inpainting, note density conditioning
- lakhclean_mmmbar_8bars_d-2048: 8 bars resolution, bar inpainting, note density conditioning
- lakhclean_mmmtrack_4bars_chords: 4 bars resolution, chord conditioning
- lakhclean_mmmtrack_4bars_d-2048: 4 bars resolution, note density conditioning (this model)
- lakhclean_mmmtrack_4bars_simple-2048: 4 bars resolution
- lakhclean_mmmtrack_8bars_d-2048: 8 bars resolution, note density conditioning
### Lakhfull
These models were trained on roundabout 175K MIDI files from the Lakh dataset.
- lakhfull_mmmtrack_4bars_d-2048: 4 bars resolution, note density conditioning (the big brother of this model)
- lakhfull_mmmtrack_4bars_simple-2048: 4 bars resolution
### Metal
These models were trained on roundabout 7K MIDI files from my own collections. They contain genre conditioning.
- metal_mmmbar_4bars_d-2048: 4 bars resolution, bar inpainting, note density conditioning
- metal_mmmbar_8bars_d-2048: 8 bars resolution, bar inpainting, note density conditioning
- metal_mmmtrack_4bars_d-2048: 4 bars resolution, note density conditioning
- metal_mmmtrack_8bars_d-2048: 8 bars resolution, note density conditioning
### MetaMIDI Dataset genres
These models were trained on genre-specific subsets of the MetaMIDI dataset.
- mmd-baroque_mmmtrack_4bars_d-2048: 4 bars resolution, note density conditioning
- mmd-baroque_mmmtrack_8bars_d-2048: 8 bars resolution, note density conditioning
- mmd-classical_mmmtrack_8bars_d-2048: 8 bars resolution, note density conditioning
- mmd-noncontemporary_mmmtrack_8bars_d-2048: 8 bars resolution, note density conditioning
- mmd-pop_mmmtrack_8bars_d-2048: 8 bars resolution, note density conditioning
- mmd-renaissance_mmmtrack_8bars_d-2048: 8 bars resolution, note density conditioning
### MetaMIDI Dataset full
These models were trained on roundabout 400K MIDI files from the MetaMIDI dataset.
- mmd-full_mmmtrack_4bars_d-2048: 4 bars resolution, note density conditioning
- mmd-full_mmmtrack_8bars_d-2048: 8 bars resolution, note density conditioning
- mmd-full_mmmtrack_4bars_chords-d-2048: 4 bars resolution, note density conditioning, chord conditioning (most powerful model in the entire group)
## Intended uses & limitations
This model is just a proof of concept. It shows that HuggingFace can be used to compose music.
### How to use
There is a notebook in the repo that you can use to generate symbolic music and then render it.
### Limitations and bias
Since this model has been trained on a very small corpus of music, it is overfitting heavily.
### Acknowledgements
This model has been created with support from NVIDIA. I am very grateful for the GPU compute they provided!

34
config.json Normal file
View File

@@ -0,0 +1,34 @@
{
"_name_or_path": "../transformermusic/bin/checkpoints/lakhclean_mmmtrack_4bars_d-2048/20220317-1538/checkpoint-120000",
"activation_function": "gelu_new",
"architectures": [
"GPT2LMHeadModel"
],
"attn_pdrop": 0.1,
"bos_token_id": 50256,
"embd_pdrop": 0.1,
"eos_token_id": 50256,
"initializer_range": 0.02,
"layer_norm_epsilon": 1e-05,
"model_type": "gpt2",
"n_ctx": 2048,
"n_embd": 512,
"n_head": 8,
"n_inner": null,
"n_layer": 6,
"n_positions": 2048,
"pad_token_id": 3,
"reorder_and_upcast_attn": false,
"resid_pdrop": 0.1,
"scale_attn_by_inverse_layer_idx": false,
"scale_attn_weights": true,
"summary_activation": null,
"summary_first_dropout": 0.1,
"summary_proj_to_labels": true,
"summary_type": "cls_index",
"summary_use_proj": true,
"torch_dtype": "float32",
"transformers_version": "4.16.2",
"use_cache": true,
"vocab_size": 422
}

View File

@@ -0,0 +1,316 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"id": "DWLOSBkp0A2U"
},
"source": [
"# GPT-2 for music - By Dr. Tristan Behrens\n",
"\n",
"This notebook shows you how to generate music with GPT-2\n",
"\n",
"--- \n",
"\n",
"## Find me online\n",
"\n",
"- https://www.linkedin.com/in/dr-tristan-behrens-734967a2/\n",
"- https://twitter.com/DrTBehrens\n",
"- https://github.com/AI-Guru\n",
"- https://huggingface.co/TristanBehrens\n",
"- https://huggingface.co/ai-guru\n",
"\n",
"\n",
"---\n",
"\n",
"## Install depencencies.\n",
"\n",
"The following cell sets up fluidsynth and pyfluidsynth on colaboratory."
]
},
{
"cell_type": "code",
"source": [
"if \"google.colab\" in str(get_ipython()):\n",
" print(\"Installing dependencies...\")\n",
" !apt-get update -qq && apt-get install -qq libfluidsynth2 build-essential libasound2-dev libjack-dev\n",
" !pip install -qU pyfluidsynth"
],
"metadata": {
"id": "k1a8sd2KZCz9"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "6J_AnhV8D5p6"
},
"outputs": [],
"source": [
"!pip install transformers\n",
"!pip install note_seq"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "RzhHhFll0JVl"
},
"source": [
"## Load the tokenizer and the model from 🤗 Hub."
]
},
{
"cell_type": "code",
"source": [
"import os\n",
"os.environ[\"PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION\"] = \"python\""
],
"metadata": {
"id": "zGupj_vuZ9f2"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "g3ih12FMD7bs"
},
"outputs": [],
"source": [
"from transformers import AutoTokenizer, AutoModelForCausalLM\n",
"\n",
"tokenizer = AutoTokenizer.from_pretrained(\"ai-guru/lakhclean_mmmtrack_4bars_d-2048\")\n",
"model = AutoModelForCausalLM.from_pretrained(\"ai-guru/lakhclean_mmmtrack_4bars_d-2048\")"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "YfHXFugA0WdI"
},
"source": [
"## Convert the generated tokens to music that you can listen to.\n",
"\n",
"This uses note_seq, which is something like MIDI coming from Google Magenta. You could even use it to load and save MIDI files. Check their repo if you want to learn more.\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "L3QMj8NyEBqs"
},
"outputs": [],
"source": [
"import note_seq\n",
"\n",
"NOTE_LENGTH_16TH_120BPM = 0.25 * 60 / 120\n",
"BAR_LENGTH_120BPM = 4.0 * 60 / 120\n",
"\n",
"def token_sequence_to_note_sequence(token_sequence, use_program=True, use_drums=True, instrument_mapper=None, only_piano=False):\n",
"\n",
" if isinstance(token_sequence, str):\n",
" token_sequence = token_sequence.split()\n",
"\n",
" note_sequence = empty_note_sequence()\n",
"\n",
" # Render all notes.\n",
" current_program = 1\n",
" current_is_drum = False\n",
" current_instrument = 0\n",
" track_count = 0\n",
" for token_index, token in enumerate(token_sequence):\n",
"\n",
" if token == \"PIECE_START\":\n",
" pass\n",
" elif token == \"PIECE_END\":\n",
" print(\"The end.\")\n",
" break\n",
" elif token == \"TRACK_START\":\n",
" current_bar_index = 0\n",
" track_count += 1\n",
" pass\n",
" elif token == \"TRACK_END\":\n",
" pass\n",
" elif token == \"KEYS_START\":\n",
" pass\n",
" elif token == \"KEYS_END\":\n",
" pass\n",
" elif token.startswith(\"KEY=\"):\n",
" pass\n",
" elif token.startswith(\"INST\"):\n",
" instrument = token.split(\"=\")[-1]\n",
" if instrument != \"DRUMS\" and use_program:\n",
" if instrument_mapper is not None:\n",
" if instrument in instrument_mapper:\n",
" instrument = instrument_mapper[instrument]\n",
" current_program = int(instrument)\n",
" current_instrument = track_count\n",
" current_is_drum = False\n",
" if instrument == \"DRUMS\" and use_drums:\n",
" current_instrument = 0\n",
" current_program = 0\n",
" current_is_drum = True\n",
" elif token == \"BAR_START\":\n",
" current_time = current_bar_index * BAR_LENGTH_120BPM\n",
" current_notes = {}\n",
" elif token == \"BAR_END\":\n",
" current_bar_index += 1\n",
" pass\n",
" elif token.startswith(\"NOTE_ON\"):\n",
" pitch = int(token.split(\"=\")[-1])\n",
" note = note_sequence.notes.add()\n",
" note.start_time = current_time\n",
" note.end_time = current_time + 4 * NOTE_LENGTH_16TH_120BPM\n",
" note.pitch = pitch\n",
" note.instrument = current_instrument\n",
" note.program = current_program\n",
" note.velocity = 80\n",
" note.is_drum = current_is_drum\n",
" current_notes[pitch] = note\n",
" elif token.startswith(\"NOTE_OFF\"):\n",
" pitch = int(token.split(\"=\")[-1])\n",
" if pitch in current_notes:\n",
" note = current_notes[pitch]\n",
" note.end_time = current_time\n",
" elif token.startswith(\"TIME_DELTA\"):\n",
" delta = float(token.split(\"=\")[-1]) * NOTE_LENGTH_16TH_120BPM\n",
" current_time += delta\n",
" elif token.startswith(\"DENSITY=\"):\n",
" pass\n",
" elif token == \"[PAD]\":\n",
" pass\n",
" else:\n",
" #print(f\"Ignored token {token}.\")\n",
" pass\n",
"\n",
" # Make the instruments right.\n",
" instruments_drums = []\n",
" for note in note_sequence.notes:\n",
" pair = [note.program, note.is_drum]\n",
" if pair not in instruments_drums:\n",
" instruments_drums += [pair]\n",
" note.instrument = instruments_drums.index(pair)\n",
"\n",
" if only_piano:\n",
" for note in note_sequence.notes:\n",
" if not note.is_drum:\n",
" note.instrument = 0\n",
" note.program = 0\n",
"\n",
" return note_sequence\n",
"\n",
"def empty_note_sequence(qpm=120.0, total_time=0.0):\n",
" note_sequence = note_seq.protobuf.music_pb2.NoteSequence()\n",
" note_sequence.tempos.add().qpm = qpm\n",
" note_sequence.ticks_per_quarter = note_seq.constants.STANDARD_PPQ\n",
" note_sequence.total_time = total_time\n",
" return note_sequence"
]
},
{
"cell_type": "markdown",
"source": [
"## Generate music\n",
"\n",
"This will generate one track of music and render it. "
],
"metadata": {
"id": "4kr2dECziaFA"
}
},
{
"cell_type": "code",
"source": [
"generated_sequence = \"PIECE_START\""
],
"metadata": {
"id": "cUg1DrlygzgT"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"Note: Run the following cell multiple times to generate more tracks."
],
"metadata": {
"id": "SinUPIHyimr5"
}
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "ZYpukydNESDF"
},
"outputs": [],
"source": [
"# Encode the conditioning tokens.\n",
"input_ids = tokenizer.encode(generated_sequence, return_tensors=\"pt\")\n",
"#print(input_ids)\n",
"\n",
"# Generate more tokens.\n",
"eos_token_id = tokenizer.encode(\"TRACK_END\")[0]\n",
"temperature = 1.0\n",
"generated_ids = model.generate(\n",
" input_ids, \n",
" max_length=2048,\n",
" do_sample=True,\n",
" temperature=temperature,\n",
" eos_token_id=eos_token_id,\n",
")\n",
"generated_sequence = tokenizer.decode(generated_ids[0])\n",
"print(generated_sequence)\n",
"\n",
"note_sequence = token_sequence_to_note_sequence(generated_sequence)\n",
"\n",
"synth = note_seq.fluidsynth\n",
"note_seq.plot_sequence(note_sequence)\n",
"note_seq.play_sequence(note_sequence, synth)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "d1x6HeF90kkO"
},
"source": [
"# Thank you!"
]
}
],
"metadata": {
"colab": {
"provenance": []
},
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.7"
},
"accelerator": "GPU",
"gpuClass": "standard"
},
"nbformat": 4,
"nbformat_minor": 0
}

3
pytorch_model.bin Normal file
View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:42bda0df7ff96166407c5c6602126716e103eba0a18614c247f6e2fc1d0e08b2
size 105915613

1
special_tokens_map.json Normal file
View File

@@ -0,0 +1 @@
{"pad_token": "[PAD]"}

1
tokenizer.json Normal file

File diff suppressed because one or more lines are too long

1
tokenizer_config.json Normal file
View File

@@ -0,0 +1 @@
{"tokenizer_class": "PreTrainedTokenizerFast"}