This is the official repository for HumTrans: A Novel Open-Source Dataset for Humming Melody Transcription and Beyond. To use the whole dataset, please refer to this link.
We present baseline results of four SOTA vocal melody transcription models on both validation and test sets of our HumTrans dataset, including VOCANO, Sheet Sage, MIR-ST500, and JDC-STP, shown in the following table. For all the experiments, we directly utilized the codes provided by the authors to generate predicted transcription (midis/{VOCANO.zip,SheetSage.zip,MIR-ST500.zip,JDC-STP.zip}) and compared them with the reference MIDI files (midis/GroundTruth.zip). We can observe that although JDC-STP performed slightly better than the other models, the transcription capabilities of all the models are still far from satisfactory. Therefore, there is significant room for improvement in the domain of humming melody transcription.
Model | Valid Set | Test Set | ||||
Precison | Recall | F1 | Precison | Recall | F1 | |
VOCANO | 3.270 | 3.314 | 3.194 | 3.384 | 3.329 | 3.352 |
Sheet Sage | 2.757 | 2.656 | 2.702 | 3.039 | 2.982 | 3.005 |
MIR-ST500 | 6.258 | 6.448 | 6.341 | 5.686 | 5.853 | 5.755 |
JDC-STP | 6.777 | 6.785 | 6.741 | 5.844 | 5.620 | 5.667 |
python calc_transcription_eval_metric.py valid_keys.txt midis/GroundTruth/valid midis/VOCANO/valid
The valid_keys.txt
contains a list of name keys of the validation set, midis/GroundTruth/valid
is the reference MIDI folder, and midis/VOCANO/valid
is the predicted MIDI folder. The output will be three numbers which are precision, recall and F1-score of the compared group. The train_valid_test_keys.json
contains the official split of this dataset, if users need to train their own model, please use this official split for fair comparison.