This repository includes source code for training and evaluating the neural infinite impulse response filter field (NIIRF) proposed in the following ICASSP 2024 paper:
@InProceedings{Masuyama2024ICASSP_niirf,
author = {Masuyama, Yoshiki and Wichern, Gordon and Germain, Fran\c{c}ois G. and Pan, Zexu and Khurana, Sameer and Hori, Chiori and {Le Roux}, Jonathan},
title = {NIIRF: Neural IIR Filter Field for HRTF Upsampling and Personalization},
booktitle = {Proc. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
year = 2024,
month = apr
}
- Environment setup
- Training and evaluating single-subject neural fields
- Training and evaluating multi-subject neural fields
- Contributing
- Copyright and license
The code has been tested using python 3.10.9
on Linux.
Necessary dependencies can be installed using the included requirements.txt
:
pip install -r requirements.txt
In order to train and evaluate neural fields on the CIPIC dataset, please execute run_cipic.sh
after specifying EXPDIR
.
The dataset, model checkpoints, and training/evaluation log files will be stored in it.
run_cipic.sh
consists of the following four stages:
-
Stage 1:
- This stage downloads the dataset and converts the sofa files to pickles.
- The sofa files of the 44 released subjects will be downloaded from the sofa conventions.
- The pickles will be stored in
EXPDIR/stage1/s{subject_id}
.
-
Stage 2:
- This stage splits the training, developement, and test sets by
python pyscripts/stage2_split_dataset.py EXPDIR SIDX NDATA_TR NDATA_DEV NDATA_TT
- A sum of
NDATA_TR
,NDATA_DEV
, andNDATA_TT
must be the number of HRTF measurements for each subject, 1250. - The created training, dev, and test sets respectively saved in
EXPDIR/stage2/s{subject_id}
.
- This stage splits the training, developement, and test sets by
-
Stage 3:
- This stage trains the proposed neural field, NIIRF, for each subject with different amounts of measurements and evaluates its interpolation capability.
- As a default,
config.yaml
in exp_example will be copied to yourMODELDIR
and used for the training. - Log-spectral distortion (LSD) on the entire test set will be reported in
EXPDIR/stage3/MODELNAME/s{subject_id}/{amount_of_measurements}/log
.
-
Stage 4:
- This stage trains a neural field that directly estimates magnitude HRTF for each subject with different amounts of measurements and evaluates it.
- As a default,
config.yaml
in exp_example will be copied to yourMODELDIR
and used for the training. - Log-spectral distortion (LSD) on the entire test set will be reported in
EXPDIR/stage4/MODELNAME/s{subject_id}/{amount_of_measurements}/log
.
The following table shows LSD for different numbers of measurements. This result is only for the subject3, while the paper reported LSD averaged over all the subjects.
method | 10 | 20 | 30 | 50 | 70 | 100 | 150 |
---|---|---|---|---|---|---|---|
Mag. NF | 6.6 | 5.5 | 4.6 | 3.9 | 3.6 | 3.3 | 3.1 |
NIIRF (K=32) | 6.1 | 5.0 | 4.4 | 3.9 | 3.5 | 3.4 | 3.1 |
In order to train and evaluate our method on the HUTUBS dataset, please execute run_hutubs.sh
after specifying EXPDIR
.
run_hutubs.sh
consists of the four stages similar to run_cipic.sh
, but this experiment pre-trains a multi-subject neural field and adapts it to another subject in Stage 3 and Stage 4.
For the adaptation method, the current implementation supports only LoRA and BitFit.
-
Stage 3:
- NIIRF is pre-trained with HRTFs of multiple subjects by
python pyscripts/stage3a_train_niirf.py MODELDIR pretrain
. - The pre-trained model is adapted to each subject with different amounts of measurements by
python pyscripts/stage3a_train_niirf.py SDIR/_AMOUNT adaptation
. - The adapted model will be evaluated on both seen and unseen directions.
- Seen directions (100 directions in default): HRTFs of non-target subjects for these directions are used in the pre-training.
- Unseen directions (100 directions in default): HRTFs for these directions are used in neither pre-training nor adaptation.
- NIIRF is pre-trained with HRTFs of multiple subjects by
-
Stage 4:
- Similar to stage 3, a magnitude neural field is pre-trained with multiple subjects and adapted to the target subject.
The following table shows LSD for different numbers of measurements, where LoRA is used for the adaptation. This result is only for subject 89, while the paper reported LSD averaged over subjects ID from 89 to 95.
method | directions | 10 | 20 | 30 | 50 | 100 |
---|---|---|---|---|---|---|
Mag. NF | seen | 4.0 | 3.8 | 3.8 | 3.7 | 3.5 |
NIIRF (K=32) | seen | 3.9 | 3.7 | 3.7 | 3.6 | 3.6 |
Mag. NF | unseen | 4.9 | 4.7 | 4.7 | 4.6 | 4.6 |
NIIRF (K=32) | unseen | 4.2 | 3.9 | 3.9 | 3.8 | 3.7 |
See CONTRIBUTING.md for our policy on contributions.
Released under AGPL-3.0-or-later
license, as found in the LICENSE.md file.
All files, except as noted below:
Copyright (c) 2023 Mitsubishi Electric Research Laboratories (MERL)
SPDX-License-Identifier: AGPL-3.0-or-later
The following files:
pyscripts/models/hiir_downstream.py
pyscripts/utils/extract_iir_params.py
were adapted from https://github.com/yoyololicon/hrtf-notebooks (license included in LICENSES/MIT.md):
Copyright (c) 2023 Mitsubishi Electric Research Laboratories (MERL)
Copyright (c) 2023 Chin-Yun Yu