You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
First of all, thank you for sharing your work. It's great to see an open-source project covering everything from training to deployment. 👍👍
We're working on making HILcodec run in web browsers. However, using ONNX WebAssembly didn't achieve real-time processing, and WebGPU didn't help either.
I think the computational cost needs to be reduced to less than half.
So, we're trying to reduce the model size at the expense of bitrate.
Is it enough to just reduce channels_enc and channels_dec, or are there other tuning points I should look into?
Also, if you have any ideas about deploying in web applications, please share. 🙏
Thank you!
The text was updated successfully, but these errors were encountered:
Regarding training,
HILCodec is composed of a small encoder and a large decoder. I suggest to reduce the channels_dec to 32 and use higher bitrate. (If the mixed-precision training fails, you may try to reduce the learning rate.)
Also, our colleagues found that applying FINITE SCALAR QUANTIZATION to HiFi-Codec improves the speech quality at ultra-low bitrates. It would be non-trivial, but I think it is worth trying to replace the RVQ to FSQ for HILCodec. (https://arxiv.org/pdf/2309.15505)
Regarding inference,
By default, HILCodec encodes and decodes 320 samples at a time @ 24kHz. This introduces a delay of 13.3ms + computation time. If increasing the delay is acceptable, changing the codec to encode and decode more samples at once will speed up the RTF. (The number of samples processed at a time must be a multiple of 320, such as 640, 960, etc. If you want to process a different number of samples at once, you can adjust the strides of [2, 4, 5, 8] to other values and then train. Currently, 1 frame corresponds to 2x4x5x8=320 samples.)
First of all, thank you for sharing your work. It's great to see an open-source project covering everything from training to deployment. 👍👍
We're working on making HILcodec run in web browsers. However, using ONNX WebAssembly didn't achieve real-time processing, and WebGPU didn't help either.
I think the computational cost needs to be reduced to less than half.
So, we're trying to reduce the model size at the expense of bitrate.
Is it enough to just reduce channels_enc and channels_dec, or are there other tuning points I should look into?
Also, if you have any ideas about deploying in web applications, please share. 🙏
Thank you!
The text was updated successfully, but these errors were encountered: