[BUG FIX] [Stable Audio Pipeline] Resolve torch.Tensor.new_zeros() TypeError in function prepare_latents caused by audio_vae_length #10306

syntaxticsugr · 2024-12-19T17:53:41Z

What does this PR do?

Parameter initial_audio_waveforms when passed torch.Tensor as described here raises TypeError: new_zeros(): argument 'size' failed to unpack the object at pos 3 with error "type must be tuple of ints,but got float"

Notebook to Reproduce the Error

In function prepare_latents:

audio_vae_length = self.transformer.config.sample_size * self.vae.hop_length

audio_vae_length evaluates to a float because self.transformer.config.sample_size returns a float

audio = initial_audio_waveforms.new_zeros(audio_shape)

torch.Tensor.new_zeros() accepts a single argument size – a list, tuple, or torch.Size of integers defining the shape of the output tensor. Source

audio_shape = (batch_size // num_waveforms_per_prompt, audio_channels, audio_vae_length)

But audio_shape is of type – (int, int, float) because audio_vae_length is a float

Proposed Fix is to wrap self.transformer.config.sample_size with int()

Notebook with the Applied Fix

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline?
Do not work on the main branch. 🤦‍♂️
Did you read our philosophy doc (important for complex PRs)?
Was this discussed/approved via a GitHub issue or the forum? Please add a link to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

…ize' failed to unpack the object at pos 3 with error "type must be tuple of ints,but got float" torch.Tensor.new_zeros() takes a single argument size (int...) – a list, tuple, or torch.Size of integers defining the shape of the output tensor. in function prepare_latents: audio_vae_length = self.transformer.config.sample_size * self.vae.hop_length audio_shape = (batch_size // num_waveforms_per_prompt, audio_channels, audio_vae_length) ... audio = initial_audio_waveforms.new_zeros(audio_shape) audio_vae_length evaluates to float because self.transformer.config.sample_size returns a float

HuggingFaceDocBuilderDev · 2024-12-19T18:27:50Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

hlky

Thanks @syntaxticsugr!

…peError in function prepare_latents caused by audio_vae_length (huggingface#10306) [BUG FIX] [Stable Audio Pipeline] TypeError: new_zeros(): argument 'size' failed to unpack the object at pos 3 with error "type must be tuple of ints,but got float" torch.Tensor.new_zeros() takes a single argument size (int...) – a list, tuple, or torch.Size of integers defining the shape of the output tensor. in function prepare_latents: audio_vae_length = self.transformer.config.sample_size * self.vae.hop_length audio_shape = (batch_size // num_waveforms_per_prompt, audio_channels, audio_vae_length) ... audio = initial_audio_waveforms.new_zeros(audio_shape) audio_vae_length evaluates to float because self.transformer.config.sample_size returns a float Co-authored-by: hlky <hlky@hlky.ac>

…peError in function prepare_latents caused by audio_vae_length (#10306) [BUG FIX] [Stable Audio Pipeline] TypeError: new_zeros(): argument 'size' failed to unpack the object at pos 3 with error "type must be tuple of ints,but got float" torch.Tensor.new_zeros() takes a single argument size (int...) – a list, tuple, or torch.Size of integers defining the shape of the output tensor. in function prepare_latents: audio_vae_length = self.transformer.config.sample_size * self.vae.hop_length audio_shape = (batch_size // num_waveforms_per_prompt, audio_channels, audio_vae_length) ... audio = initial_audio_waveforms.new_zeros(audio_shape) audio_vae_length evaluates to float because self.transformer.config.sample_size returns a float Co-authored-by: hlky <hlky@hlky.ac>

…ve load performance on network mounts (#10305) * Add no_mmap arg. * Fix arg parsing. * Update another method to force no mmap. * logging * logging2 * propagate no_mmap * logging3 * propagate no_mmap * logging4 * fix open call * clean up logging * cleanup * fix missing arg * update logging and comments * Rename to disable_mmap and update other references. * [Docs] Update ltx_video.md to remove generator from `from_pretrained()` (#10316) Update ltx_video.md to remove generator from `from_pretrained()` * docs: fix a mistake in docstring (#10319) Update pipeline_hunyuan_video.py docs: fix a mistake * [BUG FIX] [Stable Audio Pipeline] Resolve torch.Tensor.new_zeros() TypeError in function prepare_latents caused by audio_vae_length (#10306) [BUG FIX] [Stable Audio Pipeline] TypeError: new_zeros(): argument 'size' failed to unpack the object at pos 3 with error "type must be tuple of ints,but got float" torch.Tensor.new_zeros() takes a single argument size (int...) – a list, tuple, or torch.Size of integers defining the shape of the output tensor. in function prepare_latents: audio_vae_length = self.transformer.config.sample_size * self.vae.hop_length audio_shape = (batch_size // num_waveforms_per_prompt, audio_channels, audio_vae_length) ... audio = initial_audio_waveforms.new_zeros(audio_shape) audio_vae_length evaluates to float because self.transformer.config.sample_size returns a float Co-authored-by: hlky <hlky@hlky.ac> * [docs] Fix quantization links (#10323) Update overview.md * [Sana]add 2K related model for Sana (#10322) add 2K related model for Sana * Update src/diffusers/loaders/single_file_model.py Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com> * Update src/diffusers/loaders/single_file.py Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com> * make style --------- Co-authored-by: hlky <hlky@hlky.ac> Co-authored-by: Sayak Paul <spsayakpaul@gmail.com> Co-authored-by: Leojc <liao_junchao@outlook.com> Co-authored-by: Aditya Raj <syntaxticsugr@gmail.com> Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> Co-authored-by: Junsong Chen <cjs1020440147@icloud.com> Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>

syntaxticsugr changed the title ~~[BUG FIX] [Stable Audio Pipeline] initial_audio_waveforms raises TypeError: new_zeros()~~ [BUG FIX] [Stable Audio Pipeline] Resolve TypeError in function prepare_latents caused by new_zeros() Dec 19, 2024

syntaxticsugr changed the title ~~[BUG FIX] [Stable Audio Pipeline] Resolve TypeError in function prepare_latents caused by new_zeros()~~ [BUG FIX] [Stable Audio Pipeline] Resolve torch.Tensor.new_zeros() TypeError in function prepare_latents caused by audio_shape Dec 19, 2024

Merge branch 'main' into main

77f55d2

hlky approved these changes Dec 19, 2024

View reviewed changes

Merge branch 'main' into main

a626ab4

hlky merged commit 9020086 into huggingface:main Dec 20, 2024
12 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG FIX] [Stable Audio Pipeline] Resolve torch.Tensor.new_zeros() TypeError in function prepare_latents caused by audio_vae_length #10306

[BUG FIX] [Stable Audio Pipeline] Resolve torch.Tensor.new_zeros() TypeError in function prepare_latents caused by audio_vae_length #10306

syntaxticsugr commented Dec 19, 2024 •

edited

Loading

HuggingFaceDocBuilderDev commented Dec 19, 2024

hlky left a comment

[BUG FIX] [Stable Audio Pipeline] Resolve torch.Tensor.new_zeros() TypeError in function prepare_latents caused by audio_vae_length #10306

[BUG FIX] [Stable Audio Pipeline] Resolve torch.Tensor.new_zeros() TypeError in function prepare_latents caused by audio_vae_length #10306

Conversation

syntaxticsugr commented Dec 19, 2024 • edited Loading

What does this PR do?

Before submitting

HuggingFaceDocBuilderDev commented Dec 19, 2024

hlky left a comment

Choose a reason for hiding this comment

syntaxticsugr commented Dec 19, 2024 •

edited

Loading