What if audio could be converted to an image, and then seam carving could be used on the image, and then the image could be converted back to audio. Would it sound interesting? Would it be useful somehow as a sound design tool?
It didn't quite work out.
With some string, glue, experience with Go and some output from GPT4, I created the wavecarve
package for Go. This package provides these functions:
- A function for reading a
.wav
file:ReadWavFile(filePath string) ([]int16, WAVHeader, error)
- A function for creating and writing to a
.wav
file:WriteWavFile(filePath string, int16s []int16, header WAVHeader)
- A function for converting audio to an image (more or less, the conversion is a bit lossy, unfortunately):
CreateSpectrogramFromAudio(int16s []int16) (*image.RGBA, error)
- A function for removing the least interesting parts of the image, using the excellent github.com/esimov/caire package:
CarveSeams(img *image.RGBA, newWidthInPercentage float64) (*image.RGBA, error)
- And finally, a function for converting the image back to audio:
CreateAudioFromSpectrogram(img *image.RGBA) ([]int16, error)
These functions are used by the utilities that are included in the cmd
directory, which are:
cmd/spectrogram
- a utility that readsinput.wav
, creates a visual representation of the audio (a spectrogram with phase information) and outputs the image tospectrogram.png
.cmd/recreate
- a utility that readsinput.wav
, creates a visual representation of the audio, uses this representation to try to re-create the audio (a lossy process), and outputsoutput.wav
.cmd/carve
- a utility that readsinput.wav
, creates a visual representation, seams carves the image to remove the least interesting parts, writes this image tocarved.png
and then creates audio from the image and outputsoutput.wav
.
Note that the generated .wav
files are unesessarily large with a little bit of audio at the start and a lot of silence at the end and needs to be trimmed down manually after having being generated. This might be fixed in a future version.
Here is the example audio I used as an input file:
https://github.com/xyproto/wavecarve/raw/main/wav/example.wav
Here is a spectrogram created with cmd/spectrogram
:
Here is a seam carved version of the spectrogram, reduced to 50% of the width:
And here is the re-created audio from example.wav
, created with cmd/recreate
and then converted to .mp3
. The audio has lost quite a bit of quality in the process and is not particularly pleasing to listen to:
https://github.com/xyproto/wavecarve/raw/main/mp3/output.mp3
If the carved image is used to re-create audio instead, this is the result:
https://github.com/xyproto/wavecarve/raw/main/mp3/carved.mp3
Even though the audio is of low quality, one can get a hint of which effect seam carving has on audio. It's not particularly pleasing.
- A spectrogram + phase information is not a great representation of audio, since converting audio to this format and back is a very lossy process.
- Seam carving does not produce a particularly interesting effect on the audio, using these types of spectrograms.
- There might be other visual representations of audio that gives much better results, though.
- It's not a great sound design tool, unless a lot of filters are applied afterwards, perhaps.