Make a donation
Free and royalty-free sounds library for yours commercial or personal projects.
Logo of the website BigSoundBank.com
Logo of the website BigSoundBank.com
⚠️ BigSoundBank needs you to stay alive! Make a (small) donation ⚠️
The Blog

AI turns your vocal imitations into pro sound effects.

An AI to Transform Voice into Realistic Sound Effects

“Sketch2Sound” is the name of an innovative research project led by Adobe Research, in collaboration with Northwestern University. The main authors of the study are Hugo Flores Garcia, Gustavo Goretkin, Bryan Pardo, and Brian Hargreaves. Their goal? To offer creators the ability to generate realistic sound effects from a simple vocal imitation and a text description.

Imagine saying "pfff boom" and getting a believable explosion sound effect in a video game, or humming a "whoosh" to illustrate a spaceship passing by in a film. This is exactly what Sketch2Sound allows: interpreting the human voice as a sound sketch (“sketch”), which artificial intelligence then transforms into a finished sound.

CCuwefGG1ds

A Hybrid Technique: Diffusion and Voice Control

Technically, Sketch2Sound is based on a latent diffusion model called DiT (Diffusion Transformer). The model is conditioned by three signals extracted from the vocal imitation: loudness, spectral centroid (brightness), and pitch. These signals serve as temporal guides to generate a realistic audio sound from random noise.

An audio encoding in the style of CLAP (Contrastive Language-Audio Pretraining) also allows integration of the desired sound’s text description. By combining these two modalities — vocal imitation and natural language — the system becomes extremely powerful and intuitive, even for non-musicians or non-technicians.

A Promising Tool… and Worrying for Some Professions

The advancement is undeniable. Sketch2Sound could radically simplify the sound creation process, enabling a larger number of people to design custom sound effects without access to expensive sound libraries or recording studios. For sound designers, it’s a revolution: a tool capable of quickly generating high-quality sounds directly inspired by their intentions.

But this automation also raises legitimate concerns, especially in the sound design profession. If an AI can turn a vocal “click” into a gunshot sound, what will remain of the meticulous work of foley artists? These often invisible artists, who manipulate objects and materials to create a coherent sound universe, could see their expertise challenged or even replaced.

What Now?

Sketch2Sound is not yet a commercial product but a research prototype. It is therefore not yet accessible to the general public. However, Adobe has presented convincing demonstrations and seems to consider future integration into its creative tools.

Artificial intelligence is becoming more and more capable every day of interpreting our creative intentions. The question remains whether it will stay a tool in service of artists… or profoundly reshape the boundaries of sound professions.

And you, do you think Sketch2Sound is a useful tool for creators… or a threat to sound artisans?

Source(s) : Arxiv.org, Hugofloresgarcia.art

"Any news, information to share or writing talents? Contact me!"

21/05/2025