Sine-Wave Speech

Estimated read time (minus contemplative pauses): 5 min.

Look familiar? For a visual hint, check out An Introduction to Sine-Wave Speech.

Ever notice that when a familiar song is playing in another room, you can clearly hear the words, but when unfamiliar, you can’t make out the words?

I’ve experienced something similar with self-recording musicians—especially voice-shy ones—who think they’ve set their vocal loud in a mix, but it’s barely audible to anyone else. This has happened to me personally with backing parts as well (oboe parts, vocal harmonies, background guitar melodies, etc.).

Familiarity seems to result in experiencing more than is there, more than is actually making it through the walls and the noise—as if your brain, in order to finish constructing the mental content you (unconsciously) expect to experience, is pulling from memory the building materials your environment is failing to provide.

It turns out we can home in on and experiment with this phenomenon with a fascinating technique called sine-wave speech, which I first learned about it on the excellent Brain Science PodcastAndy Clark on Prediction, Action, and the Embodied Mind (BSP 126; January 28, 2016). Listen to the first 2:15 minutes to hear a sine-wave speech demonstration (and I of course recommend the whole episode, especially if you’re interested in embodied cognition, a rising, and I think very promising, interdisciplinary field occupying philosophers, psychologists, linguists, AI-specialists, and others):

You’ll first hear a short, digitally altered recording of a human voice. The recording has been stripped of much of the acoustic information that makes human voices perceptually intelligible to us—in other words, that gives our brains something to turn into an experience of meaningful human speech. Next, you’ll hear the original, unaltered recording. Finally, you’ll again hear the stripped—i.e., sine-wave speech—version, but this time your memory is stocked so that it can fill in the gaps.

For a longer example, check out this “Untitled Test Sequence,” a “Speech-synthesis and sine-wave speech demonstration video, prepared for the artist project Disinformation, premiered in the PoetryFilm ‘Sounds of Love’ event, at the Southbank Centre, London, 19 July 2014”:

Part of what fascinates here me is that at first glance it seems to involve a mechanism for the brain to produce a more accurate representation of the world. But in reality it’s inaccurate: you’re perceiving things that aren’t there. One way to emphasize this discrepancy would be to make a sine-wave speech recording that could be experienced as either of two different sets of words, depending on which set the listener is primed for. That must be possible, right?

Maybe the listener could even be taken back and forth. I don’t think getting both at once would be possible; instead, one interpretation will always win out at a given moment. Maybe which one wins could come down to indirect priming—e.g., given a sentence that starts with the envelope and one that starts with beyond the hope, the former might get the edge because you were briefly shown a mailbox an hour earlier.

I also wonder if hearing the unaltered recording is necessary: Would it be enough, at least with short sequences, to read a transcript of the original recording? If so, would this work for words you’ve never heard before? If so, this could have interesting implications for the sorts of memory involved in this process (these words should be in a language you know well, so that your memory can draw on past experience to assemble the sounds).

And does it work on other animals? Would appropriately primed dogs respond to commands from a stripped voice? (Perhaps that voice would only need to be altered from about 67 Hz upward, the low end of what dog brains turn into sound.)  Given that words must be very different sorts of objects for dogs than they are for humans, would a dog even need to be primed with the original recording? Thinking about this might tell us something about the role meaning plays here. Speaking of which, would it work on a human in a language that person doesn’t understand? With nonsense syllables?

Lots to play with. For a more thorough survey of what’s been studied about sine-wave speech processing, check out An Introduction to Sine-Wave Speech (which also includes a visual example!). I haven’t yet explored everything there myself, but see that some of my questions are addressed there or in linked pages—e.g., nonsense word examples are included in a writeup about another distortion technique: An Introduction to Noise-Vocoded Speech.

Finally, I’m reminded here of a more rigid, or predictable, sort of phenomenological filling-in, called restoration of the missing fundamental: when exposed to a set of frequency intervals spaced to duplicate all but the fundamental of a note (i.e., the overtones without the fundamental), neurons will fire at the frequency of the “missing” fundamental. It strikes me that this is a kind of longterm or embedded version of sine-wave speech: the brain, due to nature and nurture, has been structured to mechanize a certain experience when given the relevant harmonic content. With sine-speech, that structure is established in the short-term (though we can get better at filling in with practice).

It’s interesting to consider how the role of memory—likely involving, in some respects, entirely different sorts of memory—in these scenarios (e.g., in the restoration case, it’s a physical response, like ripples in water: when hit with certain frequencies, neurons fire at a certain rate, etc.; sine-wave speech, like any cognitive process, must also involve “mechanized” physical responses, but I presume meaning and short-term [i.e., echoic] memory play interesting roles as well; I’d wager, for example, that, with fundamental restoration, neurons would fire as expected in a sleeping or [unconscious] comatose person, but we couldn’t expose that person to an unaltered sine-wave speech recording, then observe a gap-filling response upon waking the person1).

Daniel Levitin describes a fascinating demonstration of fundamental restoration in his book This Is Your Brain on Music2: Petre Janata uses electrodes to connect an owl’s brain to an amplifier; the owl is played the “The Blue Danube Waltz” melody, but with the fundamentals removed; the amplifier, receiving signals from the owl’s firing neurons, plays back the missing fundamentals of the melody. (I believe this is also how we’re able to hear—or at least vaguely sense—low notes whose fundamentals extend very low into, or even below, the audible range of human hearing.)

Enjoy or find this post useful? Please consider pitching in a dollar or three to help me do a better job of populating this website with worthwhile words and music. Let me know what you'd like to see more of while you're at it. Transaction handled by PayPal.
Or click the banner to shop at Amazon (at no extra cost: it just gives me some of what would have gone to Amazon).

Further Reading


  1. What about if the person happened to incorporate the recorded voice into her dream? (Something known as sensory incorporation.)
  2. 2007, page 43

Share your thoughts:

Deprecated: Directive 'allow_url_include' is deprecated in Unknown on line 0