Human senses are remarkable. We all take for granted how very easily we can pick out a familiar face from a crowd. It's so simple, in fact, we hardly consider it. But how easily can we spot a voice we know in the herd? A new study from Canada has found that — more than 99 percent of the time — two words are enough for people with normal hearing to distinguish the voice of a close friend or relative among other voices.
Conducted by Julien Plante-Hébert who is a voice recognition doctoral student at the Department of Linguistics and Translation at the University of Montreal, the study involved playing a multiple recordings to 44 Canadian French-speaking people aged between 18 and 65. The subjects were then asked which of the 10 male voices they had heard was familiar to them. It turned out that, to make an accurate identification, "merci beaucoup" was all they needed to hear.
"The auditory capacities of humans are exceptional in terms of identifying familiar voices," says Plante-Hébert. "At birth, babies can already recognize the voice of their mothers and distinguish the sounds of foreign languages." To evaluate these skills, he created a series of voice "lineups" — a similar technique to the visual identification procedures used by police in that a witness is asked to look at number of individuals sharing similar physical traits and pick out the one most familiar.
"A voice lineup is an analogous procedure in which several voices with similar acoustic aspects are presented," says Plante-Hébert. "In my study, each voice lineup contained different lengths of utterances varying from one to eighteen syllables." It was found that the degree to which the voice could be recognized was directly related to the length of phrase heard. "Familiarity between the target voice and the identifier was defined by the degree of contact between the interlocutors," says Plante-Hébert.
The researcher found that familiarity with the speaker had little effect on the ability to identify short utterances — even with a friendly voice, the spoken phrases were difficult for the participants to decipher. However, with phrases of four or more syllables — such as "merci beaucoup" — there was an almost 100 percent success rate for very familiar voices. Despite needing that little bit more data, the human ear (and brain) is remarkably more skilled than our best voice processing instruments.
"Identification rates exceed those currently obtained with automatic systems," says Plante-Hébert. The best speech recognition apparatus, he suggests, is much less efficient than our own auditory systems — having only a 92 percent success rate compared to more than 99.9 percent for humans. Even in a noisy environment, humans can exceed machine-based recognition because of our brain's ability to filter out ambient noise.
"Automatic speaker recognition is in fact the least accurate biometric factor compared to fingerprints or face or iris recognition," Plante-Hébert says. "While advanced technologies are able to capture a large amount of speech information, only humans so far are able to recognize familiar voices with almost total accuracy." So it seems our ears are just as good as our eyes at picking out of the crowd the familiar nuances, the inflections, tones and patterns of speech, we are used to. You heard it here first.