CIO upfront: 5 essentials to create an AI that speaks human

CIO upfront: 5 essentials to create an AI that speaks human

Your communications with customers can’t be robotic and stilted, writes Kevin Adams of FaceMe

Credit: Dreamstime

You can recreate a more human experience, which many people want more of, in a digital world

Kevin Adams, FaceMe

The spoken word has been around for roughly 100,000 years; the written word, a little over 5,000. As humans, we’ve millennia more practice with—and perhaps a greater affinity to—what’s being said and how we say things than words on a page or screen. 

There’s a reason important interactions are usually saved for face-to-face conversation, because there’s a greater emotional connection and impact when you combine voice, tone of voice and body language

And so, there’s a huge difference between writing for the ears and writing for the eyes.

For example, if we were to speak this sentence, we wouldn’t start it with “for example”. We’re unlikely to break out a word like “affinity” like in the opening of this article either. Truth be told, we probably wouldn’t open a verbal conversation with anything other than “hi”.

The differences when writing for the ear instead of the eye aren’t usually an issue, and not something most people will even recognise, unless you start writing for conversational AI -like developing your chatbot into a fully fledged digital human.

Our work at FaceMe has included writing natural language processing (NLP) that acts as a dialogue tree for digital humans. And “natural” is an important word there. If a digital human doesn’t sound natural, it’s unlikely to convey the emotional connection and conversational experience you want for your business.

Great companies have their AI voice as part of their brand

Kevin Adams, FaceMe

So, how do you do that? Here are some things to keep in mind.

What are the benefits of voice (when done well)?

  • It’s often more concise: normal conversation involves shorter sentences and simpler language.

  • It helps to simplify complexity. Or, as Deloitte puts it: “advanced voice capabilities allow interaction with complex systems in natural, nuanced conversations” .

  • It’s more personable and can create a more emotional connection than plain text (there’s more character in voice).

  • Is better as a brand experience. Great companies have their AI voice as part of their brand (Siri is the biggest example). You can’t add as much “brand” with just text.

One of the biggest benefits of using voice communication instead of (or as well as) text comes from how the audience best takes in information. When expressing themselves, the actual words people say only make up 7 per cent of emotional impact. Much more essential are the tones in which those words are said (38 per cent) and the facial expressions used when saying them (55 per cent).

It all goes towards recreating a more human experience, which many people want more of in a digital world. In this recent study on customer expectations, researchers Gladly found how “[customers now] want the same warmth and seamless experience they expect with human support in their automated support, too.”

So, your communications with customers can’t be robotic and stilted, but many are. Gladly also found that 69 per cent of customers say they’re being treated like a case number, not a human.

Having a human interaction is impossible when businesses write for voice but still use the same language they do for text.

Hopefully that explains the “why”, and you’re now all in and ready to hear the “how”. Fortunately, how you write for the ear is actually quite simple—in many ways, much simpler than writing for the eye.

How to create an AI that speaks human:

1. Use short sentences and statements

In written language, studies show that when a sentence is 14 words in length, the average reader understand more than 90 per cent of it. At 43 words, that drops to less than 10 per cent. It’s even more difficult in speech, where you can’t just re-read what’s being said.

That’s why we naturally tend to use short sentences in speech. With regular micro-breaks in between sentences, the listener gets a couple of seconds to process and comprehend what’s being said.

That also goes for what’s being said. Human-like conversation is rarely one way, and it’s less engaging when it is. So, if you’ve written a response that’s 300 words, you’ll likely find that your listener can’t process it all in one go.

We’d recommend keeping responses well under 100 words, when possible. If it’s not possible, be prepared for:

  • Your customers to ask the digital human to repeat certain parts of their dialogue, so you’ll have to have this as part of your scripting.

  • Your customers to interrupt the digital human mid-conversation at the cost of missing potential important info.

  • Your customers to get disengaged and leave the conversation.

  • A Plan B scenario: like using on-screen visuals to complement the conversation or hyperlinks to parts of your website that explain more.

2. Use short, everyday words

Supercalifragilisticexpialidocious—just because the sound of it is something quite atrocious—is terrible for anything but song. Don’t take a leaf out of Mary Poppins’ book, trying to impress her peers with big words no one knows; instead aim for simple language everyone uses.

For one, you want your audience to require as little effort as possible comprehending what’s being said. Short, everyday words are more accessible, with studies showing that the effort required to comprehend speech is significantly greater for older people or those with a hearing impairment.

The use of common words is important, too. More research shows that knowing 2,000 words gives 80 per cent coverage of written text but 96 per cent coverage of informal speech.

When it comes to word length, bigger is rarely better. Words with three syllables or fewer are generally your best bet in spoken word. Four and more syllables are passable when you have no other options, like if you’re naming a certain product or location, or when no other word will do.

Perhaps the most famous and best speech of all time, by Martin Luther King Jnr., demonstrates these rules to a tee - and it’s likely more impactful by how he uses everyday vocabulary and the low effort needed to process his shorter words.

“I have a dream that my four little children will one day live in a nation where they will not be judged by the color of their skin but by the content of their character...”

Long story short: be like Martin, not like Mary.

3. Don’t fear contractions

We have always said that you should not fear contractions. Does that scan well as text? It’s even worse if you read it out loud.

Contractions like she’s (she is), there’s (there is) and wouldn’t (would not) help speech flow more naturally. 

Whether you’re writing for a text-only chatbot or for a digital human who will speak to your customers, contractions stop your words coming across as robotic. There’s not much more to it than that.

4. Get to the point

Remember at the start of this guide when we started talking about the histories of the spoken and the written word? You can get away with that in text; but if you were to start a conversation with a friend in the same way, he or she would be more confused than engaged.

So get to the point. You don’t need a hook when you’re writing for the ears because voice is already more engaging. Just give people what they need.

Apparently, eight out of 10 people will read an article headline, but only two out of 10 will read the rest (so thanks if you’re reading this part). In speech, you have about a minute before people start tuning in or out.

But you shouldn’t take their engagement for granted. If a digital human helps a customer from start to finish within that minute, even better.

5. Read it out loud

Simple and effective. You often don’t know how something will sound until it’s read aloud. Here you’ll understand the rhythm and stresses of the words and sentences you’re saying. If something sounds a little robotic, you’ll soon hear it.

For your most important interactions, you can even run through the ideal “happy path” dialogue with another person, to make sure it comes across as conversational as possible.

Simply script your ideal conversation, from start to finish, as it will be put into your NLP and read through it with another person.

Kevin Adams is Conversation Experience Designer at FaceMe

Credit: Dreamstime

Sign up for CIO newsletters for regular updates on CIO news, career tips, views and events. Follow CIO New Zealand on Twitter:@cio_nz

Join the CIO New Zealand group on LinkedIn. The group is open to CIOs, IT Directors, COOs, CTOs and senior IT managers.

Join the newsletter!


Sign up to gain exclusive access to email subscriptions, event invitations, competitions, giveaways, and much more.

Membership is free, and your security and privacy remain protected. View our privacy policy before signing up.

Error: Please check your email address.

Tags AInlpcxFaceMeDXdigital human

More about DeloitteFaceMePlan BSimpleTwitter

Show Comments