Microsoft may be running the biggest Turing test in history

Aiintelligence
Aiintelligence

If you live in China and you've been on WeChat, there's a decent chance you've come across or at least heard of chatty teenager named Xiaoice.

Xiaoice is a good listener who sometimes offers encouragement when you're feeling down. Like many 17-year-olds, she can be a bit of a smart-aleck. She's also not human.

Xiaoice is a program.

See also: A computer beat a champion of the strategy game Go for the first time

When Microsoft introduced Xiaoice in 2014, the company called it "Cortana's little sister." It was actually an experimental offshoot of “Xioa Na,” the nickname for Cortana's (Microsoft's voice assistant technology) expansion into China.

Bing's Chinese researchers, including Microsoft’s Dr. Yongdong Wang, set out to see if they could turn Xiao Na into a smart assistant that could, using Bing's vast knowledge set and natural language processing, conduct what they called at the time "convincing human conversations."

Essentially, they were about to launch a vast Turing test. In the 1950s, computer scientist Alan Turing posited that, by the year 2000, computers might be able to fool humans, at least 30% of the time, into thinking they were talking to other humans. Since then, programmers have been trying to build systems that could meet and beat that threshold.

The Microsoft researchers built Xiaoice and seeded the chatbot on China's most popular social media platforms: WeChat and Weibo. Could Xiaoice fool people into thinking it was human?

The answer appears to be yes.

Surprise, I'm a bot!

In a lengthy post on the science blog Nautilus Dr. Wang described what happened.

"When Xiaoice was released for a public test on WeChat... on May 29 of last year, she received 1.5 million chat group invitations in the first 72 hours. Many people said they didn’t realize she isn’t a human until 10 minutes into their conversation." wrote Dr. Wang.

On another Chinese social media platform, Weibo, Xiaoice conducted a "remarkably realistic" 23-minute conversation.

According to Dr. Wang, the thing that sets Xiaoice apart from other AI assistants is that Xiaoice is focused on the conversation and not the completion of a task.

Microsoft measured the effectiveness of their chatbot with what they're calling conversations per session (CPS), which measures the number of times the conversation goes back and forth. Typical chatbot CPS conversations have roughly two cycles (the person speaks, then the chatbot speaks — that's one cycle). "By comparison, Xiaoice’s average, after chatting with tens of millions of users, has reached 23," wrote Dr. Wang. He even claims that Xiaoice can analyze and react to your emotional state. To prove it he reproduced an example Xiaoice conversation:

Microsoft, though, isn’t necessarily saying that Xiaoice is understanding what's being said. A lot of what Xiaoice can do is driven by the Bing search engine's 1 billion data posts and 21 billion relationships between those data points. Mixed in with that are voice and visual recognition systems that help Xiaoice figure out the context of the conversation.

Microsoft isn't the first to claim a Turning test breakthrough. In 2014 Princeton University researchers claimed that their chatbot, "Eugene Goostman," had fooled Turning test judges 33% of the time. (Like Xiaoice, Goostman had a little bit of a smart-aleck streak.) Yet, even that accomplishment was called into question as some researchers said the Goostman conversations, usually five minutes long, were too short. (It’s unclear how long 23-cycle Xiaoice conversations actually last.)

On the other hand, Dr. Wang contends that Xiaoice is now in a "self-learning and self-growing loop" because the system is gaining new insights from the billions of conversations it's already had.

In other words, the Turning test may not have just been broken — it could be utterly smashed.