AI's Next Frontier: Machines That Understand Language

The AI startup MetaMind has published new research detailing a neural networking system that uses a kind of artificial short-term memory to answer a wide range of questions about a piece of natural language.

With the help of neural networks---vast networks of machines that mimic the web of neurons in the human brain---Facebook can recognize your face. Google can recognize the words you bark into an Android phone. And Microsoft can translate your speech into another language. Now, the task is to teach online services to understand natural language, to grasp not just the meaning of words, but entire sentences and even paragraphs.

At Facebook, artificial intelligence researchers recently demonstrated a system that can read a summary of The Lord of The Rings, then answer questions about the books. Using a neural networking algorithm called Word2Vec, Google is teaching its machines to better understand the relationship between words posted across the Internet---a way of boosting Google Now, a digital assistant that seeks to instantly serve up the information you need at any given moment. Yann LeCun, who oversees Facebook's AI work, calls natural language processing "the next frontier."

Working toward this same end, the AI startup MetaMind has published new research detailing a neural networking system that uses a kind of artificial short-term memory to answer a wide range of questions about a piece of natural language. According to MetaMind, the system can answer everything from very specific queries about what the text describes to more general questions like "What's the sentiment of the text?" or "What's the French translation?" The research, due to appear Wednesday at Arxiv.org, a popular online repository for academic papers, echoes similar research from Facebook and Google, but it takes this work at step further.

"This is a very hot topic, on which the authors of this paper approach or pass the state-of-the-art results on several benchmarks," says Yoshua Bengio, a professor of computer science at the University of Montreal who specializes in artificial intelligence and has reviewed the MetaMind paper. "Their architecture is also interesting in that it is aiming at something potentially very ambitious, trying to sequentially parse a large amount of facts---hopefully one day the whole of Wikipedia and more---in such a way, via a learned semantic representation, that one can answer questions about them."

MetaMind
A Tool for Every Job

Typically referred to as "deep learning," modern neural networking algorithms are so powerful in part because they can handle so many different tasks. Other researchers are using these same algorithms to improve autonomous vehicles and build robots that can learn to screw a cap on a bottle. According to Google engineer Jeff Dean, the company's neural networking systems are driving dozens of its online services across the company, from Google+ to Google Now to Street View. With its paper, MetaMind shows how effective these algorithms can be when applied to a wide range of natural language tasks. "That is precisely what makes the beauty and the interest and importance of machine learning," Bengio says. "It is about generic ways to learn tasks."

MetaMind, which builds deep learning systems for other businesses, describes what's called a Dynamic Memory Network. On one level, it mirrors work from Facebook, providing a way for machines to answer questions about what's said in a particular piece of text. Socher and company have demonstrated their Q&A model on the same dataset as Facebook's system. "This is similar to web search," Socher says, "except you give the actual answer rather than just a bunch of links."

According to the paper, you can feed the system the following piece of text:

Jane went to the hallway.
Mary walked to the bathroom.
Sandra went to the garden.
Daniel went back to the garden.
Sandra took the milk there.
*

And when you ask "Where is the milk?," it will respond: "garden."

All Just Questions

At the same time, the system can judge sentiment---that is, the general feeling the words express. It can identify parts of speech. It can determine the referent of a particular pronoun. And it can translate from one language to another. Basically, the system treats these tasks as additional questions that need answering. Is the text positive or negative? What are the parts of speech? What does "their" or "that' or 'him' refer to? What is the French translation the entire text?

MetaMind

"The insight---and it's almost trivial---is that every task in NLP is actually a question-and-answer task," says Metamind co-founder and CEO Richard Socher, whose Stanford University PhD focused on machine learning, computer vision, and natural language processing.

The system does all this using what Socher calls "episodic memory." If a neutral network is analogous to the cerebral cortex---the means of processing information---its episodic memory is something akin to hippocampus, which provides short-term memory in humans. In the example of the garden and the the milk, the system must "remember" that Daniel is in the garden before determining where the milk is. "You can't do transitive reasoning without episodic memory," Socher says.

And, he explains, you can use much the same setup to do analyze sentiment or translate words into a new language. "One model---one dynamic memory network---can solve these very different problems," he says. Bengio points out that MetaMind actually trained a slightly different model for each of the tasks. But the ultimately aim is to unify all those tasks. It's another step into the new frontier.