Wikipedia Deploys AI to Expand Its Ranks of Human Editors

Using artificial intelligence to check Wikipedia entries' quality might seem to mean less work for humans. But it's meant to get more humans to stick around.
2000pxWikipedialogov2enS
Evan Mills/WIRED

Aaron Halfaker just built an artificial intelligence engine designed to automatically analyze changes to Wikipedia.

Wikipedia is the online encyclopedia anyone can edit. In crowdsourcing the creation of an encyclopedia, the not-for-profit website forever changed the way we get information. It's among the ten most-visited sites on the Internet, and it has swept tomes like World Book and Encyclopedia Britannica into the dustbin of history. But it's not without flaws. If anyone can edit Wikipedia, anyone can mistakenly add bogus information. And anyone can vandalize the site, purposefully adding bogus information. Halfaker, a senior research scientist at the Wikimedia Foundation, the organization that oversees Wikipedia, built his AI engine as a way of identifying such vandalism.

In one sense, this means less work for the volunteer editors who police Wikipedia's articles. And it might seem like a step toward phasing these editors out, another example of AI replacing humans. But Halfaker's project is actually an effort to increase human participation in Wikipedia. Although some predict that AI and robotics will replace as much as 47 percent of our jobs over the next 20 years, others believe that AI will also create a significant number of new jobs. This project is at least a small example of that dynamic at work.

"This project is one attempt to bring back the human element," says Dario Taraborelli, Wikimedia's head of research, "to allocate human attention where it's most needed."

Don't Scare the Newbies

In the past, if you made a change to an important Wikipedia article, you often received an automated response saying you weren't allowed to make the change. The system wouldn't let you participate unless you followed a strict set of rules, and according to study by Halfaker and various academics, this rigidity prevented many people from joining the ranks of regular Wikipedia editors. A 2009 study indicated that participation in the project had started to decline, just eight years after its founding.

"It's because the newcomers don't stick around," Halfaker says. "Essentially, Wikipedians had traded efficiency of dealing with vandals and undesirable people coming into the wiki for actually offering a human experience to newcomers. The experience became this very robotic and negative experience."

With his new AI project—dubbed the Objective Revision Evaluation Service, or ORES—Halfaker aims to boost participation by making Wikipedia more friendly to newbie editors. Using a set of open source machine learning algorithms known as SciKit Learn—code freely available to the world at large—the service seeks to automatically identify blatant vandalism and separate it from well-intentioned changes. With a more nuanced view of new edits, the thinking goes, these algorithms can continue cracking down on vandals without chasing away legitimate participants. It's not that Wikipedia needs to do away with automated tools to attract more human editors. It's that Wikipedia needs better automated tools.

"We don't have to flag good-faith edits the same way we flag bad-faith damaging edits," says Halfaker, who used Wikipedia as basis for his PhD work in the computer science department at the University of Minnesota.

In the grand scheme of things, the new AI algorithms are rather simple examples of machine learning. But they can be effective. They work by identifying certain words, variants of certain words, or particular keyboard patterns. For instance, they can spot unusually large blocks of characters. "Vandals tend to mash the keyboard and not put spaces in between their characters," Halfaker says.

Halfaker acknowledges that the service can't going to catch every piece of vandalism, but he believes it can catch most. "We're not going to catch a well-written hoax with these strategies," he says. "But it turns out that the vast majority of vandalism is not very clever."

Wikipedia Articles That Write Themselves?

Elsewhere, the giants of the Internet—Google, Facebook, Microsoft, and others—are embracing a new breed of machine learning known as deep learning. Using neural networks—networks of machines that approximate the web of neurons in the human brain—deep learning algorithms have proven adept at identifying photos, recognizing spoken words, and translating from one language to another. By feeding photos of a dog into a neural net, for instance, you can teach it to identify a dog.

With these same algorithms, researchers are also beginning to build systems that understand natural language—the everyday way that humans speak and write. By feeding neural nets scads of human dialogue, you can teach machines to carry on a conversation. By feeding them myriad news stories, you can teach machines to write their own articles. In these cases, neural nets are a long way from real proficiency. But they point towards a world where, say, machines can edit Wikipedia.

Halfaker believes that such a world is a long way off. And even if it arrives, he says, Wikipedia will still need humans to guide those neural networks. "I'm not sure we'll ever get to the place where an algorithm will beat human judgment—or we won't get there anytime soon," he says. "But even in that case, we'll still want human judgment as part of the process." That's why, ironically, he has built an AI service that can expand the ranks of Wikipedia editors.

It's telling that he and the Wikimedia Foundation are not implementing these algorithms from a central location. They're offering the algorithms as an online service that the broader Wikipedia community can use as it sees fit. "We're making it easy to experiment and easy to critique the algorithms," he says. "We want to enable a conversation, to move us towards a place where we're dealing with new content and new editors in better ways." It's AI. But, then again, it's also very human.