Featured Product
This Week in Quality Digest Live
Innovation Features
Daniel Croft
Noncontact scanning for safer, faster, more accurate, and cost-effective inspections
National Physical Laboratory
Using Raman spectroscopy for graphene and related 2D materials
Ashley Hixson
Partnership with Hexagon’s Manufacturing Intelligence division provides employable metrology skills
Annual meeting in Phoenix, April 26–28
Krysten Crawford
Stanford researchers designed a program to accelerate hiring for minorities and women

More Features

Innovation News
Alliance will help processors in the US, Canada, and Mexico
New features revolutionize metrology and inspection processes with nondimensional AI inspection
Strategic partnership expands industrial machining and repair capabilities
Supports robots from 14 leading manufacturers
AI designed to improve productivity and processes
Ultrasonic flaw detector now has B/C scan capability, improved connectivity, and an app to aid inspection
Tapping tooz for AR/VR competence center
Provides opportunities to deepen leadership capabilities

More News

Vanessa Bates Ramirez

Innovation

Meta Is Building an AI to Fact-Check Wikipedia

That’s right—all 6.5 million articles

Published: Thursday, September 29, 2022 - 12:02

Most people older than 30 probably remember doing research with good old-fashioned encyclopedias. You’d pull a heavy volume from the shelf, check the index for your topic of interest, then flip to the appropriate page and start reading. It wasn’t as easy as typing a few words into the Google search bar, but on the plus side, you knew that the information you found in the pages of the Britannica or the World Book was accurate and true.

Not so with internet research today. The overwhelming multitude of sources is confusing enough, but with the proliferation of misinformation it’s a wonder any of us believe a word we read online.

Wikipedia is a case in point. As of early 2020, the site’s English version was averaging about 255 million page views per day, making it the eighth-most-visited website on the internet. As of last month, it had moved up to spot No. 7, and the English version currently has more than 6.5 million articles.

But as high-traffic as this go-to information source may be, its accuracy leaves something to be desired. The site’s page on its own reliability states, “The online encyclopedia does not consider itself to be reliable as a source and discourages readers from using it in academic or research settings.”

Meta—formerly Facebook—wants to change this. In a blog post published last month, the company’s employees describe how AI could help make Wikipedia more accurate.

Though tens of thousands of people participate in editing the site, the facts they add aren’t necessarily correct; even when citations are present, they’re not always accurate nor even relevant.

Meta is developing a machine learning model that scans these citations and cross-references their content to Wikipedia articles to verify that not only the topics line up, but specific figures cited are also accurate.

This isn’t just a matter of picking out numbers and making sure they match. Meta’s AI will need to “understand” the content of cited sources. However, “understand” is a misnomer, as complexity theory researcher Melanie Mitchell would tell you, because AI is still in the “narrow” phase, meaning it’s a tool for highly sophisticated pattern recognition, while “understanding” is a word used for human cognition, which is still a very different thing.

Meta’s model will “understand” content not by comparing text strings and making sure they contain the same words, but by comparing mathematical representations of blocks of text, which it arrives at using natural language understanding (NLU) techniques.

“What we have done is to build an index of all these web pages by chunking them into passages and providing an accurate representation for each passage,” Fabio Petroni, Meta’s Fundamental AI Research tech lead manager, tells Digital Trends. “That is not representing word-by-word the passage, but the meaning of the passage. That means that two chunks of text with similar meanings will be represented in a very close position in the resulting n-dimensional space where all these passages are stored.”

The AI is being trained on a set of four million Wikipedia citations, and besides picking out faulty citations on the site, its creators would like it to eventually be able to suggest accurate sources to take their place, pulling from a massive index of data that’s continuously updating.

One big issue left to work out is a grading system for sources’ reliability. A paper from a scientific journal, for example, would receive a higher grade than a blog post. The amount of content online is so vast and varied that you can find “sources” to support just about any claim. But parsing the misinformation from the disinformation (the former means incorrect, while the latter means deliberately deceiving), the peer-reviewed from the nonpeer-reviewed, and the fact-checked from the hastily slapped-together is no small task. But it’s a very important one when it comes to trust.

Meta has open-sourced its model, and those who are curious can see a demo of the verification tool. Meta’s blog post noted that the company isn’t partnering with Wikimedia on this project, and that it’s still in the research phase and not currently being used to update content on Wikipedia.

If you imagine a not-too-distant future where everything you read on Wikipedia is accurate and reliable, wouldn’t that make doing any sort of research a bit too easy? There’s something valuable about checking and comparing various sources ourselves, isn't there? It was a big a leap to go from paging through heavy books to typing a few words into a search engine and hitting the “Enter” key. Do we really want Wikipedia to move from a research jumping-off point to a gets-the-last-word source?

In any case, Meta’s AI research team will continue working toward a tool to improve the online encyclopedia. “I think we were driven by curiosity at the end of the day,” Petroni says. “We wanted to see what was the limit of this technology. We were absolutely not sure if [this AI] could do anything meaningful in this context. No one had ever tried to do something similar.”

First published August 26, 2022, on Singularity Hub.

Discuss

About The Author

Vanessa Bates Ramirez’s picture

Vanessa Bates Ramirez

Vanessa Bates Ramirez is senior editor of Singularity Hub. She’s interested in biotechnology and genetic engineering, the nitty-gritty of the renewable energy transition, the roles technology and science play in geopolitics and international development, and countless other topics.