By John Borland | Wired.com
| Aug. 14, 2007
On November 17th, 2005, an anonymous Wikipedia user deleted 15 paragraphs from an article on e-voting machine-vendor Diebold, excising an entire section critical of the company's machines. While anonymous, such changes typically leave behind digital fingerprints offering hints about the contributor, such as the location of the computer used to make the edits.
In this case, the changes came from an IP address reserved for the corporate offices of Diebold itself. And it is far from an isolated case. A new data-mining service launched Monday traces millions of Wikipedia entries to their corporate sources, and for the first time puts comprehensive data behind longstanding suspicions of manipulation, which until now have surfaced only piecemeal in investigations of specific allegations.
Wikipedia Scanner -- the brainchild of Cal Tech computation and neural-systems graduate student Virgil Griffith -- offers users a searchable database that ties millions of anonymous Wikipedia edits to organizations where those edits apparently originated, by cross-referencing the edits with data on who owns the associated block of internet IP addresses.
Inspired by news last year that Congress members' offices had been editing their own entries, Griffith says he got curious, and wanted to know whether big companies and other organizations were doing things in a similarly self-interested vein.
"Everything's better if you do it on a huge scale, and automate it," he says with a grin.
This database is possible thanks to a combination of Wikipedia policies and (mostly) publicly available information.
The online encyclopedia allows anyone to make edits, but keeps detailed logs of all these changes. Users who are logged in are tracked only by their user name, but anonymous changes leave a public record of their IP address.
The organization also allows downloads of the complete Wikipedia, including records of all these changes.
Griffith thus downloaded the entire encyclopedia, isolating the XML-based records of anonymous changes and IP addresses. He then correlated those IP addresses with public net-address lookup services such as ARIN, as well as private domain-name data provided by IP2Location.com.
The result: A database of 34.4 million edits, performed by 2.6 million organizations or individuals ranging from the CIA to Microsoft to Congressional offices, now linked to the edits they or someone at their organization's net address has made.
Some of this appears to be transparently self-interested, either adding positive, press release-like material to entries, or deleting whole swaths of critical material.
Voting-machine company Diebold provides a good example of the latter, with someone at the company's IP address apparently deleting long paragraphs detailing the security industry's concerns over the integrity of their voting machines, and information about the company's CEO's fund-raising for President Bush.
The text, deleted in November 2005, was quickly restored by another Wikipedia contributor, who advised the anonymous editor, "Please stop removing content from Wikipedia. It is considered vandalism."
A Diebold Election Systems spokesman said he'd look into the matter but could not comment by press time.
Wal-Mart has a series of relatively small changes in 2005 that that burnish the company's image on its own entry while often leaving criticism in, changing a line that its wages are less than other retail stores to a note that it pays nearly double the minimum wage, for example. Another leaves activist criticism on community impact intact, while citing a "definitive" study showing Wal-Mart raised the total number of jobs in a community.
As has been previously reported, politician's offices are heavy users of the system. Former Montana Sen. Conrad Burns' office, for example, apparently changed one critical paragraph headed "A controversial voice" to "A voice for farmers," with predictably image-friendly content following it.
Perhaps interestingly, many of the most apparently self-interested changes come from before 2006, when news of the Congressional offices' edits reached the headlines. This may indicate a growing sophistication with the workings of Wikipedia over time, or even the rise of corporate Wikipedia policies.
Wikipedia founder Jimmy Wales told Wired News he was aware of the new service, but needed time to experiment with it before commenting.
The vast majority of changes are fairly innocuous, however. Employees at the CIA's net address, for example, have been busy -- but with little that would indicate their place of apparent employment, or a particular bias.
One entry on "Black September in Jordan" contains wholesale additions, with specific details that read like a popular history book or an eyewitness' memoir.
Many more are simple copy edits, or additions to local town entries or school histories. One CIA entry deals with the details of lyrics sung in a Buffy the Vampire Slayer episode.
Griffith says he launched the project hoping to find scandals, particularly at obvious targets such as companies like Halliburton. But there's a more practical goal, too: By exposing the anonymous edits that companies such as drugs and big pharmaceutical companies make in entries that affect their businesses, it could help experts check up on the changes and make sure they're accurate, he says.
For now, he has just scratched the surface of the database of millions of entries. But he's putting it online so others can look too.
The nonprofit Wikimedia Foundation, which runs Wikipedia, did not respond to e-mail and telephone inquiries Monday.