A perplexing and potentially very troublesome problem affecting Xerox scanners has been explained and fixed, thanks to some sleuthing by a savvy software engineer—and a bit of viral attention on the Internet.
D. Kriesel, a German Ph.D. student studying computational geometry, encountered a strange problem when scanning a blueprint on a common Xerox office scanner. The numbers denoting the square footage of rooms were totally wrong, and what's more, they changed when he scanned the blueprint again.
Intrigued, Kriesel tried scanning a table of costs and figures. Numbers changed again—but not wildly, just by a little bit: 54.60 became 54.80, for instance. And it wasn't just a blurry scan or a misplaced pixel—these were fully formed, unmistakable characters.
This was concerning. Kriesel wrote a blog post about it (after alerting Xerox), and before long the post was being linked to by others who had encountered or could reproduce the problem. Xerox, he said, thought at first that he was just playing a joke. And indeed, the whole situation has an air of satire—machines switching digits around just to vex their humans.
But it was real, and looking at all the data, it quickly became clear what the culprit was: an image compression algorithm called JBIG2, built into the scanner as the "normal" quality option for those who wanted to save a bit of space on their hard drive (versus "high" and "higher," which made for much bigger files).
(Read more: How Obama's veto of iPhone ban will change tech wars)
Unlike an analog photocopier, or a digital one that simply records the black-and-white values of pixels, JBIG2 examines the whole image and finds pieces that are highly similar, replacing them with a sort of clone-stamped version that saves space. Examples of such pieces of an image might be the pattern on some wallpaper or the top of a fence—or, as it turns out, small letters and numbers that look similar, like 6s and 8s.
Perhaps not a big deal when it causes a difference of 20 cents, though you can imagine the havoc if something was off by 20 meters or $20,000—and if the problem has existed for years, that may just have happened.
But it isn't a bug, just a poorly done feature, Kriesel emphasized in an email to NBC News. "The algorithm itself is common, yet, like all algorithms, it can be parametrized well or badly." In fact, the JBIG2 website mentions such errors as a potential problem if it isn't set up right.
There's even a warning that the quality setting might cause "character substitution errors," but this small warning only appears once, when configuring the scanner via a Web interface.
"Remember, these are business machines," wrote Kriesel. "Their settings are likely to be changed, and besides the small notice in the admin panel, the machine doesn't tell anybody later on it will possibly be mangling numbers."
Hackers and software engineers deplored this use of a totally unsuitable algorithm for document scanning, and the blog post garnered hundreds of thousands of hits.
Before long, Xerox realized it was no joke. The company organized a call with Kriesel, which he recounted on his blog in detail. The Xerox representatives agreed that the warning was too easy to miss considering the potential consequences, and that perhaps the algorithm itself could be tweaked to avoid this kind of thing. They published a post on the official Xerox blog to acknowledge the issue, and may update the scanners' software or at least make the warning more obvious.
In the end, the mystery was solved without too much damage done—at least that anyone knows of. But if you use a scanner frequently for documents (Xerox or otherwise), it might be wise to double-check those settings.
—Devin Coldewey, NBC News.