GO
Loading...

Copier conundrum: Xerox machines swap numbers during scans

Devin Coldewey, NBC News
Wednesday, 7 Aug 2013 | 10:44 AM ET

A perplexing and potentially very troublesome problem affecting Xerox scanners has been explained and fixed, thanks to some sleuthing by a savvy software engineer—and a bit of viral attention on the Internet.

D. Kriesel, a German Ph.D. student studying computational geometry, encountered a strange problem when scanning a blueprint on a common Xerox office scanner. The numbers denoting the square footage of rooms were totally wrong, and what's more, they changed when he scanned the blueprint again.

Intrigued, Kriesel tried scanning a table of costs and figures. Numbers changed again—but not wildly, just by a little bit: 54.60 became 54.80, for instance. And it wasn't just a blurry scan or a misplaced pixel—these were fully formed, unmistakable characters.

This was concerning. Kriesel wrote a blog post about it (after alerting Xerox), and before long the post was being linked to by others who had encountered or could reproduce the problem. Xerox, he said, thought at first that he was just playing a joke. And indeed, the whole situation has an air of satire—machines switching digits around just to vex their humans.

But it was real, and looking at all the data, it quickly became clear what the culprit was: an image compression algorithm called JBIG2, built into the scanner as the "normal" quality option for those who wanted to save a bit of space on their hard drive (versus "high" and "higher," which made for much bigger files).

(Read more: How Obama's veto of iPhone ban will change tech wars)

Unlike an analog photocopier, or a digital one that simply records the black-and-white values of pixels, JBIG2 examines the whole image and finds pieces that are highly similar, replacing them with a sort of clone-stamped version that saves space. Examples of such pieces of an image might be the pattern on some wallpaper or the top of a fence—or, as it turns out, small letters and numbers that look similar, like 6s and 8s.

Perhaps not a big deal when it causes a difference of 20 cents, though you can imagine the havoc if something was off by 20 meters or $20,000—and if the problem has existed for years, that may just have happened.

But it isn't a bug, just a poorly done feature, Kriesel emphasized in an email to NBC News. "The algorithm itself is common, yet, like all algorithms, it can be parametrized well or badly." In fact, the JBIG2 website mentions such errors as a potential problem if it isn't set up right.

There's even a warning that the quality setting might cause "character substitution errors," but this small warning only appears once, when configuring the scanner via a Web interface.

(Read more: Promoted posts are key to drive mobile revenue: LinkedIn CEO)

"Remember, these are business machines," wrote Kriesel. "Their settings are likely to be changed, and besides the small notice in the admin panel, the machine doesn't tell anybody later on it will possibly be mangling numbers."

Hackers and software engineers deplored this use of a totally unsuitable algorithm for document scanning, and the blog post garnered hundreds of thousands of hits.

Read more from NBC News:
Obama urges shutting down Freddie Mac and Fannie Mae
Chevy Volt drops $5K as auto industry aims to boost electric
Not free, after all: Public school fees add up

Before long, Xerox realized it was no joke. The company organized a call with Kriesel, which he recounted on his blog in detail. The Xerox representatives agreed that the warning was too easy to miss considering the potential consequences, and that perhaps the algorithm itself could be tweaked to avoid this kind of thing. They published a post on the official Xerox blog to acknowledge the issue, and may update the scanners' software or at least make the warning more obvious.

In the end, the mystery was solved without too much damage done—at least that anyone knows of. But if you use a scanner frequently for documents (Xerox or otherwise), it might be wise to double-check those settings.

—Devin Coldewey, NBC News.

Featured

Contact Technology

  • CNBC NEWSLETTERS

    Get the best of CNBC in your inbox

    › Learn More
  • Matt Hunter is the senior technology editor at CNBC.com.

  • Cadie Thompson is a tech reporter for the Enterprise Team for CNBC.com.

  • Working from Los Angeles, Boorstin is CNBC's media and entertainment reporter and editor of CNBC.com's Media Money section.

  • Jon Fortt is an on-air editor. He covers the companies, start-ups, and trends that are driving innovation in the industry.

  • Lipton is CNBC's technology correspondent, working from CNBC's Silicon Valley bureau.

  • Mark is CNBC's Silicon Valley/San Francisco Bureau Chief covering technology and digital media.