Slashdot Mirror


Xerox Photocopiers Randomly Alter Numbers, Says German Researcher

First time accepted submitter sal_park writes "According to a report from German computer scientist D. Kriesel, some Xerox WorkCentre copiers and scanners may alter numbers that appear in scanned documents. Having analyzed the output of two such devices, the Xerox WorkCentre 7535 and 7556, Kriesel found that "patches of the pixel data are randomly replaced in a very subtle and dangerous way": in particular, some numbers appearing in a document may be replaced by other numbers when it is scanned."

38 of 290 comments (clear)

  1. These numbers are not the true numbers by hawkinspeter · · Score: 4, Funny

    So, it has come to this.

    --
    You're a temporary arrangement of matter sliding towards oblivion in a cold, uncaring universe
    1. Re: These numbers are not the true numbers by rickb928 · · Score: 4, Funny

      The Dark Lord uses SAP to interact with our world. You know nothing?

      --
      deleting the extra space after periods so i can stay relevant, yeah.
    2. Re:These numbers are not the true numbers by Zaatxe · · Score: 5, Insightful

      Too much XKCD?

      There is no such thing as "too much XKCD".

      --
      So say we all
    3. Re:These numbers are not the true numbers by davidbrit2 · · Score: 4, Funny

      Maybe before we rush to adopt XKCD, we should stop to consider the consequences of blithely giving this technology such a central position in our lives.

  2. Slashdot affected as well by Anonymous Coward · · Score: 5, Funny

    Kriesel found that âoepatches of the pixel data are randomly replaced in a very subtle and dangerous wayâ

    Slashdot users are advised not to use Xerox copiers for submissions.

    1. Re:Slashdot affected as well by J'raxis · · Score: 4, Informative

      That bug is caused by Slashdot still refusing to implement this 20-year-old technology. I mean, this being some sort of cutting-edge tech blog and all, who'd expect them to properly support a character-encoding technology that came out two decades ago?

    2. Re:Slashdot affected as well by intermodal · · Score: 5, Funny

      Especially with such an international audience.

      --
      In SOVIET RUSSIA... erm...NSA AMERICA, the Internet logs onto YOU!
    3. Re:Slashdot affected as well by Mr+Z · · Score: 4, Informative

      No, just significantly harder to filter effectively. Also, there were a rash of troll accounts with names that looked like the various Slashdot editors, only using accented variants of letters, such as 'tÍmothy'. All those shenanigans added up to where we are today.

    4. Re:Slashdot affected as well by Anonymous Coward · · Score: 4, Insightful

      Especially with such an international audience.

      You must have missed the memo. Slashdot is a US site that tolerates international visitors. These are not, however, encouraged to return.

    5. Re:Slashdot affected as well by tibit · · Score: 4, Informative

      Just in case people miss the obvious: The differing opening and closing quotes are the correct punctuation marks. It was only due to the typewriters and teletypes that the mangling into one quote has begun. The MS Office quotes are not "smart", they are merely correct.

      --
      A successful API design takes a mixture of software design and pedagogy.
    6. Re:Slashdot affected as well by J'raxis · · Score: 4, Informative

      The typo in the article evidences that they were using UTF-8. If a quotation mark is turned into three separate characters, that's the tell-tale that it was UTF-8 (multibyte) and not a Windows code page (all single-byte encodings).

    7. Re:Slashdot affected as well by nmb3000 · · Score: 5, Funny

      "Smart parentheses" add no value to a document, either. They're just fluff. We should start using | for both opening and closing parentheses, no?

      Wow, you've somehow managed to make Lisp even more difficult to read

      |defun proj |y x||+|*|flet ||ip |x y||sum |* x y|||||* |/|ip x y||ip x x||x||x|y||

      Congratulations are in order, but I'm sure people will still keep using it :|

      --
      "What do you despise? By this are you truly known." --Princess Irulan, Manual of Muad'Dib
      /)
  3. oh man, what a mess by Trepidity · · Score: 5, Informative

    Some of these machines have been used for digitizing documents whose originals were later shredded, so some people now have subtly wrong "original" digitals. It's particularly problematic because of the nature of degradation; usual lossy degradation of images is in a non-semantic way, just produces blurring or blocking or other kinds of artifacts, not OCR-error style mistakes.

    The issue here seems to be the lossy mode of JBIG2, which tries to find patches of the image that approximately match, and consolidates them. The idea seems to be that if the letter "e" appears 5000 times in a document in the same typeface, you just store some version of it once, and then reference it everywhere it appears. But now you get OCR-style errors, if you end up matching some patches to incorrect partners. You have your lightly printed "8" replaced by the "0" patch now and then, that kind of thing. And unlike people doing OCR, who know they need to take this into account, the operators of these machines likely had no idea this was even a possible failure mode to watch for, so who knows how many numbers are wrong in miscellaneous documents (letters are a little less problematic, because most random letter mutations don't destroy meaning).

    Blargh.

    1. Re:oh man, what a mess by Trepidity · · Score: 5, Informative

      Yeah, it's not OCR per se, but it operates on a somewhat similar principle to OCR, identifying which numbers are which and consolidating things it thinks are the same glyph. I agree it's much worse, because it alters the actual image. And it does so in a way that still looks plausible and "clean". Really bad lossy compression that just produced a lot of artifacts so that certain numbers were unreadable would at least telegraph that you shouldn't trust the result, but the numbers here look clean and artifact-free, they just happen to be wrong.

    2. Re:oh man, what a mess by iguana · · Score: 4, Insightful

      Could also be a problem with an overly aggressive hole filling algorithm. http://www.mathworks.com/help/images/ref/imfill.html

      I'd expect there's nothing nefarious going on. It's very likely an overly aggressive image processing algorithm.

    3. Re:oh man, what a mess by Trepidity · · Score: 4, Informative

      Ran some numbers to check, and with some assumptions your estimate seems pretty close.

      The modern standard "postscript point" is 1/72 in, so a 7-point font has a height 7/72 inches. The stroke distinguishing the 6 from the 8 is maybe 1/4 of the height, so let's say ~0.025 inches. If the print/scan cycle roundtrips at somewhere in the range 75-150 dpi, that's 2-4 pixels. If you can manage a professional-standard 300 dpi, you get more like 7-8 pixels, but that's a fairly optimistic case.

    4. Re:oh man, what a mess by Anonymous Coward · · Score: 5, Interesting

      While it isn't nefarious so far as a deliberate plot to destroy documents and their integrity, it is a bug that is of concern for those who want to preserve documents for long-term storage in an archival situation.... such as was the case with the architectural documents being scanned.

      Keep in mind that in some archival situations, the original paper documents are destroyed where the scanned versions in these files are all that remains of those documents. Ultimately, by having the numbers change like this, regardless of why it is happening, now throws serious doubt as to the validity of any of the numbers in that document. This can have an enormous set of consequences if you are using this scanned document as a receipt, for banking purposes (aka the check amount might have a different number than was originally used) or other similar kinds of situations. Engineering offices, banks, and a great many other businesses are shredding mountains of paper and archiving those documents electronically, so it is a big deal.

      I guess it really boils down to understanding the limitations of compression algorithms, and not buy into the hype that a vendor might have where you can save all kinds of storage space with this incredible algorithm.... and find out that all of your documents are worthless when you try to submit them to a judge & jury in a lawsuit as evidence. Perhaps an engineer needs to find the dimensions and tolerance limits of a bolt in an obscure subsystem... and the numbers change? Do you really want to fly in an airplane where the parts specifications have changed because of an error like this? Do you mind if a few hundred or even thousand dollars are taken out of your bank account that you didn't authorize?

    5. Re:oh man, what a mess by Hatta · · Score: 5, Funny

      That's what she said.

      --
      Give me Classic Slashdot or give me death!
    6. Re:oh man, what a mess by Trepidity · · Score: 4, Interesting

      It could just be a particularly poor JBIG implementation: the format and decompressor is standardized, but the standard doesn't specify how to find the matches, so various companies have their own proprietary versions.

  4. JBIG2 by Anonymous Coward · · Score: 5, Insightful

    Caused by misconfigured JBIG2 compression. When pixel error rate is low enough, similar looking features get printed with the same subimage.

  5. Re:Mission Impossible 4? by Entropius · · Score: 5, Funny

    That's Xenu, not Xerox.

  6. Re:Some image smoothing algorithm... by Sponge+Bath · · Score: 4, Informative

    This is not smoothing, distortion or individual pad pixels. Entire image patches are copied incorrectly, essentially repeating a scanned section containing one number over another part of the image containing a different number.

  7. Re:Anti-counterfeiting by J'raxis · · Score: 4, Insightful

    Maybe you should read the article.

  8. Re:Really? by Sponge+Bath · · Score: 5, Insightful

    Scanning an article without comprehension and your complaining about your misinterpretation. Really?

  9. Re:Really? by fuzzyfuzzyfungus · · Score: 5, Informative

    Scanning 7pt text at 200dpi with consumer level scanner technology and you're complaining about scan errors. Really?

    These 'errors' are substantially worse than ordinary scanner suckitude or lossy-compression legovision: JBIG2's pixel-block matching creates the potential for a block containing one character to be mis-identified and replaced with a block containing a different character.

    The replaced character will be exactly as legible as text elsewhere on the page, just entirely incorrect.

    If it were just the scan quality being lousy, or somebody turning, say, JPEG compression up to the point of pain, mangled characters would be obviously mangled. Not as good as being legible; but the issue is obvious. In this case, the errors will look as good as the rest of the document.

  10. see the Xerox user manual by mejustme · · Score: 5, Informative

    Quote: "Normal/Small produces small files by using advanced compression techniques. Image quality is acceptable but some quality degradation and character substitution errors may occur with some originals"

    Source: http://www.cs.unc.edu/cms/help/help-articles/files/xerox-copier-user-guide.pdf

    1. Re:see the Xerox user manual by Atzanteol · · Score: 5, Insightful

      That's "Normal" quality? That could be *very* misleading. If you have an option that has negative side-effects such as this then the option should be titled something to indicate the risk - "Super-compressed", "dangerously small" or the like.

      Though I'm surprised Xerox would even allow such a compression if such an obvious issue occurs. People would expect image quality to suffer - but full character substitution?

      --
      "Ignorance more frequently begets confidence than does knowledge"

      - Charles Darwin
    2. Re:see the Xerox user manual by Rob+the+Bold · · Score: 4, Informative

      Very interesting find, although that warning only appears in the "Fax" section of the manual, and not in the "Copy" or "Workflow Scanning" sections.

      AND I'd be wrong, it's in all three sections. Ctrl-F'ing in Ocular only finds "character substitution" when the words are side-by-side, not split by a line break as they appear in the copying and scanning sections.

      That's way worse. Xerox knows about this, and just puts in a little note, rather than a big old: "WARNING: Normal/Small mode may produce undetectable text errors."

      And that type of warning should be defined in the beginning of the manual as "operations that may cause data transcription errors resulting in financial harm, damage to property, injury or death".

      --
      I am not a crackpot.
    3. Re:see the Xerox user manual by Rob+the+Bold · · Score: 5, Insightful

      Seems a little dangerous for that algorithm to be the default, doesn't it? Plus, burying the warning deep in the documentation.

      And an insufficient warning, at that.

      Something more like:

      Normal/Small Mode may not be suitable for documents where faithful reproduction of the original text, numbers or illustrations is critical. Examples would include legal documents (contracts, wills, articles of incorporation, etc.), medical documents (patient charts, orders, medication lists, etc.), financial documents (bills, invoices, statements, reconciliations, etc.), business documents (HR records, meeting minutes, memoranda, etc.), engineering documents (drawings, plans, change orders, instructions, bills of material, etc.) or any other document where incorrect data could result in financial loss, injury, death, property damage or destruction, legal liability, loss of reputation or other harm. These examples should not be considered an exhaustive list of documents not suited for scanning, copying or faxing using Normal/Small mode.

      would be more appropriate.

      --
      I am not a crackpot.
  11. Free Speech by BradyB · · Score: 4, Funny

    Hey, even photo copiers and faxes need freedom of speech.

    --

    Good is never enough, when you dream of being great!
  12. Re:Really? by xaxa · · Score: 4, Informative

    Scanning 7pt text at 200dpi with consumer level scanner technology and you're complaining about scan errors. Really?

    Consumer level? This isn't a home, or even home-office, machine. It's sold on the website under the office section.

  13. Known Xerox Issue..... in documentation by Anonymous Coward · · Score: 5, Informative

    If you read the documentation from XEROX... it claims that on scanning it is a known problem that "Image quality is
    acceptable but some quality degradation and character substitution errors may occur with some
    originals." page 107 from http://www.cs.unc.edu/cms/help/help-articles/files/xerox-copier-user-guide.pdf

    also on page 129 we have the following: "Quality / File Size
    The Quality / File Size settings allow you to choose
    between scan image quality and file size. These settings
    allow you to deliver the highest quality or make smaller
    files. A small file size delivers slightly reduced image quality
    but is better when sharing the file over a network. A larger
    file size delivers improved image quality but requires more
    time when transmitting over the network. The options are:
      Normal/Small produces small files by using advanced
    compression techniques. Image quality is acceptable but some quality degradation and character
    substitution errors may occur with some originals."

    1. Re:Known Xerox Issue..... in documentation by Chris+Mattern · · Score: 4, Insightful

      Now the question becomes: what moron made this setting the default? Maybe a setting that can undetectably corrupt your data can be provided if appropriate warnings are given, but it sure as hell should never be the default. I would've thought that was obvious.

  14. Re:Really? by UnknowingFool · · Score: 4, Informative

    If you read the article you would see it's not a simple case of scan error where a "13" appears blurry and looks like "B". Whole numbers are changed: 21.11--> 17.43. This is a major issue if it was on a construction drawing for example. A beam 4m too short would be a problem. Even if caught the engineer signing off might have to go through a whole audit process.

    --
    Well, there's spam egg sausage and spam, that's not got much spam in it.
  15. Self-Correcting Bug by JeanCroix · · Score: 4, Funny

    I printed out the article in order to hang it on the wall above my office's Workcentre as a warning to coworkers. But apparently printing it fixed the problem, because the article headline became:

    "Xerox scanners/photocopiers Scan Documents Flawlessly and are the Best in the Industry"

  16. Re:Anti-counterfeiting by Anubis+IV · · Score: 5, Informative

    That's all I did, and I learned what they were talking about pretty quickly.

    It's actually pretty insane. They had architectural diagrams that had the square meters for the rooms copy/pasted by the scanner into other rooms. For instance, here were the room sizes for the three rooms on the diagram as reported on the original diagram and various scans of it (I've bolded incorrect values):
    Original Diagram: 14.13m^2, 21.11m^2, 17.42m^2
    Xerox WorkCentre 7335 scan: 14.13m^2, 14.13m^2, 14.13m^2
    Xerox WorkCenter 7556 scan 1: 14.13m^2, 14.13m^2, 14.13m^2
    Xerox WorkCenter 7556 scan 2: 17.42m^2, 21.11m^2, 17.42m^2
    Xerox WorkCenter 7556 scan 3: 14.13m^2, 14.13m^2, 17.42m^2

    They have images of this happening. It's just outright substituting blocks of text from one part of a scanned image into an entirely separate part. Not just mangling pixels or uniformly displacing each by a few mm, but outright moving them into a different part of the image that was similar, yet slightly different. Maybe it's some sort of optimization or compression gone wrong? I.e. They detected a block that appeared to be the same as a previous one, so assumed they were the same and only kept one copy of that data?

    It's bizarre.

  17. This is HUGE! by tekrat · · Score: 4, Interesting

    This is how people get shot, because the police are given the wrong address to raid a house. This is how people get foreclosed on because a few account numbers are switched.

    Holy crap. That makes me never want to go near a copier again.

    --
    If telephones are outlawed, then only outlaws will have telephones.
  18. Re:Anti-counterfeiting by Anubis+IV · · Score: 5, Funny

    You came up with the exact same conclusion as the author of the article you just read:

    Hey now, there's no need to accuse me of reading the article just because I looked at the pictures.