Slashdot Mirror


Xerox Photocopiers Randomly Alter Numbers, Says German Researcher

First time accepted submitter sal_park writes "According to a report from German computer scientist D. Kriesel, some Xerox WorkCentre copiers and scanners may alter numbers that appear in scanned documents. Having analyzed the output of two such devices, the Xerox WorkCentre 7535 and 7556, Kriesel found that "patches of the pixel data are randomly replaced in a very subtle and dangerous way": in particular, some numbers appearing in a document may be replaced by other numbers when it is scanned."

290 comments

  1. These numbers are not the true numbers by hawkinspeter · · Score: 4, Funny

    So, it has come to this.

    --
    You're a temporary arrangement of matter sliding towards oblivion in a cold, uncaring universe
    1. Re:These numbers are not the true numbers by durrr · · Score: 3, Funny

      The dark lord is touching the world, and he's doing it through photocopy machines.

      I would've expected printers or those cheap ISP-provided routers to be his preferred way of evildoing, though I guess even he/it couldn't get those to work properly.

    2. Re:These numbers are not the true numbers by somersault · · Score: 1

      What exactly are you referring to with your "they" and "their"? Because his post was implying that a bunch of Europeans imported a bunch of Africans. Though actually the Europeans constitute a fair amount of diversity too, so bringing slavery into it was just trolling.

      --
      which is totally what she said
    3. Re:These numbers are not the true numbers by Joce640k · · Score: 1, Informative

      Too much XKCD?

      https://xkcd.com/1022/

      --
      No sig today...
    4. Re:These numbers are not the true numbers by Anonymous Coward · · Score: 0

      The Dark One was bound outside of time at Shayol Guhl at the moment of creation

    5. Re: These numbers are not the true numbers by rickb928 · · Score: 4, Funny

      The Dark Lord uses SAP to interact with our world. You know nothing?

      --
      deleting the extra space after periods so i can stay relevant, yeah.
    6. Re:These numbers are not the true numbers by Zaatxe · · Score: 5, Insightful

      Too much XKCD?

      There is no such thing as "too much XKCD".

      --
      So say we all
    7. Re:These numbers are not the true numbers by nedlohs · · Score: 1

      They and their are both "you Muricans".

    8. Re:These numbers are not the true numbers by davidbrit2 · · Score: 4, Funny

      Maybe before we rush to adopt XKCD, we should stop to consider the consequences of blithely giving this technology such a central position in our lives.

    9. Re:These numbers are not the true numbers by somersault · · Score: 1

      Well the most ignorant and opinionated Americans often make a massive deal out of patriotism and their history - so I think slapping them in the face with it usually works pretty well in one form or another :p

      --
      which is totally what she said
    10. Re: These numbers are not the true numbers by riT-k0MA · · Score: 1

      Quickly! we must train you to seize the Java Source so that you may defeat The Dark One.

    11. Re:These numbers are not the true numbers by Anonymous Coward · · Score: 0

      You must be new here.

    12. Re: These numbers are not the true numbers by Anonymous Coward · · Score: 0

      These are not the numbers you are looking for. You can go about your business. Move along.

    13. Re:These numbers are not the true numbers by maxwells_deamon · · Score: 1

      So it has come to this!

  2. They hashed the scanned image blocks? by Anonymous Coward · · Score: 0

    OOPS

  3. The Pentium Bug strikes again by Anonymous Coward · · Score: 1

    Now, in a more subtle way.

    1. Re:The Pentium Bug strikes again by stjobe · · Score: 2

      Ah, my favourite Star Trek / Computer nerd pastiche:

      "I am Pentium of Borg. Division is futile, you will be approximated".

      Caused me endless mirth in the early nineties - and still does, although these day it's nostalgic more than funny.

      --
      "Total destruction the only solution" - Bob Marley
    2. Re:The Pentium Bug strikes again by PReDiToR · · Score: 1

      I was very pleased with myself when I came up with the line:

      "We are tech support. You will be assisted. Impatience is futile."

      Still am.

      --

      Do not meddle in the affairs of geeks for they are subtle and quick to anger
  4. Slashdot affected as well by Anonymous Coward · · Score: 5, Funny

    Kriesel found that âoepatches of the pixel data are randomly replaced in a very subtle and dangerous wayâ

    Slashdot users are advised not to use Xerox copiers for submissions.

    1. Re:Slashdot affected as well by J'raxis · · Score: 4, Informative

      That bug is caused by Slashdot still refusing to implement this 20-year-old technology. I mean, this being some sort of cutting-edge tech blog and all, who'd expect them to properly support a character-encoding technology that came out two decades ago?

    2. Re:Slashdot affected as well by intermodal · · Score: 5, Funny

      Especially with such an international audience.

      --
      In SOVIET RUSSIA... erm...NSA AMERICA, the Internet logs onto YOU!
    3. Re:Slashdot affected as well by Anonymous Coward · · Score: 0

      I am not sure what language this site uses, but this may be useful:
      http://stackoverflow.com/questions/9394210/smart-quotes-not-converting-properly-into-utf8

    4. Re:Slashdot affected as well by Anonymous Coward · · Score: 0

      They used to support unicode. But people used to make goatse "ascii" art from the unicode characters.

    5. Re:Slashdot affected as well by Anonymous Coward · · Score: 0

      I mean, this being some sort of cutting-edge tech blog

      Slashdot is a news service, not a blog!

      Anyway, completely agree with the Unicode thing. Millions of websites implement full character support - it's not that hard or expensive to implement.

    6. Re:Slashdot affected as well by Anonymous Coward · · Score: 0

      So? is that somehow more offensive than goatse "art" in normal ascii characters?

    7. Re:Slashdot affected as well by Mr+Z · · Score: 4, Informative

      No, just significantly harder to filter effectively. Also, there were a rash of troll accounts with names that looked like the various Slashdot editors, only using accented variants of letters, such as 'tÍmothy'. All those shenanigans added up to where we are today.

    8. Re:Slashdot affected as well by Anonymous Coward · · Score: 0

      Meanwhile, that whooshing sound you just heard was caused by the joke sailing over your head. ;)

    9. Re:Slashdot affected as well by xaxa · · Score: 1

      No, just significantly harder to filter effectively. Also, there were a rash of troll accounts with names that looked like the various Slashdot editors, only using accented variants of letters, such as 'tÍmothy'. All those shenanigans added up to where we are today.

      So filter usernames and email addresses for ASCII, perhaps filter comments for UTF8 basic type 'Graphic' and \n.

      Problem solved? http://slashdot.jp/ supports Unicode.

    10. Re:Slashdot affected as well by NatasRevol · · Score: 2

      Sorry, but you have that exactly backwards.

      Online publishing is a blight on smart quotes.

      If your publishing can't handle smart quotes, then stop publishing. All they are is a different character. Deal with it properly or GTFO.

      --
      There are two types of people in the world: Those who crave closure
    11. Re:Slashdot affected as well by slashmydots · · Score: 0

      Because some people are viewing with open source, half working, alpha release hippie browsers that don't support UTF-8 so they can't implement it.

    12. Re:Slashdot affected as well by NatasRevol · · Score: 1

      That sounds like a whole lot of whining on the editors' part.

      --
      There are two types of people in the world: Those who crave closure
    13. Re:Slashdot affected as well by Mr+Z · · Score: 1

      Quite possibly. *shrug* I find it very difficult to actually care.

    14. Re:Slashdot affected as well by operagost · · Score: 2

      ISO should release a UTF-8.1 standard. They'll all adopt it immediately.

      "My browser uses UTF-8.1. You probably haven't heard of it."

      "I used UTF-8.1 before it was cool."

      --

      Gamingmuseum.com: Give your 3D accelerator a rest.
    15. Re:Slashdot affected as well by Anonymous Coward · · Score: 0

      Kriesel found that âoepatches of the pixel data are randomly replaced in a very subtle and dangerous wayâ

      Slashdot users are advised not to use Xerox copiers for submissions.

      LOL! You should be working with me with your humor. What do I do? Air traffic control.

      Nicely done

    16. Re:Slashdot affected as well by Anonymous Coward · · Score: 0

      Actually the goatse ascii art is done completely in ASCII. I've never seen a UTF-8 version of the goatman.

    17. Re:Slashdot affected as well by nospam007 · · Score: 1

      "Slashdot users are advised not to use Xerox copiers for submissions."

      Imagine what Excel, rounding problems and this copy-machine could do to the economy of your enemies.

    18. Re:Slashdot affected as well by Anonymous Coward · · Score: 0

      oh yeah, my browser uses UTF-9!

    19. Re:Slashdot affected as well by zieroh · · Score: 1

      Sorry, but you have that exactly backwards.

      Online publishing is a blight on smart quotes.

      If your publishing can't handle smart quotes, then stop publishing. All they are is a different character. Deal with it properly or GTFO.

      Why bother? Smart quotes add no value to the document. They are just fluff. I would think that any self-respecting slashdot reader would immediately see through such silliness.

      Oh, wait...

      --
      People who say "sheeple" have about as much sophistication as an AOL user, and in fact are probably actually AOL users.
    20. Re:Slashdot affected as well by gl4ss · · Score: 1

      Actually the goatse ascii art is done completely in ASCII. I've never seen a UTF-8 version of the goatman.

      yeah.. and what matters to ascii art is getting a monospace font. and they do provide that. so what the fuck. at least have them on the story submissions.
       

      /xenu\. . . . .
      \rulz/ ' ' ' ' '

      --
      world was created 5 seconds before this post as it is.
    21. Re:Slashdot affected as well by omnichad · · Score: 1

      And how will they keep up with making all the dupe posts? The summary is supposed to be grossly wrong here anyway.

    22. Re:Slashdot affected as well by stewsters · · Score: 0

      It's not a valid ASCII character. Windows-1251 character codes do not belong on the internet. Many people now a days don't use windows.
      Look at your smartphone. Does it say "win8" on it? If it does, I'm sorry. Nothing I can say will help you now. If it doesn't then you are affected by this.

      Ellipses, smart quotes, en dash, em dash and non breaking spaces should either use their plain ASCII alternatives, html character codes, or UTF-8. Don't keep using Windows-1251.

    23. Re:Slashdot affected as well by omnichad · · Score: 1

      It actually looks better. We shouldn't go back 100 years or more on typography just to placate technical users who can't be bothered to deal with it. They're only "smart" quotes if the publishing software picks the left or right quotation mark for you automatically. They are just standard left quote and right quote characters otherwise. Characters that weren't on early computers because we were so bit-frugal, not because they weren't already in use in typography.

      The only time it's a bad thing is if you're trying to copy snippets of code from a Wordpress blog. And a lot of people on Wordpress are publishing without realizing that quotes are converted and don't have any plugin installed to allow code blocks that go unchanged.

      Maybe you just need to get/make a clipboard utility that will correct this for you if it bothers you so much.

    24. Re:Slashdot affected as well by Minwee · · Score: 1

      If that technology is too arcane, perhaps this helpful tool might be useful.

      On the other hand, it might backfire and wipe out half of the site's users, so maybe that's not such a good idea.

    25. Re:Slashdot affected as well by NatasRevol · · Score: 1

      Yeah, it's not a Windows only thing.

      It's a UTF thing.

      http://www.cl.cam.ac.uk/~mgk25/ucs/quotes.html

      It belongs on the internet, regardless of your opinion.

      Here's how to even do them in HTML! Gasp!

      http://www.dwheeler.com/essays/quotes-in-html.html

      --
      There are two types of people in the world: Those who crave closure
    26. Re:Slashdot affected as well by Anonymous Coward · · Score: 4, Insightful

      Especially with such an international audience.

      You must have missed the memo. Slashdot is a US site that tolerates international visitors. These are not, however, encouraged to return.

    27. Re:Slashdot affected as well by dolmen.fr · · Score: 3, Informative

      Slashdot uses Perl which is the programming language that has the best support for Unicode (while PHP support for this is comparatively almost inexistent).
      But that doesn't make Unicode work magically. The slashcode has to take it into account.

    28. Re:Slashdot affected as well by Anonymous Coward · · Score: 0

      Microsoft can afford to use the same character for the apostrophe and for the closing single quote because they can afford the 1-engineer month it takes to implement an algorithm that can distinguish them with 99.9% accuracy based on context. Rendering the damn character is easy, but other word processors are fucked, so no one should play along by solidarity.

    29. Re:Slashdot affected as well by tibit · · Score: 4, Informative

      Just in case people miss the obvious: The differing opening and closing quotes are the correct punctuation marks. It was only due to the typewriters and teletypes that the mangling into one quote has begun. The MS Office quotes are not "smart", they are merely correct.

      --
      A successful API design takes a mixture of software design and pedagogy.
    30. Re:Slashdot affected as well by Anonymous Coward · · Score: 1

      You must have missed the memo. Slashdot is a US site that depends upon sarcasm for its survival. Those who do not have a sense of humour are not, however, encouraged to return.

    31. Re:Slashdot affected as well by dmbasso · · Score: 1

      I know you aimed at being sarcastic, but I bet the number of non-US users is quite significant. Regardless, your post would benefit of using the U+2e2e character.

      --
      `echo $[0x853204FA81]|tr 0-9 ionbsdeaml`@gmail.com
    32. Re:Slashdot affected as well by J'raxis · · Score: 2

      Yes, professional-looking typography is such a blight. Instead we should use kludges invented for typewriters and held over since the 1960s in computer charsets because of 7-bit character size limitations.

      Perhaps we should go back to using 'O' for '0' and 'l' for '1', too.

    33. Re:Slashdot affected as well by Anonymous Coward · · Score: 0

      There were both acsii and utf-8 versions. There were other things being done, too. The goatse one was the one I happened to remember.

    34. Re:Slashdot affected as well by J'raxis · · Score: 2

      "Smart parentheses" add no value to a document, either. They're just fluff. We should start using | for both opening and closing parentheses, no? We could even use the same symbol in place of "smart brackets" and "smart braces."

    35. Re:Slashdot affected as well by J'raxis · · Score: 4, Informative

      The typo in the article evidences that they were using UTF-8. If a quotation mark is turned into three separate characters, that's the tell-tale that it was UTF-8 (multibyte) and not a Windows code page (all single-byte encodings).

    36. Re:Slashdot affected as well by Plumpaquatsch · · Score: 1

      Sorry, but you have that exactly backwards.

      Online publishing is a blight on smart quotes.

      If your publishing can't handle smart quotes, then stop publishing. All they are is a different character. Deal with it properly or GTFO.

      Why bother? Smart quotes add no value to the document. They are just fluff. I would think that any self-respecting slashdot reader would immediately see through such silliness.

      Oh, wait...

      Hell yeah - if it can't be displayed on a teletype, it's not worth reading.

      --
      Of course news about a fake are Fake News.
    37. Re:Slashdot affected as well by J'raxis · · Score: 1

      Based on the quality of some of the articles that appear on here nowadays, no, it's a blog. Or maybe it's an advertising venue.

    38. Re:Slashdot affected as well by dnaumov · · Score: 1

      That bug is caused by Slashdot still refusing to implement this 20-year-old technology. I mean, this being some sort of cutting-edge tech blog and all, who'd expect them to properly support a character-encoding technology that came out two decades ago?

      Obviously, since Slashdot is cutting-edge, there is no way in hell it's going to be implementing technology that old.

    39. Re:Slashdot affected as well by AmiMoJo · · Score: 1

      Interestingly Slashdot Japan does use UTF-8, so clearly the underlying code can handle it.

      --
      const int one = 65536; (Silvermoon, Texture.cs)
      SJW, n: "Someone I don't like, and by the way I'm a fuckwit" - AC
    40. Re:Slashdot affected as well by Anonymous Coward · · Score: 0

      What is wrong with "..."? It's simple to write, displays everywhere. FFS!

    41. Re:Slashdot affected as well by tlhIngan · · Score: 1

      That bug is caused by Slashdot still refusing to implement this 20-year-old technology. I mean, this being some sort of cutting-edge tech blog and all, who'd expect them to properly support a character-encoding technology that came out two decades ago?

      /. does support Unicode (UTF-8 sucks, btw - it's a compatibility hack). It's just that /. decided to whitelist Unicode characters rather than allow them all, because plenty of people have screwed around with Unicode control characters to completely destroy the page layout.

      Yes, it's one thing that makes Unicode annoying to deal with - the fact you can copy one character of text, but actually end up copying a half dozen or more because Unicode codepoints encompass decorations to said characters as well (many non-printing).

      And yes, UTF-8 is an annoyance to deal with because you can only parse it unidirectionally. If you need to go backwards through text, you're better off converting it first to UTF-16 or UTF-32.

    42. Re:Slashdot affected as well by Samantha+Wright · · Score: 1

      For what little it's worth, the "twenty-year-old" figure is a bit misleading; many software platforms didn't endorse or include Unicode support for another five to ten years after that, when many web technologies and popular operating systems were going through their awkward teenage phases. If UTF-8 really had become popular immediately after it was created, we might have much deeper native Unicode support in everything.

      --
      Bio questions? Ask me to start a Q&A journal. Computer analogies available for most topics!
    43. Re:Slashdot affected as well by nmb3000 · · Score: 5, Funny

      "Smart parentheses" add no value to a document, either. They're just fluff. We should start using | for both opening and closing parentheses, no?

      Wow, you've somehow managed to make Lisp even more difficult to read

      |defun proj |y x||+|*|flet ||ip |x y||sum |* x y|||||* |/|ip x y||ip x x||x||x|y||

      Congratulations are in order, but I'm sure people will still keep using it :|

      --
      "What do you despise? By this are you truly known." --Princess Irulan, Manual of Muad'Dib
      /)
    44. Re:Slashdot affected as well by epine · · Score: 1

      /. does support Unicode (UTF-8 sucks, btw - it's a compatibility hack).

      I was guessing your house wine was UTF-32 even before the last paragraph. Unfortunately it lacks compatibility with the size of existing Google datacenters, though it's nothing that couldn't be solved with more circuitry and a beefier power feed.

      You absolutely can parse UTF-8 backwards: "continuation bytes all have '10' in the high-order position". How much easier does it have to get? Please inform me how your pushmepullyou parsing system is defined such that all code points are pallindromes with no loss of space efficiency.

      Ken Thompson of the Plan 9 operating system group at Bell Labs then made a small but crucial modification to the encoding, making it very slightly less bit-efficient than the previous proposal but allowing it to be self-synchronizing, meaning that it was no longer necessary to read from the beginning of the string to find code point boundaries. Thompson's design was outlined on September 2, 1992, on a placemat in a New Jersey diner with Rob Pike. The following days, Pike and Thompson implemented it and updated Plan 9 to use it throughout, and then communicated their success back to X/Open.

      Good grief, if Thompson and Pike are the scourge if right thinking, our species is doooooomed. However you describe it, the present state of Slashdot's Unicode handling is a disgrace to God, geek, and man.

    45. Re:Slashdot affected as well by Anonymous Coward · · Score: 0

      The one I remember was penisbird, think it used UTF-8

    46. Re:Slashdot affected as well by aliquis · · Score: 1

      They whitelist?

      Yeah, my åäö must be really dangerous Ã¥ÃÃ.

    47. Re:Slashdot affected as well by ais523 · · Score: 1

      INTERCAL uses the same character for opening and closing parenthesis (' or ", the programmer can choose, and occasionally has to mix them to resolve ambiguity). This is not particularly easy to read, although it is normally unambiguous; look at my signature for an example.

      --
      (1)DOCOMEFROM!2~.2'~#1WHILE:1<-"'?.1$.2'~'"':1/.1$.2'~#0"$#65535'"$"'"'&.1$.2'~'#0$#65535'"$#0'~#32767$#1"
    48. Re:Slashdot affected as well by Macgrrl · · Score: 1

      Congratulations are in order, but I'm sure people will still keep using it :|

      Bonus points for the emoticon.

      --
      Sara
      Designer, Gamer, Macgrrl in an XP World
    49. Re:Slashdot affected as well by Anonymous Coward · · Score: 0

      Things that annoy me about myself #2313094:

      |defun proj |y x||+|*|flet ||ip |x y||sum |* x y|||||* |/|ip x y||ip x x||x||x|y||

      is still readable, and I'm more interested in newlines and indentations than the shape of the parens.

      Oh, no, I'm gonna start liking Python now. :-( :-(

    50. Re:Slashdot affected as well by Anonymous Coward · · Score: 0

      8.1?? that's so outdated: http://en.wikipedia.org/wiki/UTF-32

    51. Re:Slashdot affected as well by Reziac · · Score: 1

      Completely OT, when I exercised curiosity about your sig link, it took me to what looked like a malware download page that tried very hard to not let me leave the page. ???

      --
      ~REZ~ #43301. Who'd fake being me anyway?
    52. Re:Slashdot affected as well by rastos1 · · Score: 1

      You know why you know that? Because your /. id has 4 digits. The users with 4 digits in id have seen those shenanigans. Or more likely heard about them from sub-4-digit users when they get high.

    53. Re:Slashdot affected as well by J'raxis · · Score: 1

      Used to be a trustworthy interstitial ad provider. Apparently they no longer are; I switched the link to the direct link now.

    54. Re:Slashdot affected as well by Reziac · · Score: 1

      Ah, good. (And interesting, too)

      --
      ~REZ~ #43301. Who'd fake being me anyway?
    55. Re:Slashdot affected as well by Anonymous Coward · · Score: 0

      I've come across similar problems before, and it seems to me that there should be a simple solution: when UTF-8 was defined, there should also have been a standard mapping defined that maps each UTF-8 character to its closest ASCII equivalent. That would allow graceful conversion of UTF-8 strings to ASCII for near-collision-detection.

      As far as I know, though, no such mapping exists. Is there a reason for this, other than lack of foresight?

    56. Re:Slashdot affected as well by couchslug · · Score: 1

      "Slashdot users are advised not to use Xerox copiers for submissions."

      Yo dawg, we herd you like dupes in your dupes so we put dupes in your dupe so you can dupe while you dupe.

      --
      "This post is an artistic work of fiction and falsehood. Only a fool would take anything posted here as fact."
    57. Re:Slashdot affected as well by Anonymous Coward · · Score: 0

      I believe the use of "smart" in the name of the Office feature refers to the fact that it uses one key to produce both characters depending on context.

  5. oh man, what a mess by Trepidity · · Score: 5, Informative

    Some of these machines have been used for digitizing documents whose originals were later shredded, so some people now have subtly wrong "original" digitals. It's particularly problematic because of the nature of degradation; usual lossy degradation of images is in a non-semantic way, just produces blurring or blocking or other kinds of artifacts, not OCR-error style mistakes.

    The issue here seems to be the lossy mode of JBIG2, which tries to find patches of the image that approximately match, and consolidates them. The idea seems to be that if the letter "e" appears 5000 times in a document in the same typeface, you just store some version of it once, and then reference it everywhere it appears. But now you get OCR-style errors, if you end up matching some patches to incorrect partners. You have your lightly printed "8" replaced by the "0" patch now and then, that kind of thing. And unlike people doing OCR, who know they need to take this into account, the operators of these machines likely had no idea this was even a possible failure mode to watch for, so who knows how many numbers are wrong in miscellaneous documents (letters are a little less problematic, because most random letter mutations don't destroy meaning).

    Blargh.

    1. Re:oh man, what a mess by zAPPzAPP · · Score: 0

      From the article (yeah, i know...):

      "This is not an OCR problem (as we switched off OCR on purpose), it is a lot worse"

      The machines are altering the scanned pictures.
      And they seem to do this in locations where there are numbers in the picture.
      AND they seem to do it so that the altered image still contains numbers at the same location. Just different ones.

    2. Re:oh man, what a mess by Trepidity · · Score: 5, Informative

      Yeah, it's not OCR per se, but it operates on a somewhat similar principle to OCR, identifying which numbers are which and consolidating things it thinks are the same glyph. I agree it's much worse, because it alters the actual image. And it does so in a way that still looks plausible and "clean". Really bad lossy compression that just produced a lot of artifacts so that certain numbers were unreadable would at least telegraph that you shouldn't trust the result, but the numbers here look clean and artifact-free, they just happen to be wrong.

    3. Re:oh man, what a mess by iguana · · Score: 4, Insightful

      Could also be a problem with an overly aggressive hole filling algorithm. http://www.mathworks.com/help/images/ref/imfill.html

      I'd expect there's nothing nefarious going on. It's very likely an overly aggressive image processing algorithm.

    4. Re:oh man, what a mess by Anonymous Coward · · Score: 0, Flamebait

      He said "ocr style" not "ocr". God damn you and everyone else who thinks your reading comprehension problem is someone else's mistake. Wastes so many posts.

      Other posts are right there on your screen. You can read them ten times if that's what it takes for your stupid ass to comprehend what they do and don't say.

      How's it feel to miss something that schoolchildren are expected to get right? Does it make you feel stupid? It should. It really should.

    5. Re:oh man, what a mess by sh00z · · Score: 2, Insightful

      The issue here seems to be the lossy mode of JBIG2

      combined with the fact that he's complaining about errors in scans of a 7-point font. At that size, it probably only takes two erroneous pixels to change a 6 to an 8.

    6. Re:oh man, what a mess by Anonymous Coward · · Score: 1

      According to his later posts, the HTML settings page for the scanner warns that character substitution can occur with the default compression setting. Of course, with it being the default, you'd only ever see that warning if you were going in there to change it to something else...

    7. Re:oh man, what a mess by Trepidity · · Score: 4, Informative

      Ran some numbers to check, and with some assumptions your estimate seems pretty close.

      The modern standard "postscript point" is 1/72 in, so a 7-point font has a height 7/72 inches. The stroke distinguishing the 6 from the 8 is maybe 1/4 of the height, so let's say ~0.025 inches. If the print/scan cycle roundtrips at somewhere in the range 75-150 dpi, that's 2-4 pixels. If you can manage a professional-standard 300 dpi, you get more like 7-8 pixels, but that's a fairly optimistic case.

    8. Re:oh man, what a mess by Anonymous Coward · · Score: 0

      I am not far from Xerox, they are an evil incompetent company that has outsourced everyone and their mother. I have no idea what they really make anymore other than the CEO's ugly mug is constantly sucking BO's C**K. I've interviewed at that hell hole and it was the rudest most annoying interview I've ever been on. Xerox the outsourced affirmative action company.

    9. Re:oh man, what a mess by Anonymous Coward · · Score: 5, Interesting

      While it isn't nefarious so far as a deliberate plot to destroy documents and their integrity, it is a bug that is of concern for those who want to preserve documents for long-term storage in an archival situation.... such as was the case with the architectural documents being scanned.

      Keep in mind that in some archival situations, the original paper documents are destroyed where the scanned versions in these files are all that remains of those documents. Ultimately, by having the numbers change like this, regardless of why it is happening, now throws serious doubt as to the validity of any of the numbers in that document. This can have an enormous set of consequences if you are using this scanned document as a receipt, for banking purposes (aka the check amount might have a different number than was originally used) or other similar kinds of situations. Engineering offices, banks, and a great many other businesses are shredding mountains of paper and archiving those documents electronically, so it is a big deal.

      I guess it really boils down to understanding the limitations of compression algorithms, and not buy into the hype that a vendor might have where you can save all kinds of storage space with this incredible algorithm.... and find out that all of your documents are worthless when you try to submit them to a judge & jury in a lawsuit as evidence. Perhaps an engineer needs to find the dimensions and tolerance limits of a bolt in an obscure subsystem... and the numbers change? Do you really want to fly in an airplane where the parts specifications have changed because of an error like this? Do you mind if a few hundred or even thousand dollars are taken out of your bank account that you didn't authorize?

    10. Re:oh man, what a mess by Hatta · · Score: 5, Funny

      That's what she said.

      --
      Give me Classic Slashdot or give me death!
    11. Re:oh man, what a mess by dj245 · · Score: 3, Interesting

      Ran some numbers to check, and with some assumptions your estimate seems pretty close.

      The modern standard "postscript point" is 1/72 in, so a 7-point font has a height 7/72 inches. The stroke distinguishing the 6 from the 8 is maybe 1/4 of the height, so let's say ~0.025 inches. If the print/scan cycle roundtrips at somewhere in the range 75-150 dpi, that's 2-4 pixels. If you can manage a professional-standard 300 dpi, you get more like 7-8 pixels, but that's a fairly optimistic case.

      Why wouldn't you use at least 300dpi?

      Most "serious" office printers print at 600dpi or better, so the information is there. Even my $100 brother laser printer defaults to 600dpi. Every recent office multifuntion I have seen can scan at 200, 300, or 600dpi, but every single one defaults to 200dpi. 200dpi scans are hard on the eyes. I always scan at 600dpi, the file size isn't bad in the age of 300GB laptop hard drives, and if I need to send it to someone external to the company, I can always reduce the size.

      --
      Even those who arrange and design shrubberies are under considerable economic stress at this period in history.
    12. Re:oh man, what a mess by Anonymous Coward · · Score: 0

      I found his approach amusing: read the article carefully and skim the highest ranked post to find a place where he can "correct" it to appear even more deserving of mod points.

    13. Re:oh man, what a mess by N1AK · · Score: 2

      I have to admit I'm actually really surprised by this. The idea and technology are good but I would think it fundamentally breaks a key feature of digitising a document: removing the need to keep the hard copy. The moment the digitised copy is more than an electronic representation of the physical document then the authenticity of anything in the digitised document is in doubt. Can it really be used to prove what someone read and signed for example, even if the chance of an error in any case is 1/10,000?

    14. Re:oh man, what a mess by N1AK · · Score: 1

      I always scan at 600dpi, the file size isn't bad in the age of 300GB laptop hard drives, and if I need to send it to someone external to the company, I can always reduce the size.

      In which case it begs the question why bother using an algorithm that substitutes in the real content to save space if space isn't an option regardless of what DPI you use? Clearly space saving was a consideration for someone ;)

    15. Re:oh man, what a mess by nine-times · · Score: 1

      Thanks for the quick explanation. This is kind of hilariously unfortunate, since it has the potential to undermine the reliability of lots of documents.

    16. Re:oh man, what a mess by Anonymous Coward · · Score: 0

      I do think you are holding a very naive view of the world. This kind of shit and much worse happens every day. Money corrupts. As engineers need money for themselves and their families, they have become whores to those who control the money.

      Somebody needs to die because of this before anybody important gets actually upset. If at all.

    17. Re:oh man, what a mess by omnichad · · Score: 1

      And yet these machines have OCR built-in. So they could use the OCR as a sanity check to make sure that the algorithm didn't replace any characters. If the OCR of the original and the JBIG2 version match, you're OK. If not, automatically increase the quality until you get a match.

    18. Re:oh man, what a mess by v1 · · Score: 2

      I think the problem isn't so much the problem recognition, but the reproduction. It may be looking at two numbers that both look about the same, and using the same compressed data to draw both of them back out. Making them look identical. So if you started with two numbers, say one that was 70% like a 6 and 30% like a 8, and another that was 40% like a 6 and 60% like an 8, it's deciding they're "close enough" and is drawing the 70/30 image in both places. A human could figure out the second one was supposed to be an 8 before, but now both of them look like 6's with the same 70/30 confidence.

      If they're going to use such generous "similar consolidation" they've got to be doing a better job of figuring out if that part of the image needs to maintain its high resolution. I think that's what's going on here... it's probably got an algo that's looking for fine text and sharp images, and going with higher quality algorithms in that region of the page, and going with less accurate but more efficient ones in other places where it doesn't think it needs to waste the storage on low res. But that method is failing to find fine text when placed around rough shapes. (and I suspect it's affected by the rest of the content of the page) The article didn't show entire pages well, just small excerpts.

      --
      I work for the Department of Redundancy Department.
    19. Re:oh man, what a mess by sribe · · Score: 1

      The issue here seems to be the lossy mode of JBIG2...

      Maybe, maybe not. I don't know if that's a contributor or not. But I do know how many copiers scan in black-and-white ;-)

      What they do is scan in a lower resolution of grayscale, then interpolate upwards by pattern-matching small blocks of grayscale and substituting the "most likely" higher-resolution block of black-and-white pixels, from a smallish subset of all possible combinations, that subset being based on the most common patterns of pixels in text documents. You can see then, how a small bit of noise, could cause a match to the wrong pattern, and change whether or not a 6 is an 8, or some such.

      The thing is, that many of the other things being discussed by /.ers in this thread (smoothing algorithm filling holes, JBIG, and so on) are more likely to make a somewhat more random substitution that to a human might still look like noise. But when this algorithm goes wrong it can produce something that looks like perfectly clean text.

      This is what happens when they try to be too clever and pretend that they're scanning at substantially higher resolution than the optics of the device allow; these algorithms are intended to produce sharper cleaner results than merely using some form of curve-fitting interpolation. (Also, they're intended to require extremely minimal processing, usually being implemented in custom tiny hardware instead of even having a CPU involved.)

      And back full circle to your suggestion, my guess would be that if JBIG is contributing here, it's only with documents where the initial gray -> b&w upsampling already fucked up the image.

    20. Re:oh man, what a mess by omnichad · · Score: 1

      And with it being on an HTML settings page and not in the dialog of the actual scan process...not great. It's funny that it has built-in OCR, but it won't use that to determine if the character substitution DID occur after compressing and warn you of it (or silently recompress it at a higher setting). Machine's aren't being made smart enough.

    21. Re:oh man, what a mess by sribe · · Score: 1

      The issue here seems to be the lossy mode of JBIG2...

      Maybe, maybe not. I don't know if that's a contributor or not. But I do know how many copiers scan in black-and-white ;-)

      Well, shit, should have RTFA. So, it doesn't happen if you don't get PDF out, but just take TIFF, which, IIRC, does not support JBIG.

      Still, I wonder if there's any processing internal before compression that makes it more likely for JBIG to find false matches.

    22. Re:oh man, what a mess by omnichad · · Score: 1

      I keep posting variations of this, but I think it's ridiculous that the machine has OCR in it, but they didn't use that OCR to verify that the compressed version read as the same characters as the original. There's really no excuse for this. And why would the default settings be so low as to cause character substitution? I guess it's relative to font size, but maybe it should identify high-detail areas and use a higher match threshold for those when making the patch library.

    23. Re:oh man, what a mess by Trepidity · · Score: 4, Interesting

      It could just be a particularly poor JBIG implementation: the format and decompressor is standardized, but the standard doesn't specify how to find the matches, so various companies have their own proprietary versions.

    24. Re:oh man, what a mess by Anonymous Coward · · Score: 0

      What if the OCR is done on the JBIG2 version only?
      I am not really sure, but it seems pretty much the first thing the scanner does is to compress it before any other function is done.
      When you are photocopying you expect the raw data to be printed, not a compressed version.

    25. Re:oh man, what a mess by Anonymous Coward · · Score: 0

      Because, in my experience, the people scanning documents for companies don't really give a shit. They have a pile of docs to scan, and want to get done as quickly as possible. Scanning at 600 dpi takes longer than 300 dpi, which takes longer than 150 dpi. They will bitch and moan about scan times, and then tweak the scanner to throw pages through as quickly as possible.

      At least that's the way it works in my company.

    26. Re:oh man, what a mess by Anonymous Coward · · Score: 0

      When operating an overly aggressive hole filling algorithm, you should always yell "surprise!" first. Then it won't be rape if you fill a hole that you didn't have permission to fill.

    27. Re:oh man, what a mess by Agent0013 · · Score: 2

      Can't be a hole filling algorithm. The 8 that replaces the 6 still has the little dent on the left between the two round parts. It isn't just filling in the 6 to make an 8, it is actually replacing the 6 with a copy of the 8 from elsewhere on the page.

      --

      -- ssoorrrryy,, dduupplleexx sswwiittcchh oonn.. -Quote found on actual fortune cookie.
    28. Re:oh man, what a mess by Agent0013 · · Score: 1

      Filling in the 6 to make an 8 would not leave the dent on the left side of the eight between the two round parts. This is actually replacing the 6 with an 8 from elsewhere on the page.

      --

      -- ssoorrrryy,, dduupplleexx sswwiittcchh oonn.. -Quote found on actual fortune cookie.
    29. Re:oh man, what a mess by Anonymous Coward · · Score: 0

      ...and given most companies tendency to use low-paid contractors/temps in file rooms, they'll probably be long gone when the company gets sued for charging 27m^2 worth of rent on an office that's really only 21m^2.

    30. Re:oh man, what a mess by sribe · · Score: 1

      It could just be a particularly poor JBIG implementation: the format and decompressor is standardized, but the standard doesn't specify how to find the matches, so various companies have their own proprietary versions.

      Yes, excellent point which I had overlooked.

    31. Re:oh man, what a mess by jensend · · Score: 1

      In particular, there's no excuse for using bitonal. Bitonal documents at 300dpi are 100kb or less when using reasonable lossless compression (lossless jb2/ jbig2, CCITT Group 4 TIFF, or even just PNG).

    32. Re:oh man, what a mess by Anonymous Coward · · Score: 0

      I think so, in my company we have our one JBIG2 compressor implementation in a ingrown information system, and it never makes any error. All files are stored at 150dpi but scanned and processed at 600dpi (the shapes are recognized at 600dpi but stored at 150dpi ).

    33. Re:oh man, what a mess by Anonymous Coward · · Score: 0
    34. Re:oh man, what a mess by sjames · · Score: 1

      I think this particular incompetence rises to the level that it is indistinguishable from malice.

      Even where a document hasn't been screwed up, it can no longer be trusted.

      Even documents scanned with some other technology will be suspect until that can be proven.

    35. Re:oh man, what a mess by Compaqt · · Score: 1

      But are the file sizes so small that this even occurs as a problem?

      When I scan from Ubuntu with Simple Scan, by default, the filesize is over 1MB.

      Every single character is huge (viewed at 100% size).

      If you're sending an email, reduce the size. But why reduce when you're archiving?

      --
      I'm not a lawyer, but I play one on the Internet. Blog
    36. Re:oh man, what a mess by Anonymous Coward · · Score: 0

      nobody uses it like that

  6. JBIG2 by Anonymous Coward · · Score: 5, Insightful

    Caused by misconfigured JBIG2 compression. When pixel error rate is low enough, similar looking features get printed with the same subimage.

  7. Some image smoothing algorithm... by Nutria · · Score: 0

    which kicks in when saving to PDF, and doesn't handle low image resolution very well?

    --
    "I don't know, therefore Aliens" Wafflebox1
    1. Re:Some image smoothing algorithm... by Sponge+Bath · · Score: 4, Informative

      This is not smoothing, distortion or individual pad pixels. Entire image patches are copied incorrectly, essentially repeating a scanned section containing one number over another part of the image containing a different number.

  8. Re:Mission Impossible 4? by Entropius · · Score: 5, Funny

    That's Xenu, not Xerox.

  9. Re:Really? by Anonymous Coward · · Score: 0

    Scanning 7pt text at 200dpi with consumer level scanner technology and you're complaining about scan errors. Really?

    when other brands don't do that, yes, yes we are.

  10. Re:Anti-counterfeiting by J'raxis · · Score: 4, Insightful

    Maybe you should read the article.

  11. Re:Really? by Sponge+Bath · · Score: 5, Insightful

    Scanning an article without comprehension and your complaining about your misinterpretation. Really?

  12. Re:Really? by Anonymous Coward · · Score: 0

    Did you even read the blog post?

  13. Problem with JBIG2, not OCR by Anonymous Coward · · Score: 3, Insightful

    Before anyone spreads wrong information: The problem is with the JBIG2 image compression algorithm used when scanning to PDF format. OCR has nothing to do with this. Also, TIFF format images are not affected as they don't use JBIG2.

    1. Re:Problem with JBIG2, not OCR by Anonymous Coward · · Score: 0

      Also, TIFF format images are not affected as they don't use JBIG2.

      Given that TIFF is just a container format for many image compression formats (similar to how .AVI and .MKV are just container formats for many audio-video compression formats) I wouldn't make that assumption if I were you. TIFF portability has always been a pain-in-the-ass with different vendors using different sets of compression formats in their TIFF implementations.

    2. Re:Problem with JBIG2, not OCR by barlevg · · Score: 2

      He's not making an assumption--it says so right in the article.

    3. Re:Problem with JBIG2, not OCR by charles2678 · · Score: 1

      That's raw TIFFs. TIFF also supports compression, including JBIG2. Whether these devices support JBIG2 in TIFF is less clear, though indeed, as it says in the article, they definitely support raw TIFFs, which come out clean.

    4. Re:Problem with JBIG2, not OCR by charles2678 · · Score: 1

      ...anyhow -- the parent didn't say that these devices didn't exhibit the problem in TIFF, but that TIFF itself was innately immune to the problem. That's a considerably more sweeping -- and, frankly, unfounded -- claim.

  14. Re:Mission Impossible 4? by Anonymous Coward · · Score: 0

    Tom cruise is NOT a role-model! Shame on you Xerorx!

    you actually watch that shit? hahahahaha. no wonder you posted ac.

  15. Machine Awakening by Anonymous Coward · · Score: 0

    It's the first subtle warning of the machine awakening. ...It's coming...

  16. Re:Really? by fuzzyfuzzyfungus · · Score: 5, Informative

    Scanning 7pt text at 200dpi with consumer level scanner technology and you're complaining about scan errors. Really?

    These 'errors' are substantially worse than ordinary scanner suckitude or lossy-compression legovision: JBIG2's pixel-block matching creates the potential for a block containing one character to be mis-identified and replaced with a block containing a different character.

    The replaced character will be exactly as legible as text elsewhere on the page, just entirely incorrect.

    If it were just the scan quality being lousy, or somebody turning, say, JPEG compression up to the point of pain, mangled characters would be obviously mangled. Not as good as being legible; but the issue is obvious. In this case, the errors will look as good as the rest of the document.

  17. see the Xerox user manual by mejustme · · Score: 5, Informative

    Quote: "Normal/Small produces small files by using advanced compression techniques. Image quality is acceptable but some quality degradation and character substitution errors may occur with some originals"

    Source: http://www.cs.unc.edu/cms/help/help-articles/files/xerox-copier-user-guide.pdf

    1. Re:see the Xerox user manual by Racemaniac · · Score: 1, Insightful

      thanks for mentioning where in the 328 page document you linked that is written :)

    2. Re:see the Xerox user manual by mejustme · · Score: 3

      That is why keyboards have CTRL+F. (Top of page 107.)

    3. Re:see the Xerox user manual by Anonymous Coward · · Score: 3, Informative

      Interesting, since as far as I remember from reading about this issue yesterday, Xerox had not yet responded to this issue. Strange, since it's in the documentation.

      But then, reading the manual in context, the quote appears on pages 107, 129, and 179, which is the chapters "Fax", "Workflow Scanning", and "Save and Reprint Jobs" respectively.

      It's not in the chapter "Copying" (pages 39..63), so there is no excuse that this issue occurs in simple copy mode.

    4. Re:see the Xerox user manual by Anonymous Coward · · Score: 0

      Ctrl + F you

    5. Re:see the Xerox user manual by mwvdlee · · Score: 1

      Page 107.

      It literally took longer to download the PDF than it took to find the page by Ctrl+S.

      --
      Slashdot social media options: AIM, ICQ, Yahoo, Jabber and Mobile Text. Why no MySpace?
    6. Re:see the Xerox user manual by timeOday · · Score: 1

      Seriously, how did you happen to know about that?

    7. Re:see the Xerox user manual by h4rr4r · · Score: 1

      Try searching for that phrase. Should be pretty simple.

    8. Re:see the Xerox user manual by Rob+the+Bold · · Score: 1

      Quote: "Normal/Small produces small files by using advanced compression techniques. Image quality is acceptable but some quality degradation and character substitution errors may occur with some originals"

      Source: http://www.cs.unc.edu/cms/help/help-articles/files/xerox-copier-user-guide.pdf

      Very interesting find, although that warning only appears in the "Fax" section of the manual, and not in the "Copy" or "Workflow Scanning" sections.

      --
      I am not a crackpot.
    9. Re:see the Xerox user manual by Atzanteol · · Score: 5, Insightful

      That's "Normal" quality? That could be *very* misleading. If you have an option that has negative side-effects such as this then the option should be titled something to indicate the risk - "Super-compressed", "dangerously small" or the like.

      Though I'm surprised Xerox would even allow such a compression if such an obvious issue occurs. People would expect image quality to suffer - but full character substitution?

      --
      "Ignorance more frequently begets confidence than does knowledge"

      - Charles Darwin
    10. Re:see the Xerox user manual by Rob+the+Bold · · Score: 4, Informative

      Very interesting find, although that warning only appears in the "Fax" section of the manual, and not in the "Copy" or "Workflow Scanning" sections.

      AND I'd be wrong, it's in all three sections. Ctrl-F'ing in Ocular only finds "character substitution" when the words are side-by-side, not split by a line break as they appear in the copying and scanning sections.

      That's way worse. Xerox knows about this, and just puts in a little note, rather than a big old: "WARNING: Normal/Small mode may produce undetectable text errors."

      And that type of warning should be defined in the beginning of the manual as "operations that may cause data transcription errors resulting in financial harm, damage to property, injury or death".

      --
      I am not a crackpot.
    11. Re:see the Xerox user manual by petermgreen · · Score: 2

      The problem is that most people only read the manual when they discover something is wrong and there is no immediately obvious problem with the results of these scans. The problem only gets noticed much later when someone tries to work with the scanned information and discovers that it is readable but doesn't make sense.

      I also notice that the manual says that the other options give larger files with better image quality but does not state clearly whether compression algorithms that can cause character substitution are disabled in those modes or whether substitution is just less likely due to higher quality settings.

      When a development of a technology introduces new failure modes great care needs to be taken to inform users of those modes. Just burying it deep in a manual that people only read when things go wrong is not sufficient.

      --
      note: i'm known as plugwash most places but i screwd up registering that here somehow in the past and now can't register
    12. Re:see the Xerox user manual by NatasRevol · · Score: 1

      If their response is anything other than RTFM, they're dying.

      --
      There are two types of people in the world: Those who crave closure
    13. Re:see the Xerox user manual by Anonymous Coward · · Score: 0

      Strange, my photocopied version says "Image quality is acceptable and no quality degradation or character substitution errors will occur with any originals"

    14. Re:see the Xerox user manual by operagost · · Score: 1

      Seems a little dangerous for that algorithm to be the default, doesn't it? Plus, burying the warning deep in the documentation.

      --

      Gamingmuseum.com: Give your 3D accelerator a rest.
    15. Re:see the Xerox user manual by Rob+the+Bold · · Score: 5, Insightful

      Seems a little dangerous for that algorithm to be the default, doesn't it? Plus, burying the warning deep in the documentation.

      And an insufficient warning, at that.

      Something more like:

      Normal/Small Mode may not be suitable for documents where faithful reproduction of the original text, numbers or illustrations is critical. Examples would include legal documents (contracts, wills, articles of incorporation, etc.), medical documents (patient charts, orders, medication lists, etc.), financial documents (bills, invoices, statements, reconciliations, etc.), business documents (HR records, meeting minutes, memoranda, etc.), engineering documents (drawings, plans, change orders, instructions, bills of material, etc.) or any other document where incorrect data could result in financial loss, injury, death, property damage or destruction, legal liability, loss of reputation or other harm. These examples should not be considered an exhaustive list of documents not suited for scanning, copying or faxing using Normal/Small mode.

      would be more appropriate.

      --
      I am not a crackpot.
    16. Re:see the Xerox user manual by Anonymous Coward · · Score: 0

      Warning: Coffee may be hot.
      Warning: Water may be wet.
      Warning: Low quality scans may lose data.
      Warning: Low IQ posters may be modded +5 Insightful.
      Warning: Knife may be sharp.
      Warning: Nuclear reactor may not be immune to unprecedented earthquakes.
      Warning: Spy agencies may be spying.
      Warning: Anonymous Cowards may be trolling.
      Warning: Gasoline is not approved by the FDA as a beverage.
      Warning: Overclocking laptops and resting them on your lap can result in broiled testicles.
      Warning: Slashdot doesn't like Unicode.
      Warning: Don't forget to breathe.

      Did I miss anything?

    17. Re:see the Xerox user manual by omnichad · · Score: 1

      Oh, yes I did find it. "It was on display in the bottom of a locked filing cabinet stuck in a disused lavatory with a sign on the door saying 'Beware of the Leopard."

      Yes, it's documented. That's not really an acceptable excuse to make this setting the default.

    18. Re:see the Xerox user manual by tibit · · Score: 1

      This shouldn't be buried on page 107. It should be a warning on the first page. Heck, on top of the box the device is shipped in, for all I care. The scanned document should have that setting embedded as both metadata and a visible tag, so that it would be obvious on third generation documents (printouts from PDFs!) that the source could have had such errors. At least it's possible to decode jbig2 and visually mark all blocks on the page that have reference count larger than one. That way, if you still have the PDFs, you know what areas to audit. It's still a big snafu if you ask me.

      --
      A successful API design takes a mixture of software design and pedagogy.
    19. Re:see the Xerox user manual by Anonymous Coward · · Score: 1

      How did saving help you find the correct page?

    20. Re:see the Xerox user manual by NeverVotedBush · · Score: 1

      I would so mod this insightful if I had mod points.

    21. Re:see the Xerox user manual by Anonymous Coward · · Score: 0

      Warning: Low quality scans may lose data.

      I'm pretty sure most users of scanning technology have a reasonable concept of the proportional relationship between quality and filesize.

      However the lossy nature of JBIG2 is quite different from most compression schemes and character substitution is not the kind of error most users would anticipate.

      But hey, don't let me sidetrack your trolling.

    22. Re:see the Xerox user manual by Sir+Holo · · Score: 1

      And that type of warning should be defined in the beginning of the manual as "operations that may cause data transcription errors resulting in financial harm, damage to property, injury or death".

      If it's in a multi-user office, how many of the users are going to sit down and read the whole 328-page manual before making a copy?

      Heck, how many will even know where the manual is kept?

    23. Re:see the Xerox user manual by Anonymous Coward · · Score: 0

      No. tl;dr...

    24. Re:see the Xerox user manual by hawguy · · Score: 1

      The problem is that most people only read the manual when they discover something is wrong and there is no immediately obvious problem with the results of these scans. The problem only gets noticed much later when someone tries to work with the scanned information and discovers that it is readable but doesn't make sense.

      I think the problem is that almost no one reads the manual for the office copier at all - I've used dozens of large office copiers over the years and haven't read (or have even seen) the manual of a single one of them (at most, I've looked at the "quick-start" guide taped on the wall so I know how to use a few of the more arcane features, but even that seems to have gone away with "smart" touch-screen copiers that are supposed to be intuitive).

    25. Re:see the Xerox user manual by hawguy · · Score: 1

      Seems a little dangerous for that algorithm to be the default, doesn't it? Plus, burying the warning deep in the documentation.

      And an insufficient warning, at that.

      Something more like:

      Normal/Small Mode may not be suitable for documents where faithful reproduction of the original text, numbers or illustrations is critical. Examples would include legal documents (contracts, wills, articles of incorporation, etc.), medical documents (patient charts, orders, medication lists, etc.), financial documents (bills, invoices, statements, reconciliations, etc.), business documents (HR records, meeting minutes, memoranda, etc.), engineering documents (drawings, plans, change orders, instructions, bills of material, etc.) or any other document where incorrect data could result in financial loss, injury, death, property damage or destruction, legal liability, loss of reputation or other harm. These examples should not be considered an exhaustive list of documents not suited for scanning, copying or faxing using Normal/Small mode.

      would be more appropriate.

      Or maybe it should say "We don't recommend that you use "Normal" mode for "Normal" scanning, we only call it "Normal" and make it the default so we can tout the image compression gains you get by using it, but you should only use it if you don't care if your scanned document is different than the original in subtle and hard to find ways. If you're ok with the "Normal" mode caveats, you might like our "Super-tiny" mode, where it turns every scanned document into a blank white document. This mode may not retain any of the original document's information, but hey, at least it's guaranteed to not have transcription errors and it has the best compression ratio in the industry!".

    26. Re:see the Xerox user manual by Anonymous Coward · · Score: 0

      That's way worse. Xerox knows about this, and just puts in a little note, rather than a big old: "WARNING: Normal/Small mode may produce undetectable text errors."

      And that type of warning should be defined in the beginning of the manual as "operations that may cause data transcription errors resulting in financial harm, damage to property, injury or death".

      What, the first page of boilerplate common sense legal disclaimer stuff people skip over because they think they're not stupid?
      That's another way of blaming the user for doing it wrong, and in an office environment, NOBODY will read the manual before ramming jobs through it. That's just a legal tool, it's not a solution. It's also hard to believe Xerox doesn't already have their ducks in a row at the legal department, but who knows.

      This is how you wind up with the McDonalds Hot Coffee incident, IMO. If you were serious about digital archiving, you wouldn't just HOPE an average multi-function copier on default settings cuts it, no more than you'd HOPE a perfectly average spilled cup of coffee wouldn't hurt like all hell. That said, many will, and a warning blurb in the manual will NOT stop any of the hundreds of people working around such a machine from using it 'wrong' any more than printing 'Hot' on a cup of coffee prevents spills.

      The real solution is changing the default compression to something with better fidelity, and lowered coffee temps.

    27. Re:see the Xerox user manual by gl4ss · · Score: 1

      Try searching for that phrase. Should be pretty simple.

      I doubt that's the point..

      the point is more probably that who the fuck reads a 328 page manual.

      anyone remember that ask slashdot article from an archivist if it's ok to throw away the originals? well fuck no it's not ok. you might change somebodys birth parents, their assets, their criminal record and all kinds of shit if you do..

      --
      world was created 5 seconds before this post as it is.
    28. Re:see the Xerox user manual by Chelloveck · · Score: 2

      "I think it'd be more appropriate if the box bore a great red label WARNING: LARK'S VOMIT!"

      I'm boggled. I can't believe any copier maker would use this algorithm for its default mode. Disk and bandwidth are cheap, the space savings can't possibly be worth the risk.

      --
      Chelloveck
      I give up on debugging. From now on, SIGSEGV is a feature.
    29. Re:see the Xerox user manual by Anonymous Coward · · Score: 0

      "Did I miss anything?"

      Yes: the point. You missed the entire fucking point of the summary, the article, and the foregoing discussion.

      You missed the point that errors arising from copying may in the realms of engineering, finance, law, medicine, data analysis, mathematics, physics, chemistry, biology - any fucking science, tech or human endeavor - readily lead to loss of money, liberty, or life.

      So yes, you missed not only anything, but you missed everything. Give the man a cigar.

    30. Re:see the Xerox user manual by Anonymous Coward · · Score: 1

      It was on display in the bottom of a locked filing cabinet stuck in a disused lavatory with a sign on the door saying 'Beware of the Leopard'."

    31. Re:see the Xerox user manual by Anonymous Coward · · Score: 0

      LOOK! Someone ready the fucking manual for once! Now if only I could get people to read the fucking display too, Hey look, it says add toner! That's why it must have stopped printing.

    32. Re:see the Xerox user manual by Anonymous Coward · · Score: 0

      My God! I didn't know the find function that only works in some applications was actually built into my CTRL and F key!
      Fuckin keyboards? How do they work?

    33. Re:see the Xerox user manual by Anonymous Coward · · Score: 0

      Makes me think. If Xerox wants to sell this as 'normal', then what is the font size of legal fine print?

    34. Re:see the Xerox user manual by gigaherz · · Score: 1

      If you application does not have a working find functionality, consider an alternative application.

    35. Re:see the Xerox user manual by lxs · · Score: 1

      "It's not my fault that Buttle's heart condition didn't appear on Tuttle's file!"

      It's getting to the point where the movie Brazil has an appropriate quote for every Slashdot article.

  18. Proofreading @ Xerox Development? by BoRegardless · · Score: 1

    How could Xerox make copiers for this length of time and not have a proofreading algorithm that works with a super-resolution scan & no interpolation to "machine check" the final commercial copier as a way of quickly finding errors?

    Internatlly, Xerox engineering had to know they were "correcting" pixels, rather than just "copying" them, so how did they verify their software?

    1. Re:Proofreading @ Xerox Development? by Fnord666 · · Score: 2

      How could Xerox make copiers for this length of time and not have a proofreading algorithm that works with a super-resolution scan & no interpolation to "machine check" the final commercial copier as a way of quickly finding errors?

      Internatlly(sic), Xerox engineering had to know they were "correcting" pixels, rather than just "copying" them, so how did they verify their software?

      They do know about it.

      --
      'The tyrant will always find pretext for his tyranny.' - Aesop's Fables
    2. Re:Proofreading @ Xerox Development? by tibit · · Score: 1

      What they do is they use a canned jbig2 image compression library, without much understanding. Very typical, after all the use of a library is supposed to not have you understand the tricky parts of image compression, right? Um, yeah, right.

      --
      A successful API design takes a mixture of software design and pedagogy.
  19. Its the NSA by Anonymous Coward · · Score: 0

    NSA strikes again.

  20. my guess by Anonymous Coward · · Score: 1

    my guess is that since digitization of documents that are later destroyed are treated as originals, then this will be used to bring uncertainty and doubt to information that will otherwise be essential to bringing accountability to large organizations that used these machines.

    People: 0 , Big brother: 999999999999999999999999

  21. Free Speech by BradyB · · Score: 4, Funny

    Hey, even photo copiers and faxes need freedom of speech.

    --

    Good is never enough, when you dream of being great!
    1. Re:Free Speech by HyperQuantum · · Score: 1

      But they should not be allowed to print FIRE on a sheet that's to be used in a theater.

      Right?

      --
      I am not really here right now.
    2. Re:Free Speech by Anonymous Coward · · Score: 0

      Dear Mister SEAGER,

      We write to inform you of the unfortunate XEROX RULEZ!!.
      It is currently SCREW YOU power to resolve this situation.
      If you CENSORINGZ information then please feel free GOD YES FREEDOM to contact us.
      We whish I'M GAY with your loss.

      Sincerely,
      I WANT A DAY OFF DAMNIT!!!!!

  22. Re:Really? by xaxa · · Score: 4, Informative

    Scanning 7pt text at 200dpi with consumer level scanner technology and you're complaining about scan errors. Really?

    Consumer level? This isn't a home, or even home-office, machine. It's sold on the website under the office section.

  23. Makes copies that are better than the original by Anonymous Coward · · Score: 0

    This Xerox product was popular on Wall Street a few years ago, especially those dealing in mortgage-backed securities.

  24. Known Xerox Issue..... in documentation by Anonymous Coward · · Score: 5, Informative

    If you read the documentation from XEROX... it claims that on scanning it is a known problem that "Image quality is
    acceptable but some quality degradation and character substitution errors may occur with some
    originals." page 107 from http://www.cs.unc.edu/cms/help/help-articles/files/xerox-copier-user-guide.pdf

    also on page 129 we have the following: "Quality / File Size
    The Quality / File Size settings allow you to choose
    between scan image quality and file size. These settings
    allow you to deliver the highest quality or make smaller
    files. A small file size delivers slightly reduced image quality
    but is better when sharing the file over a network. A larger
    file size delivers improved image quality but requires more
    time when transmitting over the network. The options are:
      Normal/Small produces small files by using advanced
    compression techniques. Image quality is acceptable but some quality degradation and character
    substitution errors may occur with some originals."

    1. Re:Known Xerox Issue..... in documentation by Chris+Mattern · · Score: 4, Insightful

      Now the question becomes: what moron made this setting the default? Maybe a setting that can undetectably corrupt your data can be provided if appropriate warnings are given, but it sure as hell should never be the default. I would've thought that was obvious.

    2. Re:Known Xerox Issue..... in documentation by Nimey · · Score: 2

      So you're telling us this is a problem caused by a user not RTFMing and Slashdot sensationalized it?

      Surely you're joking. :P

      --
      Hail Eris, full of mischief...

      E pluribus sanguinem
    3. Re:Known Xerox Issue..... in documentation by MozeeToby · · Score: 3, Insightful

      Substitution errors shouldn't happen in corporate level scanning hardware, even if you bury a warning about it 107 pages into the 350 page manual. You can't have something that fundamentally makes your product not fit for purpose and claim that it's ok just because it's a known issue.

    4. Re:Known Xerox Issue..... in documentation by Rob+the+Bold · · Score: 3, Interesting

      So you're telling us this is a problem caused by a user not RTFMing and Slashdot sensationalized it?

      Surely you're joking. :P

      I admit that I, for one, don't usually RTFM before using the copier. Certainly not when I'm using the copier in "Normal" mode.

      And don't call me "Shirley".

      --
      I am not a crackpot.
    5. Re:Known Xerox Issue..... in documentation by Anonymous Coward · · Score: 0

      are you serious or are you just screwing with us???

      compression is one thing but no one should ever have to deal with a scanner changing the document's content.... Do you read your cars manual to make sure turning on the wipers don't disable the brakes?

      dam u neckbeards r dim...

    6. Re:Known Xerox Issue..... in documentation by tibit · · Score: 1

      Never mind that this is like usability 101.

      --
      A successful API design takes a mixture of software design and pedagogy.
    7. Re:Known Xerox Issue..... in documentation by Anonymous Coward · · Score: 0

      Lucky you. My boss doesn't let me fix bugs that way. I actually have to fix the bugs.

      Otherwise he'll do some worker substitution on me.

      FWIW we use Canon not Xerox.

    8. Re:Known Xerox Issue..... in documentation by EvilSS · · Score: 1

      My question is how is this ever acceptable? If I am scanning a document (with OCR turned off) to PDF, I would expect the device to always (no matter the settings I used) give me a faithful copy of the document. The possibility that it may randomly substitute letters or numbers makes this completely useless since I can't trust the accuracy of the output. This is especially true for numbers, which may not have enough contextual information in the document to make it obvious that the number is incorrect.

      --
      I browse on +1 so AC's need not respond, I won't see it.
    9. Re:Known Xerox Issue..... in documentation by Anonymous Coward · · Score: 0

      The moment they realized that the manual needed to include this warning *should have been* the moment they removed this feature from their product.

      (coincidentally amusing captcha: "expunge")

    10. Re:Known Xerox Issue..... in documentation by Anonymous Coward · · Score: 0

      The real question is : what moron still use the default setting when he wants to copy a document with a 7 point font? I know we live in a world where a company as to put a warning saying should remove the cap of the antiperspirant before using it, but I do think we should stop caring that much for stupid people.

    11. Re:Known Xerox Issue..... in documentation by SleazyRidr · · Score: 1

      I'd expect the default to not fuck with my document too much, so if I'm trying to make it work with something like a 7 point font, I'm not going to go messing around and risk messing things up.

    12. Re:Known Xerox Issue..... in documentation by Control-Z · · Score: 1

      My further question would be what moron thought this setting should be associated in any way with a COPIER? A copier should produce copies of the original. If I just wanted random nonsense to print I'd hire monkeys to sit at keyboards.

    13. Re:Known Xerox Issue..... in documentation by Anonymous Coward · · Score: 0

      Do you mean to imply that they wrote the documentation before they designed the copier? My guess is that they figured out the problem afterwards and added it to the documentation to cover their arses.

    14. Re:Known Xerox Issue..... in documentation by deadweight · · Score: 1

      I just walk up to the copier and hit the button. I *never* check the 1,000 menus on the thing unless the copy is illegible. I might expect smeared or blurred text, but I sure as hell don't expect 888 to turn into 666!

    15. Re:Known Xerox Issue..... in documentation by Demonantis · · Score: 1

      Why wouldn't you proof all copies before distributing them. The print head could have failed and cut off important text. Its a pain but it mitigates liability.

    16. Re:Known Xerox Issue..... in documentation by ZosX · · Score: 1

      Agreed. This is simply unacceptable. No other way about it.

  25. Re:Anti-counterfeiting by Anonymous Coward · · Score: 3, Funny

    Huh?

    I'm sorry. I understand those 6 words individually. But when you put them in that order, they don't make any sense.

    Read? The? Article? You are not making any sense, man!

  26. Re:Anti-counterfeiting by Anonymous Coward · · Score: 3, Funny

    I lack the proper attention span to read the article. Let's make a deal: I quickly skim through it, and soon return here with another completely wrong conclusion. Be back in 30 seconds.

  27. Wub fur by Errol+backfiring · · Score: 1

    They probably have some parts made of wub fur. Those machines are more advanced than I thought!

    --
    Nae king! Nae laird! Nae yurrupiean pressedent! We willna be fooled again!
  28. No need to be a scientist.... by Anonymous Coward · · Score: 0

    to RTFM

  29. Re:Really? by Atzanteol · · Score: 2

    A $12,000 scanner/printer is "consumer level?"

    --
    "Ignorance more frequently begets confidence than does knowledge"

    - Charles Darwin
  30. I recognise the algorithm that gives those errors. by SuricouRaven · · Score: 1

    I just spent ten minutes describing exactly how JBIG works here before noticing someone already realised what is happening and put it up on the page.

  31. ImageRunner by poofmeisterp · · Score: 3, Funny

    OMG, my Canon ImageRunners are doing the same thing! It must be a virus!

    I'd better write up a research document on this and request some grant money.

    1. Re:ImageRunner by ZosX · · Score: 1

      Are they? That would be news if they also substituted characters.

  32. Re:oh man, what a mess, Obama birth certificate to by Anonymous Coward · · Score: 0

    This problem showed up a while ago, when Obama's birth certificate was released. Some doofus scanned it using some overblown Adobe product, which probably without asking, did OCR on it and added layers of gray OCR'ed text.

    That set off a spitstorm of wingnuts posting smarmy YouTube videos where they showed how "intelligent" they were at "detecting" that the image was so, so, so "manipulated".

  33. RTFM? WTF? by Anonymous Coward · · Score: 0

    Quote: "Normal/Small produces small files by using advanced compression techniques. Image quality is acceptable but some quality degradation and character substitution errors may occur with some originals"

    Source: http://www.cs.unc.edu/cms/help/help-articles/files/xerox-copier-user-guide.pdf

    Page 129 for those incapable of searching a PDF.

    But, seriously dude, this is scientific research! You can't seriously expect the man to RTFM.

  34. Interesting by jones_supa · · Score: 3, Interesting

    The things you learn. I never knew before about JBIG2 and how scanners use it to repeat pieces of image. Seems to me that the JBIG2 parameters are tuned incorrectly in these scanners.

  35. Corporate decision by Dunbal · · Score: 3, Funny

    This was a decision by Xerox to get around ever being sued for copyright violations...

    --
    Seven puppies were harmed during the making of this post.
  36. NSA BUG by Sentrion · · Score: 1, Funny

    It's just a bug in the NSA eavesdropping algorithm.

  37. I can't understand by joh · · Score: 2

    how a compression that may lead to documents altered in such a way (numbers replaced by other numbers) can be considered fit for use in a photocopier. This can lead to very real, expensive and even dangerous problems down the line.

    1. Re:I can't understand by Anonymous Coward · · Score: 0

      Djvu is being used for archiving large digitized libraries, which uses DjVuBitonal, also known as JB2 similar to JBIG2.

  38. Re:Really? by jeffmeden · · Score: 1

    Scanning 7pt text at 200dpi with consumer level scanner technology and you're complaining about scan errors. Really?

    These 'errors' are substantially worse than ordinary scanner suckitude or lossy-compression legovision: JBIG2's pixel-block matching creates the potential for a block containing one character to be mis-identified and replaced with a block containing a different character.

    The replaced character will be exactly as legible as text elsewhere on the page, just entirely incorrect.

    If it were just the scan quality being lousy, or somebody turning, say, JPEG compression up to the point of pain, mangled characters would be obviously mangled. Not as good as being legible; but the issue is obvious. In this case, the errors will look as good as the rest of the document.

    After actually looking at the images in TFA, it does seem like there is a problem with the way 6/8 and 4/7 are interpreted. However, you can't say that the results aren't quite noisy; I would look at a scan like that with a squinty eye and be super annoyed at the jerk who couldn't just procure the *original* electronic format. Just because the scanner "seems to do ok" on other equally tiny numbers doesn't make it right. Get the goddamn original file.

  39. Re:Really? by UnknowingFool · · Score: 4, Informative

    If you read the article you would see it's not a simple case of scan error where a "13" appears blurry and looks like "B". Whole numbers are changed: 21.11--> 17.43. This is a major issue if it was on a construction drawing for example. A beam 4m too short would be a problem. Even if caught the engineer signing off might have to go through a whole audit process.

    --
    Well, there's spam egg sausage and spam, that's not got much spam in it.
  40. Re:Really? by Anonymous Coward · · Score: 1

    A $12,000 scanner/printer is "consumer level?"

    What are we? Farmers?

  41. Re:Anti-counterfeiting by niftydude · · Score: 1

    That is asking too much of him - maybe if he just looked at the pictures in the article?

    --
    You can never know everything, and part of what you do know will always be wrong. Perhaps even the most important part.
  42. digital is not always better by Anonymous Coward · · Score: 0

    duh..

    but its really the embedded serial numbers in scanned and printed documents that's getting in the way.

  43. Re:Mission Impossible 4? by CanHasDIY · · Score: 1

    That's Xenu, not Xerox.

    Xenu... Xerox... Xenu-Rox?

    --
    An enigma, wrapped in a riddle, shrouded in bacon and cheese
  44. Security "feature" by Anonymous Coward · · Score: 0

    There was a trend a while back for copiers and printers to put fingerprints on output to make police investigations easier and to prevent effective counterfeiting. I think it may still be happening industry wide.

    JJ

  45. Re:Mission Impossible 4? by Anonymous Coward · · Score: 0

    Except Xenu doesn't. Crazy lunatics!

  46. Surprised nobody asked this... by ZorinLynx · · Score: 3, Informative

    Why do we need such aggressive compression algorithms, algorithms that can make the data WRONG, in this day and age when storage and memory is so incredibly cheap?

    This is not 1987 when every byte was precious and 1MB of RAM cost a hundred bucks. There is NO EXCUSE for this these days; just use PNG or JPG compression; at least those don't freaking CHANGE THE DATA!!

    1. Re:Surprised nobody asked this... by Anonymous Coward · · Score: 0

      There is NO EXCUSE for this these days; just use PNG or JPG compression; at least those don't freaking CHANGE THE DATA!!

      Although lossless variants exists, standard JPG sure as hell does change the data.

  47. Self-Correcting Bug by JeanCroix · · Score: 4, Funny

    I printed out the article in order to hang it on the wall above my office's Workcentre as a warning to coworkers. But apparently printing it fixed the problem, because the article headline became:

    "Xerox scanners/photocopiers Scan Documents Flawlessly and are the Best in the Industry"

    1. Re: Self-Correcting Bug by Anonymous Coward · · Score: 0

      Sounds great! Now I can skip the article and the other comments.

  48. "Windows" and "American" etymological fallacies by tepples · · Score: 1

    Windows-1251 character codes do not belong on the internet. Many people now a days don't use windows.

    For one thing, using Windows code pages does not require Windows. They are well-defined encodings of a subset of Unicode. If I were to apply the same etymological fallacy to your suggestion to stick to the American Standard Code for Information Interchange, it might look like this: "Many people now a days don't live in America." A lot of languages don't easily map to just the Basic Latin block (U+0020 through U+007E). For example, in Spanish, "esta" means "this" while "está" means "is currently" or "is located".

    1. Re:"Windows" and "American" etymological fallacies by Alioth · · Score: 1

      A better example is "año" means year and "ano" means anus.

    2. Re:"Windows" and "American" etymological fallacies by tepples · · Score: 1

      Good point. So would "feliz ano nuevo" mean "congratulations on your imperforate anus repair"?

    3. Re:"Windows" and "American" etymological fallacies by stewsters · · Score: 1

      So why can't you use UTF-8 for those? Windows-1251 and UTF-8 are different things.

      I'm not advocating for the removal of all characters but 26 upper-case Latin characters, what I'm saying is that there is a more universal standard for that. Does windows-1251 work for Japanese?

      If Microsoft goes off and implements their own character encoding that's fine, but its not going to work well with UTF-8 unless everyone is really explicit about character encoding. And Slashdot seems to choose UTF-8:
      Content-Type:text/html; charset=utf-8

  49. Directionality override by tepples · · Score: 1

    I've explained this several times. Slashdot introduced a code point whitelist after past abuses of bidirectional override characters.

    1. Re:Directionality override by J'raxis · · Score: 1

      Except it's not stripping characters out, it's misinterpreting UTF-8 as single-byte characters. Seeing a quotation mark turn into three characters is indicative of this. Although I do see two of the characters then lost their accent marks, so some kind of stripping or substitution is going on.

      So it's not just complete ignorance on the part of Slashdot's coders. It's intentional, but half-assed, so it makes things even uglier.

    2. Re:Directionality override by Plumpaquatsch · · Score: 1

      I've explained this several times. Slashdot introduced a code point whitelist after past abuses of bidirectional override characters.

      So either whitelist known characters, or simply fucking ban bidirectional override characters. Or is the codebase stuck at the same time the cookies at the bottom of the page are?

      --
      Of course news about a fake are Fake News.
  50. Re:Mission Impossible 4? by AJH16 · · Score: 2

    That's what he said before he scanned it on his WorkCentre.

    --
    AJ Henderson
  51. Re:Anti-counterfeiting by Anonymous Coward · · Score: 0

    You must be new here...

  52. The codecs commonly used with a container by tepples · · Score: 2

    In theory, TIFF is a container format for any image codec that has a TIFF embedding defined. In practice, TIFF is a container format only for those codecs supported by common TIFF viewers. To use your analogy to AVI, when people see "AVI", they think of the codecs commonly used with an AVI container, such as MPEG-4 ASP video and MPEG-1 Layer III audio back in the DivX era. I could wrap the obscure codec of PlayStation 1 or Game Boy Advance FMV in an AVI or MKV container, but there'd be no use because next to nothing that supports such a container also supports those codecs. WAV is also a container, but over 9 times out of 10, the compression features aren't used.

    1. Re:The codecs commonly used with a container by sjames · · Score: 1

      I still have nightmares about TIFF from the '80s. Each and every piece of sh^wsoftware out there thought it was the keeper of the one true TIFF format and none would dirty themselves by reading anyone else's TIFF images.

      Save to TIFF was just an elaborate way of saying "exit without saving"

  53. THIS! by Anonymous Coward · · Score: 0

    Why would a copier do OCR + compression? To store and to transmit.

    We knew already that copiers save all material to hard disk. We now know they OCR+compress. What better way to maximize disk usage and save transmission time to the Mother Ship (NSA Utah).

  54. Lossy Compression by steamraven · · Score: 1

    ....AND that's why you don't use a lossy compression for your important text documents.

  55. Re:Anti-counterfeiting by Anubis+IV · · Score: 5, Informative

    That's all I did, and I learned what they were talking about pretty quickly.

    It's actually pretty insane. They had architectural diagrams that had the square meters for the rooms copy/pasted by the scanner into other rooms. For instance, here were the room sizes for the three rooms on the diagram as reported on the original diagram and various scans of it (I've bolded incorrect values):
    Original Diagram: 14.13m^2, 21.11m^2, 17.42m^2
    Xerox WorkCentre 7335 scan: 14.13m^2, 14.13m^2, 14.13m^2
    Xerox WorkCenter 7556 scan 1: 14.13m^2, 14.13m^2, 14.13m^2
    Xerox WorkCenter 7556 scan 2: 17.42m^2, 21.11m^2, 17.42m^2
    Xerox WorkCenter 7556 scan 3: 14.13m^2, 14.13m^2, 17.42m^2

    They have images of this happening. It's just outright substituting blocks of text from one part of a scanned image into an entirely separate part. Not just mangling pixels or uniformly displacing each by a few mm, but outright moving them into a different part of the image that was similar, yet slightly different. Maybe it's some sort of optimization or compression gone wrong? I.e. They detected a block that appeared to be the same as a previous one, so assumed they were the same and only kept one copy of that data?

    It's bizarre.

  56. Henceforth - ban 7pt text by Anonymous Coward · · Score: 0

    Especially for the checksum, which is required to be printed in bold 18 point text, and corresponds to the provided the electronic format of the document.

  57. Re:Really? by NeverVotedBush · · Score: 1

    What you see here is that the copiers have achieved the singularity and are now posting defensively. Guytoronto is a network-connected WorkCentre copier that is using JBIG in everything including its thought processes and is thus misinterpreting everything.

  58. This is HUGE! by tekrat · · Score: 4, Interesting

    This is how people get shot, because the police are given the wrong address to raid a house. This is how people get foreclosed on because a few account numbers are switched.

    Holy crap. That makes me never want to go near a copier again.

    --
    If telephones are outlawed, then only outlaws will have telephones.
    1. Re:This is HUGE! by Anonymous Coward · · Score: 0

      This is how people get shot, because the police are given the wrong address to raid a house. This is how people get foreclosed on because a few account numbers are switched.

      Holy crap. That makes me never want to go near a copier again.

      More frequently, accidents happen because the gaps between mouths and ears has MUCH worse fidelity than this compression technique.

      Your house can be foreclosed on if you don't pay HOA fees or the company that supplied rocks to your landscaper wasn't paid, and lots of other absurd reasons.
      Most people HOPE there are errors in the bank's record keeping in regards to foreclosures, just sayin.

  59. Re:Anti-counterfeiting by hawguy · · Score: 1

    They have images of this happening. It's just outright substituting blocks of text from one part of a scanned image into an entirely separate part. Not just mangling pixels or uniformly displacing each by a few mm, but outright moving them into a different part of the image that was similar, yet slightly different. Maybe it's some sort of optimization or compression gone wrong? I.e. They detected a block that appeared to be the same as a previous one, so assumed they were the same and only kept one copy of that data?

    It's bizarre.

    You came up with the exact same conclusion as the author of the article you just read:

    Edit: It seems that the above thought was not that wrong at all. Several mails I got suggest that the xerox machines use JBIG2 for compression. This algorithm creates a dictionary of image patches it finds “similar”. Those patches then get reused instead of the original image data, as long as the error generated by them is not “too high”. Makes sense.

  60. Re:Anti-counterfeiting by Anubis+IV · · Score: 5, Funny

    You came up with the exact same conclusion as the author of the article you just read:

    Hey now, there's no need to accuse me of reading the article just because I looked at the pictures.

  61. We are Farmers by tepples · · Score: 1

    What are we? Farmers?

    Bum, badum, bum bum bum bum.

  62. Probably an image quality enhancement fix. by jellomizer · · Score: 3, Insightful

    I expect the bug is because it is trying clean up the scanned image. Trying to account for what it thinks is missing data.
    14.13m^2, 21.11m^2, 17.42m^2

    It see 3 blocks of information that probably roughly looks the same to the software accounting for errors. The amount of pixels used in each are fairly close. I expect the scanner sees the three blocks and thinks they are the same, and tries to find the block that seems the most sharp and reproduces them over the other spots.

    Scanning isn't pixel perfect you get a different match. So the image cleaning processor will probably try to clean the numbers differently.

    --
    If something is so important that you feel the need to post it on the internet... It probably isn't that important.
  63. Anything Similar in their RedLight/Speed Cam SW? by Anonymous Coward · · Score: 0

    Perhaps, their traffic cam software should be checked for "substitution errors."

  64. confirmed by Anonymous Coward · · Score: 1

    We just completed some testing using a 7535 and got the same 8/6 mixing issue.

    1. Re:confirmed by Russ1642 · · Score: 1

      I think you mean model 7536. Slashdot tends to change numbers on you every now and again.

  65. Re:Really? by Agent0013 · · Score: 1

    We scanned in the originals and then shredded them. This is now the official copy!

    --

    -- ssoorrrryy,, dduupplleexx sswwiittcchh oonn.. -Quote found on actual fortune cookie.
  66. Insane! by bradley13 · · Score: 1

    Anyone who hasn't RTFA really ought to at least look at the example. This is not only a case of a blurry 6 being replaced with a blurry 8, which would be bad enough. If surrounding context matches, it will replace numbers with complete different text! In the first example given, the number 14.31 is scanned in one place, and used to replace the numbers 21.11 and 17.42 in two other places. In all cases, the numbers are perfectly legible.

    In what world is this acceptable? To actually document this on page 129 of the handbook (that almost no one will ever read) and deliver the product - insane!

    --
    Enjoy life! This is not a dress rehearsal.
  67. It does not look better. by Anonymous Coward · · Score: 0

    Then again, "better" is a subjective statement based on the perception and crieteria of the observer.

    However YOU propose it as some sort of Universal Truth.

    Either you're a frigging moron or you think that you are God.

  68. Xerox's Official Response by Anonymous Coward · · Score: 2, Informative

    http://realbusinessatxerox.blogs.xerox.com/2013/08/06/always-listening-to-our-customers-clarification-on-scanning-issue/?CMP=SMO-EXT#.UgEhdRgk98F

    By Francis Tse, principal engineer, Xerox

    Recently there have been articles about Xerox devices randomly altering numbers in scanned documents. We take this issue very seriously.

    The problem stems from a combination of compression level and resolution setting. The devices mentioned are shipped from the factory with a compression level and resolution that produces scanned files which are optimized for viewing or printing while maintaining a reasonable file size. We do not normally see a character substitution issue with the factory default settings however, the defect may be seen at lower quality and resolution settings.

    The Xerox design utilizes the recognized industry standard JBIG2 compressor which creates extremely small file sizes with good image quality, but with inherent tradeoffs under low resolution and quality settings.

    For data integrity purposes, we recommend the use of the factory defaults with a quality level set to “higher.” In cases where lower quality/higher compression is desired for smaller file sizes, we provide the following message to our customers next to the quality settings within the device web user interface: “The normal quality option produces small file sizes by using advanced compression techniques. Image quality is generally acceptable, however, text quality degradation and character substitution errors may occur with some originals.”

    Xerox is totally committed to customer satisfaction and with this feedback we will look for ways to help our customers better manage their scanning application needs.

    For more information, contact Xerox Support at http://www.xerox.com/perl-bin/world_contact.pl#0.

    1. Re:Xerox's Official Response by guruevi · · Score: 1

      It's been a long time since I've seen a website with a working Perl script generating content. I wonder what the guts of the Perl script does.

      --
      Custom electronics and digital signage for your business: www.evcircuits.com
  69. Re:Anti-counterfeiting by Gr8Apes · · Score: 1

    That's exactly what it did - JBIG - compression algorithm. Why on earth would a scanner be using a compression algorithm? Memory is cheap, do a pixel scan and send me that, let me deal with compressing it, if I want to.

    --
    The cesspool just got a check and balance.
  70. Hash collisions anybody? by John+Allsup · · Score: 1

    If you break apart data into chunks, hash each chunk to a hash smaller than a chunk, make an incorrect assumption about lack of collisions and then try to reconstruct, this is what you'd come across.

    --
    John_Chalisque
  71. I call BS by Anonymous Coward · · Score: 0

    I work for Xerox. I specifically support these machines in a tier 3 capacity. I have not seen or heard a single case of this. My group handles calls from all of North America, and some South.

    1. Re:I call BS by itsdapead · · Score: 2

      I work for Xerox. I specifically support these machines in a tier 3 capacity. I have not seen or heard a single case of this. My group handles calls from all of North America, and some South.

      Perhaps they're all trying to call the support number on the user guide that they just printed out... :-)

      --
      In a survey of 100 programmers, 111111 thought that duck-typing was a good idea.
    2. Re:I call BS by Guy+Harris · · Score: 3, Informative

      I work for Xerox. I specifically support these machines in a tier 3 capacity. I have not seen or heard a single case of this.

      So does Francis Tse, and he's apparently heard of it.

      My group handles calls from all of North America, and some South.

      You might want to talk to somebody who handles calls from Western Europe - Germany, in particular.

  72. Re:Anti-counterfeiting by Beardo+the+Bearded · · Score: 1

    How do you think I feel?

    I work for an engineering company and we've got Xerox workstations, so this basically has "long day" written all over it.

    It would be great if they had a few test sheets we could run.

    --

    ---
    ECHELON is a government program to find words like bomb, jihad, plutonium, assassinate, and anarchy.
  73. Life imitates art: Dilbert by time961 · · Score: 1

    See They're photocopies! You don't need to proofread each one!

    Unbelievable software incompetence. Not only did they do this, but they knew about it and documented it!

  74. So that's why BLS, CPI, and revenue projections by DCFusor · · Score: 0

    Are all wrong. What a convienient excuse for the liars in government to put out ridiculous wrong numbers. "Who could have known?" There's no inflation right? Unemployment (if you count part time jobs designed to elminate need for obamacare)...and so on. This seems in the examples to likely print lower numbers...How handy for the liars to have an excuse for it.

    --
    Why guess when you can know? Measure!
  75. Re:Anti-counterfeiting by nytes · · Score: 2

    I have some test sheets that should do the job for you.

    I'll scan them and send you the images.

    --
    -- I have monkeys in my pants.
  76. Trying again by jensend · · Score: 1

    In particular, there's no excuse for using < 300 dpi when using bilevel. Bilevel documents at 300dpi are 100kb or less when using reasonable lossless compression (lossless jb2/ jbig2, CCITT Group 4 TIFF, or even just PNG).

    Using lossy jb2/jbig2 like these copiers were doing is at most going to save you a couple dozen kb per page. Not worth the problems in many cases.

    (doing this reply again since I forgot slashdot eats less than signs unless you use html entities. Man, what an anachronism.)

  77. this should be in the FAQ by Anonymous Coward · · Score: 0

    Zalgo and page-widening trolls, beeotches.

    Because Unicode validation & sanity checking is harrrrdddddd...

  78. Re:Anti-counterfeiting by AK+Marc · · Score: 1

    It's probably a setting. Scan to TIFF is generally turned off because TIFFs are big. The work around he found was to change the coarse/fine setting.

    Given the included images, nobody should be using the copies for official work. The ones that weren't changed were mostly unreadable (or at least partially ambiguous). Just print 3 originals, rather than one original and making three copies of it.

    And despite the use of the word "random" it didn't appear random at all, and the author even said so himself.

  79. Re:Anti-counterfeiting by aix+tom · · Score: 2

    Well, that's the thing with the "all rolled up into one" solutions. Those scanners scan/mail/archive/etc... all by themselves, without further user intervention and without the need for an additional computer attached.

    The big foul-up is that hey use JBIG, not a more sensible compression algorithm like LZW or JPEG where "to small to read" stuff really gets "too small to read" in the scan, too, not "improved" to something else. The foulup would have been exactly the same if someone later in the tool chain had used JBIG.

    They probably hat a test run by a marketing drone that found that JBIG "looks so much clearer" ;-P

  80. Hello? QA? Hello!!! by nanospook · · Score: 2

    Something like this shouldn't have passed QA.. did we outsource or what?

    --
    Have you fscked your local propeller head today?
  81. Re:Really? by Anonymous Coward · · Score: 0

    Not only that, this is probably a 6-10 thousand dollar machine (Depending on options).

    It isn't some home office multifunction.

  82. Intermittent is the worse by Tablizer · · Score: 1

    I once worked at a place that had a printer that intermittently flipped characters. It was difficult to solve because it was so intermittent. It couldn't be recreated in the lab to test without blowing thru thousands of sheets of paper, and that wasn't enough to isolate the problem or prove to the supplier. It drove everybody crazy.

    Rumor has it that a technician secretly sabotaged it by juicing it with too much voltage so that the whole thing had to be replaced after a "mysterious failure". Sometimes you welcome dishonesty.

  83. JBIG2 by GoRK · · Score: 1

    I haven't even read this article and I know the culprit exactly: JBIG2.

    The compression algorithm operates on binary (2 color) images and has two modes, a lossless mode which is sort of like the love child of RLE and JPEG and a higher compression mode which operates by running the lossless blocks through a comparison routine and discarding and replacing any blocks that are sufficiently similar with references to the first copy. It's actually a good algorithm, but you have to understand how it works to implement it properly. When you have a perfect storm of certain fonts (especially small ones where a glyph can fit perfectly inside a block), have some noise in the bitonal images and have the compression threshold too high you can get some real zingers.. 9, 6, 0, 3, and 8 can all easily get muddled up, not to mention what happens to letters like e o c etc. The key to the whole thing is having good algorithms that can produce quality bitonal images from poor originals and scanning at sufficient resolution (or lowering the compression threshold enough) that blocks cannot hold an entire glyph.

    As to why the copier is using the lossy mode of JBIG2 internally is mystery, especially in the "copy" pipeline. I can think of no good reason that it should use anything other than the lossless mode or uncompressed data.

  84. Re:Anti-counterfeiting by Dogtanian · · Score: 1

    You came up with the exact same conclusion as the author of the article you just read:

    Actually, his conclusion *was* very different, but Slashdot's malfunctioning compression algorithm didn't realise this and inadvertantly replaced it with a duplicate of the original instead.

    --
    "Slashdot - News and Chat Sites Deviant". (Click "homepage" link above for details).
  85. Re:Anti-counterfeiting by Sabriel · · Score: 1

    Any chance of a public link? Xerox here too, and I'd put short odds that more than a few slashdotters are affected.

  86. Re:Anti-counterfeiting by Sabriel · · Score: 1

    *facepalm* Aargh, I just saw what you did there. +1 Internets to you, sir/madam/other.

  87. Re:Anti-counterfeiting by Sabriel · · Score: 1

    There's a pre-error PNG of the drawing sample and a pre-error TIF of the number table sample they used in the original article. Perhaps try scanning printouts of them, it appears to be how some of the readers are reproducing the error?

    The blog's also had a few updates, indicating affected models known so far and a possible workaround.

  88. What a bug.. by StewartD · · Score: 1

    Well I'm glad my business uses another printer brand. I don't entirely understand the cause of the problem that's described in the original article but I can't believe it took this long for this bug to be found.

  89. Update by hweimer · · Score: 1

    Now the question becomes: what moron made this setting the default? Maybe a setting that can undetectably corrupt your data can be provided if appropriate warnings are given, but it sure as hell should never be the default. I would've thought that was obvious.

    The guy who came up with this posted several updates to his blog.

    1. The setting is not the default.
    2. There is a warning when you change the settings in the web frontend.
    3. Xerox's support staff was not aware of this problem and could not come up with a solution.

    --
    OS Reviews: Free and Open Source Software
  90. Honestly... by Anonymous Coward · · Score: 0

    This is EXACTLY why I take the disk space premium and only archive stuff in lossless formats (unless the original was encoded in lossy format:: I'm looking at you DV/h264 camcorder formats!)

    I made the mistake years ago of ogg-encoding a bunch of audio files instead of flac. While I didn't shred the originals, as soon as I started using quality speakers I noticed the terrible quality compared to the source CDs. Needless to say I went to the trouble of re-ripping them once I had enough hard disk space (Thus allowing me to avoid further wear to the discs due to placing them into cd players.)

  91. Re:Really? by Anonymous Coward · · Score: 0

    well that engineer needs to compare original and copy

    thats part of his job

  92. What other devices do this ? by ToddInSF · · Score: 1

    And how far back are we talking ?

    Somebody would have noticed, surely...

    Unless, of course, the resulting errors were errors people accepted/wanted.

  93. JPG encryption ... pixelation artifact? by FreedomFirstThenPeac · · Score: 1

    Is the system using JPG compression ? We have all seen boxes of pixels move around when the DVD player gets confused.

    --
    "There is no god but allah" - well, they got it half right.
  94. Re:Anti-counterfeiting by Smallpond · · Score: 1

    I would mod this "Funny" but my auto-correct substituted "F***ing genius"

  95. Bullshit by Anonymous Coward · · Score: 0

    There is no use case for compression gain over semantic fidelity.

    Period.

    Please make sure you've understood my comment before attempting a rebuttal. This is very much a case of "just because you can, doesn't mean you should".

  96. Xerox copiers changing numbers - vs. previous NSA? by Dmpstrdvr · · Score: 1

    Does no one remember the previous revelation that Xerox color printers were printing "serial number" coded "not visible to the naked eye" codes on all color prints? This seems to have been secretly installed in the equipment for the convenience of some government agency.. (think Treasury, i.e. copying currency images)

    How could a .jpg algorithim only substitute numbers for numbers - vs. random alphanumeric characters? unless the machine was converting the contents via OCR.. possibly to forward to? Most of those machines do now have internet connections..