Slashdot Mirror


Why Netscape shows ? instead of '

RandySC writes " Demoronizer is a Perl program which corrects numerous errors and incompatibilities in HTML generated by, or edited with, Microsoft applications. The demoroniser keeps you from looking dumber than a bag of dirt when your Web page is viewed by a user on a non-Microsoft platform. A little detective work revealed that, as is usually the case when you encounter something shoddy in the vicinity of a computer, Microsoft incompetence and gratuitous incompatibility were to blame. Western language HTML documents are written in the ISO 8859-1 Latin-1 character set, with a specified set of escapes for special characters. Blithely ignoring this prescription, as usual, Microsoft use their own "extension" to Latin-1, in which a variety of characters which do not appear in Latin-1 are inserted in the range 0x82 through 0x95--this having the merit of being incompatible with both Latin-1 and Unicode, which reserve this region for additional control characters. " So now we know what happened to Jon.

104 comments

  1. Apple do this, as well as Microsoft by Anonymous Coward · · Score: 0

    Many Mac web sites have the same problem.

    Smart quotes don't come out properly in IE/Win.

  2. Dammit! by Anonymous Coward · · Score: 0

    We WANT them to look stupid! Now someone goes and writes something that allows them to clean up after Microsoft's crap. It should put a little tag in there someplace that says "Cleaned up after Microsoft by Demoronizer!"

    I liked it better when I could spot the people using Microsoft tools on their web pages ... like Scriptics.

  3. Weep! CmdrTaco will never make NY Times Editor. by Anonymous Coward · · Score: 0


    The Grey Old Lady will never see the likes of
    the inhabitants of the Little Green Whelp.

  4. Good god, that is lame by Anonymous Coward · · Score: 0

    I've heard of picky, but this is crazy. If it took you that long to figure it out, then it's a pretty obscure fact and not that *that* unreasonable that they missed it. Everything is a Microsoft conspiracy, right?

    Chill!

  5. Impartial? ...huh... by Anonymous Coward · · Score: 0
    Dude, if you want unbiased editorial content, go somewhere else. I read Slashdot partialy because they post some cool opinions and stuff. I dig reading it, they seem to like posting it. If you don't... bye.

    Some other news sites that you may like:

    Oh, and since when is Slashdot a news portal? Slashdot is News for Nerds. Stuff that Matters. Don't try to kludge that into some mainstream media website classification... jeeze..

  6. we don't need such a program by Anonymous Coward · · Score: 0

    When a program claims to follow some standards and it doesn't, I call this is BUG.

    What this programmer had to do was sending MS a bug-report and download next week the corrected
    versions of excell, powerpoint, ie, etc... from the MS website.

  7. There are more bugs in text documents! by Anonymous Coward · · Score: 0

    And here's another BIG BUG in MS-Software, which not only renders text unreadable, but also even changes it's semantic content/meaning!!!

    If you ever got an e-mail in german language from a user using Outlook, and wondered, why he didn't use umlauts where he should have to, here's the reason:

    Outlook per default substitutes Umlauts and other characters with "ascii" characters deliberately.

    So ä (ä) becomes a, and ü (ü) becomes u.

    You may wonder why we read "Diese Fehler bereiten mir viel Arger" (=worse) instead of "...viel Ärger" (=annoyance); and a "Bär" (=bear) is not a "Bar" (=pub); and I prefer using a "Mull" (=gauze) to bandage a wound instead of "Müll" (=garbage).

    Yeah, I also noticed the starting and ending "double quotes", Attachments which don't use MIME and lot of other BUGS which made reading e-mails and webpages produced by MS-products impossible. Often these errors are introduced simply by stupid default configurations.

    PLEASE CALL NON-STANDARD PRODUCTS ILLEGAL, so we will get rid of this MS-crap.

    :-)
    Markus

  8. Just now figuring it out? by Anonymous Coward · · Score: 0
    G'z, I've been complaining about this for at least a year, if not two. It's why I boycott abcnews.com and The Economist's web site, among others. The pages say they're HTML but they're not--they're an HTML-like text that can be viewed correctly only on a Microsoft platform.

    I've written to the webmaster of every site I find doing this, but so far they've all ignored me. Nobody cares about standards while BillyG is in the world.

  9. This is well known by Anonymous Coward · · Score: 0

    and has been for ages.

    Sheesh.

    Seems that /. is descending into the pits of crappiness.

  10. Not partisan enough... by Anonymous Coward · · Score: 0


    for me anyway. I have a very personal bone to pick with MS. I suspect a whole lot of others do to.

  11. Nonsense by Anonymous Coward · · Score: 0

    > Well, after this I don't think you guys should claim impartial editorial content.

    What Microsoft is doing in this case amounts to either gratuitous or clueless incompatibility and that's a fact. The author simply calls them on it. Even if those had been a /. editor's words: they're well-deserved.

    I would expect the same regardless of the source of the garbage.

  12. Hmm! by Anonymous Coward · · Score: 0

    How about a proxy to filter and fix content as you browse?

  13. A Related URL by Anonymous Coward · · Score: 0

    See also: "On the use of some MS Windows characters in HTML" at http://www.hut.fi/u/jkorpela/www/windows-chars.htm l

  14. time to wipe out windows from my hard drive by Anonymous Coward · · Score: 0

    I like to give Microsoft the benefit of the doubt, but this is just the piece of news to push me over the edge and wipe out the Windows partition of my dual boot Linux machine. When will Microsoft learn to compete on the terms of trying to make good quality software and not by dirty tricks.?

  15. How about a demoronizer for MS Access? by Anonymous Coward · · Score: 0

    I've been inflicted with an MS Access database where the person who originally set it up used the table-driven mechanism to write queries.

    If you've ever seen the SQL this generates, you know my horror. A simple select of a couple rows that should be a couple easily understandable lines turns into 6-12 lines of gibberish with heavy use of "Inner" "Outer" "Right" and "Left" joins.

    Worst off, if you write a query in SQL and decide to view it in the table you're in danger of your simple small query turning into one of these monsters -- and it often does not work the same again! :-(

    Alongton@clark.net

  16. Wait! I can do this with active-X!! by Anonymous Coward · · Score: 0
    Hold on, why use perl. You can do this with a $99 Active-X gizmo that only works on IIS!

    http://www.cnn.com/TECH/computing/9902/05/empty.id g/

    -- cary

  17. Partial solution for Courier by Anonymous Coward · · Score: 0

    Go into Netscape preferences and change the Fixed Width Font from courier to something less objectionable.

    Now, if someone wants to tell *me* how to get a broader list of fonts there, I'd appreciate it. Right now I have about six choices. The least objectionable is Fixed(misc) which isn't bad but isn't all that great either.

  18. HoohoooO! by Anonymous Coward · · Score: 0

    You're joking... tell me you're joking...

  19. absolutely correct by Anonymous Coward · · Score: 0

    and i wish they died slowly and painfully, which, of course, will happen anyway as access to information being less and less controlled.

  20. Dammit! by Anonymous Coward · · Score: 0

    Now that's the OSS spirit, working together to improve the end-user experience. NOT..

  21. This is well known by Anonymous Coward · · Score: 0

    Ya I was thinking the same thing. I had found the demoronizer a LONG time ago. But it seems (judging from the comments) that a lot of people didn't know about it, so it seems like a Good Thing to report.

  22. HTML 4.0 has all the characters you want by Anonymous Coward · · Score: 0

    See http://www.htmlhelp.com/reference/html40/entities/ special.html

    HTML 4.0 includes characters like the em dash and double quotation mark. But unfortunately most browsers don't support them. (BTW, IE4 is better than Netscape 4 when it comes to support for these HTML 4.0 entities.)

  23. it removes </p> by Anonymous Coward · · Score: 0
    last time i checked,

    was a required counterpart for

    .

  24. Send email to webmaster by Anonymous Coward · · Score: 0

    and ask them to fix those ? and post their reply.

  25. It?s not Microsoft?s software?s problem by Anonymous Coward · · Score: 0

    It is?nt microsoft?s fault, it?s that my keyboard?s messed up. I?ll see you guys at the plane?arium.

    --root

  26. Computers that _don't_ run Windows? Go awaaay... by Anonymous Coward · · Score: 0

    A proxy isn't neccessary. Because of the was X handles font names, you can have aliases. Like you can specify "fixed" or "variable" as a font name, you can alias, say Arial to Helvetica and Verdana to Lucida. This is what I did. You can get this fonts.alias file and append it to your /usr/X11R6/lib/X11/fonts/75dpi/fonts.alias file. Just restart X and you have "new" fonts Arial and Verdana mapped to Helvetica and Lucida

  27. True but you shouldn't need them by Anonymous Coward · · Score: 0

    They are defined in the character set. The character set of HTML 4.0 is Unicode.

  28. Apple does this, as well as Microsoft by Anonymous Coward · · Score: 0

    The smart quote problem is because smart quotes aren't ASCII characters. Characters beyond the basic ASCII 128 character set have different numbers on different OS's.

  29. use a text editor by Anonymous Coward · · Score: 0

    Simple solution.

    Hand code it yourself w/ vi, pico, notepad, whatever.



  30. MS Fixed This in Office SR-1 by Anonymous Coward · · Score: 0

    The standard MS-Office HTML converter is indeed horrible.

    MS fixed many of the bugs that the 'demoroniser' [sic] aims to correct in its "Service Release 1" patch for Office 97. Most importantly, with this patch, Office:

    • properly escapes characters not in the ISO-8859-1 character set
    • correctly replaces special symbols their proper html equivalents (e.g. $lt;, $gt;, $amp;, $quot;, etc.)

    After installing this patch, I tried to make Word create a html file that the demoroniser would correct. Aside from whitespace changes (changing CR/LF pairs into LF and word-wrapping), I couldn't get the demoroniser to make any improvements to the Word-generated html. I didn't poke around the demoroniser source, though, so I might have missed something.

    You can download Office SR-1 for free from http://officeupdate.microsoft.com/Articles/sr1howt oget.htm. It's worth mentioning that it not only fixes bugs with the html converter, but a number of other really nasty office bugs as well.

    You might also want to download Office SR-2, from http://officeupdate.microsoft.com/articles/sr2fact .htm. Of course, in a less-than-brilliant move by MS, you need to download and install SR-1 before you can install SR-2.

    Of course, even with the patch, the HTML generator of office (while now compliant) is still pretty brain-dead.

    Your Helpful Anonymous Coward

  31. Slashdot is guilty as well by Anonymous Coward · · Score: 0

    I've seen several articles on /. with the ?'s. How about fixing things at home first, eh?

  32. Isn't this a little partisan? by Anonymous Coward · · Score: 0

    I agree that the comment sounds very biased, but personally, I'll make my own mind up, I don't need Sengan or anyone else to do that for me. I suspect a lot of people here will do the same.

    Cheers

    AndyM

  33. Character set != character encoding by Anonymous Coward · · Score: 0

    You're confusing the concepts of character set and character encoding. In HTML 4.0, the character set is fixed--it is always Unicode. The character encoding defines how the bytes transmitted should be converted to Unicode characters.

    So if you use ISO-8859-1 as your encoding, then you can only represent an em dash with the HTML 4.0 entity or a numeric character reference (ampersand-#). If you use UTF-8 as your encoding, you can represent any Unicode character without the need for entities or numeric character references.

    The following document is worth reading for understanding some of the issues:

    http://www.hut.fi/u/jkorpela/chars.html

  34. some facts (I know you're not used to those here) by Anonymous Coward · · Score: 0


    It's amazing to me how even labelling of charsets turns into a Microsoft flame fest here every time! Don't you people ever get tired of gazing into your own navels?

    For the record, this other character set that Microsoft uses (that is a superset of Latin1) is, in fact, an ISO-registered character set. It is properly known as ISO-8859-1-Windows-3.1-Latin-1.

    In this particular case, Apple's poor decisions have been more of a problem than Microsoft's, because the character set that Macintosh tools use (MacRoman, which is not ISO-registered) is not even a superset of Latin1 (ISO-8859/1.)

    If you're interested in facts, you can see an article I posted to comp.infosystems.* about this way back in 1997.

    If you were really interested in doing something productive, you could contribute to Mozilla so that future web browsers could (as the old saying goes) be lenient in what they accept, and in a cross-platform way.

    Or, you could just continue your endless whining here, feeling superior without actually doing anything, and wait for someone else to fix it all for you.

    -- jwz

  35. Not partisan enough... by Anonymous Coward · · Score: 0

    Well said!

    Don't give those b*stards one cent of your money.

    I want to see them GONE.

  36. Do you bother to read before you post? by Anonymous Coward · · Score: 0

    I've known about this for a long time, and I'm sure you have too, which is why I never would have even thought to post it to /.

    But if you'd have even bothered to read even one or two of the other replies before you posted, you might have noticed that for many people this was new.

    Or are you just trying to show everyone how smart you are that you already knew this?

  37. We still have a question to answer... by Anonymous Coward · · Score: 0

    Does this StupidQuotes thingy turn Jon's 1s into ls also?

  38. There are more bugs in text documents! by Anonymous Coward · · Score: 0

    That's MIME encoding, you idiot. An equals sign at the end of a line in MIME encoding means a soft break.

    Try learning a bit about Internet standards and how they work.

    If your mail reader can't handle MIME, then you're screwed. BUT THIS IS THE STANDARD.

    By default, MS uses MIME encoding. Umlauts, etc, don't get removed. They're kept.

    *IFF* you switch to ASCII encoding instead of MIME encoding, it'll strip things like umlauts.

    Learn to use the tools. You people don't have problems with .config files, but you can't drive a fucking GUI based too?

  39. use a text editor by Anonymous Coward · · Score: 0

    Yeah, and then throw one of these on the bottom of the page...

    http://www2.gvsu.edu/~adamscr/picts/pico2.gif

    (Not my site, but I can't remember where my copy of it went...)

  40. Bill stole our web. by Anonymous Coward · · Score: 0

    I do think, that instead of following MS, we need to attack them.
    We all use the MS compatible standards because everyone else does,
    and that's why we're losing the internet to them.

    [snip]

    Standard did not exist for Microsoft to ruin,
    but for us to use.

    --------

    Learn a little fucking history - HTML incompatibilities started with NETSCAPE. At least Microsoft participates in the W3C. Remember the layer tag?

    What gets me is that all you fucking idiots would start screaming in *JUST* the same way if MS made its tools use UNICODE for all webpages.

  41. Excuse me, FUDmeister? by Anonymous Coward · · Score: 0

    You should check your history.

    NS started the incompatibilities - beginning with the tag.

    Before then, there was Arena and Mosaic. Both of which were HTML standard browsers. It doesn't matter whether IE was around or not; when Netscape 1.0 came out, the W3C already existed.

    And Netscape wasn't following the HTML 1.0 spec.

    Netscape doesn't support W3C standards - this only changed with the Mozilla project, and is STILL a work in progress. Until Gecko comes out in fully-fledged, download it as a non-developer user, Netscape's browsers are highly incompatible - and always have been.

    When Gecko comes out, this will no longer be the case - thank the goddess.

    Oh, and re: unicode: are you telling me that if I telnet into my Linux box, and use Lynx to view a UNICODE web page, it'll render correctly?

    Yeah, right, pull the other one. Or provide a link to some proof.

  42. More mal-rendered or mal-encoded HTML... by Anonymous Coward · · Score: 0

    An aside - there used to be a problem with older
    netscapes that some links to cgi's wouldnt work,
    because they looked like:
    http://wherever/a.cgi?foo=bar(ampersand)quotatio n=marx

    The 'quotation' would cause it to render like
    foo=bar"ation=marx
    so clicking on the link wouldnt work.

    Most frustrating. I agree with the
    original argument though, the problem is that
    MS HTML generators are stating that they use
    ISO-8859-1 when they don't - there is no problem
    with the smart quotes per se.

    -Baz

  43. A story from real life... by Anonymous Coward · · Score: 0

    Sounds more like you have a lazy admin than a problem with IIS. Admittedly, IIS is full of bugs, but this ain't one of them. It'll quite happily accept the default page being anything you care to type into the admin console. default.htm, index.html or whatever you like :)

    A.

  44. Accept only correct HTML on your WebServer! by Patrik+Nordebo · · Score: 1

    Do it as a cron job that checks all new files to see if they are correct HTML and sends mail to the owner if they're not. Or do it in the web server.

  45. Yes! by Matrix · · Score: 1

    If those ugly sites start looking decent, I will be very happy.. The question marks make me stop reading when it's more than a page, because it's such a pain to interpret what they should be.

  46. Good god, that is lame by drwiii · · Score: 1
    Everything is a Microsoft conspiracy, right?

    Yes, including your post.

  47. No quotes??? by chuck · · Score: 1

    Check ``this'' out. I don't know what it looks like on your screen, but on mine (Netscape 4.05 on Linux) it looks pretty ``nice.''

  48. Netscape and IE bug only. by Trepidity · · Score: 1

    Opera 3.5 here, running on Win95, displays the quotes correctly according to the standard as ? while Netscape and IE both display them as ' (incorrectly, though how the author of the page intended them to be seen). So it can't be merely Windows's fault if Opera on Windows displays the pages according to the standards.

    Perhaps somebody needs to fix this in mozilla.

  49. That's not quite what the problem is by J.+J.+Ramsey · · Score: 1

    The problem is that when the application is turning the document into HTML, it doesn't turn the 'smart quotes' into either the appropriate ASCII character or the appropriate HTML enties for left and right quotes. In short, the filter for saving to HTML is badly designed.

  50. What happened to Jon Katz... by gavinhall · · Score: 0

    Posted by The ULTIMATE Crippler:

    ...well it could be that or he really is dumber than a bag of dirt.

  51. we don't need such a program by gavinhall · · Score: 1

    Posted by Jeremy Witt:

    I remember reading an article a while back where Gates told the reporter that MS wasn't in the business of fixing bugs and handling bug reports..

    "User's don't want bug fixes, they want new features"

    This could be a hoax and all, but it seems valid enough given their BugFix Track record.

    My Point is: Post your bug report, but download demoronizer anyway, because it's gonna be quite a wait.

    JWitt

  52. MS post a patch in a week????!! by shaldannon · · Score: 1

    Good luck! Every time someone has posted a bug report on Windows or Internet Explorer to the Microsoft web site, Microsoft has tried it's usual strategy of

    1. Ignore it, hoping it goes away

    2. Deny the existence of said bug

    3. Promise a patch in the next release

    4. Maybe fix, maybe not; definitely introduce new bugs



    # find /dev/brain
    find: cannot open /dev/brain: No such file or directory

    --


    What is your Slash Rating?
  53. Cold hard facts make MS look bad. by DunbarTheInept · · Score: 1

    When you have a cold hard fact that makes MS look bad, is it 'partasinship' or 'anti-Microsoft FUD' to point this out? Hell no. To ignore it would be partasinship in favor of MS. To point it out is neutral honesty. The fact that neutral honesty usually makes MS look bad is MS's fault, not ours.

    --

    Don't label something "offtopic" unless you know the topic well enough to tell what's on topic.

  54. Bad examples. by DunbarTheInept · · Score: 1
    In general, you have one good point: that informing users of alternatives politely is smarter and more effective in the long run than belittling them.

    But, you belittle slashdotters in so saying, by assuming that we are idiots who don't already know this simple fact. Thus you are not following your own advice.

    And your examples were terrible. ODBC?! Puh-lease! ODBC is very hard to use compared to the SQL it is trying to supplant. It is *not* an example of an improvement.

    And why on earth do you assume that a world in which the linux machines are only on the server and the clients are all Windows is a good goal? This is not a worthy goal. We can already do that today. Allowing Linux on the clients is a good goal. Making MS coexist with Linux is also an impossible goal unless we can raise awareness of MS's incompatabilities so the public no longer tolarates them. Getting Windows to play nice with other platforms is essentially what this article was about, if you will recall.

    In an ideal world, the choice of what OS to have on the desktop would *not* have to be dictated from on high just to get stuff to be compatable. I have no problem with systems where the users *can* use Windows and most end up choosing to use Windows. I have a problem with systems where the users have no choice but to use Windows in order to work with their Windows-using co-workers. (This is also why I don't like the propsed network for the ISS - allowing Windows is one thing, but forcing it on people for artificial reasons is another entirely.)

    --

    Don't label something "offtopic" unless you know the topic well enough to tell what's on topic.

  55. Isn't this a little partisan? by pingouin · · Score: 1
    Yeah, it's partisan, but a little commentary adds some color. I wouldn't want /. to become too bland in its intros - there's plenty enough blandness out there as it is.

    Having said that, maybe RandySC's bon mots are a wee bit OTT, but if you've experienced any of today's Threads From Hell (or read the non-OTT part of his submission), maybe you can excuse someone's MS-Disgust-o-Meter® running in the red.

    Disclaimer: I'm asleep.


    Rants Are Optional

    --

    --

    --
    =8^

  56. It looks awful... by Craig · · Score: 1
    ... under Linux & Netscape 4.08, default font either (TT) Verdana or (Adobe) Helvetica, 1280x1024. (TT font support for X at http://airnet.net/craig/linux and links from there) --

    Craig

  57. Blame W3C, too... by Craig · · Score: 2
    OK, Microsoft should be condemned for their HTML-crunching products for using characters in that numeric range. (They should also be condemned for hard-coding absolute font sizes instead of +1, -2 etc.)

    But the HTML spec has a glaring lack that motivates this violation in the first place: no curved quotes and apostrophes, and no em-dash.

    Now, HTML is supposed to display by default in a proportional font, like printed matter (it's easier to read, among other advantages). But proportional fonts always use curved, symmetric double and single quotes.

    Likewise proportional fonts always distinguish between a hyphen and a dash; most, in fact, have two dashes (the endash and the emdash) of slightly different widths, in addition to the hyphen.

    But the HTML spec (and ISO8859-1) assumes the broken ASCII/Typewriter usage, which in proportional fonts is jarring and ugly. Font specs should be designed by people who know something about fonts, not by engineers!

    The situation is potentially worse in other languages, though I'm not sure how the other ISO-8859-x specs handle it. In German, for example, the opening double and single quotes are traditionally at the bottom of the print line rather than the top, in addition to being reverse-curved, and French uses "guillaumettes", which look like doubled marks.

    Search the Web for things like ampersand-emdash-semicolon and ampersand-lquot-semicolon -- which are attempts to address the problem -- and you'll see that this gaping mistake in HTML/ISO8859-1 bothers a lot of people.

    So yeah, blame Microsoft for a kluge that works on only maybe four out of five of the web-surfing PCs out there. But complain to the ISO and to W3C for their oversight, too.

    Craig

  58. Good god, that is lame by ninjaz · · Score: 1
    I've heard of picky, but this is crazy. If it took you that long to figure it out, then it's a pretty obscure fact and not that *that* unreasonable that they missed it. Everything is a Microsoft conspiracy, right?
    Yeah, just like it was a crazy little oversight when IBM created EBCDIC as an attempt to decommotidize the character set. Since the "smart quotes" are listed as a feature, it's painfully obvious they were purposely breaking standards. Besides, why give any benefit of the doubt to someone who repeatedly does this type of thing. "Cross me once, shame on you, cross me twice, shame on me" comes to mind..
  59. The Windows-1004 codepage -- not on Linux either by Robin+Hood · · Score: 1

    And on my Linux box, running Netscape Communicator 4.5, they show up as accented vowels -- the old, familiar extended-ASCII character set that I remember from my DOS days.
    -----

    --
    The real meaning of the GNU GPL:
    "The Source will be with you... Always."
  60. Not true at all by Matts · · Score: 1

    I'm afraid you don't know what you're talking about. HTML 4 is internationalised. The character encoding is specifiable by either the web server, the script that produces the code, or in a META tag. That means you can specify utf-8 or utf-16 or ISO-8859-2 or whatever. utf-8 and 16 contain those characters, so it wouldn't be a problem at all for HTML 4 compliant browsers. Both Netscape 4 and IE 4 support this. Of course, whether or not you have a full unicode font or not is another matter.
    --

    --

    Matt. Want XML + Apache + Stylesheets? Get AxKit.
  61. Poor M$ HTML!=Perl by Daniel · · Score: 1

    Yeah, but this means I can make my sister's Web pages (which she creates in Windows and stores on my computer) look nice. :-)

    Daniel

    --
    Hurry up and jump on the individualist bandwagon!
  62. Convert to a mod_perl script, and use it on /. by fizbin · · Score: 1

    Actually, this seems like a starting point for a full-fledged HTML cleaner.

    Now, most of the HTML produced by Rob's slash scripts is pretty good, but there are a few complaints I have (basically, most of the page winds up on one bigass-long line, making debugging the HTML code (as when I was trying to figure out which Netscape bug the "ask slashdot" header triggers) annoying in the extreme) that this seems to clear up - it doesn't just go through and convert windows-specific quotation characters to standards-based equivalents; among other things, it will wrap HTML lines so that you can read the code later.

    Actually, if some enhancements were made to this (say, applying a standard sorting to text-level markup and eliminating consecutive open/close tags), it could be used as a rather nice HTML cleaner; for example, getting something that turns:
    This is a test
    into:
    This is a test
    Or even something as awful as:
    This is a test
    would be very nice - I've noticed many odd tag-ordering problems in all sorts of auto-generated HTML. Turning crufty HTML into something that renders identically but will pass a weblint test would be very nice indeed.

  63. No it doesn't. by fizbin · · Score: 1
    I suppose I can't blame you for thinking that it does - "Nugatory" is not a word in my vocabulary either - but it only removes

    tags that are used as follows:

    <P><UL>This text is in a paragraph.
    </UL></P>

    becomes:

    <p>This text is in a paragraph.

    It leaves other occurences of </P> alone. (well, it also removes empty <P></P> paragraphs entirely) I agree, though, that it probably shouldn't be removing the </P> in this case; as other people have pointed out, it's explicitly not required in HTML 4.0 or 3.2, but I think that in the 3.0 draft (that all the browser vendors rejected because they thought it was too hard - such a shame too, because I was really looking forward to MathML) you are correct. Even if it was never required, </P> is a good idea.

    Also, after looking at the actual script, I take back my comment about it being a good starting point for a general HTML-cleaner. Such an HTML cleaner is still a good idea, but it should probably be written from scratch.
  64. Isn't this a little partisan? by maynard · · Score: 1

    Well, after this I don't think you guys should claim impartial editorial content. ^o While I realize Sengen didn't write this, he certainly published it.

    " A little detective work revealed that, as is usually the case when you encounter something shoddy in the vicinity of a computer, Microsoft incompetence and gratuitous incompatibility were to blame. "

    True or not I think this may be pushing an anti MS viewpoint too far. Nor that I support MS. Yeah, I mostly agree with the DOJ view and consider MS has legally stepped over the line. But a news portal ought to be a little more careful about presenting fair and consistent views from the editorial staff, and they ought to be as impartial as possible. This was nowhere near that goal.

  65. More mal-rendered or mal-encoded HTML... by Mawbid · · Score: 1

    Some sites use entities without the semicolon, like this:

    Barnes &amp Noble

    instead of

    Barnes &amp; Noble

    which seems to work ok, until you remove the space following the entity:

    Barnes&ampNoble

    is sometimes rendered as

    Barnes&ampNoble

    Who is to blame in this case?

    PS. It sure is fun to describe HTML in HTML, especially when the slashdot code enters into the picture.

    PPS: The following is encoded as Barnes&ampNoble. What does your browser do with it?

    Barnes&ampNoble


    --

    --
    Fuck the system? Nah, you might catch something.
  66. Ohhh, I've wondered about this for a while by MinusOne · · Score: 1

    I'm glad someone finally figured this out. One of my favorite pages suffers from this problem, but I never told the author, because I had no idea if it was my browser of her page. Now I know, and I can tell her how to fix it! Thanks!

  67. I've been wondering what the damn ? thing is. by fiid · · Score: 1

    I keep seeing this damned phenomenon all over the net.

    I should have known Micro$loth would have something to do with it.

    --
    Fiid - Ryhmes with Squid. Software Engineer
  68. Broken by Mindphunk · · Score: 1

    The story of OSS, always new versions to fix deliberate windows incompatibilities.

    New filesystems, new encrypted passwords, new efforts to block *nix dns servers, incompatible fonts, office 97 file formats.

    Linux would be less a threat to MS if they acted like part of the community instead of trying to grind it into the mud methinks.

  69. distinguishing between the sinner and the sin by sdw · · Score: 1

    ODBC is just about the worst and most proprietary way to accomplish the task it tries to solve. It's idiotic and has cost the industry a ridiculous amount of money and time needlessly.

    The Problem it is trying to solve: standard access to multiple databases so that a program can be configured for any particular database needed.

    The MS solution: Design only the library interface and the SQL administration level and require custom, database specific drivers for each database a program has to talk to. Absolutely avoid creating a standard protocol since that would mean that anyone could write a non-proprietary driver. Charge money for ODBC drivers and encourage database vendors to do the same.

    PLEASE! ARGHHHH.....

    We are finally seeing this madness come to an end slowly, but it has been a nightmare! How hard do you have to make databases??? Login, send a query, get results and status messages, bind data to variables, close.

    Message Oriented (MOM) or Message Passing (MPI, Voyager, etc.) with XML requests and results will soon obliterate the stupidity... I hope!

    sdw

    --
    Stephen D. Williams
  70. Prefer it in Mozilla... by tilly · · Score: 1

    If you see those quotes, clean them up, and pop up a warning.

    On any platform.

    (However if you try to use the locale settings on Microsoft platforms, I bet that the quotes in question are pushed on you by default. So it may not be so simple..)

    Regards,
    Ben Tilly

    --
    My usual seat in the cluetrain is at A HREF="http://pub4.ezboard.com/biwethey.ht
  71. Bad journalism by Nelson+Minar · · Score: 1

    The fun thing about this bug in Microsoft HTML production is it damages the quality of news sites like MSNBC. It's a lot harder to read a story when you can't tell what parts of text are quotes, and what parts are the author's own text. It was worse in Netscape 3.0 - it just displayed a space, not a question mark.

    I like the idea of an Apache module that fixes things on the way out.

  72. distinguishing between the sinner and the sin by Woodie · · Score: 1

    ODBC -

    I think a lot of people here are confusing ODBC with SQL. ODBC isn't SQL. ODBC is a layer that allows you to talk with a variety of data sources - SQL or not. You can use it access a variety of data sources - Oracle, Sybase, Informix, SQL-Server, Access, FoxPro, etc. All of which provide for very different protocols, and flavors of SQL.

    In the end, it provides you with an environment where you can be less concerned about the protocol and how you've linked to the data source - and less concerned about how you stucture your queries (in that they are not custom coded for it).

    Sure, you take a performance hit. But, it works pretty good, and allows you to quickly and easily change the data source on the back end of your application. Makes it nice for prototyping an App using a local db schema defined in Access, and then for roll out actually pointing at a db in Oracle, or SQL server...

    And - there are some ODBC drivers that are freely available.

    - Porter

  73. Ever hear of "embrace and extend"? by dido · · Score: 1

    It's a classic example of the embrace and extend tactic Microsoft just loves to use. I suppose this is what "de-commoditizing protocols and services" in Halloween I means. They take a simple standard like HTML and try to add their own incompatible extensions to it, using their OS monopoly as leverage so the rest of the world has to bend over. They tried to do this to Java and now they're trying to do it to HTML. Gates really is brilliant, thinking up these new ways to screw the rest of the computing world over for a quick buck. And to hell with innovation and progress.
    --

    --
    Qu'on me donne six lignes écrites de la main du plus honnête homme, j'y trouverai de quoi le faire pendre.
  74. Bill stole our web. by scrytch · · Score: 1

    Something such as a "warning: non standard page" on every non standrd page should be required by law.

    God knows the net sure could use more regulation and laws, right?
    --
    I've finally had it: until slashdot gets article moderation, I am not coming back.
  75. </p> is not required by stevenj · · Score: 1

    Actually </p> is (and always has been) optional. The HTML 4.0 spec says:

    You can omit the end tag, which is then implied by the next block-level start tag. It is also implied by the end tag of the element that encloses the P element.
    --
    If a thing is not diminished by being shared, it is not rightly owned if it is only owned & not shared. S. Augustine
  76. Isn't this a little partisan? by orabidoo · · Score: 1

    it's partisan, but it's also quite accurate... otherwise there wouldnt be such a thing as the demoronizer!

  77. Isn't this a little partisan? by Syberghost · · Score: 1

    But a news portal ought to be a little more careful about presenting fair and consistent views from the editorial staff, and they ought to be as impartial as possible.

    Screw news; Slashdot is entertainment.

  78. How bout a version for apache's mod_proxy by Ex+Machina · · Score: 1

    So it will fix itself! God how tacky! grr.


    Ex Machina "From the Machine"
    xm@GeekMafia.dynip.com [http://GeekMafia.dynip.com/]

  79. email LAME sites by alenp · · Score: 1

    From now on, ALWAYS send email to the lame pages, pointing them to the demoronizer. They should thank us for fixing their pages.

  80. New Term -- "Microsoft-Enhanced Superior Standard" by TrentC · · Score: 1

    We should coin a new term for Microsoft other product breaking enhancements. Something resembling Snafu would be nice.. maybe MSSE -- Microsoft Standard Enhancement or something to that effect.

    How about MESS?

    (Microsoft-Enhanced Superior Standard)

    Jay (=

  81. Good god, that is lame by dxkelly · · Score: 1

    How do you miss using reserved characters for your own use?

  82. Isn't this a little partisan? by FreeUser · · Score: 1

    Partisan? Perhaps.

    But absolutely accurate.

    If the anti-microsoft bias of slashdot annoys you, go somewhere else. Pro-Microsoft news outlets may become more difficult to find as MS's behavior becomes more flagrant and public ("you reap what you sow, Billy Boy"), but right now they're a dime a dozen and would love to add your mouse clicks to their statistics.

    --
    The Future of Human Evolution: Autonomy
  83. distinguishing between the sinner and the sin by rdsmith · · Score: 1
    Hmm, a well written, thoughtful post... could I possibly be on the wrong site?

    I would rather raise awareness of the limitations of Microsoft stuff even as I help people work with it. It's on their desks, they need to do work, and they need to exchange stuff with other people.

    This is probably one of the larger points that most people seem to be overlooking. I'm currently contracting with a nationwide bank, that has over 30,000 employees. All evil-MSism's aside, the TCO (total cost of ownership (support, installation, training, NOT merely the cost of the software)) would be staggering. You cannot merely absorb such a cost because of the Microsoft bad, Linux good mentality.

    Phred is taking the proper approach. Informing people, showing them that there are better, cheaper alternatives. Contrary to popular belief, people aren't all lemmings. Show them, don't berate them, they will learn and they will become better for it.

  84. Ah yes, the point of the article... by rdsmith · · Score: 1
    The point of the article is not that users can successfully use Front Page to create web pages, but Microsoft's campaign to hijack open standards.

    Well, if you had bothered to read all of the subsequent posts as the person you refer to had, you would have realised that the discussion had, as per the norm here on /., turned into a MS-bashing and was no longer referring to the original topic.

    You are just a weak imitation of your evil twin in that campaign - Jackass

    This is too easy... but I will say that the simpleminded, when unable to defend an idea with intelligent conversation resort to such tactics thus undermining any respect that they would have garnered.

  85. Dammit! by mattc · · Score: 1

    You can still tell because their file extensions will be .htm, right?

  86. Isn't this a little partisan? by afc · · Score: 1

    As you have observed, these are not comments posted by Sengan (or whoever posted it) but the opinions of John Walker, the owner of Fourmilab.
    He of course doesn't have to impartial, PC or place whatever other form of blandness constraint upon the writings he publishes on his own Web site, don't you agree?

    Regardless, this feature of Microsoftware is indeed brain damaged and gratuitously incompatible with the existing standards.

    --
    Information wants to be beer, or something like that.
  87. The Windows-1004 codepage. by jerodd · · Score: 1
    This codepage is also known as Windows-1004. I used it on my system because so many people send me email in Windows-1004 encoding, which mangles all the commas and quotes. (Actually, its set of quote symbols are something for which there is a need; while `` and '' are fine when using a proportional font, `` and '' don't like so nice when using a monospaced font.) I believe that Aldus came up with this scheme along with Microsoft for their Pagemaker for WIndows product.

    Without further ado, here it is. If your system is using the Windows-1004 codepage, you should see the actual symbols. Otherwise, you'll see garbage.

    é 130 - slightly shortened comma
    â 131 - forte (looks like an italic `f')
    ä 132 - double comma (like double quote; like `,,')
    à 133 - elipsis [spelling?] (like `...')
    å 134 - dagger (like a cross or a `t')
    ç 135 - double dagger (has two crosses in it)
    ê 136 - circumflex or caret (like `^')
    ë 137 - perthousand (like a `%' but two circles on bottom)
    è 138 - uppercase sh (an `S' with an upside down `^' on top of it)
    ï 139 - less than (like `î 140 - uppercase osh (ligature of `O' and `E'; looks like `OE')
    ì 141 - unused/unknown
    Ä 142 - uppercase zh (a 'Z' with an upside down `^' on top of it)
    Å 143 - unused/unknown
    É 144 - unused/unknown
    æ 145 - open single quote (like a ` on Unix systems)
    Æ 146 - close single quote (like a ' on Unix systems)
    ô 147 - open double quote (like a `` on Unix systems)
    ö 148 - close double quote (like a '' on Unix systems)
    ò 149 - dot; reminds me of 007 on IBM codepage 437 and friends@

    --
    --jon. Postel is dead. May we all mourn his, and our, loss.
  88. ODBC is good? by jerodd · · Score: 1
    I have had to use ODBC in the past, and I found it quite inferior to more mature and sophisticated methods like static SQL or even dynamic SQL. It is just so much easier to write EXEC SQL SELECT * FROM MYTABLE than have to wrestle with the ODBC commands to do the same thing. In addition, I've found that ODBC programs are significantly slower than a bound static SQL program.

    On the other hand, ODBC does adhere to the SQL standard in terms of the actual query language, which is nice for a change. I would have expected Microsoft to use something like FoxPro's old query language.

    That said, I do not like ODBC.

    --
    --jon. Postel is dead. May we all mourn his, and our, loss.
  89. Even if you don't want to use xfstt... by jerodd · · Score: 1
    You can still get pages that insist on using Arial to render correctly by editing your fonts.dir file (usually in /usr/lib/X11/fonts/75dpi) and simply taking a font you like (such as Century Schoolbook or Helvetica), duplicating those lines, and changing the name to Arial.

    xfstt also works nicely, and on Linux, it doesn't use very much memory, because it uses memory-mapped files efficiently.

    My port of xfstt to OS/2 (which you can get from hobbes.nmsu.edu when my website is down) isn't as efficient because OS/2 has no easy mmap(4) facility.

    TrueType fonts render so much more nicely than Adobe Type 1 fonts or Speedo fonts at low resolutions. Baskerville 8 point actually looks nice when done with TrueType. Of course, you only get this benefit if your TrueType fonts have hints in them--in other words, using a tool to convert from Type 1 to TrueType won't help. You can read more about ``hints'' on Microsoft's website; search for ``fonts''.

    --
    --jon. Postel is dead. May we all mourn his, and our, loss.
  90. OxymoronsI a little by CopiceC · · Score: 1

    I don't think anyone should claim impartial editorial opinion. Its an oxymoron, like "unbiased opinion". In fact its a double oxymoron. "Impartial" with either "editorial" or "opinion" forms an oxymoron.

  91. Woolly character set control by CopiceC · · Score: 1

    Those of us who look at sites in languages other than English have far more annoyance with character sets. A large percentage of non-English pages don't specify the character set, so you have to keep telling the browser how to decode the pages. Don't the site designers ever browse their own sites? Maybe they only ever browse their own site, and have their default character set set to the appropriate value. Its just one more example of sloppiness, that makes the Web a pain to browse, and a far from idiot-proof medium.

  92. distinguishing between the sinner and the sin by phred · · Score: 1

    I don't belittle people who use Microsoft stuff for the Web, or otherwise for that mater. I'm
    basically a pretty satisfied user of NT 3.51 and only run 4.0 because it's needed for a lot of the
    new apps. But *I* am the boss of my machine, not Microsoft.

    What I observe is that people feel obligated to stick with Microsoft and the whole range of their software. They kind of know what they'll get --
    a lot of fairly complete but complicated pieces that are difficult to manage and offer far more functionality in a "flat" undifferentiated space
    than they really know what to do with.

    They then start to discover all the little gotchas, like Microsoft's shall we say cavalier attitude about the TCP/IP stack, their sincere but misguided attempt to steer HTML and so on.

    Actually, there is one thing Microsoft has done that I have few qualms with, and that is ODBC. Sure it's slow and not well documented. It's almost like Microsoft didn't really want it to be accepted so that they could clear the way for their various lunges at "distributed" "object" management (ho ho). But ODBC was designed by some good people at Redmond who played nice with the other kids, paid attention to existing standards and user preferences, and produced something that has joined CGI and Perl as a true Web standard.

    Sure, this is like Sun sponsoring Tcl. A pretty good analogy, that.

    I would rather raise awareness of the limitations of Microsoft stuff even as I help people work with it. It's on their desks, they need to do work, and they need to exchange stuff with other people.

    I just let them find out that I was running Perl scripts to manage Novell servers in 1993, and that there is a whole world of other alternatives out there. If they don't want Linux on their desktop -- and I don't want it there yet either -- I can have them gracefully accept it off in the corner, serving their offices by being a good-neighbor fence and conduit for the net. Plus it lets me use all the nice post-1993 486 machines that are orphaned out there.

    Microsoft is big and round and we can throw rocks at it and stamp our little feet, or we can work around them and help people get actual value from their computer systems -- a big anxiety out there about that, by the way, if you haven't noticed. At least, unlike ten years ago, people aren't putting them in the closet because they can't use them. Now they just get stuck when Win 95/98 crashes a lot and starts not running the programs they used to.

    Eventually we might even have Linux file and program servers running Samba over Fast Ethernet
    networks (which eliminate the perceived delay in transferring files across a normal office net), and thin Microsoft clients that have just enough
    local stuff to boot up and do basic things.

    Sure, Front Page is a dog, but people are using it successfully to create Web pages, and who am I to get in the way of that. We have a big educational process ahead of us, that's for sure. But we don't get anywhere by asking everyone to join our Microsoft Whine Club first.

    Just show 'em how it's done. They'll figure it out.


    --------

    --
    Bill Gates Is My Evil Twin.
  93. DeBillAlizer by BiGGO · · Score: 1

    Thats really fun to have standarts isnt it?
    "One computer, one language -except we dont use it at all."

    I tell you what,
    Maybe let MS stop using HTML.
    They will use MSML which is only viewable with MS apps.
    IE 5 will use it and HTML,
    but if a website wants to be viewed with IE6, it will have to be MSML.

    It will have "the best lisence they can offer": GPF
    That means you can use it for free unless it presents a competition to them.

    not only that,
    the new IIS servers will check each file they are transfering to see if its HTML compatable.
    If it is, it will automaticly make it into MSML, for the users to enjoy it better!
    (and ofcoyrse the IIS server 7 times faster)

    Next thing to do,
    is to create a new language for people,
    "MS-english" and they will talk in that only.
    It wil be slower and sometimes people die when using it and have to reboot themselves.

    Go Microsoft!
    Make our lives easier by making new and improved standrats!!!

    --


    ---
    I'm going to live forever, or die in the attempt.
  94. Bill stole our web. by BiGGO · · Score: 1

    I do think, that instead of following MS, we need to attack them.
    We all use the MS compatible standards because everyone else does,
    and that's why we're losing the internet to them.

    They say HTML is now MS thing, we say ok, lets quickly change netscape to fit.
    But HTML was ours a long time before MS had it.

    We cannot afford to lose anything.

    I see people say that without Microsoft there would be no Internet!
    I see people thinking MS invented email!
    People that tell me that HTML was introduced by IE!

    We should not allow non-standard to win.
    We need to enforce the standard and take our internet back.

    Something such as a "warning: non standard page"
    on every non standrd page should be required by law.
    same as "This product does not generate HTML but our version of HTML" too.

    Standard did not exist for Microsoft to ruin,
    but for us to use.

    --


    ---
    I'm going to live forever, or die in the attempt.
  95. Good idea! by BiGGO · · Score: 1

    If make such a thing,
    please post it to freshmeat or anywhere,
    i'd like to have it too... :-)

    --


    ---
    I'm going to live forever, or die in the attempt.
  96. Computers that _don't_ run Windows? Go awaaay... by mdingler · · Score: 1

    Yeah, that's the nice thing about having almost an monopoly, you can just forget cross-platform compatibility. Forget the fact that you should write your HTML in plain ASCII and escape special characters, surely the viewer has the same codepage... BTW, does someone know how to translate Arial into 'Helvetica' or at least tell Netscape not to use Courier as default font? Gee, it looks like I'd have to install a proxy or true-type server...

    --
    ...Michael...
  97. Arial vs. Helvetica, the battle continues... by mdingler · · Score: 1
    Yeah, that's the authors point of view. BTW, the font tag is deprecated in the current standard, so I'd just use style-sheets, but that's not the point.

    We're talking about stupid web authors who think the whole world is ruled by Microsoft and think that 'Arial' has existed since before time began and isn't just a rip-off of Helvetica to circumvent copyright-laws (like the whole True-Type stuff).

    Netscape 3.0 substituted the current proportional font when it did't find the indicated on, but the 4.0 browsers just use Courier. Not quite nice, if you ask me...

    There's got to be a way to tell Netscape what font to use, but I didn't find anything in the Netscape.ad file. I guess you could tell X to map 'Arial' to 'Helvetica', but I seriously don't know how...

    --
    ...Michael...
  98. Poor M$ HTML!=Perl by Sircus · · Score: 1

    If someone's...

    1. Stupid enough to be using an M$ app to generate their HTML in the first place.

    2. Lazy enough not to worry about the bugs in it, and consequently their HTML.

    ...then, nice and accurate as it might be, I fear a Perl script won't be what they're looking for. A huggy, friendly, 4.5Mb Winbloze app with 15 options screens might just whet their appetite :-)

    --
    PenguiNet: the (shareware) Windows SSH client
  99. A story from real life... by Fellgus · · Score: 1

    I currently developing some Web stuff for a small Internet Company. The manager of this company has made alot of website, but never used any thing but Microsoft Frontpage. Thus, he always talks HTM documents and not HTML documents, because MSFP default to the .htm extension even though .html would work. When i confronted him with it, he just said, "But if it works, i don't see the problem!". - Hmm... If we didn't have Microsoft, we'd have all these incompatibilities that other application have to be aware of. Follow the standards.

    --

    -larsch

  100. Computers that _don't_ run Windows? Go awaaay... by Taral · · Score: 1

    Well, I run xfsft, and have all those Microsoft truetype fonts available in X :)

    --
    Taral

    WARN_(accel)("msg null; should hang here to be win compatible\n");
    -- WINE source code

  101. Isn't this a little partisan? by Vidar+Hokstad · · Score: 1
    Where have they claimed to be impartial?

    And why should they be impartial?

    It is interesting to see that in the last few decades, it has been increasingly touted that news sources should be impartial. It first affected newspapers with close afiliations with political parties, which in large numbers have claimed independence, and claimed being impartial.

    It has spread out to all kinds of news sources.

    But a news source is NEVER impartial.

    It always to some extent reflect whatever the owners, and their appointed editors want it to. No matter how much they try to pretend they're impartial, what they do will always have a slant depending on their view of the world.

    And claiming to be impartial just makes the bias harder to see. In that respect, claiming to be impartial does more harm than it does good: it make a lot of people swallow biased news because they aren't aware of the political or commercial afiliations of the news source.

    If you want to look for an impartial news source, why don't you search for the fountain of youth while you're at it - you're just as likely to find either.

  102. nothing new by ozone · · Score: 1

    flashback to 1992: i'm trying to print a postscript document generated by m$word, and guess what? some printers hang, some mangle the document, few print it correctly. the problem was unique to micro$oft, i've never seen anyone else's postscript do this, ever.

  103. De-commoditized(tm) characters by ColonelPanic · · Score: 1
    Well, duh! Those de-commoditized(tm) characters and HTML are integrated(tm) features of Frontpage(tm). I will personally videotape a demonstration (well, a simulation - hehe) to prove that the Demoronizer program does not completely remove these features. No wait, that's perjury. I think I'll have a few of my dolts (I mean, employees) do all the dirty work instead. Then I'll send a senior exec to testify and say things like "Your honor, I think that's the truth, and..." Well, you get the point. I'm just so smart!

    - Bill G~++@_&^*&$3
    NO CARRIER

    --
    "Skill shows through where genius wears thin." -Wittgenstein || Religion: uniting aviation and architecture.
  104. Not just Microsoft by foo · · Score: 1

    Netscape on Windows platforms (and apparently on the Mac as well) are also following Windows 1250 (?) character set instead of ISO-8859-1. Therefore it is not just Microsoft products; Netscape is not following the standard either.

    This will increase our effort to educate the public :-(

    Whenever I see "Web Tips" columns saying you can put in quotes and dashes in a web page without resorting to Unicode or GIF, I write a letter to the author. Unfortunately, last time I did this the author replied, promised a correction, but the correction never showed up in print.