Domain: fourmilab.ch
Stories and comments across the archive that link to fourmilab.ch.
Comments · 750
-
Old News
Here is an article about the same sort of concept except with a solution from 1993.
-
emmett, you need this
-
Slashdot == Windows Kiddies?
From the article: "As if you couldn?t tell..." (Note the ? where the ' belongs) This is a trait os MS... maybe y'all should demoronize before you post...
~rwm -
Re:Say it ain't so!demoroniser: DEMORONISER Correct Moronic Microsoft HTML
This page describes, in Unix manual page style, a Perl program available for downloading from this site which corrects numerous errors and incompatibilities in HTML generated by, or edited with, Microsoft applications. The demoroniser keeps you from looking dumber than a bag of dirt when your Web page is viewed by a user on a non-Microsoft platform. NAME demoroniser - correct moronic and gratuitously incompatible HTML generated by Microsoft applications SYNOPSIS demoroniser [ -u ] [ -w cols ] [ infile ] [ outfile ] DESCRIPTION Many slick, high profile corporate Web sites I visit seemed to exhibit terrible grammar completely inconsistent with the obvious investment in graphics and design. Apostrophes and quote marks were frequently omitted, and every couple of paragraphs words were run together which should have been separated by a punctuation mark of some kind.
This remained a mystery to me until I wanted to convert a presentation I'd developed in 1996 using Microsoft PowerPoint into a set of Web pages. A friend was kind enough to run the presentation through PowerPoint's "Save as HTML" feature (I have abandoned all use of Microsoft products, so I did not have a current version of PowerPoint which includes this feature). When I got the PowerPoint-generated HTML back and viewed it in my browser, I discovered that it contained precisely the same grammatical errors I'd noted on so many Web sites, and which certainly were not present in my original presentation.
A little detective work revealed that, as is usually the case when you encounter something shoddy in the vicinity of a computer, Microsoft incompetence and gratuitous incompatibility were to blame. Western language HTML documents are written in the ISO 8859-1 Latin-1 character set, with a specified set of escapes for special characters. Blithely ignoring this prescription, as usual, Microsoft use their own "extension" to Latin-1, in which a variety of characters which do not appear in Latin-1 are inserted in the range 0x82 through 0x95--this having the merit of being incompatible with both Latin-1 and Unicode, which reserve this region for additional control characters.
These characters include open and close single and double quotes, em and en dashes, an ellipsis and a variety of other things you've been dying for, such as a capital Y umlaut and a florin symbol. Well, okay, you say, if Microsoft want to have their own little incompatible character set, why not? Because it doesn't stop there--in their inimitable fashion (who would want to?)--they aggressively pollute the Web pages of unknowing and innocent victims worldwide with these characters, with the result that the owners of these pages look like semi-literate morons when their pages are viewed on non-Microsoft platforms (or on Microsoft platforms, for that matter, if the user has selected as the browser's font one of the many TrueType fonts which do not include the incompatible Microsoft characters).
You see, "state of the art" Microsoft Office applications sport a nifty feature called "smart quotes." (Rule of thumb--every time Microsoft use the word "smart," be on the lookout for something dumb). This feature is on by default in both Word and PowerPoint, and can be disabled only by finding the little box buried among the dozens of bewildering option panels these products contain. If enabled, and you type the string,
"Halt," he cried, "this is the police!"
"smart quotes" transforms the ASCII quote characters automatically into the incompatible Microsoft opening and closing quotes. ASCII single and double quotes are similarly transformed (even though ASCII already contains apostrophe and single open quote characters), and double hyphens are replaced by the incompatible em dash symbol. What other horrors occur, I know not. If the user notices this happening at all, their reaction might be "Thank you Billy-boy--that looks ever so much nicer," not knowing they've been set up to look like a moron to folks all over the world.
You see, when you export a document as text for hand-editing into HTML, or avail yourself of the "Save as HTML" features in newer versions of Office applications, these incompatible, Microsoft-specific characters remain in place. When viewed by a user on a non-Microsoft platform, they will not be displayed properly--most browsers seem to just drop them, as opposed to including a symbol indicating an undisplayable character. Hence, the apparently ungrammatical text, which the author of the page, editing on a Microsoft platform, will never be aware of.
Having no desire to hand-edit the HTML for a long presentation to correct a raft of Microsoft-induced incompatibilities, I wrote a Perl program, the demoroniser, to transform Microsoft's "junk HTML" into at least a starting point for something I'd consider presentable on my site. In addition to replacing the incompatible characters with HTML-compliant equivalents wherever possible (a few rarely-encountered characters which can't be translated result in warning messages if encountered), the following sloppy or downright wrong HTML is corrected.
- The missing semicolon at the end of numeric character escapes (=) is supplied.
- Numeric renderings of special characters (< > &) are replaced with readable equivalents.
- Unquoted <table> tags containing non-alphanumeric characters are quoted.
- PowerPoint's mis-nesting of <font> and <strong> tags is corrected.
- PowerPoint's boneheaded use of <ul> and </ul> tags to accomplish paragraph breaks is corrected and the proper <p> tags inserted.
- Missing <tr> tags in text-only slides are inserted.
- Nugatory </p> tags are removed.
- Unmatched <li> tags in headings are removed.
- Idiot "paragraph-long lines" are broken into something suitable for editing with a normal text editor.
-w cols Wrap output lines at column cols. By default, lines are wrapped at column 72. A cols specification of 0 disables line wrapping. demoroniser attempts to wrap lines so as to preserve their meaning. Lines are broken at white space whenever possible. If this cannot be done, a line longer than the cols specification will remain in the output HTML. BUGS demoroniser is a Perl script. In order to use it, you must have Perl installed on your system. demoroniser was developed using Perl 4.0, patch level 36. FILES If no outfile is specified, output is written to standard output. If no infile is specified, input is read from standard input. SEE ALSO perl(1) Download demoroniser.zip AUTHOR John Walker
http://www.fourmilab.ch/This software is in the public domain. Permission to use, copy, modify, and distribute this software and its documentation for any purpose and without fee is hereby granted, without any conditions or restrictions. This software is provided "as is" without express or implied warranty.
by John Walker
January 16th, 1998 -
Cold Fusion Weapon ... an old idea
I thought up the conspiracy theory...and have found I'm not alone. Nuclear Pipe Bomb anybody?
--Mike-- -
Re:Linux and commercial software
If Autodesk would port AutoCAD to Linux I'd buy it so fast it would make your head spin. And AutoCAD costs $2500 on the street.
But I suspect, though of course I haven't any hard evidence at all, that Autodesk has a private arrangement with Microsoft. At that time AutoCAD was supported on MS-DOS, Macintosh, and several flavors of Unix. But five or six years later that list of platforms had been whittled down to one: Win32. Just recently Microsoft assimilated Visio, but in the process, for some mysterious reason or another, as part of the deal Visio spun off its Intellicad division, which had been selling a clone of AutoCAD for about the tenth of AutoCAD's price. Now why did they do that? I'm just guessing, but I think that Microsoft long ago agreed not to gobble up Autodesk as an appetizer, as they pretty obviously have been able to do for the last decade, in return for Autodesk dropping all other platforms.
And I also suspect that it's lots of under-the-table business like this, rather than some impalpable attitude problem amongst Linux users with regard to commercial software, that explains why so few vendors of commercial software for Win32 are willing to port their products to Linux. If this is only a paranoid fantasy, then it's one I share with Judge Jackson and the DOJ antitrust division.
Yours WDK - WKiernan@concentric.net
-
Re:Very Likely This is IMPOSSIBLE
All one has to do is realize that the pads numbers can not be purely random.
Look at The Hotbits RNG, Genuine random numbers, generated by radioactive decay. Government agencies' budjets are in billions of pounds/dollars. I think they could stretch to something like this. Or they could be using a dice. They could gave a guy throw a dice and read out the number that comes up. Or they micht have a big bucket of numbered balls they shake about and then they take a ball out of at random. If you can project that, you chould have no trouble telling me next week's lottery numbers.
When you factor in the observation that if a random data set contains less numbers than it has possible combinations it will not show a pattern, if is fair to assume the data is random, or at least the Real pads are random.
Just my thoughts.
Michael Tandy -
Blame Univac
Hahahahahaah! You Andover folks are more 1337 than I thought. Not only do you have uber-hacker John Walker on your team, you're running the site on a Univac 1107 -- say, you have any of those old 2 1/4 ton 100MB hard disks?
-
Blame Univac
Hahahahahaah! You Andover folks are more 1337 than I thought. Not only do you have uber-hacker John Walker on your team, you're running the site on a Univac 1107 -- say, you have any of those old 2 1/4 ton 100MB hard disks?
-
Blame Univac
Hahahahahaah! You Andover folks are more 1337 than I thought. Not only do you have uber-hacker John Walker on your team, you're running the site on a Univac 1107 -- say, you have any of those old 2 1/4 ton 100MB hard disks?
-
Re:Blame Canada
Exodus is getting $1million/year from us so they let us do whatever we want. They only thing they won't let us do is take a picture of our cage -- no cameras allowed anywhere in the facility! I guess they're afraid we're going to steal their soul. We were able to smuggle out this picture of PatG, PatL, Martin, and the Arrowpoint rep. Behind them you can see the current Slashdot setup.
-
More on Zvezda
John Walker's Website contains information on the Salyut 3 (Almaz) station that apparently carried an automatic cannon (like on a MiG fighter). Cosmonaut Pavel Popovich (the ?4?th person to fly in space) claims he flew on the station with the weapon aboard but it was only fired once to test when no cosmonauts were on board. They had to hook the station's thrusters into the controls so they would fire simultaneously to offset the thrust of the gun. Wierd stuff eh? Strange but (apparently) true.
-
Re:Good God
Here's a link to the company that is using this patent. They actually have some products even. If you just want to try this stuff out. I would recommend checking out the RPKP Project
-
Analytical Engine Emulator
Here's a page full of links and info on not only Babbage and his engine but also emulation of his Analytical Engine. There are also links to download the source for the mathematical function library and the java.
-
Re:Everybody take a breather> You want to run Linux, that?s fine. I like running Windows 2000. You wanna use Netscape fine. I like running IE 5 - it has a tendency not to crash my system like Netscape. We both have a choice.
My system (Linux) has a tendency not to crash at all, no matter what Web browser I happen to be running. It also has a tendency not to litter the Web with broken HTML. (By contrast, your broken browser/OS, which you used to post your comments, gives the rest of us the gift of non-standard "smart quote" codes every time you use quotes or apostrophes.)
Such considerations might prove helpful in our rational thinking about what would happen if Microsoft were shut down.
--
-
Re:MS Locks IE to EVERYTHING
The line is fuzzier than I first thought.
Nonetheless, the decision was business, not technology. They did this to work around the consent-decree against bundling, though how 'integration into the OS' isn't 'bundling with the OS' is beyond me. A classic example of Microspeak.
The threat is that if they achieve true browser dominance, they can lverage it to achieve web server/service domainance, just as they have leveraged desktop dominance to achieve (or nearly achieve) browser dominance. They have no compunction in using their 'embrace extend, destroy' schemes on any standards that get in their way (witness Java; also, see http://www.fourmilab.ch/webtools/demo roniser/).
So the fact that they have a plausible-sounding explanation for locking IE into everything is no reason for us not to oppose/forbid it. If they are forced into some slightly less than optimal code resharing in order to protect us from their monopoly tactics, so be it.
They have acted in bad faith many times. We are not obligated to give them the benefit of the doubt anymore. -
Univac example
In the early '70s, the U. of Maryland had a timesharing Univac 1106 for most of its compute power. The story I heard was this:
When Univac build the 1108, they priced it to recover their engineering costs with relatively low sales levels (read: high cost).
It turned out to be much more successful than their forecasts predicted, so the fixed costs were amortized quickly, and they could drop the price and sell lots more at a still-tidy profit. However, they didn't want to rile those who'd paid the big bucks. So, they introduced the 1106---same machine, but slower clock speed. Lots of folks bought it, thought it was neat, and Univac was happy.
Of course, the time came (sooner rather than later) when someone poked around enough to notice that the only difference was a resistor and a crystal. So, after a trip to Radio Shack with a few bucks, your low-budget machine would run just as well as the high-prices spread---double the speed. Lots of those got used at schools and anywhere else hackers congregated, until Univac cheapened enough components to make the 1106 a truly slow machine.
See note 6 to this document for some corroborative detail, but a quick search reveals nothing else on the web, durnit.
-
Re:*sniff* they had 17K in the late 1950sUNIVAC then went on to build, in the 1970s, the biggest drum memory device of all time, the FASTRAND. 90 megabytes. 880 RPM. This monster had two drums six feet long, and weighed two and a half tons. Special cast concrete mounting pads were required.
UNIVAC stayed with drums longer than any other vendor, but the FASTRAND was just too much. They bought a disk manufacturer (ISS) and went to disks. So the FASTRAND ended the drum era.
-
Re:You know slashdot is going downhill...
to be fair, the slashdot guys didn't put those in, the guy who posted the story did. but mayb slashdot should have some way to fix that
It's called the demoroniser. It's been around for quite a while, so I'm a little suprised to see that the Slashdot folks don't use it to clean up messy input.... -
Re:demoroniser, anyone?Yes... the quotes above display properly under Linux/Netscape.
There's more to the problem than a simple matter of supporting smart quotes in the font sets on UNIX, however - try to view some of the same MS-generated HTML when using TrueType fonts on Windows and MacOS for example; you will have the same problem. Better yet, just follow the link and read the author's reasoning. I tend to agree with him, as you might imagine.
The real problem is that ASCII is ASCII and Unicode is Unicode. You extend either one at the peril of alienating those users who do not use software which supports the "solution". Real, honest-to-God, Open Standards should be adhered to in situations such as these, IMHO.
-
demoroniser, anyone?Attention all MS employees/sycophants/victims:
Please run your Word(tm) mangled HTML through demoroniser before submitting your posts to
/. Otherwise, it makes it damn hard for the rest of us to follow your questions.Thank you.
-
Re:what, no ISO-standard character set?I agree! Jon K's misuse of character set is a crime.
But don't lay the blame on UNIX boxes. On this unix box, the apostrophe's are superscript `1's, which is a refreshing change from the usual question marks, at least. It is most likely a faulty translation to HTML by a Microsoft product.
Jon knows about this problem. I have personally mailed him twice about it. I even sent him the URL for the de-moronizer.
Jon has known about this problem since at least fall 1998. I first mailed him about it last June. Yet he refuses to fix it.
Why Jon? Does the demoronizer not work for you? Too much trouble to use?
-
Re:What's up with question marks?
he linuxnewbie.org's web page has the ? problem. All the single quotes appear as question marks. This happens if the web page was edited using any Micro$oft software, because M$ once again decided to bastardize the standard (unicode in this case).
For anyone interested, there is a solution to this problem. There's a public domain program called demoroniser that parses html and cuts out Microsoft's bogus unicode mangling. It's conveniently written in Perl, so you can modify it if you don't particularly like the way that the original author decided to correct MS's mistakes.
-
Mirror this side of the pond
I always thought the Fourmilab was in Switzerland
;-) -
Internet Explorer 5 still has a Y2K bug?
Check this out:
http://www.fourmilab.ch/documents/calen dar/
In my copy of Netscape Communicator 4.7, the current date is filled in correctly. In my copy of Internet Explorer 5.0, the date is filled in with the year 3900. Could it be a bug in getYear()? Microsoft's support site says I am up-to-date with patches. And I haven't seen any company put out more Y2K patches than Microsoft.
-
Patents as a strong defense
I believe that if you have a patent you're required to enforce it and protect it against violations and other infringements. Otherwise, you may be deemed to have surrendered or otherwise nullified it. That doesn't mean that they couldn't have licensed it to Barnes and Noble for a token sum, of course, but I fail to see why they'd want to.
A point made by John Walker (founder of Autodesk) in his Autodesk File (North American mirror) is that software companies are regrettably low on patents when compared to industrial or hardware companies of similar size. These patents are used defensively, in a cross-licensing scheme, if a violation is made.
Consider this example: company A uses technology possibly patented by company B. Company B sues. The lawyers will work out a deal where company B is licensed technologies of equal value from company A's patent portfolio - it may go all the way to a full exchange of licenses for all marketable technologies from both companies. Intel and Digital did this relatively recently.
The problem is that if company A doesn't have a strong or viable patent portfolio, it cannot protect itself against patent infringement suits. It may be required to actually shell out cash to settle a suit, which is against the interest of the shareholders (and may lead to the sacking of the management, besides).
While Bezos may be the largest single shareholder, he isn't the only one, and his share will decrease over time. Not to mention that he probably has no desire to lose his shirt in the short term, either.
-- -
Oceania has *always* been at war with EastAsia!Last fall one of our local Democrat Congresscritters, either Anna Eshoo or Zoe Lofgren , commented that the Republicans has discovered that there were still *Commies* in *China*, and they'd be trying to make a big deal about it once they were done embarassing themselves with Monicagate. ["Oh, boy - *Commies* outside of Berkeley and Hahvahd! We haven't known what to do since the Evil Empire fell!"] Always helps to have an occasional Yellow Commie Spy around when you don't have an Arab Terrorist Mastermind handy, and it's so convenient to have Foreigners giving money to AlGore in violation of the campaign finance laws the Demos are trying to embarass you into passing. (Personally, I think the First Amendment is a good campaign finance law, but we're talking about Republicans who don't like level playing fields any more than Democrats do.)
So BE AFRAID! BE VERY AFRAID!- Commies under the bed!
- Terrorism and Extremism Must Be Stomped Out!
- Latino NarcoCrytographers who didn't go to the School Of The Americas!
- and some spare UnAmerican foreigners (the Republicans don't have Pat Buchanan in their primaries, but may have to deal with him if he somehow tricks the Reform Party into nominating him.)
- Commies under the bed!
-
Re:X-files?
>Y2K movie, or new X-Files?
This week's X-Files was just Brain-Eating Monster Of The Week. But the Preview of NeXT Week'S X-Files has them meeting Frank Black of Millenium, Chris Carter's Other Show!
John Walker's Millenium Screen Saver -
there is a secure (internet) phone for Linux
It's called Speakfreely and you have your choice of encryptions. And it actually runs on different platforms (so you can talk to Win*-users too).
More can be found at the authors site here
Roland -
Re:AutoCAD was originally a unix app, time for ret
Wrong.
Read this. The Autodesk Files by John Walker, one of the company's founders.
It's an interesting look at corporate culture and the history of computing in the '80s regardless of your interest in CAD/CAM.
k. -
A few things
"it seems like a year ago, a big ol? long year"
"ol?"? oh Jon, Jon, you were gaining such credibility, then you post something with the tell-tale stigmata of Microsoft.
Be a good lad and buzz it through the Demoronizer first. That way you'll get away with it.
cheers
jsm -
Re: Linux sound support
"Lucky to get it working on anything apart from a particular spec of machine"?
If you've got a sound app working in one place using the standard OSSLite subsystem, it should work everywhere (including on the systems of those enlightened folks using ALSA, which is a far better system, for their sound). Look at Speak Freely -- I can't see ViaVoice requiring many sound system features it doesn't support, and it works quite well. -
Re:Audio/video tools
Well, there is something called speakfreely which exists both for unix and windos. If it is compatible with anything else I don't know.
Here is the unix version and windos version. -
Re:Audio/video tools
Well, there is something called speakfreely which exists both for unix and windos. If it is compatible with anything else I don't know.
Here is the unix version and windos version. -
MS-Demoroniser
Katz forgot to use the Demoroniser, the premier tool for correcting moronic Micros~1 HTML.
-
Re:Babbage's analytical engineThe analytical engine was never built. Babbage's difference engine, a much more primitive special purpose machine, was built several times, even during his lifetime.
--
-
Re:The book has an essential flawThe book has at least one essential flaw. The first working, fully programmable general purpose computer was Konrad Zuse's Z3 (Germany, 1941).
The Z3 was not fully programmable: Zuse forgot about the Jump and If-then-else instructions. No loops possible, oops. If you wanted the Z3 to do lots of calculations, you had to feed it a long program on punched tape
:-) A German description of the Z3's architecture is here.The first fully programmable computer was Babbage's analytic engine. Steam power rules! Of course, it was never built. Vaporware rules twice! By the way, his GF Ada Lovelace invented the loop. Chicks rule thrice!
The first operational fully programmable computer was ENIAC (you programmed it by replugging cables). Colossus was a secret special purpose cracking machine for the German Lorenz code and hat limited programmability.
--
-
Did you notice the Microsoft brain-damage?
The Unisys web page contains instances of question marks where apostrophes should be. This indicates that some Microsoft program was used to construct the HTML. Not only are they assholes, but they look like morons too.
-
It's a small nit, but someone's gotta pick it
Interesting article, but you've got those annoying question marks again. Might I suggest The Demoroniser.
---- -
Hardly a new idea
As an idea, it's over six years old
-
Fix your HTML...
http://www.fourmilab.ch/webtools/demoron iser
Please... I can?t stand reading HTML with a bunch of ?question marks? littering it. It?s very annoying, don?t you think?
-
Anti-Microsoft Rant
It's amusing to find an Anti-Microsoft Rant in the middle of a book on weightloss. Can you find similar monologues in other books, hopefully in the Introduction rather than the middle of a chapter. Is this kind of thing even appropraite in a book like this, no matter how true?
-
speakfreely - cross-platform telephony available
here's the link
This FREE software is availble in *many* platforms.
(Video support is still in the works, I think.) -
Re:And of course, it's non-free
There is already work in progress to build an Open Source solution.
Examples are Speak Freely, Nautilus and Whisper.check it out and improve it!
Further information might become available under www.linuxtelephony.com or linuxtelephony.org
so long
... -
no, John uses M$ word or make M$-HTML
no, John uses M$ word or make M$-HTML.
now does this mean that John is werking for M$ no, just that he must not care that he is making BAD HTML, cause there could be no way he wouldnt know that MOST M$ software isnt freindly to non-M$ software, and write for /.
Now John is it that hard to use someother software? or at lest use the DEMORONISER. now dont get me wrong i like John, well, most of the time... but this M$-HTML is a bad thing. well time to get some work done...
nmarshall
#include "standard_disclaimer.h"
R.U. SIRIUS: THE ONLY POSSIBLE RESPONSE -
Re:ZDNet Questionable ReportingNo, it is not the question mark (Information) icon. In this article it is nonstandard characters which my browser shows as question marks around those words. Obviously quotation marks are intended, but quotation mark characters or encoding were not used.
These pages explain this MS problem:
-
funky HTML
Whats with all the question marks where there should be apostrophes? Looks like Katz needs the demoroniser. I don't tend to be critical of others but this article is a complete waste of space, as are most of Katz's other articles. Like we need some one to tell us what we already know, and do a piss poor job at that. It just comes off as a feeble attempt to score brownie points with computer geeks/nerds. He's trying to enter a "gift culture" by handing out last year's fruitcake in a new wrapper. Try coming up with something original and then maybe people would respect you. If I wanted rehashed crap I'd go back to high school.
-
Re: Physics of Immortality
-
Re: Physics of Immortality
-
Been done before
Check out John Walker's web site at:
http://www.fourmilab.ch/cship/cship.html
He started posting this stuff a couple of years ago. I believe he has made the resources available. It's a wonderful site.