I won't go into why validation is important, others have covered that well. Instead, here's just a couple of thoughts from the trenches.
Compliance is simple when you have full control over the site and all data that is input. In business reality, this is impractical. Fact is, editors, salespeople and the CEO will want to make website changes. Now, you could make it your job to clean up the garbage HTML sent by these folks. What you'll figure out quickly though, is that suicide is a more attractive option. So we of course now have content management systems so our bosses, et. al. can change what they like at 2 in the morning.
The thing is, these people will do horrible, horrible things. They will paste the most evil non-ASCII characters you could ever imagine into your lovely system. If you've only (gasp) given them a textarea in which to paste HTML, things are even uglier - they will paste the worst hackjob code you can imagine into there. Or worse, they'll paste the output of the MS word HTML export (yikes!). So now what you have is a lovely framework / skin for your site with pristine tags for navigation and advertisements, with a nice steaming heap of dog doo in the middle of it.
So now you're not compliant. Not because of anything you did directly yourself, but because you just handed the keys to the kingdom over to the vilage idiot.
Here's how I deal...
1) DON'T ALLOW FOREIGN HTML. This can easily be achived if your CMS provides an in-page HTML editor which produces valid code. You may be able to upgrade an existing CMS with something like "HTMLArea": http://sourceforge.net/projects/itools-htmlarea which is a replacement for a textarea tag.
failing #1,
2 Run W3CTidy (as others have mentioned) on the INPUT to your CMS. Give the jackass a preview. If it's borked, they'll try to fix it or call you if they really can't do it.
Re:DOM is hell.
on
DOM Scripting
·
· Score: 3, Insightful
No-one in their right mind writes code like that.
Generally, the only reason to use DOM for output is when you need to be able to reuse the generated node tree in some other local function, and it is too expensive to use a DOM parser to read in XML data. Client-side in a browser, it is just about *never* too expensive to use a parser instead of code like yours.
When people do generate DOM like that the calls are generally more spread out inside application code where functions recieve a parent node to append children on.
Also, why would anyone create 2 <li> elements in inline code? Instead, it seems much more likely that someone would instead be looping over a result set and would be creating 1 <li> per loop.
You're looking at DOM all wrong. DOM is amazing for scanning through a parsed XML tree. Creating the output tree can be done in a hundred uninteresting ways.
It gives you massive amounts of great information about the memory usage of your program.
The other day I spent nearly 3 hours trying to decode what was happening from walking the backtrace in gdb. Couldn't for the life of me figure out what was happening. Valgrind figured out the problem on the first run and after that, I had a solution in a few minutes.
Highly recommended software, and installed by default on several distributions, AFAIK.
I'd like to take issue with an idea that I caught glimpses of in the earliest authors and then one man thrust the problem into the spotlight:
ARNOLD TREHUB Psychologist, University of Massachusetts, Amherst; Author, The Cognitive Brain
Modern science is a product of biology
The entire conceptual edifice of modern science is a product of biology. Even the most basic and profound ideas of science -- think relativity, quantum theory, the theory of evolution -- are generated and necessarily limited by the particular capacities of our human biology. This implies that the content and scope of scientific knowledge is not open-ended.
Wow. Only a psychologist would come up with an idea like this. It's clearly a straw-man argument. The simpler version we've all heard for years: if a tree falls in the forest and noone is around to hear it, does it make a sound? The answer is of course it does. The weight of the tree crashing against the ground via the force of gravity sends a shockwave through the air. Whether or not a person is in range of the shockwave is completely irrelevant.
This is the highest form of hubris: it takes people/intelligence for quantifications to have meaning. Bullshit.
Take a universe exactly like ours in every respect with the very minor alteration that life never got started on earth. Well guess what? It still takes a minimum threshold of matter to condense and form a burning star. The label we've given to that threshold is nothing; a mere convienience. The real important fact is that matter *can* condense into a burning star, and it will do so even if there's no humans around to pontificate.
I was just idly hitting alt-x (random article) on Wikipedia last week and I came across this great page.
It reminded me why I got into computer programming in the first place. D&D modules were the 'software' of games.
I'm not sure kids playing today have this same experience. It seemed to me for a long time that modern D&D adventures were played in cheap card games (Magic The Gathering) and in RPG computer/console games.
It's great to hear that far from being dead and gone D&D is actually still a great pastime. Now if we could just get WOTC to hire Gary Gygax...
"The source code has not been released yet. The winners will be notified by EMail soon. They will be given a chance to review the write-up of their entry. Once this process is complete the source code will be made available on the winning entries web page. We anticipate that this will be in mid-December."
Can I buy some pot from these moderators and the parent poster?
Toaster: You know the last time you had toast? 18 days ago. 11:36, Tuesday the 3rd. Two rounds. Lister: Ssshhh! Toaster: I mean, what's the point of buying a toaster with artificial intelligence if you don't like toast? Lister: I do like toast! Toaster: I mean, this is my job! This is cruel! Just cruel! Lister: Look, I'm busy! Toaster: Oh, you're not busy eating toast, are you? Lister: I don't want any!! Toaster: I mean, the whole purpose of my existence is to serve you with hot, buttered, scrummy toast. If you don't want any, then my existence is meaningless. Lister: Good. Toaster: I toast, therefore I am. Lister: Will you shut up?!
I entirely agree with your post, but I just have a minor nitpick:
An XML document can be well-formed and make it through the parser stage, but be invaild from the point of view of the schema. 'validation' is the process of comparing a schema or DTD against a document and checking that a) tags and text are nested properly and b) attribute values are legal. What you're thinking of here is actually a parser error, not a validation one.
XML parsers usually have a "validating parser" and a "non-validating parser". The default is usually the non-validating one so you can just invent tags on the fly, and the parser won't choke during validation.
If I have an (otherwise proprietary) web application that makes a call to a GPL3'd grep command then I'd have to distribute grep to people if they asked.
No. No, no and no! First of all a) the current requirement is a linking requirement - if you linked your application to libGrep (is there such a thing?) you would be responsible for distributing *your* code under GPL; and b) the discussion here is about possibility under consideration of making this GPL requirement necessary if your application uses the output of a GPLd program. The only way you'd have to redistribute grep is if you forked the grep project for some reason and made your own modifications. And then you'd only have to give the source code to someone in a reasonable format *if they asked*.
GPL3'd applications that aren't web-apps won't suddenly require distribution if they are used in a web-app, only applications coded with such use and distribution in mind will.
This was never a requirement. We're talking about *your* code which wraps the output of GPLd programs - not someone elses code that you must redistribute. Wherever did you get that idea?
That's true, but I must point out here that this interoperatability matters less these days than it used to. There are nice bindings for HTMLTidy for all the popular server side languages. The client side is even starting to look better, with (mostly) complete DOM support everywhere. Browsers of course must present the HTML page as DOM, hence is a fairly clean HTML parser with the same API you'd use for XML.
As long as the HTML isn't seriously screwy, these systems make it just as easy to read HTML as XML. Even if the HTML is really screwed up these tools will make a best guess anyway, and will occassionaly work as expected in the extreme case.
Even so, I'm more of an advocate of everyone publishing their data in a custom XML format (as long as you don't go changing it on me!). Then, it's easy to grab whatever you want off the web using a quick get/parse, and transform it or dig down into it using XSLT / XPath. The semantics of XML tags is so much smarter. Example:
In HTML: <ol>
<li>vanilla ice-cream</li>
<li>chocolate</li>
<li>whip-cream</li>
<li>cherry</li> </ol>
Now, let's say you're writing some code to dig into this structure and get the ingredients list. Let's say that this is the fourth ordered list tag on the page. The XPath code for the HTML, version looks something like this:
//ol[4]/li
or, if you're lucky, something like this may work:
You know what's really funny about your rant against XML right after complaining that search engines won't be able to index? This google tool gives you *exact* control over what is indexed on your site. You provide google with data in guess what format?
The future is XML for data and HTML/DHTML/CSS for presentation. Plain HTML circa 1995 was the big hackish joke. The architecture of the web is coming of age.
I'd just like to add one other thing: he is also VERY wrong about XSLT. It's a fantastic language if you know what you're doing. You can take any XML based format and transform it to any other with a minimum of hastle - much less hastle in fact than transforming a data format from one type to another in *any other language*.
In addition to transforming one XML format to another, XSLT can also produce HTML and plain text output as well. It can turn XML into virtually anything - and it does so with a highly elegant syntax (alongside XPATH).
It's even possible to get it to produce some binary formats. For example, you can transform your own XML format into XSL formatting objects and then use a tool like Apache Ant to make the whole thing into a PDF. It's possible with this combination of technologies to build a website in which every single page is available as a PDF.
So, grandparent poster - please get a clue. I don't understand where your obsessive hatred of XML comes from. Perhaps you've had bad teachers or a bad expreience with XML code written by an ametueur.
You know what I find especially lame about the now infamous "NOOOOOOOO!!!" scene? It was delivered by James Earl Jones, not Hayden Christensen. James of course was responsible for making Darth Vader such a badass in Ep. 4-6. The fact that a distingushed professional as him could have delivered such a horrid stinky scene is highly dissapointing.
Oh well, personally I'll just continue to enjoy those 3 great original movies and ignore the latest 3 stinkers.
I'm hoping for XULRunner to come out so we can start dev'ing our XUL apps:D
Umm, waiting? You can build it now you know. I have actually built an administration tool for a client using a custom built XULRunner with SVG support.
As others have pointed out, that money would be much better spent on other, actual scientific work. Why not just give NASA the cash and allow them to prioritize their own work. Or do you really think George and Co. are more qualified to do so?
Look, this is a simple ploy by Bush to not look like a complete asshole in the eyes of history. I sincerely hope it will not work.
SlashHack is a cool example of an app written on top of the Mozilla platform.
The article is correct Firefox (really Moz as others pointed out) is a fantastic development platform.
The technology is especially cool for me: I wrote a system in 2000 for a client that positions Java Swing widgets using XML, in order that the app could support pluggable skins. I view XUL as the ultimate application of that architecture. A fantastic decoupling of logic and presentation.
Mozilla parts (Firefox and Thunderbird) are under the Netscape Public License
I hate to be pedantic (well, ok no I don't, this is slashdot...) but Mozilla is now released under the MPL, the Mozilla Public License. The NPL is considered a "historic document". Grok.
I won't go into why validation is important, others have covered that well. Instead, here's just a couple of thoughts from the trenches.
Compliance is simple when you have full control over the site and all data that is input. In business reality, this is impractical. Fact is, editors, salespeople and the CEO will want to make website changes. Now, you could make it your job to clean up the garbage HTML sent by these folks. What you'll figure out quickly though, is that suicide is a more attractive option. So we of course now have content management systems so our bosses, et. al. can change what they like at 2 in the morning.
The thing is, these people will do horrible, horrible things. They will paste the most evil non-ASCII characters you could ever imagine into your lovely system. If you've only (gasp) given them a textarea in which to paste HTML, things are even uglier - they will paste the worst hackjob code you can imagine into there. Or worse, they'll paste the output of the MS word HTML export (yikes!). So now what you have is a lovely framework / skin for your site with pristine tags for navigation and advertisements, with a nice steaming heap of dog doo in the middle of it.
So now you're not compliant. Not because of anything you did directly yourself, but because you just handed the keys to the kingdom over to the vilage idiot.
Here's how I deal...
1) DON'T ALLOW FOREIGN HTML. This can easily be achived if your CMS provides an in-page HTML editor which produces valid code. You may be able to upgrade an existing CMS with something like "HTMLArea": http://sourceforge.net/projects/itools-htmlarea which is a replacement for a textarea tag.
failing #1,
2 Run W3CTidy (as others have mentioned) on the INPUT to your CMS. Give the jackass a preview. If it's borked, they'll try to fix it or call you if they really can't do it.
Happy webmastering!
--graveyhead
(cred)
No-one in their right mind writes code like that.
Generally, the only reason to use DOM for output is when you need to be able to reuse the generated node tree in some other local function, and it is too expensive to use a DOM parser to read in XML data. Client-side in a browser, it is just about *never* too expensive to use a parser instead of code like yours.
When people do generate DOM like that the calls are generally more spread out inside application code where functions recieve a parent node to append children on.
Also, why would anyone create 2 <li> elements in inline code? Instead, it seems much more likely that someone would instead be looping over a result set and would be creating 1 <li> per loop.
You're looking at DOM all wrong. DOM is amazing for scanning through a parsed XML tree. Creating the output tree can be done in a hundred uninteresting ways.
valgrind -v ./myapp [args]
It gives you massive amounts of great information about the memory usage of your program.
The other day I spent nearly 3 hours trying to decode what was happening from walking the backtrace in gdb. Couldn't for the life of me figure out what was happening. Valgrind figured out the problem on the first run and after that, I had a solution in a few minutes.
Highly recommended software, and installed by default on several distributions, AFAIK.
Enjoy!
Man I can't beleve I'm giving up the chance to moderate CmdrTaco, but here we go anyway ;)</offtopic>
Check out Beatles-Beatles user page. Of the last 20 submitted articles, 18 of them were submitted by ScuttleMonkey, one by Taco, and one by samzenpus.
Now this could just be a stastical divergance, but somehow I doubt it. I think ScuttleMonkey has some 'splaining to do...
Wow. Only a psychologist would come up with an idea like this. It's clearly a straw-man argument. The simpler version we've all heard for years: if a tree falls in the forest and noone is around to hear it, does it make a sound? The answer is of course it does. The weight of the tree crashing against the ground via the force of gravity sends a shockwave through the air. Whether or not a person is in range of the shockwave is completely irrelevant.
This is the highest form of hubris: it takes people/intelligence for quantifications to have meaning. Bullshit.
Take a universe exactly like ours in every respect with the very minor alteration that life never got started on earth. Well guess what? It still takes a minimum threshold of matter to condense and form a burning star. The label we've given to that threshold is nothing; a mere convienience. The real important fact is that matter *can* condense into a burning star, and it will do so even if there's no humans around to pontificate.
End rant.
Thanks for the moderator tip
(sorry couldn't resist
I was just idly hitting alt-x (random article) on Wikipedia last week and I came across this great page.
It reminded me why I got into computer programming in the first place. D&D modules were the 'software' of games.
I'm not sure kids playing today have this same experience. It seemed to me for a long time that modern D&D adventures were played in cheap card games (Magic The Gathering) and in RPG computer/console games.
It's great to hear that far from being dead and gone D&D is actually still a great pastime. Now if we could just get WOTC to hire Gary Gygax...
Right at the top of the page it says:Can I buy some pot from these moderators and the parent poster?
Yes, toasters do not rebel. They just annoy :)
Toaster: You know the last time you had toast? 18 days ago. 11:36, Tuesday the 3rd. Two rounds.
Lister: Ssshhh!
Toaster: I mean, what's the point of buying a toaster with artificial intelligence if you don't like toast?
Lister: I do like toast!
Toaster: I mean, this is my job! This is cruel! Just cruel!
Lister: Look, I'm busy!
Toaster: Oh, you're not busy eating toast, are you?
Lister: I don't want any!!
Toaster: I mean, the whole purpose of my existence is to serve you with hot, buttered, scrummy toast. If you don't want any, then my existence is meaningless.
Lister: Good.
Toaster: I toast, therefore I am.
Lister: Will you shut up?!
I wonder if he's being forced to say that by one of these guys who is secretly building a giant bomb on top of a time-fissure in Cardiff!
Nah, that couldn't happen. It's about as likely as hmm, a Doctor Who spinoff series starring a bisexual army captain. Oh wait, nevermind.
I entirely agree with your post, but I just have a minor nitpick:
An XML document can be well-formed and make it through the parser stage, but be invaild from the point of view of the schema. 'validation' is the process of comparing a schema or DTD against a document and checking that a) tags and text are nested properly and b) attribute values are legal. What you're thinking of here is actually a parser error, not a validation one.
XML parsers usually have a "validating parser" and a "non-validating parser". The default is usually the non-validating one so you can just invent tags on the fly, and the parser won't choke during validation.
Like I said just a nitpick. Good point though.
Possibly the only true statement in your post.
No. No, no and no! First of all a) the current requirement is a linking requirement - if you linked your application to libGrep (is there such a thing?) you would be responsible for distributing *your* code under GPL; and b) the discussion here is about possibility under consideration of making this GPL requirement necessary if your application uses the output of a GPLd program. The only way you'd have to redistribute grep is if you forked the grep project for some reason and made your own modifications. And then you'd only have to give the source code to someone in a reasonable format *if they asked*.
This was never a requirement. We're talking about *your* code which wraps the output of GPLd programs - not someone elses code that you must redistribute. Wherever did you get that
idea?
Bad bad slashdot for modding up this tripe.
As long as the HTML isn't seriously screwy, these systems make it just as easy to read HTML as XML. Even if the HTML is really screwed up these tools will make a best guess anyway, and will occassionaly work as expected in the extreme case.
Even so, I'm more of an advocate of everyone publishing their data in a custom XML format (as long as you don't go changing it on me!). Then, it's easy to grab whatever you want off the web using a quick get/parse, and transform it or dig down into it using XSLT / XPath. The semantics of XML tags is so much smarter. Example:
In HTML:
<ol>
<li>vanilla ice-cream</li>
<li>chocolate</li>
<li>whip-cream</li>
<li>cherry</li>
</ol>
In XML:
<recipe id='graveyhead-sunday'>
<ingredients>
<li>vanilla ice-cream</li>
<li>chocolate</li>
<li>whip-cream</li>
<li>cherry</li>
</ingredients>
</recipe>
Now, let's say you're writing some code to dig into this structure and get the ingredients list. Let's say that this is the fourth ordered list tag on the page. The XPath code for the HTML, version looks something like this:
or, if you're lucky, something like this may work:
Now, in XML, the same XPath expression looks more like this:
Admittedly it's more verbose. It's also *much* easier to read and maintain.
Anyhow, that's just my 2 pennies.
If you want something lower level even, there's the GD library. There are lovely GD bindings for PHP, Perl and others.
Happy command-line drawing!
One thing you can say for sure now about Xena, Santa, and the Easterbunny is:
:)
they definitely exist.
You know what's really funny about your rant against XML right after complaining that search engines won't be able to index? This google tool gives you *exact* control over what is indexed on your site. You provide google with data in guess what format?
The future is XML for data and HTML/DHTML/CSS for presentation. Plain HTML circa 1995 was the big hackish joke. The architecture of the web is coming of age.
Nice response to the GP's flamebait, Bogtha.
I'd just like to add one other thing: he is also VERY wrong about XSLT. It's a fantastic language if you know what you're doing. You can take any XML based format and transform it to any other with a minimum of hastle - much less hastle in fact than transforming a data format from one type to another in *any other language*.
In addition to transforming one XML format to another, XSLT can also produce HTML and plain text output as well. It can turn XML into virtually anything - and it does so with a highly elegant syntax (alongside XPATH).
It's even possible to get it to produce some binary formats. For example, you can transform your own XML format into XSL formatting objects and then use a tool like Apache Ant to make the whole thing into a PDF. It's possible with this combination of technologies to build a website in which every single page is available as a PDF.
So, grandparent poster - please get a clue. I don't understand where your obsessive hatred of XML comes from. Perhaps you've had bad teachers or a bad expreience with XML code written by an ametueur.
You know what I find especially lame about the now infamous "NOOOOOOOO!!!" scene? It was delivered by James Earl Jones, not Hayden Christensen. James of course was responsible for making Darth Vader such a badass in Ep. 4-6. The fact that a distingushed professional as him could have delivered such a horrid stinky scene is highly dissapointing.
Oh well, personally I'll just continue to enjoy those 3 great original movies and ignore the latest 3 stinkers.
Quit waiting and start building! You can even do it with the free MSVC command line tools.
Yes, the instructions are for building Firefox, but it is a simple matter to switch targets if you've followed the rest of those build instructions.
Whatever...
As others have pointed out, that money would be much better spent on other, actual scientific work. Why not just give NASA the cash and allow them to prioritize their own work. Or do you really think George and Co. are more qualified to do so?
Look, this is a simple ploy by Bush to not look like a complete asshole in the eyes of history. I sincerely hope it will not work.
"Violence is the last refuge of the incompetent."
Where are these phrases rooted, anyone know?
More to the point: what do you do with this machine once it's up and running?
;)
A machine that takes 20 minutes to drag the mouse from 0 to 640 is not exactly a useful piece of machinery
(see my sig)
SlashHack is a cool example of an app written on top of the Mozilla platform.
The article is correct Firefox (really Moz as others pointed out) is a fantastic development platform.
The technology is especially cool for me: I wrote a system in 2000 for a client that positions Java Swing widgets using XML, in order that the app could support pluggable skins. I view XUL as the ultimate application of that architecture. A fantastic decoupling of logic and presentation.