Bogtha · Slashdot Mirror

Re:Valid Markup != Good Code on NYTimes.com Hand-Codes HTML & CSS · 2008-04-30 06:28 · Score: 1

Google, in particular, has stripped down their web code to the bare minimum

I don't believe that's true. Please see this comment. I'd be interested to hear of any evidence otherwise though.

Apathy is being indifferent; this is an intentional decision.

Apathy isn't something that happens by mistake you know, you can be intentionally apathetic. They made the decision to pay no heed to the specifications. That's apathy. Feel free to disagree, but there's not much point in arguing word definitions, my meaning is clear by now.

You might say that browsers change, thus breaking these non-fully-standard sites, but it's a spurious argument: everyone *will* make the changes, it won't them take very long, and they *still* won't care about the deviations from the new spec.

So syntax mistakes you made earlier for no good reason may cause more work at an undetermined point in the future, and you feel confident in saying that it will be easy to fix. Care to back that up? For a lot of organisations, making any changes to their sites whatsoever entails hiring a consultant. That's not an expense you want popping up at inopportune moments.

Re:W3C on NYTimes.com Hand-Codes HTML & CSS · 2008-04-30 06:12 · Score: 1

Citation, please?

If you want the authoritative source, you'll have to buy the ISO 8879:1986 standard, it costs around EUR140. Aren't "open standards" like HTML great?

I've read the HTML spec and see nothing of the sort.

The HTML specification defines the content model, not how the syntax should be parsed. It gives a brief overview in the introductory material for people unfamiliar with SGML, but it's incomplete and refers readers to the SGML standard I just mentioned. It does, however briefly mention the shorthand syntax in the appendix, which is probably why you missed it.

The purpose for checking attributes is to avoid running into conflicts with future attributes that might be declared as part of the standard.

No, that might be your reason to use a validator, but it's not the validator's purpose in checking them. The validator's purpose in checking them is because it is a syntax checker, and the syntax defines which attributes are acceptable. To complain that a syntax checker is pointing out syntax errors is ludicrous. If you don't want to know about syntax errors, don't use a syntax checker.

strict attribute checking is a waste of time and effort

This has nothing to do with strict attribute checking. The validator doesn't reject XHTML-style empty elements because it thinks the slash is an attribute it doesn't recognise, it rejects it because the element is opened and the greater-than sign becomes character data. This gives rise to problems such as having character data within the <head> element where it isn't permissible, which is a problem quite different to an incorrect attribute.

the HTML standard is fundamentally flawed in that it did not provide a clean, standard way of providing arbitrary tagging of elements with additional information except through attributes, and a strict attribute check makes that impossible.

Look up the class attribute. That's exactly what it's for.

I don't know any definition of pedantic that strict attribute validation doesn't meet

But of course the validator is being pedantic! That's the entire purpose of its existence! What good would it be if it didn't pedantically go through your markup, looking for all the errors it could find?

I'm not disagreeing about the validator being called pedantic, I'm pointing out that complaining about it is like complaining water is wet. It's part of its fundamental nature.

If you want some kind of checker that doesn't check validity, but uses heuristics to point out potential problems, then you want a linter, not a validator.

I find it very obnoxious that the W3C validator doesn't allow you to disable that check

I haven't looked at the source in quite a while, but last time I checked it would be pretty difficult to do so, because the SGML feature that allows those shortcuts is also the SGML feature that allows minimised attributes.

But the validator is open-source, so if you think it's easy, download it and do it yourself. It's pretty obnoxious to use their free service all the time when you could be running it locally anyway.

Re:Valid Markup != Good Code on NYTimes.com Hand-Codes HTML & CSS · 2008-04-30 05:40 · Score: 1

You missed the "apathy" part. I don't believe they have even attempted to make their code valid. I do believe that you could take anybody from each of those teams and have them make their code valid with ease, without sacrificing any functionality. If you disagree, please give examples of problematic code.

Re:W3C on NYTimes.com Hand-Codes HTML & CSS · 2008-04-30 05:29 · Score: 1

How many of these sites you speak of where people had to go back and make changes after each consecutive version of a browser came out actually kept their sites looking the same over that time?

Netscape were releasing a new major version about once per year at that point, so I'd say approximately 100% of them.

Int he real world, corporations often update the look and feel of their site on a fairly regular basis so as to keep the site "fresh".

Yes, and when they do so, it's incredibly rare for them to recode all their content. Mistakes like unencoded ampersands tend to stick around.

in the real world

You've said that twice now. I'm a web developer. I deal with real-world web development every day. Saying "in the real world" doesn't magically make your argument any more valid. As far as I can tell, it's code for "I haven't experienced what you are talking about, so it doesn't exist". More experience tends to fix that opinion.

It is much more likely in the real world to go back and completely redo the look and often the content of the site, than to have to go back to fix ampersand issues to work with new versions of browsers.

Oh, it's not just ampersands. They are just an example of a very long trend that started in the early 90s and continues to this very day. It was only recently that I heard somebody complaining that their invalid code was breaking due to a software upgrade.

IMHO, websites are changed more rapidly in drastic ways more often than broswer revisions come out.

It's not the frequency that matters, it's the latency between discovering you have a problem and fixing it that is important, and outside of people who redesign their blog once a week, you cannot simply wait and hope the redesign you have planned fixes your broken site, you need it fixed immediately. Let's pick a ludicrous value, and say you redesign every month. The worst case scenario is that you have to wait a month until your site gets fixed. Think that's acceptable?

Another thing you are forgetting is that it's not just browsers that deal with HTML. The person I just mentioned complaining that their invalid code was breaking — that was down to a change in a webmail provider. Search engines change their parsers on a regular basis too. Not to mention feed readers, aggregators, etc that your HTML might end up in.

Re:W3C on NYTimes.com Hand-Codes HTML & CSS · 2008-04-30 05:08 · Score: 1

It's only 95% valid HTML according to the strict specs, but it is sufficient for now." It is not laziness to say this.

That depends on what the 5% is. If it's something that actually requires work, fair enough. If it's something trivial that can be fixed in ten seconds, that shows that they haven't even bothered to look what errors there are. That's lazy. It's one thing to prioritise other tasks higher, it's another thing entirely to ignore a handy list of errors altogether.

Pick any other random industry and tell them you have a magic device that will automatically catch them when they make a mistake, that it's free, and that it works instantly, and they'd welcome it with open arms. It amazes me that some developers resent such a useful tool so much.

Re:W3C on NYTimes.com Hand-Codes HTML & CSS · 2008-04-30 04:55 · Score: 1

your slam against Americans makes you sound like a bigot.

I didn't slam Americans. Read it again.

The American stereotype is that they are arrogant and ignorant of the rest of the world. Do you disagree?

An American correcting an English person on his use of the English language — when in fact it is proper English and only incorrect American English — reinforces that stereotype. Do you disagree?

Acknowledging that a stereotype exists and pointing out when somebody is making it worse does not mean that you agree with it.

Re:W3C on NYTimes.com Hand-Codes HTML & CSS · 2008-04-30 03:57 · Score: 1

since you can be sure that any " & " is an ampersand

That's one of the few cases where you don't need to encode the ampersand.

Time consuming, yes, but far less than using vi/emacs/nano/pico on every file.

But far more than doing things correctly in the first place.

Re:W3C on NYTimes.com Hand-Codes HTML & CSS · 2008-04-30 03:54 · Score: 1

I'm afraid not. The semicolon is not always required, it depends on what immediately follows. For instance, if it's whitespace, the semicolon is not necessary.

Re:W3C on NYTimes.com Hand-Codes HTML & CSS · 2008-04-30 03:11 · Score: 1

Only if the people publishing content havn't been told not to use them

And why would you tell them any such thing? By what mechanism would you suggest they input special characters, and why is implementing such a mechanism better than simply encoding ampersands correctly in the first place?

you can incorporate that into the training course

You seem to want to go to an awful lot of trouble to avoid typing &.

Re:Valid Markup != Good Code on NYTimes.com Hand-Codes HTML & CSS · 2008-04-30 02:33 · Score: 1

Have you ever tried to validate Google's homepage?

Would that be the famously sparse Google homepage? Regardless, what works for Google and what works for everybody else are two very different things. If Microsoft released a version of Internet Explorer that choked on Google's syntax errors, there would be a huge outcry, a Microsoft manager would get a bollocking, and a new version of Internet Explorer would be promptly released. Do your clients have the kind of popularity that can make Microsoft jump through hoops?

It fails miserably, specifically because they removed every unnecessary bit of markup and javascript to save on their bandwidth bill.

This has been received wisdom for years, but I haven't seen any evidence of it. A few years back I went through the code and saw many obvious places where they could save a hell of a lot more bandwidth than the fraction saved by invalid code, but they chose not to. Sure, the code's compact and clearly not intended for human consumption, but that could just as easily be an artefact of code generation or similar. Even if they do bother with minimal optimisation, that doesn't mean they would go to the lengths of using syntax errors for that purpose. Do you have anything to back up this claim?

Re:W3C on NYTimes.com Hand-Codes HTML & CSS · 2008-04-30 02:30 · Score: 1

Have you ever heard of a content management system.

Yes, I've worked on my fair share of them.

That's where you handle things like translating & to &

Doing so would immediately result in incorrect, ugly code being presented to the end user, as character entities get double encoded.

Re:W3C on NYTimes.com Hand-Codes HTML & CSS · 2008-04-30 02:21 · Score: 1

On a website, especially for a newspaper or a news feed, there is a distinction between the framework (code) and the content (articles). It's a good programming paradigm used not only on websites but often when building traditional programs. Make sense now?

Of course I know about the separation between layers. But I fail to see why he thinks all ampersands in the content need encoding and all ampersands in the code should be left alone.

Because maybe he doesn't have a magic time machine to go back and fix code written 1, 2, 5 or 10 years ago?

I think you might have lost track of the context. This sub-thread was caused by somebody responding to me saying:

When you can do things correctly right now for no effort, why on earth would you risk incurring extra work in the future?

Of course if you are stuck with a load of legacy data a regexp can come in handy. That doesn't mean that it's easy or that it's a good alternative to doing things properly.

And it really isn't that hard to separate code from content, which makes find-replace and regexp replacement straightforward, which you seem to have a hard time wrapping your head around.

The separation is obvious, it's the bald assertion that it makes find-replace straightforward that I have the problem with. Ampersands can appear in code and content in both encoded and unencoded forms. Separating the two doesn't help.

Re:W3C on NYTimes.com Hand-Codes HTML & CSS · 2008-04-30 02:14 · Score: 1

Per the specification, omitted end tags are valid.

No, omitted end tags are valid only for some element types, as the section you link to clearly says. The HTML specification lists exactly which element types it is permissible to omit end tags for. The NYTimes are omitting end tags for <div> elements, which are required.

Re:W3C on NYTimes.com Hand-Codes HTML & CSS · 2008-04-30 01:51 · Score: 1

Only if you don't know how to write a proper regular expression.

No, the problem with using a regular expression is the ambiguity, not writing the regexp. When faced with &shy do you leave it alone or escape the ampersand? It looks like a soft-hyphen, so it should be left alone, right? But then you run it over your personal ads and you miss things like "Looking for somebody cute&shy". The reverse is also a problem. If you do account for situations like that, then you can end up double-escaping, which ends up displaying unintelligible code to the end-user. Are you sure you're going to consider all possible combinations of encoded/unencoded/trailing-shitespace/trailing-characters/etc before running the regexp?

If software could reliably tell the difference, there wouldn't be a need to encode them in the first place, would there? If software could reliably tell the difference, you wouldn't be scrambling to deal with the browser failing to do so, would you? It's a harder problem than you think and just blindly running a regexp you cooked up in an hour over hundreds of thousands of pages of content is a recipe for disaster.

Re:W3C on NYTimes.com Hand-Codes HTML & CSS · 2008-04-30 01:34 · Score: 1

It's not obvious what you mean by "keep ampersands out", the distinction you draw between HTML code and content, or why you would do so when the alternative of doing things correctly from the beginning is so simple. Please clarify.

Re:W3C on NYTimes.com Hand-Codes HTML & CSS · 2008-04-30 01:31 · Score: 1

For example IE6 doesn't understand auto margins for centering, but setting text-align to center will center divs.

Internet Explorer 6 works just fine in this respect unless you kick it into quirks mode. It's 5.5 and below that can't handle auto margins.

Re:W3C on NYTimes.com Hand-Codes HTML & CSS · 2008-04-30 01:28 · Score: 4, Informative

It may or may not be improper American English, but "misspelt" is certainly correct English. Consult the OED if you don't believe me.

This is far from the first time I've had an ignorant American attempt to "correct" my proper English into your regional dialect. It's pretty annoying and reinforces negative aspects of your national stereotype.

Re:W3C on NYTimes.com Hand-Codes HTML & CSS · 2008-04-29 16:58 · Score: 1

I have one thanks. Regular expressions don't fix things either. At best, you'd end up with a crude heuristic that would result in you reviewing each document to make sure it hadn't screwed anything up, and that's after you came up with the regexp to try to guess at what's appropriate. Heh. You'd quite literally now have two problems.

Re:Valid Markup != Good Code on NYTimes.com Hand-Codes HTML & CSS · 2008-04-29 16:53 · Score: 3, Informative

An & sign in a link to a URL isn't a syntax error

Yes, it is. Don't just take my word for it, take a look at what the HTML specification has to say on the matter.

treating it as such would nullify all GET parameters after the first one.

You are confusing a URI with the representation of that URI within an HTML document. Just because it appears as & in the document, it doesn't mean that's what you end up with after it has been parsed.

Re:W3C on NYTimes.com Hand-Codes HTML & CSS · 2008-04-29 16:41 · Score: 1

Search and replace doesn't cut it. It would screw up all the character entity references and numeric character references on your site. That's even more of a problem for a newspaper site than other sites, as they usually have decent typography, like proper dashes, etc, which are often implemented with character entity references.

Re:W3C on NYTimes.com Hand-Codes HTML & CSS · 2008-04-29 16:36 · Score: 3, Informative

while XHTML syntax is not strictly speaking correct HTML it is still valid HTML.

This is simply not true. It's incorrect and invalid.

What you may be thinking of is Appendix C of the XHTML 1.0 specification. It lays out a series of guidelines that minimise incompatibility with legacy user-agents. This means that it is relatively safe to transmit XHTML 1.0 documents following these guidelines as text/html. What it does not mean is that those XHTML 1.0 documents magically become valid HTML documents. They are not.

Re:W3C on NYTimes.com Hand-Codes HTML & CSS · 2008-04-29 16:32 · Score: 1

One additional thing:

all HTML browsers have to ignore unknown properties in tags

HTML doesn't define error handling. It offers non-normative suggestions, but HTML parsers aren't required to follow them.

Re:W3C on NYTimes.com Hand-Codes HTML & CSS · 2008-04-29 16:28 · Score: 3, Insightful

all HTML browsers have to ignore unknown properties in tags

That reasoning would work if the people behind XML had chosen any other character to indicate empty elements. But unfortunately, they chose the slash. Not many people realise because browser support is rare, but a slash inside an opening tag means that it is the end of the tag and the contents follow. Basically, <foo/>x/ is equivalent to <foo>>x</foo> .

So no, while parsers that don't implement HTML fully might mistakenly treat it like an attribute, a parser that fully implements HTML cannot do so, and a validator certainly shouldn't.

the W3C validator is being way too pedantic (as usual).

What on earth do you think a validator is for, if not to point out syntax errors? Do you complain that your spelling checker is being pedantic when it tells you that you have misspelt something?

Re:Valid Markup != Good Code on NYTimes.com Hand-Codes HTML & CSS · 2008-04-29 16:17 · Score: 2, Interesting

I am willing to bet that out of the top 100 sites on the internet, the front page of all of them will produce Markup validation errors. The reason is simple: The validation rules are so restrictive that there is no point even worrying about them.

You're right about valid code being rare, but wrong about the reason. Sturgeon's Revelation applies to developers.

It would be impossible to make a working website by being totally loyal to the markup rules.

That's not even close to being true. Take the NYTimes for example. Would you care to point out a syntax error they've made that is actually necessary, where the valid alternative wouldn't work?

The same goes for those "100 top sites" you mentioned. They aren't invalid because valid code is impossible to get working, they are invalid due to apathy and ignorance. In practically every case, you could take a mildly competent developer, throw the code at him, and have it valid in next to no time. Hell, in many cases, a program can do it automatically! The cases where invalid code is actually required to achieve a particular effect are far and few between these days.

Especially with the validator's stupidity in treating & signs in the href attribute of my a elements as the beginning of an entity which it's not!

The validator is completely correct. That's a syntax error and the job of a validator is to point out syntax errors to you.

Re:W3C on NYTimes.com Hand-Codes HTML & CSS · 2008-04-29 16:06 · Score: 4, Informative

<br /> is XHTML standard and <br> is the regular HTML 4 standard. Both are correct

No, one is correct for XHTML and incorrect for HTML, and one is incorrect for XHTML and correct for HTML. The NYTimes use HTML. That means the XHTML syntax is incorrect.

Slashdot Mirror

User: Bogtha

Comments · 3,000