With a quoted phrase '"u tube"', you are correct. With the much more likely unquoted two-word search phrase 'u tube', Universal Tube & Rollform Equipment Corporation's utube.com is the second result, behind youtube.com.
Yes, you are right. I'm just too used to searching with quotes, I guess, that I dind't think about not doing it. That doesn't change the fact that their markup is horrendous, though my original point about that being the reason for their absence from Google's index is now proven to be partly incorrect. My research could have been better, I admit.
Didn't suggest that it was. Only that your statement that "Non-semantic code is basically just jibberish for search engines" is about an order of magnitude too strong.
Well, it doesn't make any more sense to a search engine than a PDF document or Flash file without any accessibility voodoo applied to it. It's basically just a huge blob of text, where no parts of the text are weighed or indexed any differently than the other. And that's not very search engine — nor user — friendly.
What are you talking about? utube.com is the very first result on a Google search for "utube", and the second for "u tube".
"u tube" does not list the domain utube.com in the first 100 search results. Neither does u tube. I wrongly suggested that utube didn't either, and for that I am sorry.
And Google is smart enough to understand code that doesn't pass w3c validation - which is good, because only a vanishingly small percent of sites pass.
Just because Google vaguely understands egregiously broken HTML doesn't mean it's a good practice producing it. It's a fact that Google and any other search engine weights pages with less markup and more content higher than pages with more markup and less content. They also weigh pages with semantic markup (which uses <h1>...</h1> for headings instead of <font size="4"><b>...</b></font>, <p> for paragraphs instead of <br><br>, <ul> and <li> for lists instead of <br>-separated lines with <img src="bullet.gif">, etc) higher than pages with complete rubbish tagsoup markup like the one utube.com employs. They also fancy regular HTTP URI's better than JavaScript function calls as links. A unique title on each page wouldn't hurt either. I could go on and on, but I shouldn't have to; this is so obvious that it should be screaming holes in your eyes just by looking at the source code.
I still don't understand how you consider "invalidity of HTML is one of the reasons for utube's absence from Google's index" a valid statement. It has nothing to do with the fact that it doesn't validate, the fact that it doesn't is only a hint that there *might* be some other flaws that prevent Google from indexing it or ranking it better (no proof shown yet). I do not like such attempts to give W3C HTML validation more importance than it deserves.
Besides being a troll or as incompetent as the developers behind utube.com, what other good reason do you have for supporting its extremely poor coding and non-existing search engine optimalization? Do you have economic interests in the company? Did you develop the code? If not, what?
The fact is that the HTML code (and anyone with any experience and competence in this field will tell you the same, including any Google employee) is so bad that Google can't tell the difference between 2380 pages, although all of those pages (in theory) are unique. There are so many problems with the markup besides it not being valid, that I can't bring myself to understand why you want to debate it. The markup is of 1997-quality and you know it. At least if you know semantic HTML from tagsoup. If you don't, I see no point in continuing this discussion.
Let's view this from another point: Explain to me how the markup can be of excellent and for some reason, utube.com still doesn't manage to get within the first 100 (or even 500) first search results on Google. Is it conspiracy? If not, what?
Can you give me an authoritative source with information about Google's using W3C's HTML validation in the PageRank calculation?
No, I can't, because they don't. But valid pages are more often than not also semantic and structured, so it's just one side of the same issue. If you read my whole comment, you would have seen that the invalidity of U Tube's HTML is not the only cause for their (as good as) complete absence from Google's index. Searching for pages indexed on the utube.com domain yields exactly one search result, namely the front page. None of the sub-pages are indexed at all. That's proof enough that the quality of their markup is so poor that Google doesn't understand much of it at all.
It's no wonder that U Tube looses customers. A Google search for "U Tube" or search for "utube"doesn't yield Universal Tube's homepage as any of the first 100 results. Why is this? Well, here's one reason. Another one is that the web page uses completely non-semantic markup without a single H1 element in the source code. Non-semantic code is basically just jibberish for search engines, so they could just as well have riddled their pages with googlygook. The search results would be the same.
Yet another reason is that "utube" or "u tube" is mentioned nowhere on their pages. The design is built with endless amounts of nested tables. The markup-to-content ratio is sky-high. And they're using JavaScript as the main means for navigation. It almost looks like they're actively blocking users and search engines from making any sence of their web pages. A redesign with a solid implementation of semantic and accessible markup, would increase the searchability and usability of their pages by an order of magnitude and they would both get happier customers and less reason to go to court.
But it's of course much easier to just sue YouTube for their own incompetence. Or ignorance. Or both.
Yes, there is something wrong with the code. It is written a way that is almost equivalent of var 1 = 2; which of course doesn't work. That object property names are parsed and interpreted differently than variable names in other browsers than Opera is the real bug here.
Opera is behaving perfectly while the others are not. And the D2 code is egregiously and horribly broken. That the Slashdot developers aren't willing to do some string concatenation to fix the bug in their code because it supposedly hurts performance (I'd bet $10 that it doesn't) is just ignorant and stubborn.
If you had the chance to rewrite the whole CSS 1 specification (and thus the later CSS 2 and CSS 3 specifications which depend on it), what would you like to change? Does the inconsistency of property names like 'color' (which applies to text) and 'background-color' (which applies to elements), the (in)famous box model (don't you agree that IE's quirky implementation is more intuitive?) and other things like that bother you, or do you feel like most of it is as perfect as it can get? If there was anything you could change, what would it be?
Well, what you use on paper and what you use on screen differs because the DPI differs. When monitors sports 300DPI and above, serifs will look good on screen too. Until then, sans-serifs are easier on the eye and thus also easier to read. Reading large blocks of text on-screen sucks no matter what typeface you use; serif or sans-serif.
I always print out web pages if they contain more text to read than what could fit on one A4 sheet of paper, and when it's printed, I prefer to read it in a seriffed font. On screen, though, I definately prefer sans-serif.
Although I don't agree with you or this campaign, I understand your position. However, as a web master with knowledge in W3C's specifications and IE's conditional comments, I hope that you see the value of writing good code. If you have had a peek at Explorer Destroyer's code, wouldn't you agree that it is not good? Infact, won't you agree that it's some of the worst browser-detection JavaScript code you've ever seen? Even though you agree with this campaign, can't you admit that Explorer Destroyer is perhaps the worst way to go through with it?
You are wrong on so many levels I don't even know where to start. First, it's obvious that you don't develop web pages a lot and if you do, haven't really started using CSS to a greater extent than setting font and background colors. If you knew what you were talking about, you would know that Internet Explorer has okay support for HTML (it's still a lot of tags it doesn't support there, like ABBR), good support for CSS Level 1, but terrible (and I mean excruciatingly, painfully, wreckingly bad) support for CSS Level 2 (note that the original CSS2 specification is found here and is 8 years old).
You talk about "The W3C specifications" and what good support IE has for them without being specific about which it supports well. This also leads me to believe that you have no idea of what you are talking about. There are a lot of specifications W3C has released that Internet Explorer either supports half-way with a lot of bugs (like CSS2, PNG, DOM Level 2) and some not at all (like XHTML, SVG, XForms and DOM Level 3). The greatest problem with IE is not the lack of support for newer (well, "newer" is relative, considering that the PNG specification is 10 years old and the SVG specification is 5 years old) specifications, but the half-way support it has for standards like CSS2 and PNG.
Had it not supported it, and supported HTML the way it's defined (like OBJECT for example), creating fallbacks would be easy. With half-way support, you need to resort to all sorts of hacks to make something work, because IE claims to support it fully, but doesn't, and thus breaks completely if you try to follow the standard.
Claiming that Microsoft is active in any of these working groups is either a truth with modifications or a blatant lie. Had Microsoft been active in developing any of these specifications, wouldn't you agree that it's a bit odd that the company as a whole and Internet Explorer particularly supports some of them so badly and yet more of them not at all? The obvious fact is that Microsoft haven't been involved in developing any of these specifications and still after almost 9 years haven't managed to read and understand the the first HTML 4.0 Specification completely.
No, Internet Explorer does not have good support for "the W3C specifications". It supports some specifications okay, some badly and some not at all. Not having full support for HTML 4.01 and CSS2 in 2006 is just embarrassing. Oh, and both background and valign are defined in the HTML 4.01 specification (e.g. they're "standard"), and they're attributes, not tags.
On a last note, I'd like to point out what's been pointed out many times already, namely that the method Explorer Destroyer uses to detect Internet Explorer and all the code surrounding it, is horrific, terrible, very unsolid and simply said very bad. Don't use it.
With a quoted phrase '"u tube"', you are correct. With the much more likely unquoted two-word search phrase 'u tube', Universal Tube & Rollform Equipment Corporation's utube.com is the second result, behind youtube.com.
Yes, you are right. I'm just too used to searching with quotes, I guess, that I dind't think about not doing it. That doesn't change the fact that their markup is horrendous, though my original point about that being the reason for their absence from Google's index is now proven to be partly incorrect. My research could have been better, I admit.
Didn't suggest that it was. Only that your statement that "Non-semantic code is basically just jibberish for search engines" is about an order of magnitude too strong.
Well, it doesn't make any more sense to a search engine than a PDF document or Flash file without any accessibility voodoo applied to it. It's basically just a huge blob of text, where no parts of the text are weighed or indexed any differently than the other. And that's not very search engine — nor user — friendly.
"u tube" does not list the domain utube.com in the first 100 search results. Neither does u tube. I wrongly suggested that utube didn't either, and for that I am sorry.
Just because Google vaguely understands egregiously broken HTML doesn't mean it's a good practice producing it. It's a fact that Google and any other search engine weights pages with less markup and more content higher than pages with more markup and less content. They also weigh pages with semantic markup (which uses <h1>...</h1> for headings instead of <font size="4"><b>...</b></font>, <p> for paragraphs instead of <br><br>, <ul> and <li> for lists instead of <br>-separated lines with <img src="bullet.gif">, etc) higher than pages with complete rubbish tagsoup markup like the one utube.com employs. They also fancy regular HTTP URI's better than JavaScript function calls as links. A unique title on each page wouldn't hurt either. I could go on and on, but I shouldn't have to; this is so obvious that it should be screaming holes in your eyes just by looking at the source code.
Besides being a troll or as incompetent as the developers behind utube.com, what other good reason do you have for supporting its extremely poor coding and non-existing search engine optimalization? Do you have economic interests in the company? Did you develop the code? If not, what?
The fact is that the HTML code (and anyone with any experience and competence in this field will tell you the same, including any Google employee) is so bad that Google can't tell the difference between 2380 pages, although all of those pages (in theory) are unique. There are so many problems with the markup besides it not being valid, that I can't bring myself to understand why you want to debate it. The markup is of 1997-quality and you know it. At least if you know semantic HTML from tagsoup. If you don't, I see no point in continuing this discussion.
Let's view this from another point: Explain to me how the markup can be of excellent and for some reason, utube.com still doesn't manage to get within the first 100 (or even 500) first search results on Google. Is it conspiracy? If not, what?
No, I can't, because they don't. But valid pages are more often than not also semantic and structured, so it's just one side of the same issue. If you read my whole comment, you would have seen that the invalidity of U Tube's HTML is not the only cause for their (as good as) complete absence from Google's index. Searching for pages indexed on the utube.com domain yields exactly one search result, namely the front page. None of the sub-pages are indexed at all. That's proof enough that the quality of their markup is so poor that Google doesn't understand much of it at all.
It's no wonder that U Tube looses customers. A Google search for "U Tube" or search for "utube"doesn't yield Universal Tube's homepage as any of the first 100 results. Why is this? Well, here's one reason. Another one is that the web page uses completely non-semantic markup without a single H1 element in the source code. Non-semantic code is basically just jibberish for search engines, so they could just as well have riddled their pages with googlygook. The search results would be the same.
Yet another reason is that "utube" or "u tube" is mentioned nowhere on their pages. The design is built with endless amounts of nested tables. The markup-to-content ratio is sky-high. And they're using JavaScript as the main means for navigation. It almost looks like they're actively blocking users and search engines from making any sence of their web pages. A redesign with a solid implementation of semantic and accessible markup, would increase the searchability and usability of their pages by an order of magnitude and they would both get happier customers and less reason to go to court.
But it's of course much easier to just sue YouTube for their own incompetence. Or ignorance. Or both.
Yes, there is something wrong with the code. It is written a way that is almost equivalent of var 1 = 2; which of course doesn't work. That object property names are parsed and interpreted differently than variable names in other browsers than Opera is the real bug here.
Opera is behaving perfectly while the others are not. And the D2 code is egregiously and horribly broken. That the Slashdot developers aren't willing to do some string concatenation to fix the bug in their code because it supposedly hurts performance (I'd bet $10 that it doesn't) is just ignorant and stubborn.
If you had the chance to rewrite the whole CSS 1 specification (and thus the later CSS 2 and CSS 3 specifications which depend on it), what would you like to change? Does the inconsistency of property names like 'color' (which applies to text) and 'background-color' (which applies to elements), the (in)famous box model (don't you agree that IE's quirky implementation is more intuitive?) and other things like that bother you, or do you feel like most of it is as perfect as it can get? If there was anything you could change, what would it be?
Well, what you use on paper and what you use on screen differs because the DPI differs. When monitors sports 300DPI and above, serifs will look good on screen too. Until then, sans-serifs are easier on the eye and thus also easier to read. Reading large blocks of text on-screen sucks no matter what typeface you use; serif or sans-serif.
I always print out web pages if they contain more text to read than what could fit on one A4 sheet of paper, and when it's printed, I prefer to read it in a seriffed font. On screen, though, I definately prefer sans-serif.
Although I don't agree with you or this campaign, I understand your position. However, as a web master with knowledge in W3C's specifications and IE's conditional comments, I hope that you see the value of writing good code. If you have had a peek at Explorer Destroyer's code, wouldn't you agree that it is not good? Infact, won't you agree that it's some of the worst browser-detection JavaScript code you've ever seen? Even though you agree with this campaign, can't you admit that Explorer Destroyer is perhaps the worst way to go through with it?
You are wrong on so many levels I don't even know where to start. First, it's obvious that you don't develop web pages a lot and if you do, haven't really started using CSS to a greater extent than setting font and background colors. If you knew what you were talking about, you would know that Internet Explorer has okay support for HTML (it's still a lot of tags it doesn't support there, like ABBR), good support for CSS Level 1, but terrible (and I mean excruciatingly, painfully, wreckingly bad) support for CSS Level 2 (note that the original CSS2 specification is found here and is 8 years old).
You talk about "The W3C specifications" and what good support IE has for them without being specific about which it supports well. This also leads me to believe that you have no idea of what you are talking about. There are a lot of specifications W3C has released that Internet Explorer either supports half-way with a lot of bugs (like CSS2, PNG, DOM Level 2) and some not at all (like XHTML, SVG, XForms and DOM Level 3). The greatest problem with IE is not the lack of support for newer (well, "newer" is relative, considering that the PNG specification is 10 years old and the SVG specification is 5 years old) specifications, but the half-way support it has for standards like CSS2 and PNG.
Had it not supported it, and supported HTML the way it's defined (like OBJECT for example), creating fallbacks would be easy. With half-way support, you need to resort to all sorts of hacks to make something work, because IE claims to support it fully, but doesn't, and thus breaks completely if you try to follow the standard.
Claiming that Microsoft is active in any of these working groups is either a truth with modifications or a blatant lie. Had Microsoft been active in developing any of these specifications, wouldn't you agree that it's a bit odd that the company as a whole and Internet Explorer particularly supports some of them so badly and yet more of them not at all? The obvious fact is that Microsoft haven't been involved in developing any of these specifications and still after almost 9 years haven't managed to read and understand the the first HTML 4.0 Specification completely.
No, Internet Explorer does not have good support for "the W3C specifications". It supports some specifications okay, some badly and some not at all. Not having full support for HTML 4.01 and CSS2 in 2006 is just embarrassing. Oh, and both background and valign are defined in the HTML 4.01 specification (e.g. they're "standard"), and they're attributes, not tags.
On a last note, I'd like to point out what's been pointed out many times already, namely that the method Explorer Destroyer uses to detect Internet Explorer and all the code surrounding it, is horrific, terrible, very unsolid and simply said very bad. Don't use it.