Tim Berners-Lee's List
weink writes "Tim Berners-Lee has made a career out of resolving Internet pet peeves. Ten years after he invented the Web, making the Internet user friendly, he is still drafting lists of things that could work better. "
← Back to Stories (view on slashdot.org)
TBL applied hypertext to the internet and invented the Web at CERN. He did actually invent it. He didn't invent the internet, that was DoD.
You misspelled "Al Gore."
The chief trouble with trusting metadata is that page owners and maintainers who are paid by advertisers for eyeballs will have every reason to label their pages with false metadata in order to attract clicks.
Consider the problem that search engines faced when they indexed solely on the basis of textual relevance: pr0n sites filled their pages with the same words, repeated over and over again: "teen sex xxx porn pictures teen lesbian sex erotic sex xxx porn porn xxx sex teen girl babe sex sex xxx" and so forth. This made their pages more likely to turn up at the top of a search, and thus garnered more eyeballs for their advertisers. Who suffered? Teen-age lesbians (etc.) looking for informative sites about issues related to their lives, not for hetero-oriented pr0n.
Metadata systems are just as exploitable. Anyone familiar with the Prisoner's Dilemma will recognize the following --- because these systems (like pure textual relevance search systems) reward "defecting" behaviors such as deliberately false labeling, they will not solve the problems that result therefrom.
Even notwithstanding the problem of dishonest behavior, there remains the problem of clueless or simply self-aggrandizing behavior: users labeling their pages as more relevant to a given topic than they really are, or not understanding distinctions among topics. A marketer at Dell might not know what "computer science" is, and insist that "computer science" be added to the metadata of Dell's e-commerce site. "After all, we sell very scientifically-designed computers. Isn't that what computer science means?" Cluelessness reigns supreme.
Until these problems can be solved, human-indexed sites like yahoo.com and dmoz.org will have some huge advantages over spider-powered search engines.
There's a lot of interesting things out there. In particular, I think XML and DOM could be the basis for a very good component framework in which powerful components would be easy to write, and would integrate nicely without a lot of hassle. I'm looking at RDF as a piece of this.
But, as far as I can tell, the problem that RDF solves is a bit different than the one mentioned in this article. RDF is a way of representing documents as graph structures, allowing individual files to contain both local and external pieces without everything getting tangled up.
The problem of representing metadata unambiguously is a tricky one, but is not yet solved. The RDF spec presents an interesting outline about how this might be done, but it doesn't quite tell me what I need to do to get my own Web pages to be correctly meta'ed. If I were a library, then the Dublin Core would start to give me the specific markup I needed, but that's just for libraries. What do I use do as metadata for my free software efforts?
It seems like the combination of XML plus XML-NameSpaces plus Dublin Core plus all the other recommendations, specifications, and standards analogous to the Dublin Core but for domains other than libraries might cohere into a workable metadata system for the Web, but on the other hand, the complexity and fuzziness of specification could very easily prevent the beast from flying.
When you're dealing with software, precise specification is key. Some metadata standards have succeeded pretty well in this regard - take MIME content types, for example. If you have a JPEG image, you know that the content type should be "image/jpeg". But the XML crew hasn't even managed a consistent namespace name for HTML 4.0 (I've seen "urn:w3-org-ns:HTML", "http://www.w3.org/TR/REC-html40" and others).
For those hoping for a more technical discussion of RDF, I recommend the Mozilla page on RDF and of course the specification itself.
LILO boot: linux init=/usr/bin/emacs
You should know that XML (and SGML) are only standards for creating markup tag languages. RDF is an XML compliant markup language.
Looking at the quality of 90% of the web pages out there, I think it is probably unrealistic that people will being applying RDF in an intelligent way.
In fact, using RDF in a fractured or improper way may even be more detrimental than good 'ol heuristics. Malformed RDF will send syntactically correct, but semantically incorrect metadata to a search engine equipped to handle it. This is a dnagerous combination - it makes bad search results more precisely wrong. I'd rather that have good guess than a precisely wrong answer.
It ultimately boils down to whether you trust users to be able to describe their own metadata. I don't. Perhaps a good apporach is to have centralized servers attempt to create correct RDF files based on a set of common criteria. While this is still a flawed approach, I would rather have search results that are consistent (consistently wrong or consistently right) than try to get inside the psychology of each individuals web designer's implementation of RDF metadata. This approach might also cut down on metadata abuse (trying to bump up page in searches where it should not rank highly, etc).
In other words, I think we're way way off solving the metadata/search issue. For now, the best answer seems to be human categorization (yahoo) or smart smart heuristics (google, inktomi).
I'm sorry to say this, but I hate the idea of META tags on the web. I have been an avid net user since 93, just before the browser boom. When the browsers first came out, the search engines were still based on the text of a web page. Today, you are lucky to find such a tool. Most popular engines rely on the META tags, as if the general population had some kind of superior abstraction skills.
At one time, you could force AltaVista to only show pages containing certain text or URL's. While those options are still accepted by the engine, they are largely ignored. As a user I am annoyed when I ask a search engine to only show me pages that actually contain certain strings, only to navigate to the page and turn up empty on a find.
I have actually gone from 'portal' surfing (early Yahoo) to search-based surfing, back to 'portal' surfing. The web is hairy enough that I actually *do* want someone to filter out the crap for me, unless I am looking for something extremely specific and unhappy with the portal-based results.
I do agree with some of his other thoughts, about form submission and URL changing. . .
"My husband invented the internet, and I censored all the naughty stuff on it. .
People will and do mark up their sites in ways that they think will attract eyeballs to them, regardless of whether this has anything to do with the site's actual content. This is true by abundant empirical evidence.
On the other hand, metatags make a lot of sense for huge commercial empires (Amazon, eBay, Buy.com, etc.) which will be willing to maintain reasonably accurate markings. I have a suspicion that in the not-so-distant future we will have a situation when the big search engines will accept (=believe) metatags from big commercial sites, but will ignore them from small fry. There may develop a "club" whose metatags Yahoo, AltaVista, Lycos, etc. will believe.
Kaa
Kaa's Law: In any sufficiently large group of people most are idiots.
Sorrry, that mantra doesn't hold water in this discussion. Tim Berners-Lee is single-handedly responsible for the creation of the World Wide Web. While working for CERN he invisioned an Internet service/protocol based on a worldwide hierarchy of hyperlinked pages (the concept of which was around before he came up with it) and went on to CREATE the HTML specification, the HTTP protocol, and the Universal Resource Locator (URL). He also wrote the first web browser, and in NeXTStep at that. Therefore he did invent the World Wide Web. Don't believe me? Ask the W3.
Will
Did anybody else find it amusing to hear that the Net is too complicated, when the story was surrounded by the incredibly obnoxious over-designed 'how many ads can we fit in a window?' PC World 'interface'?
GeneHack {--(bioinfo*linux*opinion)