Tim Berners-Lee and the Semantic Web

You don't want a "single" web... by Pig+Hogger · 2004-09-27 05:53 · Score: 3, Insightful

You don't want a "single" web... You want a multitude of them, and carefully isolate them (beyond normal information reading and referencing).

This is to insure against a monoculture that is so disastrous in computer circles as demonstrated by the numerous security failings of Windows...

Re:You don't want a "single" web... by JimDabell · 2004-09-27 06:03 · Score: 3, Insightful

This is to insure against a monoculture that is so disastrous in computer circles as demonstrated by the numerous security failings of Windows...

Windows executes stuff. The semantic web is just data. Your warnings about a monoculture apply to the semantic web about as much as they apply to text files.
Re:You don't want a "single" web... by JimDabell · 2004-09-27 06:16 · Score: 3, Insightful

Remember when you couldn't get a virus just by reading an e-mail?

Yes, and again, the problem is when the stuff that executes has a monoculture. It's not like you see Pine users or KMail users infected by emails with Outlook viruses in.

The rest of us call this... by Amiga+Lover · 2004-09-27 05:55 · Score: 1, Insightful

The rest of us call this... GOOGLE.

works for me.

Re:The rest of us call this... by BigGerman · 2004-09-27 06:01 · Score: 2, Insightful

Exactly.
And here is the problem: what "the rest of us" are going to do when Google goes south? Either collapses under its own weight or finally broken by its corporate overlords?
Can't put all the eggs in one basket. The only sane future is the one with unified, object-driven search and retrieval methods distributed amongst information consumers and producers.
Re:The rest of us call this... by bongoras · 2004-09-27 06:21 · Score: 4, Insightful

The Semantic web represents relationships between data based on metadata (i.e. data about data). This is a far more powerful way to describe the meaning of data.

And this is what makes me wonder if this will amount to much more then an interested research project for grad students. In order for the SemWeb to amount to anything useful, everyone is going to have to include the metadata necessary to integrate their data into the Semantic Web. How's that going to work? Who's going to make it work?

Two major problems to a semantic web by levram2 · 2004-09-27 06:00 · Score: 5, Insightful

The extra work required to put data into a standard data format won't be done. People can't bother making their pages w3c complaint (even slashdot). The second problem is that data formats can rarely be agreed upon by a large community. Look at how many calendar event and news feed formats there are.

Re:Two major problems to a semantic web by jilles · 2004-09-27 07:29 · Score: 2, Insightful

The reason people don't bother with w3c compliant webpages is that there is no obvious advantage. Slashdot works fine in all modern browsers and aside from some bandwidth that could be saved by going fully XHTML/CSS there is little to be gained (well there are a number of advantages but they're obviously lost on the editors).
With data it is different, just look at how quickly RSS & ATOM are being adopted. There's an obvious advantage because having a feed on your site makes it easier for readers to learn about new content on your site. It doesn't matter that there are multiple competing standards because the tools that matter are standards neutral (most feed readers can handle most RSS and ATOM variants). If there is a sufficiently large enough group of people using a particular (open) format, it is worthwhile to program functionality to do stuff with this data.

The RSS world is also spawning some interesting semantic things such as track back links and perma links. Not all of these things will survive but there already are these mini semantic webs emerging. These networks are growing in size and scope. People write tools to search and navigate them in various and sometimes unexpected ways. Whenever one tool involves multiple networks, effectively a larger one emerges.

IMHO the semantic web is not something that will be released by some big software company or standards body like the w3c but rather something that will emerge out of the chaos of different standards, formats that are out there today. There will not be some monolithic onthology that explains everything but rather there will be many domain specific, simple onthologies that may be abstracted from by tools so that relations between datasets may be established and explored without requiring much changes to the data. Where meaningful relations exist, tools and standards will emerge to exploit these relations.

--

Jilles

Obvious candidate for massive abuse by gammelby · 2004-09-27 06:08 · Score: 2, Insightful

How is the semantic web going to handle abuse like pr0nn g_annotation>...? I mean, anybody can put up bogus annotations to promote their filthy business, like we saw it in the days before google and pagerank.

Ulrik

Re:Obvious candidate for massive abuse by KjetilK · 2004-09-27 08:46 · Score: 2, Insightful

I suspect the answer to that one are immense social networks, user participation and webs of trust.
The WWW also has Annotea, to allow for people to submit annotations. Now, you can imagine lots of people having a simple way to rate pages, a rating option could for example be "Supplied metadata are bad/fraudulent", or something like that.
You would first and foremost make decisions based on ratings from people you trust. That is, people who are close to you in your FOAF-based social network.
When every Internet user becomes a reviewer, and people are well connected in a social network, so that there is a review available of most pages, there is going to be a very strong incentive for authors to supply accurate metadata. Think of it as moderation.
Face it, allthough it happens that you stumble upon pr0n involuntarily, the vast majority of pr0n surfers do it on purpose. Pr0n0graphers (this is getting a bit too leet for me...) then will have strong incentive to refrain from such tactics, they will be modded into oblivion anyway, and accurate metadata is going to bring them traffic, since they are modded up by those who actually surf pr0n.
So, unless the goatse guy is a friend of yours, I don't think it is a big worry.
Provided SW becomes a reality that is.
FOAF is a really good start, though, go create it now!

--
Employee of Inrupt, Project Release Manager and Community Manager for Solid

Statistical text analysis killed semweb by Ars-Fartsica · 2004-09-27 06:12 · Score: 5, Insightful

As has been stated many times, content producers will spoof semantic data just like they used to with the META tag...which is why no one uses the META tag anymore. Relevance algorithms take into account link analysis and statistical text analysis to provide a much more truthful representation of what data is there. Sorry Tim.

Re:Statistical text analysis killed semweb by Anonymous Coward · 2004-09-27 09:14 · Score: 1, Insightful

Except if you read the rest of the post.

BTW, Most relevance engines give weight to META tags. Only Google, and those who take their results from google don't use them.

You don't have to believe me, I mean I only spend all day listening to the leading experts in search, and search technology discussing the best methods for finding relevant information. The concensus around the world is, text based indexing, and link analysis is the weakest method of all. Although it does help to cut through the "noise" created by the dot com boom style webmaster, and seo tricks driven by the pursuit of $.

However, in your meta tag example, what is the problem? If my site is full of "pussy sex tight virgin", then what is wrong? If it describes the content, then yes, I do find it useful. Just because you tend to search for that shit doesn't make the tags irrelevant. Or are you suggesting that tags like that are often used by, say the library of congress?

You like harping on about abused metatags, and true enough, they are abused when commercial interests are at hand. However, you can not dicount the fact that there is no better way to describe the content. Sir Tim is bang on right. You may think that this opinion has been wrong "for at least eight years", but trust me, it is the direction that everyone is going. If you make meaningful content that has value as information, you use meta tags.

Now, for your garage ecommerce site, ya, you probably spam the living hell out of your meta tags, but you have nothing meaningful or useful. Now go and search around for national statistics on something, loaded with metadata. And if you don't see it, then the search engine is actually spidering the metadata repository behind the scenes.

Meta data is still the best method to use to describe your data. Plus, I notice you chose a very limited protion of my post to respond to. Why not try out

"to be or not to be"

A little harder to dispute, maybe?

Not doing it right by vigyanik · 2004-09-27 06:17 · Score: 4, Insightful

The fact that Tim has been trying for 15 years to sell this idea with little success indicates that he approach is insufficient. He is pitching the idea just like a startup would, giving cool examples and everything. But in practice, all he is doing is proposing and overseeing standards. Developing standards for an idea is not what is required to prove that an idea works. Standards should follow successful technology, not vice versa. You need to have companies that make products professionally and offer complete solutions (i.e. make it work real-life situations). Doing it for a very simple example that he quotes ("find pictures taken on sunny days") itself is a big, big deal. Perhaps Tim should get involved with companies in this field as an advisor/consultant. You know, there are enough smart people out there who could develop the standards. But very few people with his name and recognition to truly ignite commercial interest in his ideas.

Re:Not doing it right by dubious9 · 2004-09-27 08:04 · Score: 4, Insightful

Perhaps Tim should get involved with companies in this field as an advisor/consultant.

Um... he invented www and started the W3C. I'd say he's had some experience with companies as a advisor. Take a look at some of the W3C recommendations and look for corporate involvment.

But in practice, all he is doing is proposing and overseeing standards.

That's kinda what the W3C *does*.

Standards should follow successful technology, not vice versa.

XHTML,XML,XSLT and a lot of other recommendations started as standards that *later* had robust implementations. Technology that starts without standards if often not fully thought out and awkward, and at worst, proprietary. Waiting for technology before standards will only inhibit interoperability and adoption of the standard.

The fact that Tim has been trying for 15 years to sell this idea with little success indicates that he approach is insufficient.

I suppose that it has nothing to with the fact that it's a tremendouly difficult and abitious project. You're right. Anything that take 15 years to develop should be scrapped.

--
Why, o why must the sky fall when I've learned to fly?

Second System Effect by xleeko · 2004-09-27 06:31 · Score: 4, Insightful

I've been hearing noise about the semantic web, RDF, and what not for years now, and every time I do, the first thing that pops into my head is "Second System Effect".

He got lucky once, because he put together some tools that were simple and straightforward enough for people to pick it up quickly, thereby avoiding the fate of the dozens of other hypertext systems going back to the late 1980's.

Now, like all second systems, he wants to "do it right", over-engineering away all of the things that made the first one take off ...

Just my opinionated rant ...

Re:Opposing view by mr_majestyk · 2004-09-27 06:32 · Score: 2, Insightful

semantic web allows people to publish their own ontologies, and the best tools should be those that learn to extract interesting info from various sources.

That's right. More to the point, the system supports many ontologies, and allows the best ontologies to rise to the top.

Re:Opposing view by Allen+Zadr · 2004-09-27 06:36 · Score: 2, Insightful

Having read both of your articles, I do not see either of them as opposite, but rather complimentary.

All information that is subjective is a poor candidate for the symantec web. All information that is quickly subject to change is a poor candidate for the symantec web. When mixing subjective (verb) pointers to a given truth on a large scale, modified by objective pointers, where even one of many thousands is false (or mis-keyed), the overall meaning can become quickly subverted.

In other words, if I get enough people to post somewhere that Allen Zadr lives in New Mexico, the multiple verbs that would otherwise point to the actual fact -- there is no Allen Zadr -- would be subverted. That is, unless you could syntactically link Allen Zadr to an actual human being.

Even more simply, the symantic web is only as good as the data. It's not very difficult to get a well trusted source to make an assertion of a truth while avoiding the linking details - thus presenting the users with a subverted view of reality. It has many flaws, and many promises. It won't fail, but it will never be better or worse than the existing systems, just different.

--
Kinetic stupidity has a new brand leader: Allen Zadr.

Re:What Does 42 Mean for Privacy? by cynic10508 · 2004-09-27 06:44 · Score: 2, Insightful

Seriously though, this could be really cool, but I imagine that this could have some very adverse effects on privacy given the amount of information that finds itself on the web. Items that are linked by obscurity in disperate places would be easily linked into a single profile (If the stuff he's talking about isn't primarily smoke and mirrors). Either way, like any powerful technology, it will have both good and bad consequences. Here's hoping for the good...

People would do well to note the principle: Security by obscurity isn't.

Re:Opposing view by Sique · 2004-09-27 06:48 · Score: 3, Insightful

No, computers don't need meaning to handle data. Computers need syntax and rules how to act at syntactic structures. The semantic web is founded on the hope that enough syntax thrown at huge amounts of data turns magically into semantics.

It's based on the assumption that all semantics can be explained by syntax. So far this has not been proven yet, and all attempts to get there went stuck somewhere and turned out something different, sometimes useful (Chomsky's grammars), sometimes not so useful.

The semantic web would have to deal with the laziness of people who can't be bothered to write meaningful ALT attributes to tags. It can try to guess on some of the semantics, but it can also easily be fooled. Everyone who ever tried to use content filters for an internet connection knows what I am talking about. There are lots of false positives rejected and hundreds of questionable sites run through, because the syntax of a site alone doesn't help with evaluation the semantics (the meaning) of this site.

--
.sig: Sique *sigh*

Re:What Does 42 Mean for Privacy? by Allen+Zadr · 2004-09-27 06:57 · Score: 4, Insightful

Ah, but what constitutes privacy but an obscurity of your own behaviors in certain circles.

That is to say, I may be an item scammer in online gaming realms, or in Diablo, but not in EverQuest. However, I may be one of the most honest people I know in the real world. Perhaps I have a second account that I use to Troll on Slashdot, but otherwise have this account where I try to post insightful information. You have the right to link these things, you may even have the right to link these to real world data like where I work and where I park my car. However, if I jilted someone in Diablo, do I want them to so easily find me and take it out on my car (as some people would)?

Do I want my employer having instant access to all of my online transactions, regardless if I'm on shift or off shift at the time? Individually, these are not things that have been considered something you would even want to 'secure', yet they may be valuable to someone.

--
Kinetic stupidity has a new brand leader: Allen Zadr.

Re:What Does 42 Mean for Privacy? by cynic10508 · 2004-09-27 07:10 · Score: 2, Insightful

Ah, but what constitutes privacy but an obscurity of your own behaviors in certain circles.

I would disagree. I would say privay is more like cryptography in that privacy is the ability to control who knows certain information. So privacy is confidentiality.

That is to say, I may be an item scammer in online gaming realms, or in Diablo, but not in EverQuest. However, I may be one of the most honest people I know in the real world. Perhaps I have a second account that I use to Troll on Slashdot, but otherwise have this account where I try to post insightful information. You have the right to link these things, you may even have the right to link these to real world data like where I work and where I park my car. However, if I jilted someone in Diablo, do I want them to so easily find me and take it out on my car (as some people would)?

Well, this goes off on a tangent. I would argue that you're making an incorrect metaphysical and/or epistemelogical distinction in dividing your "virtual" and "real" personas. What is ethical in one is ethical in another and vice-versa.

Do I want my employer having instant access to all of my online transactions, regardless if I'm on shift or off shift at the time? Individually, these are not things that have been considered something you would even want to 'secure', yet they may be valuable to someone.

Kind of another tangent. If you're using your employer's network then legally you've pretty much given up the right to privacy. My suggestion would be not to use company computers to do anything that you wouldn't want them looking at.

Re:Opposing view by Fnkmaster · 2004-09-27 07:22 · Score: 2, Insightful

While I understand where you are coming from, let me present the parts of his arguments that do seem to hold water to me.

1. The Semantic Web (or rather, ontology construction and construction of relationships between your local ontology and other ontologies) is complicated and time consuming, and require you deciphering lots of other people's stuff to connect your stuff to it. Ultimately the success of any new technology, especially one that requires widespread adoption to be useful, must be easy enough to adopt that people adopt it. RSS, HTML and other successful technologies allow you to focus your effort on the local endeavour and don't require tons of formalized, structured organization of data, which runs somewhat counter to human nature. They are thus substantially less labor intensive to implement, and have therefore been taken up quite rapidly. This argument I consider to be perfectly valid and fairly strong.

2. Trust of ontological data is a critical issue because lots of false assertions and mediocre data will inevitably creep into a large, distributed "semantic web". This is a problem with the web currently, and you definitely have to take everything you read with a grain of salt, trust certain sources more than others, and so on. I think this argument holds some water, but I think this problem is addressable.

Personally, I think it will ultimately be easier to implement something like Cyc to build structured knowledge networks from information in human grokkable form. The internal representation of a Cyc-like machine will probably look quite similar to the semantic web, including the ability to adjust world view, evaluate source material reliability, etc. Getting a machine to build this knowledge representation, despite all the ambiguities of human expression, is more likely to succeed and be useful to humanity (IMHO) than getting lots of humans to interact with computers and technology in a structured, logical fashion. This is not to say that there aren't applications where structured ontological data would work well.

I particularly like the idea of auto-translation between different structured data formats, but I do agree with Clay that it's more likely that businesses will construct isolated "island" ontologies (such as a specific XML schema for describing formatted data) and deal with translation to other formats on an ad-hoc basis, for simple resource allocation and cost reasons.

Your argument (pro) seems to rely on the idea that tools will make things easier. I can't help but think of 4GL programming, SQL and attempts to make programming accessible to "average" people. The fact is good tools make things easier, but only certain people or people trained to do so can really think in a structured, logical fashion and express that in a way that a computer understands. No efforts to handwave away that issue to "tools" has ever succeeded. Tools can help, but they are not a panacea. HTML is so successful and widespread because it's simple to edit, as it only requires basic visual thinking to understand - and tools let you skip the intermediate step and edit the visual representation directly.

The concept of editing semantic information is fundamentally not so simple, because humans don't formalize their thinking about relationships on a day-to-day basis. Like visual mapping tools for XML, they may make things slightly easier, but I wouldn't expect any magic. Like I said, I think that we will ultimately end up there, but I believe it will be approached from the other direction.

Why this is a bad idea - it's a taxonomy by Animats · 2004-09-27 07:42 · Score: 4, Insightful

The big problem with the so-called "semantic web" is that trying to taxonomize ideas doesn't work very well. Full-text search works much better.

In the beginning, we had library card catalogs, with their painful attempts to index and cross-reference books. That works well in some areas, typically ones where names of people are significant. Attempts to apply the same approaches to technical papers worked less well.

There's a very elaborate classification system for patents. When you had to look through patents on paper or microfilm, it was essential. Now that we have full text search, it's used less and less.

A modern example of this approach is the ACM Taxonomy, a structure into which all computer science can be fitted. (As an exercise, try to put the current Slashdot stories into that taxonomy.) Nobody actually uses that taxonomy to find anything.

As to data interchangability, that's a separate issue, and more of a standards one. The big problem for publicly available data is that the cost of encoding the data is borne by different people than those who benefit from the encoding. Many companies don't like having all their product and pricing information easily searchable by price. (Froogle may change this, because Google has so much clout.)

I've spent some time dealing with public financial reporting. There's opposition to detailed disclosure in a standardized format. Many companies don't want their detailed information to be too easily analyzed. Embarassing results show up.

The future is better search engines, not user-created indexing data. As we've painfully learned, a search engine must look at the same data a human reader would, or it will be lied to. Lied to to the point of uselessness.

Re:No, there's something there by Allen+Zadr · 2004-09-27 07:52 · Score: 2, Insightful

Your faith in computational logic is astounding. Not to say that you may not be right, but to dismiss the possibility that 'shady' logic relationships such as this one would simply not occur. Especially when there are billions of similar relationships.

By your declaring such functionality to be an error of logic does not (in my view) make it less likely.

Back to my very example... the 'scams and cheats' property assertion of an online gamer against my account number is, by definition, a symantic inferrence. Unless a human jumps to the various links that make up the conclusion. Couple this with the very fact that my fictional search would be along the lines of 'transaction trust', the property does apply to the query.

Basically that is the point. It is broken beyond usable functionality. It cannot make the conclusions advertised. It can link to points to help a human create valid conclusions.

--
Kinetic stupidity has a new brand leader: Allen Zadr.

Nice Try, Tim by Master+of+Transhuman · 2004-09-27 08:16 · Score: 2, Insightful

As you do note in your comments, however, it's not really doable without a good simulation of conceptual processing.

Still, every little bit helps. Certainly a "Semantic Web" would be more useful than the current one.

--
Richard Steven Hack - This sig is TOO GODDAMN SHORT TO DO ANYTHING USEFUL WITH! MORONS!

Re:Opposing view by Thuktun · 2004-09-27 10:12 · Score: 4, Insightful

If you'd like an opposing view, make sure to read Clay Shirky's take on the semantic web.

His writings appear to have some uncorrected logical fallacies.

Consider the following assertions:
Count Dracula is a Vampire
Count Dracula lives in Transylvania
Transylvania is a region of Romania
Vampires are not real
You can draw only one non-clashing conclusion from such a set of assertions -- Romania isn't real.

You can conclude the following from those statements:

Count Dracula is not real
Count Dracula lives in a region of Romania

I'd like to see the mystery step that combines these to conclude that Romania isn't real; at most, you could say that Romania houses something that isn't real. The conclusion he makes isn't supported by any logic.

More importantly, these are dumbed-down semantics. The assertion that a fictional character lives somewhere real needs to be qualified that this occurs in a certain set of fictional stories, not real life. The fact that these unqualified statements are represented in this example ontology means that the ontology is insufficient, not that this method isn't useful.

Another example in that article:

US citizens are people
The First Amendment covers the rights of US citizens
Nike is protected by the First Amendment
You could conclude from this that Nike is a person, and of course you would be right.

This is even factually incorrect. The First Amendment doesn't actually say anything about US citizens; it restricts the US Congress from certain actions, period, not for certain people.

Ignoring this, you can make one conclusion and reduce this to the following:

the First Amendment covers the rights of people
Nike is protected by the First Amendment

Concluding that Nike is a person from this is a logical fallacy. (Nothing in these logical statements says the First Amendment might not also cover the disposition of small peanut butter sandwiches with blueberry jam, which set Nike might then be an element of.)

I find it hard to treat this article with much weight, given its fast-and-loose treatment of logic and ontological assertions.

need standardization? by yonyonson · 2004-09-27 10:14 · Score: 2, Insightful

for data to be shared and recognized as distinct fields of information, won't there need to be standardization across all hosts in order to use the data in any comprehensible way?

ie.

<product> Acme(tm) xxxxx </product>

on host #1
while on host #2 the same item is recognized as:

<saleitem> Acme(tm) xxxxx </saleitem>

how will the semantic web describe and relate items which are recognized as an item for sale but under different labels?

Re:Opposing view by david.given · 2004-09-27 11:32 · Score: 2, Insightful

The conclusion is invalid because YOU happen to know that it's invalid. It certainly could be valid given only the rules presented. As an example, if you used Superman and Metroplis in the above example, it would work fine.

Rule 2 does not provide any information about the reality of its parameters. Stating things a bit more formally:

isA(dracula, vampire)
locatedIn(dracula, transylvania)
locatedIn(transylvania, romania)
~isReal(vampire)

These aren't rules, they're statements providing one-way inferences. You may only create forward logic chains. There aren't really any interesting conclusions you can come up with from this, apart from being able to state that some unreal things live in Romania.

Shirky gives examples of some of Dodgson's syllogisms (and Dodgson is a master among logicians). Dogson's syllogisms are interesting because they're based around rules. Take the one about poems:

No interesting poems are unpopular among people of real taste.
No modern poetry is free from affectation.
All your poems are on the subject of soap-bubbles.
No affected poetry is popular among people of real taste.
No ancient poetry is on the subject of soap-bubbles.

He uses generic statements, rather than absolute statements. You can see this if I restate it:

isInteresting(X) IMPLIES ~isPopular(X)
isModern(X) IMPLIES isAffected(X)
isYours(X) IMPLIES isAboutBubbles(X)
isAffected(X) IMPLIES ~isPopular(X)
~isModern(X) IMPLIES ~isAboutBubbles(X)

Notice that all these rules have to be specified in generic terms. We have equations we can manipulate. This means we can use them. There's an rule that ~A IMPLIES B == B IMPLIES A which lets us restate as follows::

~isPopular(X) IMPLIES isInteresting(X)
isModern(X) IMPLIES isAffected(X)
isYours(X) IMPLIES isAboutBubbles(X)
isAffected(X) IMPLIES ~isPopular(X)
isAboutBubbles(X) IMPLIES isModern(X)

And from here it's just a matter of substituting in, since (A IMPLIES (B IMPLIES C)) == (A IMPLIES C). This means that we can prove that your poems are modern, affected and uninteresting, but popular.

You need the statements to provide the fundamental information, and the rules to let you manipulate that information. (Dodgson avoids needing a statement by using rule 2 instead; it would work just as well had rule 2 been ~isInteresting(yourPoem), but that would only let you prove that yourPoem was uninteresting, not that all your poems are uninteresting.).

Shirky's trying to discredit the Semantic Web by using a syllogism of his own, that goes like this:

Syllogisms that don't contain rules are useless.
The Semantic Web is constructed out of syllogisms.

From this he's trying to draw the erroneous conclusion that the Semantic Web is useless. I leave the problem with this as an exercise to the reader.

Seeing as he is apparently trained in this stuff, which I am not, this makes me think that he is either (a) incompetant or (b) is deliberately trying to mislead people. Either way, I don't trust his logic.

Re:Ontology by dodongo · 2004-09-27 12:18 · Score: 2, Insightful

Well... I actually wrote a paper lambasting the ontology for precisely what you bring up here. Specifically, I wrote working from a draft of Adele Goldberg & Ray Jackendoff's paper "The English resultative as a family of constructions" paper (_Language_ vol. 80 no.3, September 2004). It deals with strange things like

"The trolley rumbled through the city"

and led me to believe Victor's ontological approach would have some serious problems encoding this if it didn't have a more attuned syntax processor. It wasn't a good paper, but I made my point, and you bring up a similar idea on a more basic (and thus, even more problematic) level.

Anything remotely "idiomatic" (specifically, where the combinatoriality of semantics fails, as it does in your example, where time does not "fly" in the sense that it does not move through the air held aloft by differences in air pressure) starts to generate serious problems.

Your problem could be solved if the lexicon had in it information about common idioms, which it presumably would, to be functional on any level more colloquial than academic writing. Most linguists would tell you the lexcion really does encode idioms in some fashion too, so this wouldn't be some sort of computational stop-gap.

So the lexcion has in it "time flies" or something. The parser (or some sublevel of it) would then identify "like" as a metaphorical comparison to the following predicate "an arrow."

Thus, the TMR would have something to do with time moving briskly towards a target, perhaps.

I'm not saying this is an entirely feasible option, but read what Tim Berners-Lee is proposing, and see if you find it much more plausible. The amount of information out there people would have to manually encode would preclude the system from having any real functionality beyond keyword search. While I'm not a huge fan of the current implementation of the ontology, I do think future generations could start to sort things out. Its advantage is that once the concept database, the onomasticon, is complete, it should be mostly self-trainable, which is what Berners-Lee's solution lacks.

Re:What Does 42 Mean for Privacy? by blue+trane · 2004-09-27 13:45 · Score: 2, Insightful

Didn't they come up with a few viruses for it though?

Slashdot Mirror

Tim Berners-Lee and the Semantic Web

30 of 250 comments (clear)