Berners-Lee On The Semantic Web

Parsing natural language into semantics by streetmentioner · 2001-04-10 16:36 · Score: 4

Some systems exist to extract facts from language into semantic knowledge representations, and they're surprisingly good.

SNOWY is a system that "reads" the World Book Encyclopaedia and stores each fact about a concept into a hierarchic memory based on that concept. It's sufficiently sophisticated to be able to realise that "The bear digs up the nut" implies that the bear eats the nut, while "The miner digs up the coal" doesn't imply that. You can then ask it "what eats nuts" and it will reply correctly. (At least, this is my impression - I haven't used it, sadly.) As I remember it can fully understand 50-60% of the sentences in the bits of the encyclopaedia that it has been commanded to parse.

The language it works on is fairly simple, but is nevertheless text designed for humans as opposed to computers. Systems like this could be a good bridge between language and semantic based representations.

This is the best link I can find, unfortunately.

There are also, of course, dozens of systems designed to work on English text that has been specifically created to be computer-parsable, but still readable by humans.

I'm incredibly sceptical about all this sort of technology, but if the systems continue to evolve, the agents might be able to glean much of their knowledge from existing web pages.

Re:Parsing natural language into semantics by Salieri · 2001-04-10 17:10 · Score: 3

A separate, commerical common sense project is called CYC. It's been going since 1984 but is just starting to scratch the surface of the encyclopedia. It's incredible how dependencies can get you: for instance, you can't program what an aardvark is without also going into what it means to be a mammal, the geography of Africa, basic anatomical pieces, and behavioral traits -- all of which have their own dependencies, and so on. (Don't quote me on that though, "dependencies" probably isn't the right word.)

Here is the link to CYC, an interesting read about knowledge representation. It's also pretty timely, since they are about to release some of the project after 17 years of development. Might make a good story.

By the way, how many people posting now are from Australia? My sympathies for any other -500 students whose homework also kept 'em up tonight.

--------------------------------

You ain't seen nothin' yet... by s390 · 2001-04-10 17:27 · Score: 3

because that light at the end of the tunnel is a train called "pervasive computing" and it will be here soon. Ubiquitous connectivity, XML/SOAP protocols (assuming M$ doesn't hijack these), more capable and standardized interfaces to extensive backend data warehouses, IPv6 addressability and service level discrimination, smarter Java-based intelligent agents, speech recognition, natural language processing - these will all contribute to the second networked revolution in the ways we work and interact online. Berners-Lee has an academic vision of how some of this might work, and I applaud his courage for sharing his ideas.

Today, you can be driving on the freeway and using speech recognition to look up and call colleagues through your handsfree cellphone. It's not much of a stretch to add calendar administration and other interfaces with intelligent agents to this.

Scenario: You're flying down I405 in SoCal (in the carpool lane, with coworkers aboard) some beautiful late afternoon in the not too distant future:

"Princess (you've named your general digital assistant Princess Leia, for some reason), please check on improving my car insurance rates."

Princess: "Connecting insurance agent..."

Fred (you call your intelligent insurance agent program "Fred"): "Fred here..."

You: "Fred, please see if I can get a better rate on my car insurance this year."

Fred: "OK, I'm looking..."

You: "Princess, please tell my wife I'll be home early."

Princess: "What's your ETA, please..."

You: "Sixish Princess, thanks."

Princess: "Thank you, will do."

Fred: "You have six quotes, two of which are at lower premiums than your existing contract. Do you have any recent tickets or accidents to add?"

You: "No, thank you. What's my best choice?"

Fred: "Suckem-Dumpem Mutual offers you a $300 annual premium savings counting the good driver discount."

[I405 stops dead as it's wont to do randomly, including the carpool lane.]

Screeech Crash Tinkle. [a moment of dead air...]

You: "Fred, forget it. Princess, please tell my wife I'll be a little late."

Fred: "Request closed, no action. Bye."

Princess: "OK, will do."

Standards by Azza · 2001-04-10 17:12 · Score: 3

Excellent article. I agree this is the way things should be heading, but the biggest problem is going to be defining standards for the information. The largest information providers will be trying hard to hold on to, and control access to, their so-called intellectual property.

The semantic web depends on universal open standards for access to this information. MS's solution, hailstorm, already tells you what they think of that idea. Let's hope that we can avoid another browser (agent) war...

Not just yet by matthew.thompson · 2001-04-10 16:21 · Score: 3

This utopian information access idea is great in principle but at the moment we just don't have the always on style internet access available.

A similar idea is being touted by Orange whose grand plan is to use an always on mobile terminal device with their Wildfire personal assistant who will listen to your day and arrange things to happen, inform you of information and collate calls and messages wether they are voicemail, email or faxes.

But until we have the always on alway connected devices we're still going to be pretty much tethered to our desks.

--
Matt Thompson - Actuality - Insert product here.

Rational Programming vs Semantic Web by Baldrson · 2001-04-10 16:58 · Score: 4

As I posted to Slashdot a year ago on the topic:

The future of the Internet is in what I call "rational programming" derived from a revival of Bertrand Russell's Relation Arithmetic. Rational programming is a classically applicable branch of relation arithmetic's sub theory of quantum software (as opposed to the hardware-oriented technology of quantum computing). By classically applicable I mean it is applies to conventional computing systems -- not just quantum information systems. Rational programming will subsume what Tim Berners Lee calls the semantic web. The basic problem Tim (and just about everyone back through Bertrand Russell) fails to perceive is that logic is irrational. John McCarthy's signature line says it all about this kind of approach: "He who refuses to do arithmetic is doomed to talk nonsense." More on this a bit later, but first some history, because he who fails to learn from history is doomed to repeat its nonsense:

When I invented the precursor to Postscript (an audacious claim that I can back up -- it started as a replacement for NAPLPS which I proposed while Manager of Interactive Architectures for Viewdata Corp of America back in November of 1981 -- the Xerox PARC guys found my approach of what they called a "tokenized Forth" communication protocol to be an intriguing way to encode text and graphics), I was interested in having a Forth virtual machine migrate into silicon (ala Novix) so it could evolve from mere graphics rendering into a distributed Smalltalk VM environment (ala Squeak) as videotex terminal/personal computer capacities increased. But I was _not_ interested in object-oriented programming as the long-term semantics of distributed programming environments. (I still have some of the hardcopy of the communiques with Xerox PARC and others from this period.)

Rather, relational semantics were what I saw as the ultimate direction for distributed programming. I had a bit of a go at Tony Hoare's "communicating sequential processes" paradigm and its Transputer realization because he was, at least, starting with the hard problem of parallelism rather than making like the drunk looking for his keys under the light post the way everyone else seemed to be doing (and still are, save for Mozart, since threads, etc. are always an afterthought). But, because there were other hard problems like abstraction, transactions and persistence that he ignored, I christened his approach "Occam's Chainsaw Massacre" in my communiques (in honor of his distributed programming language "Occam") and dropped it in favor of relational programming, which has inherent parallelism resulting from both dependency and indeterminacy. (BTW: Dr. Hoare seems to have finally come to his senses about this issue.)

Unfortunately, the only researcher doing hardcore work on relational programming (meaning, getting to the root of relational semantics in a way that Codd had failed to do) at the time was Bruce MacLennan, then, of The Naval Postgraduate School, and he just didn't have the glamour of Alan Kay at places like Xerox PARC to attract the attention of guys like Steve Jobs. Bruce had a bit of a blind-spot, too, when it came to transactions and persistence, which I attempted to remedy by bringing David P. Reed's work on distributed transactions for the ARPAnet to him, but although he wrote a white paper on a predicate calculus (close to a relational) implementation of Reed's thesis (MIT/LCS/TR-205), he didn't really "get it", IMHO. Reed and MacLennan abandoned their work for other pursuits (ironically, Reed was chief scientist at Lotus while Notes was being developed but did not contribute his ideas on distributed synchronization to that development despite the fact that we had a mutual acquaintance from my Plato days by the name of Ray Ozzie -- so, I share some of the blame for this failure) even as Steve Jobs botched the embryonic object oriented world by abandoning Smalltalk and giving us, instead, a lineage consisting of Object Pascal on the Lisa/Mac which begat Objective C on Jobs's NeXT which begat Java at Sun via Naughton and Gosling's experience with NeXT.

This brings us to the present -- a world in which Javascript-based technologies like Tibet promise to not only salvage the object oriented aspect of the Internet from the birth defects of Jobs's spawn, but actually provide an advance over Smalltalk in the same lineage as CLOS and Self. But it is also a world in which there is growing confusion over the proper role of "metadata" in the form of XML -- particularly when it comes to speech acts and distributed inference. I would call Tibet "the next major Internet advance" except for the fact that the basic idea for a Tibet-like system has been around and well understood since the early 1980's. When it is finally released, Tibet (or a system like it) will put the Internet back on track. I call that a "recovery", not an "advance".

We are now poised to move forward with type inference based on full blown inference engines, thereby dispensing with the nonterminating arguments over statically vs dynamically typed languages that allowed Steve Jobs's spawn to get its nose in the tent. If you want to declare a "type" in a declarative language, just make another declaration and let the inference engine figure out what it can do with that information prior to run time. See how easy that was? Well, there is more to it than that, but not that much: Assertions have implications and assertions made prior to run time have implications prior to run time. Live with it and don't repeat the mistakes of the past.

The confusion over semantic webs, and the reason Berners Lee et al will fail, is essentially the same as the confusion that has beleaguered all inferential systems such as logic programming and "artificial intelligence" over the years: logic is irrational and the real world demands rationality -- otherwise nothing makes sense. By "rationality" I mean that reasoning must literally incorporate "ratios" -- or, as John McCarthy would put it, doing arithmetic so things make sense. By making sense, I mean there is a sense in which one interprets the sea of assertions that clearly dominates for a particular purpose. With logic not only are you limited to 0 and 1 as effective quantities; you have no adequate theoretic basis from which to derive more accurate quantities with which to make sense by taking ratios and determining which inferences are dominant.

Fuzzy logic and expert systems incorporating probabilities have typically failed because they are not based in the first principles of probability and statistics. As Gauss, the premiere probability theorist put it, "Mathematics is the study of relations." He didn't say, "Mathematics is the study of multisets." There are good reasons that relational databases, and not set manipulation languages, have come to dominate business applications -- and Gauss was aware of these differences when he began to derive his laws of probability. Subsequent axiomatizations of mathematics based on set theory were similarly misguided and have led to the idea that "fuzzy sets" are the way to introduce rationality into programming. Rather than sets, relations are the foundation, not just of mathematics but of rationality in the same sense that Gauss realized when he derived his theory of probability from the study of relations.

Rationality allows for judgment which is recognized as inherently fallible -- but which allows one to procede without exponentiating all possible paths of inference. Judgment also allows various identities to limit sharing of information to that needed -- thereby creating speech acts and a basis for rational measures of credibility associated with those identities. Since credit-rating is a degeneration of credibility, it should come as no shock that the invention of negative numbers, originating as they did with the Arabic invention of double entry account keeping, has its analog in something that might be called "logical debt" with which negative probabilities are associated.

And now we have come to the "quantum" aspect of rational programming. It is precisely the "credibility debt" aspect of rational programming that corresponds, in mathematical detail, to the various equations of quantum mechanics and their negative probability amplitudes. (Von Neumann's quantum logic failed to properly incorporate logical debt which has led to much confusion.) Logical debt is important to distributed programming for the same reason debt is important to financial networks. Logical debt is a way of handling poor synchronization of information flow in the same way that financial debt is a way of handling poor synchronization of cash flow. As in any rational system, there are both limits to credit and limits to credibilty that influence one's judgments and actions, including speech acts.

The object oriented folks may, in a sense, have the last laugh here because when we divide up inference into identities that engage in speech acts, we are reintroducing the notion of objects that hide information via exchange of speech act messages that can be thought of as "setters" (assertions) and "getters" (queries). However, I believe it is only fair to recognize that the excellent intuitions of Johan Dahl and Kristen Nygaard did need the added insights and rigor of philosophers like J. L. Austin and T. Etter.

--
Seastead this.

Meaningful Web Content by MattGWU · 2001-04-10 16:15 · Score: 4

>>A new form of Web content that is meaningful to computers will unleash a revolution of new possibilities

Good ideas, but I think we first need to make Web content that is meaningful to Humans before we start worrying about our Computers

(Yeah, I know...not *that* kind of meaningful, but it had to be said, what with all the worthless drivel on the Internet and all)

--
"These people look deep within my soul and assign me a number based on the order in which I joined" --Homer re:

Re:Sounds like rdf... by dingbat_hp · 2001-04-10 20:14 · Score: 3

Yes, it is RDF. There are many areas of the SW work where it's not clear what the final technology will be (notably the schema expression tools, such as RDF Schema vs. OIL or DAML or DAML+OIL), but RDF itself seems almost certain to be used - there's just nothing else offering itself as a competitor in that niche.

Some clarifications: XML isn't RDF, and RDF isn't XML. RDF is fundamentally a data model, whereas XML is just a serialisation of a much simpler infoset model. As RDF doesn't have its own serialisation (how you write it down), then the convention has been that it's done in XML. You could serialise RDF into anything you like, but I've yet to see a non-XML one.

XML Schema isn't the same as most other schema languages in this field. XML Schema is concerned about structure and operational matters, not about expressing semantics. XML Schema would be a very bad choice for expressing the semantics of the SW. It works OK for Ariba and XrML, because they're quite limited applications of discussion (an invoice is an invoice is an invoice). Even with MPEG-7, XML Schema has run out of steam and the MPEG group have had to invent their own schema expression language. Using XML Schema for bureaux like BizTalk is extremely limiting, and a bad move long-term.

DTD are dead. Use XML Schema instead.

RSS (the site-summary format used for Moreover newsfeeds and to make the Slashboxen work) isn't RDF. It's expressed in RDF and defined in RDF Schema, but it's just one RDF application out of many.

Behold the FUTURE of WEB typoGRAPHY by table+and+chair · 2001-04-10 17:00 · Score: 4

In the FUTURE random words in BLOCKS of text displayed ON THE web WILL be inexplicably highlighted IN A stylish PINK-ORANGE several point SIZES LARGER THAN the rest of the body text. This will come to be known as bernersing, and will BECOME a standard control in GUI web-design APPS, WITH options for frequency, DENSITY , and with the advent of the Semantic Web, relevance TO content (default for the latter = 0).

THOUGH this destroys the FLOW of the TEXT by wrenching the READER'S eye about and causing IT to pause, rather than travel naturally FROM WORD to word, this typographic treatment WILL BE hailed as a BREAKTHROUGH in internet desig N and will unleash a revolution OF NEW possibilities.

...Why all the fuss? by erikkemperman · 2001-04-10 19:16 · Score: 3

So far I haven't read a post that addresses the other side of the matter: You might not even want to overcome the barrier between human and machine readable languages, at least not in some cases. I have some limited knowledge of work by the likes of Chomsky etc., as well as supposedly "culturally neutral" and "unambiguous" languages such as Loglan/Lojban. I feel most people, techies leading the pack, tend to forget that, often, the meaning of language can be effectively tweaked, stressed, or even negated precisely because it's ambiguous or culturally predisposed. Think of all the problems, for instance, that would arise if you want to teach a machine the meaning of sentences like or "Indian summer" or "Poetry in motion".

In general, natural language is to me a wonderful "protocol" because it forces participants to make the effort of understanding each other's customs, ethics, interrests and interhumane sensitivities. Moreover, the natural language that people speak in some region always reflects that region to some extent, in terms of politics, history or even climate. I'm dutch for instance, but can you understand what I mean by "How a cow catches a rabbit" (which is a literal translation of a Dutch phrase -- guesses anybody?)

The gurus and tech developers should throw the defacto standard philosophy "if it can be automated, automate!" out the window, and face the fact that, whether you like it or not, natural language is in fact a very powerful semantical framework, all in itself - it's "standardized" (vocabularies, dictionaries etc.) "backwards compatible" (languages mostly evolve quite organically) - its practice is just not so readily automated.

regards, EK

--

--
Gosh, thanks. That must be why the other ships call me Meatfucker -- GCU Grey Area (Eccentric)

Too much technology.... by Peridriga · 2001-04-10 16:25 · Score: 5

I will be the first to say it.... I love technology... But, reading this makes me wonder what is enough...

Alas, voice activated and personalized networks are going to aid in everyday life (especially with those physcially handicapped) but, it removes the most deveolped and complex form of communication... Human Interaction..

This is becoming less and less a factor in the average humans life.. With business going paperless and friends going wireless when does someone really have to talk to someone... If you telecomute and email your family, do you really have to talk to anyone, besides maybe your coffee maker when you get up in the morning..

I don't want to be a anti-technology advocate but, mearly express an idea that we are excluding the most needed facet of human life... Interaction...

Prisoners are isolated for punishment... We are isolating ourselves for convience?..

Well... my two cents.. yall can make as much change out of it as possible...

--- My Karma is bigger than your...
------ This sentence no verb

What the hell, let's just merge them. by Flying+Headless+Goku · 2001-04-10 16:26 · Score: 4

The human-readable and computer-readable stuff, that is.

How? Lojban, a constructed language designed to be absolutely consistent and logical. You might know it in its earlier incarnation of Loglan, which was mentioned in passing as a language used for conversing with computers in Heinlein's "The Moon is a Harsh Mistress."

Certainly, you could structure a valid Lojban statement to be unreadable to computers, but it isn't that way by default. If you state things directly, the computer can extract useful information.

This is why I'm absolutely 100% certain that we'll all learn Lojban soon. Yup, there is no doubt in my mind. None at all...
[rolls eyes,whistles a little tune]
--

--

What's the internet for? A more realistic example: by Flying+Headless+Goku · 2001-04-10 17:44 · Score: 5

The entertainment system was belting out "Put 'Em on the Glass" when the phone rang. When Pete answered, his phone turned the sound down by sending a message to all the other local devices that had a volume control. His mistress, Lucy, was on the line from the office: "I think we need to see a specialist and then have a series of physical sessions. Bi or something. I'm going to have my agent set up the appointments." Pete immediately agreed to pay the fees, after confirming that she meant a chick.

At her "advisor"'s office, Lucy instructed her Semantic Web agent through her vibrowser. The agent promptly retrieved information about the "treatment" from her advisor's agent, looked up several lists of providers, and checked for the ones within budget and a 20-mile radius of her home and with a rating of triple-H (Hot, Horny, and Healthy) on trusted rating services. It then began trying to find a match between available appointment times (supplied by the agents of individual providers through their Web sites) and Pete's and Lucy's busy schedules.

In a few minutes the agent presented them with a plan. Pete didn't like it. The university student housing was all the way across town from Lucy's place, and he'd be driving back in the middle of rush hour. He set his own agent to redo the search with stricter preferences about location and time. Lucy's agent, having complete trust in Pete's agent in the context of the present task, automatically assisted by supplying access certificates and shortcuts to the data it had already sorted through.

Almost instantly the new plan was presented: a much closer brothel and earlier times--but there were two warning notes. First, Pete would have to reschedule a couple of his less important appointments. He checked what they were--not a problem. The other was something about his STD checker's list failing to include this provider: "Non-contagiousness securely verified by other means," the agent reassured him. "(Details?)"

Lucy registered her assent at about the same moment Pete was muttering, "Spare me the details," and it was all set. (Of course, Pete couldn't resist the kinky details and later that night had his agent explain how it had found that provider even though it wasn't on the proper list.)
--

--

Slashdot Mirror

Berners-Lee On The Semantic Web

13 of 112 comments (clear)