Going from a 'Web of links' to a 'Web of meaning'
neutron_p writes "Computer scientists from Lehigh University are building the Semantic Web, which will handle more data, resolve contradictions and draw inferences from users' queries. The new improved Web will also combine pieces of information from multiple sites in order to find answers to questions."
When will we be dropping HTTP and HTML in favor of more metadata-friendly protocols and file formats? I can see huge potential in a system built specifically for getting data out there and linking it all together.
Disconnect and self-destruct, one bullet at a time.
Sounds like a recipe for disaster to me.
Does everything include nothing?
Covered not long ago - an interview with Berners-Lee regarding the Semantic Web.
People at DERI in Ireland's Galway are also working on the Semantic Web (see http://www.deri.ie/). I thought lots of people are...
I'll have to rtfa to see what they propose, but just the principle of resolving contradictions is a really difficult one, and most theories of knowledge (which are essentially networks of facts) aren't terribly robust, and contradiction repair, which involves running the entire network to find invalid assumptions, and then propigating the changes is NP complete :| i'm not positive that contradiction resolution is a reasonable thing to expect out of a massive distributed network.
There are lives at stake here!
with their favourite mode of publication being the press release.
You gotta understand that "meaning" has no meaning at all to machines, at least not yet.
And even for humans, the "meaning" of a certain thing can be different thing to different people !
Although I applaud the job they are doing for Semantic Web, I wonder how they can inject "meaning" into the whole thing.
My biggest fear is the 1984-like "my meaning is THE meaning and you canna have any other meaning" thing.
Muchas Gracias, Señor Edward Snowden !
..from user's queries.
Clippy..? Is that you?
There's a Starman, waiting in the sky / He'd like to come and meet us, but he hasn't got the time.
The same can be said about any semantic web technology - whether it's FOAF (an RDF vocab for describing people and their interests) or a vocabulary for reviews. As soon as major authoring tools (i.e. both web editors and content management systems) start integrating these technologies, people will use them if they are useful. Do not expect web designers or bloggers to have a clue about all the great things that the semantic web can do - give them one useful thing which they understand, package it in a pretty UI, and they will start using it.
I guess that the Semantic Web would need HTML documents to meet strict requirements when it comes to validation, use of logical instead of physical markup and so on. This could be an incentive for people to use HTML the way it was intended, instead of the crapload of pages that don't close tags, use hundreds of redundant FONT tags, use the H1..H6 elements to control font size instead of using them to indicate headings, and so on. Strangely enough, all "beginner's" HTML books still teach people to code this way.
The semantic web is a pretty popular area of research right now and its far from being "built by computer scientists at Lehigh University", in fact I could have done an undergrad dissertation on the semantic web, and there were numerous phD positions being advertised at uni's around the world researching about the semantic web.
Whichever lehigh uni professor submitted this is stooping pretty low trying to raise publicity (and hence finance) I would think!
I have discovered a truly remarkable sig which this post is too small to contain.
Am I the only one who recognized the main graphic for the story as a lifted screencap from the movie Hackers? That movie's SOLE redeeming quality was Angelina Jolie...
Well, ok, that and the laugh factor. Not quite as much fun as MST3K'ing The Mummy with about a half dozen friends though.
Please help metamoderate.
- We need an ontology that will cover many if not all aspect of human experience. And this experience has been evolving dramatically and will continue to evolve. This ontology is probably a moving target. This task alone of creating the ontology has been, and is still the holy grail of AI and Knowledge Management.
- The amount of time we will have to invest in adding metadata to the data will dramatically increase over time. We will need a way to automate the filling of the metadata layer. This is where kicks in automatic image recognition and classification, speech to text, text summarizer and meaning extractor (Here, Copernic is is the right direction). Maybe the librarian profession will be the next hot job...
- Almost every application will have to adapt and inter-communicate. No big deal, RDF will probably become the new data bus anyway.
That will be interesting!!!There's absolutely loads of it around... especially as people are starting to use more generated websites (like slashdot for example).
If you search for *.rdf maybe you won't find as much... a lot of it is *.rss, *.xml and other things.
Also, google doesn't index them.
Hasn't everyone heard of this already?
W3C semantic web activity from 2001.
Heflin's Thesis from 2001.
I'm rather skeptical of the whole thing, it seems to me to be like "Wouldn't it be nice if people documented their web page content better? Then we could do all these neat things." The second statement is right, but I fear the first statment is intractable.
This could create huge problems for people to stay on the right side of copyright law. A medium that pulls information from several different sources could potentially make it much harder to avoid copyright infringement. For example, you pull from a Wikipedia entry, a NY Times entry and a Reason editorial. You better keep track of where you got each part if you use them in any of your own research, commentary, etc.
How does it combine information from different sources in a way that keeps the user knowledgeable about where the data came from? How do you know who to cite, or whether something you're excerpting can be used in the context you want, when your "semantic web browser" pulled the data and combined it coherently or incoherently into a mish mosh of data sources?
Am I the only one who thinks that this could be an IP trial lawyer's wet dream?
Click here or a puppy gets stomped!
It seems to be a common mistake for computer scientists to think that it's possible to make systems that "understand" the world (both real and abstract knowledge), with all its complexity and ambiguity, in the same way that humans do. I feel that there is a fundamental difference between using computers to enable humans to organize stuff, and having computers automatically do it. Every single attempt at getting computers to be "smart" about infering human intentions has ended up as an irritating impediment to using the system - look at clippy, Bob, "intelligent" voice systems that try to "help" you by stopping you from talking to a real person... what computers are very, very good at is amplifying and enabling human intelligence. Computers are not themselves intelligent, and (my personal opinion) I don't think they ever will be - unless we manage to "grow" them using processes that we probably won't fully understand. You can't construct something that is as complex as the human mind through deterministic (i.e. consciously designed architectural) means - all you'll end up with, at best, is a very complex rule inference engine that is limited by the rules you gave it. Every "holy grail" of intelligent programming that has come along - neural nets, genetic programming etc - has turned out to be very limited (though very useful in special situations).
I also feel that talking about automatically organizing the world's knowledge in a semantic web is just more of the same hot air that we've been hearing from AI departments for the last few decades. You can't automatically allocate meaning to something unless you have the capability for "common sense" reasoning, and the world knowledge at your fingertips to be able to interpret the data intelligently, like a human would. And even then, different humans would interpret it differently... so there are multiple meanings, and anyway, how to allocate "meaning" to something abstract such as a poem or piece of art?
And if we require real people to add metadata to everything... well, it just ain't going to happen, in my humble opinion. Adding meta data is a pain in the ass, since you have to define the categories of object, agree on meanings for all the different taxonomies that will have to be used to describe the world... then there's the potential for abuse, as spammers will inevitably seed their documents with inappropriate metadata. So, the "honest" people can't be bothered, and the dishonest people will wreck anything that does get built. So, it ain't gonna happen.
The beauty of google (not that I love google, but they did hit a nail on the head) is that it requires no effort or "machine intelligence", beyond a very simple algorithm that depends not on AI but rather real, tangible relationships between words and documents (proximity and links). This is something that computers can be really good at.
Just my opinion... obviously there will be others out there who will vehemently disagree, and that's fine! Go ahead and try, you'll learn a lot in the process and you will probably come out with some tangential technology that you never thought of initially but is useful nonetheless.
Meaning is always "in context". Human communication always requires a "transmitter -> medium -> receiver" structure. Some say the universe is fundamentally structured on that model. When these sematic systems are overlaid on content, there's always these slippery, unresolvable mismatches of "intent" and "understanding", those "semantic arguments" that drive likeminded people crazy. Content searching is extremely powerful, without creating the "cracks" into which meanings can irretrievably fall. As long as there are alternative semantic indices to content still available "raw", semantics will just help. When we move to wrap all content entirely in semantics, we'll live in the "map is not the territory" problem forever. Ask CORBA programmers and EU language translators about the death of meaning by means of the dictionary. If we need to add semantics as a tool, we still get under the hood at the actual content.
--
make install -not war
A message has "meaning" if you can make special use of it.
Normal web pages have meaning for browsers, it's just that that meaning is limited to "how to draw words for the user."
What we're doing, is making it so that your computer can make special use of messages on the web, to do smarter things.
It would be scary if the Semantic Web were about "my meaning is THE meaning." But it is explicitely not like that. In fact, one of the main things about it is that anyone can make up their own languages, their own way of modelling the world.
There are tools that make it so you can say, "My word X is sort of like their word Y," but it's acknowledged that such translations will be imperfect. Likely, fuzzy logic, and systems that are able to ask for clarification (and remember responses), will be used to mediate that sort of things.
You may also be interested in my favorite page on AI by Open Mind. The Semantic Web isn't explicitely about AI, but it opens the door for a lot of AI work.
A semantic web is only as useful as the metadata, and people go to great lengths to mislead and disguise.
People who think they know everything really piss off those of us that actually do.
Semantic Web is the most ridiculous idea I've ever heard. The problem with meaning isn't representation -- English represents meaning just fine. The problem is meaning itself -- it doesn't matter if you figure out a way to encode it in some XML language, for every bit that it's easier for computers to use, it will carry that much less meaning.
Another way of putting it is, any program capable of extracting the same meaning from XML that humans can, should be able to understand English without much trouble. It's the whole Intelligence-complete" thing. Like NP-complete, there seem to be a class of problems which can only be solved by real intelligence, and they're all pretty much equivalent in that with real intelligence, you can solve them all.
"If you look 'round the table and can't tell who the sucker is, it's you." -- Quiz Show
Great. An Expert System to do your google searches based on what it thinks you meant. The giant Semantic 'Clippy' knows what's best when it pops up to say:
''Here are the results to the question you should have asked.''
Maybe next they'll have the Semantic Web manage the way electronic voting is counted. Semantic Clippy will count your 'intent' instead of your actual vote.
Google works because it is largely a statistical tool that uses some meta-information.
While I could see frameworks being used for very specific purposes, like searching a homogeneous (e.g., slashdot, pubmed, nytimes) web-site where all content is controlled. But extending these ideas to a heterogenous web that would no doubt take advantages of such a volunteer system is ludicrous.
I also take issue with the top-down mind-state that they will be able to predict what is useful to the user. This is why statistical importance and quantity is the only realistic method for such a massive undertaking (which google is still actively researching).
I think that the only useful research to come out of such an endeavor would be to have news-sites, as mentioned above, implement and be scanned using an ontological browser. Of course, I am not sure how this would be different than Lexus-Nexus (sp?).
Anybody remember the demise of META keywords?
I think we could run into the same problem with the Semantic Web, as it too allows web developers to attach arbitrary metadata to their pages. The only way to prevent unscrupulous web developers from embedding inaccurate RDF in their pages in hopes of attracting more hits is by establishing a web-of-trust framework.
Google implements a very crude version of web-of-trust that assumes "incoming hyperlinks==trust". I think that in order for the Semantic Web to be something that is usable by web-wide search engines like Google, we will need a much more robust and fine-grained system of trust. The user should be able to specify some of the entities that they trust and the search engine will deduce the rest.
However, without an adequate trust framework, the Semantic Web will just be a new fertile ground for for keyword spam and search engine "optimization".
pi = 3.141592653589793helpimtrappedinauniversefactory7
The World Wide Web cannot "at its core handle inconsistent information" yet it seems to lurch along okay.
The Semantic Web is not some attempt at global knowledge, perfect knowledge, perfect reasoning, or anything of the sort, regardless of what many posters, including yourself, seem to have construed it as.
It is intended to be an analogue of the World Wide Web, which is primarily consumed by humans, that is instead primarily consumed by computers.
Can it know everything? Of course not! But it can make it so computers "understand" a heck of a lot more than they do today.
For instance: right now an everyday computer (or more accurately, the web browser) "understands" that (absent styling) a
tag is presented in a certain way. The Semantic Web wants to make it so the triple GallonOfMilk hasPrice $1.25 (this would actually be expressed in several triples about a product with a certain id, probably, but you get the idea) can be "understood" by a program in the same way across multiple sources.
The same as a person does not automatically assume a site is an absolute authority on the price of milk, semantic web enabled programs would not assume that this information was absolute (nor would it likely be presented as such). However, imagine how powerful it would be if one could give your browser the address of the RDF interfaces for local grocery stores (or it might autodiscover them at least in part), and then it would find out what the price of milk (and other groceries) is at each one of them.
That sort of thing is already possible today without the Semantic Web (or other semantic frameworks), but only with methods that either require heavy lifting on the part of the client system (such as web scraping every grocery store site, killing extensibility and easy implementation) or aren't cross-domain (perhaps I want to chart the price of milk (from some milk-price-archive) vs real dollar value -- now my client has to understand two possibly very different ways of presenting information, not just one integrated way).
The Semantic Web (and associated technologies) is an enabling framework that frees programmers from doing a lot of the heavy lifting involved in discovering meaning and relating meaning, just as SQL is an enabling framework that frees programmers from doing a lot of the heavy lifting involved in storing data and relating data.
For to end yet again.
There has been a considerable amount of work on ontology mapping within the knowledge engineering community, but the evolutionary aspects of ontologies have been largely overlooked. Ontology mapping is a harder problem than graph isomorphism, since classes from different ontologies may have extensions that overlap rather than cover each other. It's a difficult problem, certainly, but it's worth noting that game theory isn't applied here.
Game theory tends to appear more within the multi-agent systems community than the semantic web community; they've been looking at the social models for trust for some years now.