Semantic Search Points To Better Relevancy
ReadWriteWeb writes in to tell us about an article by Dr. Riza C. Berkan, founder and CEO of hakia.com, describing the promise of and potential for semantic search. This approach to providing more on-target search results contrasts with the dream of the semantic Web. Semantic search doesn't require all the Web page authors in the world to begin adding metadata; but it's not a sure thing that the researchers now developing the idea will get it right.
From TFA:
"There are so many ways of doing it improperly, and only one way of doing it right."
But he doesn't say what the right way is, or how it could be, or even if he thinks his company is on the right track. There is no information at all.
When his defense asked, "Which computer has Jon Johansen trespassed upon?" the answer was: "His own."
The semantic web is about more than search. Rich semantics will enable applications of a completely different nature than today. Aggregating and mashing up data could be taken to a whole new level. Just because someone comes up with better indexing we shouldn't give up on the semantic web.
Just my 2 cents, anyway.
.: Max Romantschuk
Hear the outlandish claims ladies and gentlemen, of how the brave doctor wants us just to have better searches.
There are places where the networks are not touching,and there are places where they are-Boeing's Lori Gunter
IMHO semantics don't work on a global scale, it does work if you only check trusted sources. If everyone can create data and place semantics on it, it becomes useless. You can't trust everyone to place correct semantics on it, either they don't have the knowledge to place correct semantics on data, or they maliciously place the wrong semantics on it.
09 f9 11 02 9d 74 e3 5b d8 41 56 c5 63
You're confusing the word "metadata" with the HTML tag . In this case (the semantic web) metadata would be in RDF. More clues here. What TFA is proposing is to semantically process and index websites content, rather than have the websites (or a third party) tag the content with RDF. What both of them are lacking is any kind of a universal ontology (or even standardized specialty ontologies).
No folly is more costly than the folly of intolerant idealism. - Winston Churchill
While this is not strictly PR piece for Hakia.com, it mentions the site (and some others) and I just to try it. I gotta be honest, it does produce more interesting results than Google in some cases (i.e. more accurate). While in others it produces worse results. But the company's young.
Overall, this is the direction we should be taking. The semantic web is indeed just that: a shiny dream.
Today, we're talking about anyone having the ability to create a web page, using pre-made online page/blog tools, or easy to use WYSIWYG desktop apps.
You can't ask of people who can't make the difference between typing a query in the search engine and typing an URL in the address bar, to add proper meta information on his blog. Not to mention the abuse potential.
I can already hear someone saying "If you don't know the XHTML/CSS specs by heart you shouldn't be making pages" but that's just arrogant. Technology should destroy barriers, not create them, the technology which implements this idea better, will succeed. Look at Google: it will parse even the most horrendous code and extract proper information for it. This is why they are number 1.
BTW, Google already extracts semantic information from both the site and query, but this quite primitive compared to the potential mentioned in the article. Google looks for term context, meaning context, synonyms, related words etc. I hope Hakia.com and businesses like them take this idea further, so there's finally some innovation happening in search (something that only enjoyed gradual and miniscule improvements for the last 9 years, since Google introduced pagerank).
Yes, people will abuse it in any way they can. Mostly to try and get higher up in the search engines. But this does not mean it is by definition useless. It is useless to do ranking, but once you (the search engine) have decided to list a site, you could use the metadata for semantic web-stuff. How about allowing for a physical address, phone number, opening hours (for brick & mortar )... This would e.g. allow for a "copy address to contacts" button. Make an easy (web based) program to generate the HTML so mom&pop shops can include it tin their website, and refrain from using it for ranking purposes, and you should be ok.
10 ?"Hello World" life was simple then
There is a huge problem with the argument made in the article - one which is plainly visible in the "Palladium" example. The meaning of "Palladium" is related to an internal state (i.e. my internal state). What am *I* thinking about when I write "Palladium"? Am I referring to the element Palladium? Am I referring to the DRM technologies from Microsoft? This is dependent on three things primarily:
1: my "role". What am I? Am I a journalist at a newspaper? Am I a private citizen with a large collection of illgotten mp3s?
2: my "context". Am I discussing something? Is this a query related to a conversation I am having with someone else? God only knows how many Google queries actually stem from ongoing IM-conversations where a, to the reader, previously unknown term/subject is brought forward.
3: my "personality". What am I primarily interested in? What is my preferred format of consumtion? If I am 7 years old - what the hell does "Palladium" really mean?
To me it is obvious that the idea of a semantic web, the promise if you will, can never be delivered upon without a framework that is usercentric rather than centralistic in the current Googlefashion. Desktop search is interesting to some extent as a way of tying our personal space with the dataspace outside of our local control but that is still a very limited tool. Since much of what is very simplistically covered in 1 and 2 above is related to interpersonal communication it becomes obvious that what is necessary is data structures that learn from ongoing conversation, eg the intersection of Person A and Person B is described in a way that can give us guidance as to what the appropriate (or most likely) interpretation of the term used is.
There is much that can be said about this but suffice to say that the semantic web people are ignoring the real needs that have to be met in order to create something that is truly semantic and carries a knowledge of what the end user actually intends. Because if we don't understand the intent, we don't really understand anything.
I've had a wonderful time, but this wasn't it -- Groucho Marx
MovieLens is perhaps kind of similar-but-different. You go there and rate movies. Based on similarities to how other people rated movies, it then suggests movies for you and your likely rating of them. It's pretty neat actually -- my wife and I both have accounts there, and you can cross-reference with other people. So now when we go to the video store, instead of each of us picking one movie we like and potentially forcing the other person to suffer through it, we can find a movie that (in theory) we will both like. Seems fairly accurate so far.
Due to circumstances beyond my control, I am master of my fate and captain of my soul.
Quick! Tag this story as "Goldfish" and "Hairdressing".
Semantic Web = the promise that never quite delivers
Such a good idea in theory, but where does trust come from? Who can we trust to mark anything?
And by the time any of this is solved google will have evolved so it can understand plain text better than mark up. How do you markup something as ambiguous? Unsure? Rumor? It's pretty easy in plain English:
"I hear Joe is living in Cornwall". There you go, easy to use and no angle brackets.
monk.e.boy
Open source, flash charts
Is 'Semantic Web' already included in Web 2.0? Or will that be the 3.0 version?
BWAAAHAHAHAAAAHAAA
How do you propose enforcing any sort of universal or specialized ontology?
If I have a turd, and I add metadata to it that says its prure gold, it's still a turd; you have to trust me to trust my metadata. That's what the op is talking about, not the container.
Nerd rage is the funniest rage.
"Susan saw the dog in the window. She pressed her nose against it. She wanted to buy it."
The SW project exists *because* machines are too dumb to read English. Or Chinese. And will probably stay that way for the forseeable future.
So W3C's RDF is positioned half-way between the world of dumb computers and smart people. It structures data in terms of classes and properties, and allows different groups to define sets of class and property names that can be freely mixed together without the need for heavyweight standardisation. And it gives us an SQL-ish querying framework, SPARQL, for asking questions of this data, and getting back tables of results. Despite the myths, RDF doesn't oblige people to put metadata "inside ever Web page". It just defines a common data model that information from various sources and formats can be mapped to, so that what they say can be processed with less regard for fiddly detail of file formats and encodings. And RDF certainly doesn't require that you believe everything you read: the SPARQL spec, unlike SQL, provides built-in machinery for querying properties of the data source, inline in your query, so you can filter the data down to the bits you decide to trust in some specific app.
If you are interested in real solution to semantic web markup that works (and is being used) right now, you might want to check out the Microformats website. There is a growing following that is working on getting the semantic web working properly. The Firefox and Songbird guys are looking at using Microformats to make browsing the web a much richer experience - NOW, not 10 years from now.
There are currently Microformats for marking up people, places, events, geographic locations, music, and many other widely used data items on the web. For more information on what Microformats are, check out the info page on Microformats.
-- manuManu Sporny (skype: msporny, twitter: manusporny, G+: +Manu Sporny)
Founder/CEO - Digital Bazaar, Inc.
Some (if not all) of the concept relation semantics needed for doing "semantic search"
or "machine comprehension" of text on the web can be gleaned by
doing statistical analysis of the relationships between words and phrases
across the entire web. Aggregating across a large corpus eliminates "noise"
in usage and draws out the semantic "signal" about how people relate the
concepts to each other.
Where are we going and why are we in a handbasket?
If it weren't for my wife, my media consumption would consist entirely of science fiction and WWI/II movies; thanks to my wife, I've been exposed to a much broader swath of media genres -- some of which has been painful, and some of which I've regretted... but in the balance, I think I'm a better person for it. But, then, I possess an abundance of room for improvement.
Actually, this issue is something that bothers me. This increasing ability to narrow our exposure to data which we find unpleasant, to filter out the world so that we only see what we want to see, is vaguely disturbing. I see what I think are consequences of this increasingly in my own country, and evidence of it in the form of rising fundamentalism around the world. I'm afraid that I do it, too. It is limiting and dangerous, and increasingly easy to do.
I don't have a solution, and maybe there isn't one. Perhaps, someday, we'll all live in virtual realities where all of the facts are shaped to what we want to believe, and we'll never have to interact with anybody who disagrees with us, and we'll find that this is the utopia that humans have been searching for.
Maybe.
--- SER