Semantic Search Points To Better Relevancy

← Back to Stories (view on slashdot.org)

Semantic Search Points To Better Relevancy

Posted by ryuzaki0 on Tuesday May 29, 2007 @09:44PM from the retrieve-what-I-mean dept.

ReadWriteWeb writes in to tell us about an article by Dr. Riza C. Berkan, founder and CEO of hakia.com, describing the promise of and potential for semantic search. This approach to providing more on-target search results contrasts with the dream of the semantic Web. Semantic search doesn't require all the Web page authors in the world to begin adding metadata; but it's not a sure thing that the researchers now developing the idea will get it right.

10 of 90 comments (clear)

Min score:

Reason:

Sort:

Man promotes own company by DrSkwid · 2007-05-29 22:19 · Score: 2, Insightful

Hear the outlandish claims ladies and gentlemen, of how the brave doctor wants us just to have better searches.

--
There are places where the networks are not touching,and there are places where they are-Boeing's Lori Gunter
Semantics don't work on a global scale by FredDC · 2007-05-29 22:31 · Score: 3, Insightful

IMHO semantics don't work on a global scale, it does work if you only check trusted sources. If everyone can create data and place semantics on it, it becomes useless. You can't trust everyone to place correct semantics on it, either they don't have the knowledge to place correct semantics on data, or they maliciously place the wrong semantics on it.

--
09 f9 11 02 9d 74 e3 5b d8 41 56 c5 63
1. Re:Semantics don't work on a global scale by epine · 2007-05-30 08:35 · Score: 2, Insightful
  
  This society goes to great lengths to cultivate learned helplessness. Attitudes toward brands are a good example. Many people wish to simplify their decision making by forming an emotional bond with their favorite brands, rather than exercising rational judgement, which involves wading into the frustrations involved in finding information you can trust about the products you wish to purchase.
  
  I no time for Sanger, either, who is busy trying to brand knowledge with the warm glow of credentialed expertise.
  
  If the purpose of semantic search is to return search results that lull the sleepy sheep into the warm glow of suspended judgement, it will be a long time coming, and the road will be paved with broken promises.
  
  The reason Google already works so well is that many of us actually *want* to enter into the larger context of the search terms we query. The various manifestations of my keywords are of interest to me. Once I've dialed into the subcontext I'm most interested in, it's usually an easy matter to refine the search. In rare instances, such as the metabolic cofactor SAMe, it proves almost impossible. This is a highly specialized meaning, masked by an everyday word.
  
  It's also annoying that Google won't accept roots, or form clusters of common spellings / misspellings. When I was working with the HC12 microcontroller, I wanted to search all the forms as a set, which included variant forms such as MC68HC12 and HCS12 and 68HC12 as well as forests of related part numbers, all of which specified an HC12 variant. Sometimes I wish to search "color/colour" as equivalent lexemes.
  
  Google already works spectactularly well for any purpose except selling learned helplessness. Many weaknesses exist, and as these weaknesses become more apparent, the worst of the problems ought to be addressed by pragmatic refinements (of the existing search algorithms). Google already has the "google suggests" mechanism to propose more specialized search in the cases where they develop the capacity to support this.
  
  The other problem with the semantic grail is that even after undesired contexts are filtered out, you still don't have a unique answer. Now the question becomes "whose answer?". There are good business models to be had in controlling the answer to that question, and you might still get away with calling it "search", but it would totally suck as an instrument for harvesting knowledge.
  
  I did a lot of work in the nineties in the area of statistical NLP, and I spent a lot of time wrestling with the boundary between what statistical methods could ultimately accomplish, and what the allure of semantic methods really amounted to. Often the "long tail" itself is a fiction of surface forms. For example, "fuschia deck chair" might be a statistical singleton on surface form, but it colour words are clustered it becomes [colour-word] deck chair, which probably isn't a statistical singleton. This level of statistical analysis is rarely employed, because the payoffs are marginal, which is yet again a testament to how well the basic (Google) algorithm already works.
  
  One of the reason statistical methods have proven so successful is that these methods nicely complement what the brain already does well (unless disabled by brand preferences). Humans don't have the patience to scan millions of documents to establish statistical patterns, but we do excel at filtering a nugget of usefullness out of a small pool of crap. This is the biological reality of Sturgeon's law. Any organism that can't identify the one nugget out of ten worth pursuing has relinquished self-destiny.
  
  If Google attacks the clustering and disambuiguation problems, slowly but surely one thing will lead to another, and a semantic-like system will finally emerge, but one quite different than one might discover having set out to achieve the semantic grail by direct means. As Douglas Adams put it "I may not have gone where I intended to go, but I think I have ended up where I intended to be."
Re:The semantic web is still a Good Thing by kahei · 2007-05-29 22:54 · Score: 5, Insightful

Honestly, if some Marxist state from the 60s produced propaganda like that, everyone would laugh:

"The People's Revolution is about more than nationalism! New communal agricultural techniques will enable a standard of living of a completely different nature than today! Manufacturing and distributing goods for the Workers could be taken to a whole new level!"

It's the same fallacy: "If only everyone spontaneously got together and did what I think they should, all problems would go away!"

Yet just because the fictional utopia in question is the 'Semantic Web' rather than the 'Workers Paradise', everybody takes it really seriously. And nobody mocks it at all. Nope, nobody ever laughs at the Semantic Web.

Ok, ok, I'm just being mean, I should go and do something useful.

--
Whence? Hence. Whither? Thither.
Re:So what does he offer? by suv4x4 · 2007-05-29 22:56 · Score: 1, Insightful

But he doesn't say what the right way is, or how it could be, or even if he thinks his company is on the right track. There is no information at all.

Why, what did you expect, a link to their full source code? The article's about the direction the engines are taking, the way those appear in userland. If you'd ask Google about specifics in their algorithm, they'll also be quite silent all of a sudden.
Tiresome and wrong by dread · 2007-05-29 23:07 · Score: 5, Insightful

There is a huge problem with the argument made in the article - one which is plainly visible in the "Palladium" example. The meaning of "Palladium" is related to an internal state (i.e. my internal state). What am *I* thinking about when I write "Palladium"? Am I referring to the element Palladium? Am I referring to the DRM technologies from Microsoft? This is dependent on three things primarily:
1: my "role". What am I? Am I a journalist at a newspaper? Am I a private citizen with a large collection of illgotten mp3s?
2: my "context". Am I discussing something? Is this a query related to a conversation I am having with someone else? God only knows how many Google queries actually stem from ongoing IM-conversations where a, to the reader, previously unknown term/subject is brought forward.
3: my "personality". What am I primarily interested in? What is my preferred format of consumtion? If I am 7 years old - what the hell does "Palladium" really mean?

To me it is obvious that the idea of a semantic web, the promise if you will, can never be delivered upon without a framework that is usercentric rather than centralistic in the current Googlefashion. Desktop search is interesting to some extent as a way of tying our personal space with the dataspace outside of our local control but that is still a very limited tool. Since much of what is very simplistically covered in 1 and 2 above is related to interpersonal communication it becomes obvious that what is necessary is data structures that learn from ongoing conversation, eg the intersection of Person A and Person B is described in a way that can give us guidance as to what the appropriate (or most likely) interpretation of the term used is.

There is much that can be said about this but suffice to say that the semantic web people are ignoring the real needs that have to be met in order to create something that is truly semantic and carries a knowledge of what the end user actually intends. Because if we don't understand the intent, we don't really understand anything.

--
I've had a wonderful time, but this wasn't it -- Groucho Marx
1. Re:Tiresome and wrong by PPH · 2007-05-30 04:06 · Score: 2, Insightful
  
  Well, humans don't understand intent. They have to ask. Call up a reference librarian and ask for information on "Palladium". Odds are s/he will reply with one or more questions. I don't expect a semantic search engine to do any better.
  There are two parts to this problem. The UI, or how a user will interact with the system to describe the context within which a search is to be performed, and the web crawler, which must extract semantics from web pages based on either metadata, linking algorithms (ala Google), natural language processing, or some combination of these.
  Within restricted knowledge domains, some of these techniques work quite well already. Document management systems can enforce metadata and linking conventions and the knowledge domain is already understood to some extent. Transferring this to the WWW might be simpler than many people imagine. Just crawl the pages with the same techniques and index those where the metadata/language/linking is consistent. Ignore the rest as garbage. Odds are that what is most easily parsed and properly tagged will be the most useful to the end user. Owners of pages who wish them to be found will clean them up so as to make them appear in searches.
  
  --
  Have gnu, will travel.
2. Re:Tiresome and wrong by dread · 2007-05-30 09:44 · Score: 2, Insightful
  
  Humans certainly understand intent. They will - as you point out - ask if they don't know the intent. You always know what you intend. If someone you know asks you a question, chances are you will have enough commonality, so to speak, to intuitively grasp the intent (or context). Your example with the librarian is interesting but pointless since you are talking about another centralised knowledge solution whereas I am talking about a decentralised model that starts with the user and - if you will - a "context model".
  
  --
  I've had a wonderful time, but this wasn't it -- Groucho Marx
Re:Would someone please cut and paste here... by srussell · 2007-05-30 03:52 · Score: 2, Insightful

instead of each of us picking one movie we like and potentially forcing the other person to suffer through it, we can find a movie that (in theory) we will both like.

If it weren't for my wife, my media consumption would consist entirely of science fiction and WWI/II movies; thanks to my wife, I've been exposed to a much broader swath of media genres -- some of which has been painful, and some of which I've regretted... but in the balance, I think I'm a better person for it. But, then, I possess an abundance of room for improvement.
Actually, this issue is something that bothers me. This increasing ability to narrow our exposure to data which we find unpleasant, to filter out the world so that we only see what we want to see, is vaguely disturbing. I see what I think are consequences of this increasingly in my own country, and evidence of it in the form of rising fundamentalism around the world. I'm afraid that I do it, too. It is limiting and dangerous, and increasingly easy to do.
I don't have a solution, and maybe there isn't one. Perhaps, someday, we'll all live in virtual realities where all of the facts are shaped to what we want to believe, and we'll never have to interact with anybody who disagrees with us, and we'll find that this is the utopia that humans have been searching for.
Maybe.
--- SER
Re:The semantic web is still a Good Thing by frank_adrian314159 · 2007-05-30 08:05 · Score: 2, Insightful

Ok, ok, I'm just being mean, I should go and do something useful.
No. Actually, you're being accurate. Unless folks can solve the multiple taxonomy problem (and, no, deciding on a common taxonomy and taxonomy translation approaches have not worked in the past) and the metadata cheating problem, the "Semantic Web" is BS promulgated by someone who probably doesn't know the history of epistemology, taxology, or why hard AI problems really are hard, even if he has been knighted. And the people who think that this is worthwhile are the same techno-utopians who probably don't know much about the problem either. When you have a robot that can actually return a Dewey Decimal System classification to four digits to the right of the decimal for a set of randomly selected web pages (and, no, just returning the word "pr0n" doesn't count, although it would probably have the best score of most algorithms you can think of) then you can come and talk about having a start. Otherwise, it's all just BS.

--
That is all.