Semantic Search Points To Better Relevancy

← Back to Stories (view on slashdot.org)

Semantic Search Points To Better Relevancy

Posted by ryuzaki0 on Tuesday May 29, 2007 @09:44PM from the retrieve-what-I-mean dept.

ReadWriteWeb writes in to tell us about an article by Dr. Riza C. Berkan, founder and CEO of hakia.com, describing the promise of and potential for semantic search. This approach to providing more on-target search results contrasts with the dream of the semantic Web. Semantic search doesn't require all the Web page authors in the world to begin adding metadata; but it's not a sure thing that the researchers now developing the idea will get it right.

12 of 90 comments (clear)

Min score:

Reason:

Sort:

So what does he offer? by javilon · 2007-05-29 22:03 · Score: 4, Interesting

From TFA:

"There are so many ways of doing it improperly, and only one way of doing it right."

But he doesn't say what the right way is, or how it could be, or even if he thinks his company is on the right track. There is no information at all.

--

When his defense asked, "Which computer has Jon Johansen trespassed upon?" the answer was: "His own."
The semantic web is still a Good Thing by Max+Romantschuk · 2007-05-29 22:11 · Score: 3, Interesting

The semantic web is about more than search. Rich semantics will enable applications of a completely different nature than today. Aggregating and mashing up data could be taken to a whole new level. Just because someone comes up with better indexing we shouldn't give up on the semantic web.

Just my 2 cents, anyway.

--
.: Max Romantschuk :: http://max.romantschuk.fi/
That's good by suv4x4 · 2007-05-29 22:52 · Score: 4, Interesting

While this is not strictly PR piece for Hakia.com, it mentions the site (and some others) and I just to try it. I gotta be honest, it does produce more interesting results than Google in some cases (i.e. more accurate). While in others it produces worse results. But the company's young.

Overall, this is the direction we should be taking. The semantic web is indeed just that: a shiny dream.

Today, we're talking about anyone having the ability to create a web page, using pre-made online page/blog tools, or easy to use WYSIWYG desktop apps.

You can't ask of people who can't make the difference between typing a query in the search engine and typing an URL in the address bar, to add proper meta information on his blog. Not to mention the abuse potential.

I can already hear someone saying "If you don't know the XHTML/CSS specs by heart you shouldn't be making pages" but that's just arrogant. Technology should destroy barriers, not create them, the technology which implements this idea better, will succeed. Look at Google: it will parse even the most horrendous code and extract proper information for it. This is why they are number 1.

BTW, Google already extracts semantic information from both the site and query, but this quite primitive compared to the potential mentioned in the article. Google looks for term context, meaning context, synonyms, related words etc. I hope Hakia.com and businesses like them take this idea further, so there's finally some innovation happening in search (something that only enjoyed gradual and miniscule improvements for the last 9 years, since Google introduced pagerank).
1. Re:That's good by suv4x4 · 2007-05-29 23:56 · Score: 2, Interesting
  
  Adhering to standards and accessibility may give you the edge in the business while letting a hundred monkeys bang away in Frontpage '97 won't. It's probably more arrogant to say you don't need that edge.
  
  There are two things here: actually there isn't a "business" behind every page. This is like saying we should all have proper automated phone answer systems on our phones, as this gives us edge in our business: but phones are used for more than business, and I certainly don't need all those fancy things on my home phone.
  
  The web is large enough, there's place for all kinds of sites: amateur sites with poor code and interesting content, web dev blogs with ultra accurate code and amusingly somewhat boring content, huge site portals with terirble code but a strong CMS system to make up for it, huge site portals with great code, bad CMS system and hundreds of monkeys who do manual edits on the pages every day.
  
  Standards, as defined by W3C are just a way to make multiple agents compatible (search engines, clients, servers). If they are compatible, you've achieved the goal of a standard. A standard isn't the goal itself, it's the means. And sometimes you need to be more flexible about the means.
  
  Now, I'm authoring pages strictly comliant with the standards, it's more of a geek-ish inner requirement since I've a good knowledge on how internally the browsers handle all this (and by the time the browsers change drastically, the site would be redesigned few times already, or dead). I don't however care about inserting empty alt tags on images without meaning, or avoiding "target" since it was supposedly bad about something. I need to use a feature, it works, it's not going away: I use it. It's my means. I achieved my goal, on time, and with great results.
2. Re:That's good by fermion · 2007-05-30 01:55 · Score: 2, Interesting
  
  The points are valid within a certain context, but we have to define what that context is. First, who is going to pay for the service. Second, who is going to use the service. Third how is the service actually going to be built. Fourth how is the profit going to be derived.
  In the Google model, advertising pays the bill, the masses use it, the service is built on sound statistical principles, and profit is driven by focusig on making the process relatively simple and cheap. The web is crawled, links are counted, a bit of intellegence is added, and results are displayed.
  Overall this method has proven useful. The problems are mainly that the pagerank has proven easy to hack. I do not believe the problem is that users look for Madonna and get the pop star by mistake. Since google is meant to be used by the masses, as it is the cheap mass searches that generate revenue, the popularity ranking is not an issue. Make no mistake, google results are ofttimes crap, but they are still usable for common searches.
  The semantic web, as discussed, seems to be something different. It in fact seems to be the standard revolt of a linguist against the mathematician. The linguist say translation must be in meaning. The statistician says I can do it without understanding anything. They are both correct, but Google has shown the later can provide reasonable and cheap results. Likewise, this guy tries to compare the long tail to the iceberg. Of course, the long tail are the minority underserved, who are underserved because the lack the means or desire to pay for the service. The hidden iceberg is the majority that sinks large ships. Not someone who understands statistics, or, for that matter, is likely to make a generous profit.
  What I think this guy is talking about is the specialized services that people might pay for directly, not a booming industry, as the nation provides librarians for free. A program that will take a search, and we assume that the user is competent enough to form the search using valid english, as there is no librarian to help construct the search, and know enough about the language, about the context, and about the subject matter, to return the exactly proper few results. It would then have to do this cheaply enough to drive a profit. This would in fact be a grand piece of software, but would it compete with Google or MSN or any mass search engine?
  I am disappointed as even simple semantic search engines could get rid of the clutter we have on google, and if someone were willing to invest, even MS for that matter, the link farms could be a thing of the past. A lot of this, I believe, is due to the battle between the mathematicians and the linguants.
  
  --
  "She's a scientist and a lesbian. She's not going to let it slide." Orphan Black
in the defense of meta-data by spectrokid · 2007-05-29 22:54 · Score: 4, Interesting

Yes, people will abuse it in any way they can. Mostly to try and get higher up in the search engines. But this does not mean it is by definition useless. It is useless to do ranking, but once you (the search engine) have decided to list a site, you could use the metadata for semantic web-stuff. How about allowing for a physical address, phone number, opening hours (for brick & mortar )... This would e.g. allow for a "copy address to contacts" button. Make an easy (web based) program to generate the HTML so mom&pop shops can include it tin their website, and refrain from using it for ranking purposes, and you should be ok.

--
10 ?"Hello World" life was simple then
Re:Would someone please cut and paste here... by regular_gonzalez · 2007-05-29 23:17 · Score: 4, Interesting

MovieLens is perhaps kind of similar-but-different. You go there and rate movies. Based on similarities to how other people rated movies, it then suggests movies for you and your likely rating of them. It's pretty neat actually -- my wife and I both have accounts there, and you can cross-reference with other people. So now when we go to the video store, instead of each of us picking one movie we like and potentially forcing the other person to suffer through it, we can find a movie that (in theory) we will both like. Seems fairly accurate so far.

--
Due to circumstances beyond my control, I am master of my fate and captain of my soul.
Re:metadata worst idea ever by monk.e.boy · 2007-05-29 23:43 · Score: 2, Interesting

Semantic Web = the promise that never quite delivers

Such a good idea in theory, but where does trust come from? Who can we trust to mark anything?

And by the time any of this is solved google will have evolved so it can understand plain text better than mark up. How do you markup something as ambiguous? Unsure? Rumor? It's pretty easy in plain English:

"I hear Joe is living in Cornwall". There you go, easy to use and no angle brackets.

monk.e.boy

--
Open source, flash charts
Re:Tiresome and wrong by Anonymous Coward · 2007-05-29 23:50 · Score: 1, Interesting

I got my master's at one of the schools getting the bulk of the research money, and we made that same argument there, to deaf ears. Namely that students and professors were solving the easy "peripheral" problems related to semantic web, and just ignoring the 13,125,732-lb gorillas in the room.
Re:Tiresome and wrong by illaqueate · 2007-05-30 00:33 · Score: 2, Interesting

Yeah, pretty much. I set out to make a data assistant program in high school (c 1996-1999) and was thinking about how to get a correspondence between what I was thinking and how data would be retrieved and figured it would have to be so generic to be worthless. And then I read Hilary Putnam's Representation and Reality and felt sick about the entire thing. But now that I think back on it I did have a lot of fun testing out different kinds of data retrieval on structured and unstructured data (and thinking up weird semantic hypertext languages).

http://slashdot.org/comments.pl?sid=142985&cid=119 86906 -- lol
Re:metadata worst idea ever by danbri · 2007-05-30 00:38 · Score: 2, Interesting

"Susan saw the dog in the window. She pressed her nose against it. She wanted to buy it."

The SW project exists *because* machines are too dumb to read English. Or Chinese. And will probably stay that way for the forseeable future.

So W3C's RDF is positioned half-way between the world of dumb computers and smart people. It structures data in terms of classes and properties, and allows different groups to define sets of class and property names that can be freely mixed together without the need for heavyweight standardisation. And it gives us an SQL-ish querying framework, SPARQL, for asking questions of this data, and getting back tables of results. Despite the myths, RDF doesn't oblige people to put metadata "inside ever Web page". It just defines a common data model that information from various sources and formats can be mapped to, so that what they say can be processed with less regard for fiddly detail of file formats and encodings. And RDF certainly doesn't require that you believe everything you read: the SPARQL spec, unlike SQL, provides built-in machinery for querying properties of the data source, inline in your query, so you can filter the data down to the bits you decide to trust in some specific app.
Re:Missing the target by msporny · 2007-05-30 01:18 · Score: 2, Interesting

If you are interested in real solution to semantic web markup that works (and is being used) right now, you might want to check out the Microformats website. There is a growing following that is working on getting the semantic web working properly. The Firefox and Songbird guys are looking at using Microformats to make browsing the web a much richer experience - NOW, not 10 years from now.

There are currently Microformats for marking up people, places, events, geographic locations, music, and many other widely used data items on the web. For more information on what Microformats are, check out the info page on Microformats.
-- manu

--
Manu Sporny (skype: msporny, twitter: manusporny, G+: +Manu Sporny)
Founder/CEO - Digital Bazaar, Inc.