Semantic Search Points To Better Relevancy

Nonsense by Anonymous Coward · 2007-05-29 21:51 · Score: -1, Offtopic

Fists of fury!

Re:Nonsense by frisket · 2007-05-29 21:57 · Score: 1, Funny

Oh look, three pink pigs just flew past my window.

metadata worst idea ever by timmarhy · 2007-05-29 21:57 · Score: -1, Troll

people just spam total crap in their metedata headers. searching based on it is a total fucking waste of time.

--
If you mod me down, I will become more powerful than you can imagine....

Re:metadata worst idea ever by Threni · 2007-05-29 22:50 · Score: 1

What are people having trouble finding at the moment anyway?
Re:metadata worst idea ever by $RANDOMLUSER · 2007-05-29 22:51 · Score: 4, Informative

You're confusing the word "metadata" with the HTML tag . In this case (the semantic web) metadata would be in RDF. More clues here. What TFA is proposing is to semantically process and index websites content, rather than have the websites (or a third party) tag the content with RDF. What both of them are lacking is any kind of a universal ontology (or even standardized specialty ontologies).

--
No folly is more costly than the folly of intolerant idealism. - Winston Churchill
Re:metadata worst idea ever by monk.e.boy · 2007-05-29 23:43 · Score: 2, Interesting

Semantic Web = the promise that never quite delivers

Such a good idea in theory, but where does trust come from? Who can we trust to mark anything?

And by the time any of this is solved google will have evolved so it can understand plain text better than mark up. How do you markup something as ambiguous? Unsure? Rumor? It's pretty easy in plain English:

"I hear Joe is living in Cornwall". There you go, easy to use and no angle brackets.

monk.e.boy

--
Open source, flash charts
Re:metadata worst idea ever by maxume · 2007-05-30 00:34 · Score: 2, Informative

How do you propose enforcing any sort of universal or specialized ontology?

If I have a turd, and I add metadata to it that says its prure gold, it's still a turd; you have to trust me to trust my metadata. That's what the op is talking about, not the container.

--
Nerd rage is the funniest rage.
Re:metadata worst idea ever by danbri · 2007-05-30 00:38 · Score: 2, Interesting

"Susan saw the dog in the window. She pressed her nose against it. She wanted to buy it."

The SW project exists *because* machines are too dumb to read English. Or Chinese. And will probably stay that way for the forseeable future.

So W3C's RDF is positioned half-way between the world of dumb computers and smart people. It structures data in terms of classes and properties, and allows different groups to define sets of class and property names that can be freely mixed together without the need for heavyweight standardisation. And it gives us an SQL-ish querying framework, SPARQL, for asking questions of this data, and getting back tables of results. Despite the myths, RDF doesn't oblige people to put metadata "inside ever Web page". It just defines a common data model that information from various sources and formats can be mapped to, so that what they say can be processed with less regard for fiddly detail of file formats and encodings. And RDF certainly doesn't require that you believe everything you read: the SPARQL spec, unlike SQL, provides built-in machinery for querying properties of the data source, inline in your query, so you can filter the data down to the bits you decide to trust in some specific app.
Re:metadata worst idea ever by Anonymous Coward · 2007-05-30 00:46 · Score: 0

You're stuck on this idea of how the data is formatted, not the very very very important question of where it comes from. Who cares if the metadata is embedded in the web page, delivered via. RDF, or tied to a brick and thrown through your window? I can still deliver inaccurate metadata. I can still be an asshat and SPAM you with it. "The semantic web" and RDF doesn't solve this problem at all.
Re:metadata worst idea ever by danbri · 2007-05-30 02:12 · Score: 1

Re the "very very very important question of where it comes from" and RDF, ...

See the RDF query spec, SPARQL, specifically the "FROM" clause in the query language.

http://www.w3.org/TR/rdf-sparql-query/#specDataset

Section "8.3.1 Accessing Graph Names" ...take a look at the example query there:

PREFIX foaf:

SELECT ?src ?bobNick
FROM NAMED
FROM NAMED
WHERE
{
GRAPH ?src
{ ?x foaf:mbox .
?x foaf:nick ?bobNick
}
}

The spec gives the resultset table, which basically says that according to http://example.org/foaf/aliceFoaf the nickname is "Bobby", and according to http://example.org/foaf/bobFoaf the nickname is "Robert".

It's a mistake (although understandable ... better tutorials and demos are needed) to assume that RDF and SemWeb ignore this problem space.

There's an online SPARQL demo at http://xmlarmyknife.org/api/rdf/sparql/query and another at http://librdf.org/query to get a feel for how some of this stuff works. There are also tools like SquirrelRDF and D2RQ that wrap existing (SQL, LDAP, ...) datasources and make them look like SPARQL too, so your apps can be couched in terms of globally-used schemas rather than per-datasource schemas. It's also worth keeping an eye on what Oracle have been up to ... http://www.oracle.com/technology/tech/semantic_tec hnologies/index.html ... no SPARQL yet but some serious RDF support.
Re:metadata worst idea ever by danbri · 2007-05-30 02:14 · Score: 1

Oops, should've previewed my post. Angle brackets in my query examples. See http://www.w3.org/TR/rdf-sparql-query/#specDataset for the real thing plus the corresponding resultsets.
Re:metadata worst idea ever by hoojus · 2007-05-30 03:53 · Score: 1

There are many researchers who are approaching this problem and aiming to put trust measures PDF file

While it may not be the holy grail in the format that is described in the article it is always possible to add additional social measures to the framework (similar to what has done to filter out trolls and and off topics here in /.)
That is why research is important and although you only read about how it is formated there are definately more research addressing the credibility problem. It is just that you need to have a common basis of understanding (i.e. data format) clarified first before you can extend it to the next problem. Of course the problem with the Semantic Web is deciding on exactly how and what needs to be represented.
Re:metadata worst idea ever by dosquatch · 2007-05-30 04:07 · Score: 1

My understanding is that you aren't tagging an item with metadata, rather the search engine is tagging your item with metadata on its end based on the linguistic context of the page. Meaning, based on context, it would understand that there is a difference between the word "server" on a page about restaurants vs. the word "server" on a page about office equipment, so you won't get links to Hooter's and Jimmy's Seafood Hut mixed in with your results for equipment. Ideally, any metadata tags you throw will be flat ignored. Context is king.
That's the dream, anyway. The reality is that it doesn't work yet, and may not for a long, long time. This is not an easy trick to pull off.

--
"Hey, the third matrix movie would have been good except for the plot,story, and acting." --AC
Re:metadata worst idea ever by maxume · 2007-05-30 04:29 · Score: 1

But that's contextual analysis, not metadata. Yes, the search engine is storing its analysis of the content in conjunction with the content, but as far as I know, no one in rdf land ever meant that when they said 'metadata', they meant direct input.

And really, why not just server -food -menu? It would be nice if when confronted with a search for 'server' that the engine noticed that there were several broad interpretations of it and offered to narrow by context(Google sometimes does this *right now*), but just having the user actually input some context works pretty well.

--
Nerd rage is the funniest rage.
Re:metadata worst idea ever by DragonWriter · 2007-05-30 04:40 · Score: 1

Such a good idea in theory, but where does trust come from?

Trust comes from the user's decision to trust a particular source of information, the same as anywhere else.
Who can we trust to mark anything?

Who can you trust to tell you anything?
And by the time any of this is solved google will have evolved so it can understand plain text better than mark up.

"Computers will understand natural language so that specialized vocabularies for interacting with them are no longer necessary or beneficial" has been the leading "promise that never delivers" for far longer than the Semantic Web has been an idea.

So what does he offer? by javilon · 2007-05-29 22:03 · Score: 4, Interesting

From TFA:

"There are so many ways of doing it improperly, and only one way of doing it right."

But he doesn't say what the right way is, or how it could be, or even if he thinks his company is on the right track. There is no information at all.

--

When his defense asked, "Which computer has Jon Johansen trespassed upon?" the answer was: "His own."

Re:So what does he offer? by suv4x4 · 2007-05-29 22:56 · Score: 1, Insightful

But he doesn't say what the right way is, or how it could be, or even if he thinks his company is on the right track. There is no information at all.

Why, what did you expect, a link to their full source code? The article's about the direction the engines are taking, the way those appear in userland. If you'd ask Google about specifics in their algorithm, they'll also be quite silent all of a sudden.
Re:So what does he offer? by Rei · 2007-05-30 05:06 · Score: 1

What we really need is for Wikipedia to move over to Semantic MediaWiki; it should be a painless transition. I really think that it would be widely used -- once people see it in use in some articles, they're more likely to use it in other articles, in the same way that people learn most of Wikipedia's formatting. With wide use of semantic tags (esp. if an ontology was used as well), the entire knowledge base of Wikipedia could be intelligently queried. Want to know all trees that can grow to more than 60 meters high? Want to know what insects have yellow heads and four wings? Want to know what countries have a GDP between 5 and 10 billion dollars? You could do it.

--
GIVE US THE CUTTLEFISH!
Re:So what does he offer? by celtic_hackr · 2007-05-30 06:25 · Score: 1

I'm not so sure this guy is bright enough to come up with a right answer.

How does this guy know there is only one solution?

It may be that there are an infinite number of right solutions. Or it may be that there are a dozen right solutions. It's very rare to find a problem in the universe that has one and only one right solution. It could even be that there is not right solution. In which case, mathematics can come to the rescue, yet again, and provide us with a very large number of solutions approaching "rightness".

BAH! Sounds like FUD to me.
One solution, give me a break!

Would someone please cut and paste here... by jg21 · 2007-05-29 22:05 · Score: 1

...the best example/s they know of a definition (or better still a demonstration) of "social search." Thanks much.

Re:Would someone please cut and paste here... by Anonymous Coward · 2007-05-29 22:15 · Score: 0

Kevin Ryan has a go here. he mentions Swickis as a good use case.
Re:Would someone please cut and paste here... by regular_gonzalez · 2007-05-29 23:17 · Score: 4, Interesting

MovieLens is perhaps kind of similar-but-different. You go there and rate movies. Based on similarities to how other people rated movies, it then suggests movies for you and your likely rating of them. It's pretty neat actually -- my wife and I both have accounts there, and you can cross-reference with other people. So now when we go to the video store, instead of each of us picking one movie we like and potentially forcing the other person to suffer through it, we can find a movie that (in theory) we will both like. Seems fairly accurate so far.

--
Due to circumstances beyond my control, I am master of my fate and captain of my soul.
Re:Would someone please cut and paste here... by pffft · 2007-05-30 03:02 · Score: 1

www.pandora.com

Along the same lines, except for music. You rate songs and then get recomendations based on the characteristics of the music you rate highly. An interesting idea, but not 100% effective in my opinion.
Re:Would someone please cut and paste here... by srussell · 2007-05-30 03:52 · Score: 2, Insightful

instead of each of us picking one movie we like and potentially forcing the other person to suffer through it, we can find a movie that (in theory) we will both like.

If it weren't for my wife, my media consumption would consist entirely of science fiction and WWI/II movies; thanks to my wife, I've been exposed to a much broader swath of media genres -- some of which has been painful, and some of which I've regretted... but in the balance, I think I'm a better person for it. But, then, I possess an abundance of room for improvement.
Actually, this issue is something that bothers me. This increasing ability to narrow our exposure to data which we find unpleasant, to filter out the world so that we only see what we want to see, is vaguely disturbing. I see what I think are consequences of this increasingly in my own country, and evidence of it in the form of rising fundamentalism around the world. I'm afraid that I do it, too. It is limiting and dangerous, and increasingly easy to do.
I don't have a solution, and maybe there isn't one. Perhaps, someday, we'll all live in virtual realities where all of the facts are shaped to what we want to believe, and we'll never have to interact with anybody who disagrees with us, and we'll find that this is the utopia that humans have been searching for.
Maybe.
--- SER
Re:Would someone please cut and paste here... by Anonymous Coward · 2007-05-30 05:22 · Score: 0

MovieLens is a statistical approach that Berkan says is all wrong. Not that I agree with him and it's working for you.
Re:Would someone please cut and paste here... by regular_gonzalez · 2007-05-30 07:43 · Score: 1

Interesting. I've pondered the algorithim that MovieLens might use -- I can think of many possible ways to do it, with a pure statistical approach being the simplest and probably least interesting. I'm sure it will never happen, but it'd be nice to have to have a look at it myself.

--
Due to circumstances beyond my control, I am master of my fate and captain of my soul.

The semantic web is still a Good Thing by Max+Romantschuk · 2007-05-29 22:11 · Score: 3, Interesting

The semantic web is about more than search. Rich semantics will enable applications of a completely different nature than today. Aggregating and mashing up data could be taken to a whole new level. Just because someone comes up with better indexing we shouldn't give up on the semantic web.

Just my 2 cents, anyway.

--
.: Max Romantschuk :: http://max.romantschuk.fi/

Re:The semantic web is still a Good Thing by Anonymous Coward · 2007-05-29 22:16 · Score: 0

as fun as http://cdrpg.com/ :)
Re:The semantic web is still a Good Thing by kahei · 2007-05-29 22:54 · Score: 5, Insightful

Honestly, if some Marxist state from the 60s produced propaganda like that, everyone would laugh:

"The People's Revolution is about more than nationalism! New communal agricultural techniques will enable a standard of living of a completely different nature than today! Manufacturing and distributing goods for the Workers could be taken to a whole new level!"

It's the same fallacy: "If only everyone spontaneously got together and did what I think they should, all problems would go away!"

Yet just because the fictional utopia in question is the 'Semantic Web' rather than the 'Workers Paradise', everybody takes it really seriously. And nobody mocks it at all. Nope, nobody ever laughs at the Semantic Web.

Ok, ok, I'm just being mean, I should go and do something useful.

--
Whence? Hence. Whither? Thither.
Re:The semantic web is still a Good Thing by ThirdPrize · 2007-05-29 22:58 · Score: 0

Its a bit like using XML to describe a rich source of data. I mean you can but it probably won't happen.

--
I have excellent Karma and I am not afraid to Troll it.
Re:The semantic web is still a Good Thing by jfengel · 2007-05-30 02:08 · Score: 1

In fact, Semantic Web isn't even vaguely about search. Semantic Web doesn't index text. It's much closer to a database, with a stronger ability to define relationships between fields than you can do with data schemas. (It's the sort of work you used to have to do with SQL, and some capabilities you couldn't do with SQL.)

So it's Web only in the sense that we're sharing data over port 80; it's not any sort of add-on to HTML. As for Semantic... well, we can debate FOL vs DL vs whatever you want in a different thread.
Re:The semantic web is still a Good Thing by frank_adrian314159 · 2007-05-30 08:05 · Score: 2, Insightful

Ok, ok, I'm just being mean, I should go and do something useful.
No. Actually, you're being accurate. Unless folks can solve the multiple taxonomy problem (and, no, deciding on a common taxonomy and taxonomy translation approaches have not worked in the past) and the metadata cheating problem, the "Semantic Web" is BS promulgated by someone who probably doesn't know the history of epistemology, taxology, or why hard AI problems really are hard, even if he has been knighted. And the people who think that this is worthwhile are the same techno-utopians who probably don't know much about the problem either. When you have a robot that can actually return a Dewey Decimal System classification to four digits to the right of the decimal for a set of randomly selected web pages (and, no, just returning the word "pr0n" doesn't count, although it would probably have the best score of most algorithms you can think of) then you can come and talk about having a start. Otherwise, it's all just BS.

--
That is all.

Eat my goatse'd penis! by Anonymous Coward · 2007-05-29 22:13 · Score: -1, Troll

Fuck you you fucking faggots.

Man promotes own company by DrSkwid · 2007-05-29 22:19 · Score: 2, Insightful

Hear the outlandish claims ladies and gentlemen, of how the brave doctor wants us just to have better searches.

--
There are places where the networks are not touching,and there are places where they are-Boeing's Lori Gunter

Re:Man promotes own company by IwantToKeepAnon · 2007-05-30 03:05 · Score: 0

> There are 11 types of people in the world,
> those who know binaries and those who don't.

So what's the 3rd type of person?

--
"Happy families are all alike; every unhappy family is unhappy in its own way." -- Anna Karenina by Leo Tolstoy

Semantics don't work on a global scale by FredDC · 2007-05-29 22:31 · Score: 3, Insightful

IMHO semantics don't work on a global scale, it does work if you only check trusted sources. If everyone can create data and place semantics on it, it becomes useless. You can't trust everyone to place correct semantics on it, either they don't have the knowledge to place correct semantics on data, or they maliciously place the wrong semantics on it.

--
09 f9 11 02 9d 74 e3 5b d8 41 56 c5 63

Re:Semantics don't work on a global scale by Anonymous Coward · 2007-05-29 23:13 · Score: 1, Funny

Lets see how many buzzwords I can crame into a single sentence...

Okay, so lets mashup this semantic idea with the whole social news concept to create a innovative, synergized social semantic system.
Re:Semantics don't work on a global scale by Anonymous Coward · 2007-05-30 00:15 · Score: 0

I think you'll find that's a "social semantic Web 2.0 system". And possibly also very AJAX-y.
Re:Semantics don't work on a global scale by PPH · 2007-05-30 03:32 · Score: 1

They do if you are searching for pr0n. Any key word you can think of will lead you to an XXX site.

--
Have gnu, will travel.
Re:Semantics don't work on a global scale by veganboyjosh · 2007-05-30 03:55 · Score: 1

They do if you are searching for pr0n. Any word you can think of will lead you to an XXX site.

fixed that for you...
Re:Semantics don't work on a global scale by DragonWriter · 2007-05-30 04:34 · Score: 1

IMHO semantics don't work on a global scale, it does work if you only check trusted sources.

The first part is not true, the second part is. Of course, one of the key applications for semantic technology is "web of trust" kind of systems that provide the infrastructure for dealing with the question "who is a trusted source and to what degree?"

There is no requirement that semantic tags from different sources be treated equally (and the distinction isn't just between "trust" and "ignore", you can do a lot more based on the source of information than that.)
Re:Semantics don't work on a global scale by epine · 2007-05-30 08:35 · Score: 2, Insightful

This society goes to great lengths to cultivate learned helplessness. Attitudes toward brands are a good example. Many people wish to simplify their decision making by forming an emotional bond with their favorite brands, rather than exercising rational judgement, which involves wading into the frustrations involved in finding information you can trust about the products you wish to purchase.

I no time for Sanger, either, who is busy trying to brand knowledge with the warm glow of credentialed expertise.

If the purpose of semantic search is to return search results that lull the sleepy sheep into the warm glow of suspended judgement, it will be a long time coming, and the road will be paved with broken promises.

The reason Google already works so well is that many of us actually *want* to enter into the larger context of the search terms we query. The various manifestations of my keywords are of interest to me. Once I've dialed into the subcontext I'm most interested in, it's usually an easy matter to refine the search. In rare instances, such as the metabolic cofactor SAMe, it proves almost impossible. This is a highly specialized meaning, masked by an everyday word.

It's also annoying that Google won't accept roots, or form clusters of common spellings / misspellings. When I was working with the HC12 microcontroller, I wanted to search all the forms as a set, which included variant forms such as MC68HC12 and HCS12 and 68HC12 as well as forests of related part numbers, all of which specified an HC12 variant. Sometimes I wish to search "color/colour" as equivalent lexemes.

Google already works spectactularly well for any purpose except selling learned helplessness. Many weaknesses exist, and as these weaknesses become more apparent, the worst of the problems ought to be addressed by pragmatic refinements (of the existing search algorithms). Google already has the "google suggests" mechanism to propose more specialized search in the cases where they develop the capacity to support this.

The other problem with the semantic grail is that even after undesired contexts are filtered out, you still don't have a unique answer. Now the question becomes "whose answer?". There are good business models to be had in controlling the answer to that question, and you might still get away with calling it "search", but it would totally suck as an instrument for harvesting knowledge.

I did a lot of work in the nineties in the area of statistical NLP, and I spent a lot of time wrestling with the boundary between what statistical methods could ultimately accomplish, and what the allure of semantic methods really amounted to. Often the "long tail" itself is a fiction of surface forms. For example, "fuschia deck chair" might be a statistical singleton on surface form, but it colour words are clustered it becomes [colour-word] deck chair, which probably isn't a statistical singleton. This level of statistical analysis is rarely employed, because the payoffs are marginal, which is yet again a testament to how well the basic (Google) algorithm already works.

One of the reason statistical methods have proven so successful is that these methods nicely complement what the brain already does well (unless disabled by brand preferences). Humans don't have the patience to scan millions of documents to establish statistical patterns, but we do excel at filtering a nugget of usefullness out of a small pool of crap. This is the biological reality of Sturgeon's law. Any organism that can't identify the one nugget out of ten worth pursuing has relinquished self-destiny.

If Google attacks the clustering and disambuiguation problems, slowly but surely one thing will lead to another, and a semantic-like system will finally emerge, but one quite different than one might discover having set out to achieve the semantic grail by direct means. As Douglas Adams put it "I may not have gone where I intended to go, but I think I have ended up where I intended to be."

That's good by suv4x4 · 2007-05-29 22:52 · Score: 4, Interesting

While this is not strictly PR piece for Hakia.com, it mentions the site (and some others) and I just to try it. I gotta be honest, it does produce more interesting results than Google in some cases (i.e. more accurate). While in others it produces worse results. But the company's young.

Overall, this is the direction we should be taking. The semantic web is indeed just that: a shiny dream.

Today, we're talking about anyone having the ability to create a web page, using pre-made online page/blog tools, or easy to use WYSIWYG desktop apps.

You can't ask of people who can't make the difference between typing a query in the search engine and typing an URL in the address bar, to add proper meta information on his blog. Not to mention the abuse potential.

I can already hear someone saying "If you don't know the XHTML/CSS specs by heart you shouldn't be making pages" but that's just arrogant. Technology should destroy barriers, not create them, the technology which implements this idea better, will succeed. Look at Google: it will parse even the most horrendous code and extract proper information for it. This is why they are number 1.

BTW, Google already extracts semantic information from both the site and query, but this quite primitive compared to the potential mentioned in the article. Google looks for term context, meaning context, synonyms, related words etc. I hope Hakia.com and businesses like them take this idea further, so there's finally some innovation happening in search (something that only enjoyed gradual and miniscule improvements for the last 9 years, since Google introduced pagerank).

Re:That's good by Yoozer · 2007-05-29 23:31 · Score: 1

I can already hear someone saying "If you don't know the XHTML/CSS specs by heart you shouldn't be making pages" but that's just arrogant.
No, it's overdone, but certainly not arrogant. Knowing not everything by heart is not a problem when there's a reference nearby, as long as you consistently follow it (which is why it helps to know stuff by heart so you don't have to look it up. Even then, in the end it makes your job easier instead of harder; CSS saves you a lot of headaches with consistency in design.

Adhering to standards and accessibility may give you the edge in the business while letting a hundred monkeys bang away in Frontpage '97 won't. It's probably more arrogant to say you don't need that edge.
Re:That's good by suv4x4 · 2007-05-29 23:56 · Score: 2, Interesting

Adhering to standards and accessibility may give you the edge in the business while letting a hundred monkeys bang away in Frontpage '97 won't. It's probably more arrogant to say you don't need that edge.

There are two things here: actually there isn't a "business" behind every page. This is like saying we should all have proper automated phone answer systems on our phones, as this gives us edge in our business: but phones are used for more than business, and I certainly don't need all those fancy things on my home phone.

The web is large enough, there's place for all kinds of sites: amateur sites with poor code and interesting content, web dev blogs with ultra accurate code and amusingly somewhat boring content, huge site portals with terirble code but a strong CMS system to make up for it, huge site portals with great code, bad CMS system and hundreds of monkeys who do manual edits on the pages every day.

Standards, as defined by W3C are just a way to make multiple agents compatible (search engines, clients, servers). If they are compatible, you've achieved the goal of a standard. A standard isn't the goal itself, it's the means. And sometimes you need to be more flexible about the means.

Now, I'm authoring pages strictly comliant with the standards, it's more of a geek-ish inner requirement since I've a good knowledge on how internally the browsers handle all this (and by the time the browsers change drastically, the site would be redesigned few times already, or dead). I don't however care about inserting empty alt tags on images without meaning, or avoiding "target" since it was supposedly bad about something. I need to use a feature, it works, it's not going away: I use it. It's my means. I achieved my goal, on time, and with great results.
Re:That's good by Yoozer · 2007-05-30 01:32 · Score: 1

There are two things here: actually there isn't a "business" behind every page.
Agreed - there isn't. But what's the goal of the business on the web? To get attention. That is exactly the same goal of that large number of people who make everything themselves - the difference being that they aren't designers, SEO specialists, server-side scripters or what-have-you and have to become a jack of all trades in the time it takes to browse through a Teach-Yourself-X-in-Y-minutes. You want your little place to be attention-generating or -friendly

Ideally these people don't have to worry about the standards - just about the content. Problem is that the vast majority with their own webpage just don't see that they're reinventing the square wheel with the crappy HTML, because they don't know about the existing solutions or they insist that they're unique snowflakes not deserving a cookie-cutter solution. The solution for them would be to have a visual programming environment that worries about the standards where you can just type, dump pictures, etc - and like an bungee rope, you're dragged back to the crappy HTML again because the IDE is clueless and doesn't use the standards because making it look exactly like in design view is more important. The CMS/blog/whatever is just too hard to install, but uploading Word documents converted to HTML resembles at least a little bit of the paradigm they're used to (count the links that start with "file:///C:/My Documents/John Doe" instead of "http://")

As for "target", it has its use - it Just Works when you don't want to send anyone away from the original page and you don't want to make a nifty Web 2.0-ish fade-to-black unclickable screen with a centered big version of the picture. It will only irritate if you're spawning a mass of new windows or tabs for no good reason.
Re:That's good by fermion · 2007-05-30 01:55 · Score: 2, Interesting

The points are valid within a certain context, but we have to define what that context is. First, who is going to pay for the service. Second, who is going to use the service. Third how is the service actually going to be built. Fourth how is the profit going to be derived.
In the Google model, advertising pays the bill, the masses use it, the service is built on sound statistical principles, and profit is driven by focusig on making the process relatively simple and cheap. The web is crawled, links are counted, a bit of intellegence is added, and results are displayed.
Overall this method has proven useful. The problems are mainly that the pagerank has proven easy to hack. I do not believe the problem is that users look for Madonna and get the pop star by mistake. Since google is meant to be used by the masses, as it is the cheap mass searches that generate revenue, the popularity ranking is not an issue. Make no mistake, google results are ofttimes crap, but they are still usable for common searches.
The semantic web, as discussed, seems to be something different. It in fact seems to be the standard revolt of a linguist against the mathematician. The linguist say translation must be in meaning. The statistician says I can do it without understanding anything. They are both correct, but Google has shown the later can provide reasonable and cheap results. Likewise, this guy tries to compare the long tail to the iceberg. Of course, the long tail are the minority underserved, who are underserved because the lack the means or desire to pay for the service. The hidden iceberg is the majority that sinks large ships. Not someone who understands statistics, or, for that matter, is likely to make a generous profit.
What I think this guy is talking about is the specialized services that people might pay for directly, not a booming industry, as the nation provides librarians for free. A program that will take a search, and we assume that the user is competent enough to form the search using valid english, as there is no librarian to help construct the search, and know enough about the language, about the context, and about the subject matter, to return the exactly proper few results. It would then have to do this cheaply enough to drive a profit. This would in fact be a grand piece of software, but would it compete with Google or MSN or any mass search engine?
I am disappointed as even simple semantic search engines could get rid of the clutter we have on google, and if someone were willing to invest, even MS for that matter, the link farms could be a thing of the past. A lot of this, I believe, is due to the battle between the mathematicians and the linguants.

--
"She's a scientist and a lesbian. She's not going to let it slide." Orphan Black
Re:That's good by hoojus · 2007-05-30 04:00 · Score: 1

You can't ask of people who can't make the difference between typing a query in the search engine and typing an URL in the address bar, to add proper meta information on his blog. But that is a good thing as the semantic search won't return their blog... and really do you want to read the blog of a person who types queries in the address bar?
Re:That's good by Anonymous Coward · 2007-05-30 04:05 · Score: 0
Before you praise Hakia too much, you might do some experiments and discover that it behaves very much as if it were doing this:
- You enter a search.
- Hakia passes the search to search.msn.com and ask.com.
- Hakia permutes the top couple dozen results from MSN and Ask and passes them back to you.
If your search is really common, they may return a stock answer. But for uncommon searches, their resuls coincide very curiously with MSN and Ask.

Let's try an example together; then you can try more for yourself. Enter the search, "is jesus really the son of god" into hakia, MSN, and Ask. Here's what I found:
- result 1 from hakia = result 17 from msn
- result 2 from hakia = result 6 from ask
- result 3 from hakia = result 3 from ask
- result 4 from hakia = result 9 from ask
- result 5 from hakia = result 11 from ask
- result 6 from hakia = result 11 from msn
- result 7 from hakia = result 7 from ask
- result 8 from hakia = result 8 from ask
- result 9 from hakia = result 2 from ask
Beware that search engines don't always return results in the same order; for example, when I redid the search on MSN, the result 1 from Hakia came up as result 29 on MSN. But you get the idea.

When I write "=" up above, I don't just mean that Hakia is returning the same URL as MSN or Ask. That might be coincidence. What I mean is that every single word in the title and description on Hakai also appears on Ask or MSN. Hakia is frequently missing some words, but never has any extra words.

Anyway, I don't know for sure what's going on, but that is a whopping big coincidence, I think.
Re:That's good by DragonWriter · 2007-05-30 04:26 · Score: 1

Overall, this is the direction we should be taking. The semantic web is indeed just that: a shiny dream.

The kind of "semantic search" laid out in the paper is at least as much of a "shiny dream" as the Semantic Web pretty much by definition. The kind of "semantic search" laid out in this paper requires an extreme version of exactly the same technology that would be used by a "semantic factory" that would take user-created content and add semantic markup automatically, the only difference is that instead of pushing the information back for storage in or with the web page it describes, the information is stored remotely, can't be tailored, and can't be accessed except through the central service that created the information.

If you've got the technology to make this "semantic search", you've got the technology to let people author "Semantic Web" pages completely seamlessly the same way WYSIWYG tools let them do now without needing to know much, or even any, HTML, CSS, etc. If you apply that technology in the Semantic Web sense rather than the centralized semantic search model, people who do understand the technology and have the initiative can improve the semantic representation associated with their pages, rather than relying on the autogenerated semantic representation, plus you don't rely on a central search service, and can create applications that rely on semantic content without going through that separate service.

in the defense of meta-data by spectrokid · 2007-05-29 22:54 · Score: 4, Interesting

Yes, people will abuse it in any way they can. Mostly to try and get higher up in the search engines. But this does not mean it is by definition useless. It is useless to do ranking, but once you (the search engine) have decided to list a site, you could use the metadata for semantic web-stuff. How about allowing for a physical address, phone number, opening hours (for brick & mortar )... This would e.g. allow for a "copy address to contacts" button. Make an easy (web based) program to generate the HTML so mom&pop shops can include it tin their website, and refrain from using it for ranking purposes, and you should be ok.

--

10 ?"Hello World" life was simple then

Re:in the defense of meta-data by suv4x4 · 2007-05-29 22:59 · Score: 1

but once you (the search engine) have decided to list a site, you could use the metadata for semantic web-stuff

Yup.. this is how microformats work. Something which a lot of the top companies seem to be interested in (including Microsoft).
Re:in the defense of meta-data by maxume · 2007-05-30 00:45 · Score: 1

As people figure out that adding metadata makes their data more useful for themselves, they will add more and more of it. If it's out there, it will get used.

--
Nerd rage is the funniest rage.

Looking for Mr/Ms Right by AndroidCat · 2007-05-29 22:57 · Score: 1

[..] but it's not a sure thing that the researchers now developing the idea will get it right.

Well, is there anyone in their Friend Of A Friend RDFweb that might know how to get it right?

--
One line blog. I hear that they're called Twitters now.

Tiresome and wrong by dread · 2007-05-29 23:07 · Score: 5, Insightful

There is a huge problem with the argument made in the article - one which is plainly visible in the "Palladium" example. The meaning of "Palladium" is related to an internal state (i.e. my internal state). What am *I* thinking about when I write "Palladium"? Am I referring to the element Palladium? Am I referring to the DRM technologies from Microsoft? This is dependent on three things primarily:
1: my "role". What am I? Am I a journalist at a newspaper? Am I a private citizen with a large collection of illgotten mp3s?
2: my "context". Am I discussing something? Is this a query related to a conversation I am having with someone else? God only knows how many Google queries actually stem from ongoing IM-conversations where a, to the reader, previously unknown term/subject is brought forward.
3: my "personality". What am I primarily interested in? What is my preferred format of consumtion? If I am 7 years old - what the hell does "Palladium" really mean?

To me it is obvious that the idea of a semantic web, the promise if you will, can never be delivered upon without a framework that is usercentric rather than centralistic in the current Googlefashion. Desktop search is interesting to some extent as a way of tying our personal space with the dataspace outside of our local control but that is still a very limited tool. Since much of what is very simplistically covered in 1 and 2 above is related to interpersonal communication it becomes obvious that what is necessary is data structures that learn from ongoing conversation, eg the intersection of Person A and Person B is described in a way that can give us guidance as to what the appropriate (or most likely) interpretation of the term used is.

There is much that can be said about this but suffice to say that the semantic web people are ignoring the real needs that have to be met in order to create something that is truly semantic and carries a knowledge of what the end user actually intends. Because if we don't understand the intent, we don't really understand anything.

--
I've had a wonderful time, but this wasn't it -- Groucho Marx

Re:Tiresome and wrong by Anonymous Coward · 2007-05-29 23:50 · Score: 1, Interesting

I got my master's at one of the schools getting the bulk of the research money, and we made that same argument there, to deaf ears. Namely that students and professors were solving the easy "peripheral" problems related to semantic web, and just ignoring the 13,125,732-lb gorillas in the room.
Re:Tiresome and wrong by illaqueate · 2007-05-30 00:33 · Score: 2, Interesting

Yeah, pretty much. I set out to make a data assistant program in high school (c 1996-1999) and was thinking about how to get a correspondence between what I was thinking and how data would be retrieved and figured it would have to be so generic to be worthless. And then I read Hilary Putnam's Representation and Reality and felt sick about the entire thing. But now that I think back on it I did have a lot of fun testing out different kinds of data retrieval on structured and unstructured data (and thinking up weird semantic hypertext languages).

http://slashdot.org/comments.pl?sid=142985&cid=119 86906 -- lol
Re:Tiresome and wrong by PPH · 2007-05-30 04:06 · Score: 2, Insightful

Well, humans don't understand intent. They have to ask. Call up a reference librarian and ask for information on "Palladium". Odds are s/he will reply with one or more questions. I don't expect a semantic search engine to do any better.
There are two parts to this problem. The UI, or how a user will interact with the system to describe the context within which a search is to be performed, and the web crawler, which must extract semantics from web pages based on either metadata, linking algorithms (ala Google), natural language processing, or some combination of these.
Within restricted knowledge domains, some of these techniques work quite well already. Document management systems can enforce metadata and linking conventions and the knowledge domain is already understood to some extent. Transferring this to the WWW might be simpler than many people imagine. Just crawl the pages with the same techniques and index those where the metadata/language/linking is consistent. Ignore the rest as garbage. Odds are that what is most easily parsed and properly tagged will be the most useful to the end user. Owners of pages who wish them to be found will clean them up so as to make them appear in searches.

--
Have gnu, will travel.
Re:Tiresome and wrong by dread · 2007-05-30 09:44 · Score: 2, Insightful

Humans certainly understand intent. They will - as you point out - ask if they don't know the intent. You always know what you intend. If someone you know asks you a question, chances are you will have enough commonality, so to speak, to intuitively grasp the intent (or context). Your example with the librarian is interesting but pointless since you are talking about another centralised knowledge solution whereas I am talking about a decentralised model that starts with the user and - if you will - a "context model".

--
I've had a wonderful time, but this wasn't it -- Groucho Marx
Re:Tiresome and wrong by Pollardito · 2007-05-30 12:24 · Score: 1

There is a huge problem with the argument made in the article - one which is plainly visible in the "Palladium" example. The meaning of "Palladium" is related to an internal state (i.e. my internal state). What am *I* thinking about when I write "Palladium"? Am I referring to the element Palladium? Am I referring to the DRM technologies from Microsoft? This is dependent on three things primarily:
1: my "role". What am I? Am I a journalist at a newspaper? Am I a private citizen with a large collection of illgotten mp3s?
2: my "context". Am I discussing something? Is this a query related to a conversation I am having with someone else? God only knows how many Google queries actually stem from ongoing IM-conversations where a, to the reader, previously unknown term/subject is brought forward.
3: my "personality". What am I primarily interested in? What is my preferred format of consumtion? If I am 7 years old - what the hell does "Palladium" really mean? the main point of the article is that semantics offers improved searching over link statistics, and i'm not sure how any of this relates to that since none of this context is being supplied or used by a search engine like Google either. but like you said it's interesting that a lot of the questions above could be answered with a semantic search of the user's own hard drive, which would be probably more useful than a link statistics index of that same device.

but doing a search on your own machine and providing that context to the search engine to help it identify context is a huge privacy issue. my browser could aid the search engine in establishing the context by transferring 1. information about me to establish what my "role" is, 2. information about ongoing conversations to establish my "context" for the search terms, and 3. information about other things that i've been browsing to determine my "personality". but do you really want your browser to convey any of that to the search engine? and if you're not going to be passing that information to the server, how can your computer use that information locally to match up against various server-supplied choices to make a more exact match while keeping your privacy intact?

unfortunately none of this relates to automatic versus manual tagging and indexing of documents semantically, which i figured would be the focus of an article by someone with a stake in that argument. how the web of relations gets built isn't really based on your personal situation. this all has to do with how we decide which part of that semantic web we want to offer the user as the most useful starting points for finding more information on a topic.
Re:Tiresome and wrong by PPH · 2007-05-30 13:01 · Score: 1

Assuming the pre-existance of a shared 'context model' is cheating, sort of. The reference librarian example is valid from the point of view of having to establish this context model upon contacting the librarian 'cold' so to speak. The exchange that must occur when you contact this librarian, or any other human may seem trivial. But for an API to a semantic database, centralized or otherwise, this exchange must be formalized. Once that's done, semantic processing isn't terribly difficult. It has been a solved problem for over a decade within restricted problem domains.

--
Have gnu, will travel.
Re:Tiresome and wrong by nuzak · 2007-05-30 15:49 · Score: 1

How's this differ from humans? If you asked me what I think of Palladium, I'd say they made some pretty fun RPG's. A semantic web search will at least be able to separate the distinct definitions from each other, which is something you don't get with current lexical searches.

--
Done with slashdot, done with nerds, getting a life.

Who actually asks search engines questions? by ThirdPrize · 2007-05-29 23:23 · Score: -1, Offtopic

I only ever google on the main words of the subject i am interested in. "Warez" as opposed to "where can i get the latest warez from?". Seriously, does google do anything with the "where can i get" part?

--
I have excellent Karma and I am not afraid to Troll it.

Re:Who actually asks search engines questions? by endianx · 2007-05-30 02:20 · Score: 1

That is a good question. The answer, I think, is "sometimes".

For example, google "USA", vs. "Where are the USA?" and you will get different results. If you really wanted to know where the USA are, the second query will be far more useful, giving you the desired information in the first link.

The "according to" links seem to be more sensitive to natural speech.

Semantic Search Points To Better Relevancy by robably · 2007-05-29 23:28 · Score: 4, Funny

Quick! Tag this story as "Goldfish" and "Hairdressing".

Re:Semantic Search Points To Better Relevancy by neersign · 2007-05-30 02:46 · Score: 1

cue Master Shake: "Search for `Tooth Fairy` and ` Tooth Conspiracy`....and `Metallica`"

Hey, kids! Test your Wikipedia street smarts! by Anonymous Coward · 2007-05-29 23:40 · Score: -1, Offtopic

Which of the three passages below is the authentic excerpt from Wikipedia?

Conan Christopher O'Brien, 44, is the comedian and the host of The Tonight Show With Jay Leno. He is Scottish, as were his parents, as well as his three brothers and two siblings. He has no relation to CNN anchor Soledad O'Brien.

O'Brien, who is 43, is commonly thought by television audiences to be of diminutive stature, though some journalists and alternative biographers dispute this claim.

As of 2007, O'Brien has been confirmed dead of tuberculosis. His hair color was red. He was 45.

—
{This page is currently protected from editing until disputes have been resolved.}

[Image:KarlMarx.jpg]
United States President Abraham Lincoln was President of the United States during the Revolutionary War, and a well-known Libertarian.[1] Though some historians see Objectivist tendencies in his greatness.[2][3] Many of his most generous qualities can be traced back to the philosophy of Ayn Rand.[4]

Lincoln is now known to have suffered a mild form of Autism known as Asperger's Syndrome.[5][6][7]

Assassinated at 54 by a vandal known as Jon Harvey Booth,[8] or some say by political crony Edwin Stanton, Lincoln would have been 187 years old today (as of 2005)[original research?] had he not been assassinated in the prime of his life at the age of 45 by unemployed actor Juliette Lewis Botch.[9]

—
The Pokédex (Pokemon Zukan[?], lit. "Pokémon Encyclopedia") is an electronic device designed to catalogue and provide information regarding the various species of Pokémon featured in the Pokémon video game and anime series. The name Pokédex is a neologism including Pokémon (which itself is a portmanteau of pocket and monster) and index. The Japanese name is simply "Pokémon Encyclopedia" in Japanese.

In the video games, whenever a Pokémon is first captured, its data will be added to a player's Pokédex. In the anime the Pokédex is a comprehensive electronic reference encyclopedia, usually referred to in order to deliver exposition. There are four differently numbered Pokédex modes to date: the Kanto Pokedex, introduced in Pokémon Red and Blue; the Johto Pokédex, introduced in Pokémon Gold and Silver; the Hoenn Pokédex, introduced in Pokémon Ruby and Sapphire and expanded upon in Pokémon FireRed and LeafGreen; and the Sinnoh Pokédex, introduced in Pokémon Diamond and Pearl.

Availablilty? by Anonymous Coward · 2007-05-29 23:46 · Score: 2, Funny

Is 'Semantic Web' already included in Web 2.0? Or will that be the 3.0 version?

BWAAAHAHAHAAAAHAAA

Re:Availablilty? by Professeur+Shadoko · 2007-05-30 01:20 · Score: 1

I heard somewhere that it will be in Longhorn when it ships.

it could be semantic already by Anonymous Coward · 2007-05-29 23:54 · Score: 0

It just needs to be used right. We have had the "keywords" HTML tag for years, but it has been abused and subsequently abandoned.

If search engines would give pages (and domains) a score distributed among its keywords and other metadata, metadata spamming would soon be over.

Let's start properly using the tools we have available now, what is ignored today might be "semantic" tomorrow.

Missing the target by sane? · 2007-05-30 00:35 · Score: 1

Look, I'm not interested the academic niceties of semantic searching and metadata. What I want, would would actually be useful, would be a way to separate out the low value sites from those of relevance FOR THE SEARCH I'M DOING.

If I'm looking for a review of a product, I don't want 50 shops trying to sell me one. If I'm looking for a explanation/definition of a term, I don't want a page that may mention the word, even if it does have high page rank. If I'm looking for a site that gives me lots of links to connected sites, I don't want one that thinks its an island on its own.

Stop trying to classify the small scale, focus on getting the broad scale right and on classifying the search first. Its an easier and more important question.

Re:Missing the target by msporny · 2007-05-30 01:18 · Score: 2, Interesting

If you are interested in real solution to semantic web markup that works (and is being used) right now, you might want to check out the Microformats website. There is a growing following that is working on getting the semantic web working properly. The Firefox and Songbird guys are looking at using Microformats to make browsing the web a much richer experience - NOW, not 10 years from now.

There are currently Microformats for marking up people, places, events, geographic locations, music, and many other widely used data items on the web. For more information on what Microformats are, check out the info page on Microformats.
-- manu

--
Manu Sporny (skype: msporny, twitter: manusporny, G+: +Manu Sporny)
Founder/CEO - Digital Bazaar, Inc.
Re:Missing the target by Anonymous Coward · 2007-05-30 01:47 · Score: 0

If you're looking for a definition of a term, you could try "define:(term here)" in Google (and other SEs prolly have similar shorthands).

Just mentioning it on the off-chance that you don't know already, not trying to be smart.

doesn't sound like TBL's semantic web to me.. by rogueuk · 2007-05-30 00:42 · Score: 1

This sounds like yet another company doing something like Latent Semantic Indexing or some sort of context processing on the text rather than using RDF markup to decide the semantics. To me, this isn't the semantic web..just another fancy search company trying to jump on the bandwagon.

relevance by ohell · 2007-05-30 01:18 · Score: 1

Is there a relevance to the display of ignorance in the title?

--
Three o'clock is always too late or too early for anything you want to do. - Jean-Paul Sartre

how long by Anonymous Coward · 2007-05-30 01:40 · Score: 0

before google buys hakia.com and Dr. Riza C. Berkan finally becomes rich and be able to buy this jeep and his friend Dr. Rizzla will be jealous that he didn't want to give him ink for the tonner the other day?

how about another example? by Anonymous Coward · 2007-05-30 01:54 · Score: 0

wikipedia....cures cancer, gives the blind sight, teaches small children in Africa that 2+2=5.

semantic search, tagging, and social search by jonharel · 2007-05-30 02:01 · Score: 1

Together with a friend from Caltech, I've helped create a social content network for food information which supports semantic search for food information. For example, you can go to efoodi.com and search for 'meat', 'vegetable', or 'Mediterranean' to get a glimpse of the concepts it understands. It also supports social search and tag-based browsing. These technologies are powerful and it's surprising they're not more commonplace on the web.

WWOT? f3p by Anonymous Coward · 2007-05-30 02:35 · Score: -1, Offtopic

aal over America HOT ON THE hEELS OF

Semantic Web by DragonWriter · 2007-05-30 03:03 · Score: 1

This approach to providing more on-target search results contrasts with the dream of the semantic Web. Semantic search doesn't require all the Web page authors in the world to begin adding metadata;

In which respect it does not contrast with the Semantic Web, which doesn't require that any more than the regular Web required every computer attached to the internet to start running a web server. Since this article wasn't about the semantic web to start with, was an inaccurate gratuitous attack on the Semantic Web necessary? (Yes, yes, it mirrors the gratuitous attack on the idea made by the author of TFA.)

Also, semantic search has a harder problem than getting people to start using metadata (which only requires demonstrating utility so that it becomes attractive to adopt), it requires developing a system to understand natural language, including understanding which of many diverse senses of a word is intended in context on a page.

Yeah, so Semantic Web requires getting some web authors to put structured information in their pages, and for that to spread as utility is demonstrated. Semantic search requires, per the author of TFA, "a system which understands both the user's query and the Web text using cognitive algorithms similar to that of the human brain, then brings results that are dead on target (right context) at first glance (not requiring to open the Web page for further investigation.)" (emphasis added)

Compared to that, the Semantic Web is easy.

Semantics derivable from web corpus statistics by presidenteloco · 2007-05-30 03:51 · Score: 2, Informative

Some (if not all) of the concept relation semantics needed for doing "semantic search"
or "machine comprehension" of text on the web can be gleaned by
doing statistical analysis of the relationships between words and phrases
across the entire web. Aggregating across a large corpus eliminates "noise"
in usage and draws out the semantic "signal" about how people relate the
concepts to each other.

--

Where are we going and why are we in a handbasket?

Dogma by Lodragandraoidh · 2007-05-30 03:55 · Score: 1

"There are so many ways of doing it improperly, and only one way of doing it right."

That is a fallacy - you can not know that there is 'only one way of doing it right', if you don't know what that 'right' way is to begin with - you are dealing with unknowns. In truth there are very few systems that collapse to a solution set of one. The phrase 'there is more than one way to skin a cat' comes to mind. This one statement tells me the person making the statement is more interested in controlling the method, rather than pursuing investigation for the sake of advancing our fundamental understanding. Every problem is not a nail, and every tool is not a hammer.

Given the variations evident (particularly when trying to propogate the 'one true' ontology), I see the semantic web as a utopia that is unapproachable. The reality will be some hybrid of the best ideas to come out of this research, coupled with (or layered above/below) the practicalities inherent with multiple ontologies/tagging systems, human interpretations and how to resolve/share those differences for each person. That is where the real solution set lays.

--

Lodragan Draoidh
The more you explain it, the more I don't understand it. - Mark Twain

Question-answering systems by Animats · 2007-05-30 04:06 · Score: 1

"Semantic search" is actually a dumbed-down version of what, in AI, used to be called "natural language question answering systems". The first one that was sort of useful was Bobrow's "Baseball", which, unlike Eliza, actually did something useful. "Baseball" had a small database of baseball statistics, and could answer questions like "How many games did the Orioles play in June?". I'm surprised that someone doesn't have a natural language query system for sports statistics on the web today. It's not out of reach technically, because the underlying data is well-structured. Sports fans would use it.

What something like this is really doing is translating natural language to SQL. "How many games did the Orioles play in June?" translates to something like SELECT COUNT(*) FROM games.baseball WHERE (hometeam="Orioles" OR awayteam="Orioles") AND month(gamedate) = 6 AND baseballseason(gamedate) = baseballseason(NOW()); There are existing tools for this, and there have been for years.

"Semantic search" is a dumbed down version of that because it doesn't try to answer the question. It just tries to spew back material which appears to contain an answer to the question. It's like talking to a politician, sales rep, or Jesus freak. "Ask Jeeves" was about as close as we ever got in the WWW era.

The problem with semantic search is that standalone queries have to be stated with more clarity and precision than most users are likely to achieve. The original article suggested "What is palladium used for?" as a query. That's a completely different query from "What is the Palladium used for?". As a standalone query, the best answer is probably "Worship of the goddess Pallas Athene". Which is probably not what the user wanted. With location hints, one might guess that the user wanted information about some theater or nightclub named the Palladium. But that's a guess; sometimes it will be wrong.

This leads to systems that engage in dialogue with the user. Probably by asking the user multiple choice questions. That's quite feasible, but it usually just means funneling the user into some kind of "wizard"-like sequence of dialog boxes. Many sites have "product selectors" like that.

Another approach, which seems to be where Google is going, is to collect vast amounts of information about the user's previous behavior, which can be used as additional context for search requests. That's likely to help, but it makes downsides. If everybody gets a different answer when searching for something, you can't tell other people what to search for to find something. Asking the same question again, after doing other things, might get you a different answer. It's probably going to do the wrong thing some of the time. Given the model that "search is a box into which you type in what you want, more or less", that could drive users nuts.

And none of this really applies to shopping-related searches, which aren't formal queries at all.

Re:Question-answering systems by msbmsb · 2007-05-30 05:06 · Score: 1

"Used to be called 'natural language question answering systems'"? NLP and Question Answering are still very very active fields of research with many conferences, workshops and evaluations going on - not only in the US but also internationally - encompassing multi-lingual QA and reasoning-based QA. Ask Jeeves was not real QA, it was based more on manual annotations than open-domain NLP.

User generated meta data? by blurryrunner · 2007-05-30 04:17 · Score: 1

I think that maybe the community could come up with some kind of user-generated meta data system. For example, some one could create a site similar to StumbleUpon, but have it be just a general meta-data service. So when you visit a page, if you feel like it, you can tag it with certain meta data. This could be helpful, for example, in blocking AND finding porn.

Probably something more effective would be something a little more complex than just tags. Using the porn example, you wouldn't want articles talking about porn to be blocked (if you were blocking porn) because it actually wasn't porn. So you might have a couple different categories of tags. You might even put in a rating on the content (I'm thinking along the lines of PG, PG-13, R, etc). The validity of certain meta data could be based on the frequency of the reported meta data.

Essentially, it's like a wiki-meta-data system. You could make a great search engine out of it. You could make good content control systems with it. If you made the data available through a web service, you can put the control for its user in the hands of the user. The meta-data rating software wouldn't be for the average joe, but you could motivate people to rate using systems like what's used in the google image labeler http://images.google.com/imagelabeler/. Or you could require the user to rate a page to "pay" for each search they do. People could also submit their site to be rated.

It would probably be hard to get wide participation, but it would cool if it could be done.

-br

Re:User generated meta data? by DragonWriter · 2007-05-30 05:15 · Score: 1

Probably something more effective would be something a little more complex than just tags. Using the porn example, you wouldn't want articles talking about porn to be blocked (if you were blocking porn) because it actually wasn't porn. So you might have a couple different categories of tags.

You mean, you could have something represented like:
- subject: http://www.somewebsite.com/ predicate: contains object: pr0n - subject: http://www.otherwebsite.com/ predicate: discusses object: pr0n

With a UI that lets you tag a page so that the current page is the subject, and you choose the predicate describing the relationship and the object it relates to?

Sounds like a perfect application for RDF.
Re:User generated meta data? by blurryrunner · 2007-05-30 08:25 · Score: 1

Thanks, you're absolutely right. I will admit, I'm not all that familiar with proposals surrounding the semantic web (I just looked up RDF) and it looks like RDF is what I'm looking for.

In terms of the idea, I was also thinking that even better than just getting input from users, you could do a hybrid, which is have the content provider provide the meta data and have it verified by the users. It would probably get adopted faster. Also, the search engine could even penalize sites if their self-provided meta data varied too far from what users thought. That would prevent gaming the system by providers.

I think that it could easily be implemented as a firefox plugin and made into a toolbar (though I don't have experience in that area). It would also have high requirements for servers, so the cost would be high. It would also only be as valuable as the availability of the meta data and user-contributed data. So it's a tough sell. You would have to be big to make it work.

-br

The Quantum Bookkeepers by Jimekai · 2007-05-30 04:17 · Score: 1

Chapter 2007 Ingrid 7.3.01 Graphics processing is based on a linear database kernel re-engineered from Patrick Slater's psychological repertory grid subroutine of the same name. Ingrid v7.3 will hopefully lay semantic long-tail search plans to put a dynamically flexible, graphically acoustic, externally scheduled version of the RadioChomsky4pp.exe into a global grid computer. This and the instructions to get the latest Ingrid On Winamp software are ready for download now at http://ingridx.dyndns.org/download.html#download Now for my perennial request for help in identifying the subject of what I described, to a Rent-A-Spy Inc., on their web form today. I only told them of the discovery I made of the young version of a very powerful person c.1970. The subject is shouting a criminally compromising line in an obscure unnamed 16mm film. They are to assume that I'm fearing for my safety should they become involved. Cautiously, while hoping they might help, I only gave my first name (Last Name Withheld) and an unidentified prepaid Simcard number for them to text me their investigator's email contact. Knowing for sure in which film exactly where he is to find the approximate timecode of the 30 frames in question, an independent investigator will pass this info onto an IAI investigator, who's foreign associates can then retrieve the original. My point is that, knowing in advance only that this may identify a very high level alleged crime, but not yet the personage, any curious investigator can be proven to start out uncontaminated, as the evidence demands. Thus I hope to remain on the other side of a legal "Chinese Wall" from thence on. Finally to remove my evidence from my website. The inducing Ingrid software includes complete open source code under its own license. There is a discussion at comp.software about future license changes for those wanting an OP Client or to protect against nano-terrorist use of Ingrid. Such dual licenses can be introduced under the present terms of *The Strong inGridX Free Public License 1.1*, available at http://homepages.ihug.co.nz/~income/ingridx/ingrid x_free_public_license.htm Unfortunately the quid pro quo is that Ingrid must be installed before the source code directory is created. It does, however, not need to be run to view the source, at which point we'll anyway be speaking by phone, I hope. Best ever

--
Argumentum ad Probabilitum

Librarians are slow by fyoder · 2007-05-30 04:20 · Score: 1

Call up a reference librarian and ask for information on "Palladium". Odds are s/he will reply with one or more questions.

That's because librarians are slow. Those questions can lead to saving a great deal of time off the top.

A search engine is fast, effectively providing a ton of answers in seconds or fractions of seconds. The problem then is that we are slow. We can't go through all the hits as fast as the search engine spits them out.

What would be helpful would be if the search engine clustered results as if in response to the sorts of questions our hypothetical librarian might ask. The Clusty search engine attempts this.

Palladium

--
Loose lips lose spit.

Charlotte's Semantic Web by datastrategy · 2007-05-30 16:19 · Score: 1

A search on Charlotte's semantic web turned up "SOME PIG", whose real name was Wilbur, a sweet little porker who the locals grew very fond of, especially as he brought fame (and a bit of fortune) to their little town. Thanks to Wilbur's great and true friend Charlotte, Wilbur's essence of character was boiled down to one short phrase, making the search results highly relevant and easily accessible by all of God's creatures -- including spiders, of course. E.B White's creative mind gave us a fascinating character in Charlotte whose "web services" are the kind every child (and most adults too) can appreciate and enjoy.

Slashdot Mirror

Semantic Search Points To Better Relevancy

90 comments