Semantic Search Points To Better Relevancy
ReadWriteWeb writes in to tell us about an article by Dr. Riza C. Berkan, founder and CEO of hakia.com, describing the promise of and potential for semantic search. This approach to providing more on-target search results contrasts with the dream of the semantic Web. Semantic search doesn't require all the Web page authors in the world to begin adding metadata; but it's not a sure thing that the researchers now developing the idea will get it right.
Oh look, three pink pigs just flew past my window.
From TFA:
"There are so many ways of doing it improperly, and only one way of doing it right."
But he doesn't say what the right way is, or how it could be, or even if he thinks his company is on the right track. There is no information at all.
When his defense asked, "Which computer has Jon Johansen trespassed upon?" the answer was: "His own."
...the best example/s they know of a definition (or better still a demonstration) of "social search." Thanks much.
The semantic web is about more than search. Rich semantics will enable applications of a completely different nature than today. Aggregating and mashing up data could be taken to a whole new level. Just because someone comes up with better indexing we shouldn't give up on the semantic web.
Just my 2 cents, anyway.
.: Max Romantschuk
Hear the outlandish claims ladies and gentlemen, of how the brave doctor wants us just to have better searches.
There are places where the networks are not touching,and there are places where they are-Boeing's Lori Gunter
IMHO semantics don't work on a global scale, it does work if you only check trusted sources. If everyone can create data and place semantics on it, it becomes useless. You can't trust everyone to place correct semantics on it, either they don't have the knowledge to place correct semantics on data, or they maliciously place the wrong semantics on it.
09 f9 11 02 9d 74 e3 5b d8 41 56 c5 63
What are people having trouble finding at the moment anyway?
You're confusing the word "metadata" with the HTML tag . In this case (the semantic web) metadata would be in RDF. More clues here. What TFA is proposing is to semantically process and index websites content, rather than have the websites (or a third party) tag the content with RDF. What both of them are lacking is any kind of a universal ontology (or even standardized specialty ontologies).
No folly is more costly than the folly of intolerant idealism. - Winston Churchill
While this is not strictly PR piece for Hakia.com, it mentions the site (and some others) and I just to try it. I gotta be honest, it does produce more interesting results than Google in some cases (i.e. more accurate). While in others it produces worse results. But the company's young.
Overall, this is the direction we should be taking. The semantic web is indeed just that: a shiny dream.
Today, we're talking about anyone having the ability to create a web page, using pre-made online page/blog tools, or easy to use WYSIWYG desktop apps.
You can't ask of people who can't make the difference between typing a query in the search engine and typing an URL in the address bar, to add proper meta information on his blog. Not to mention the abuse potential.
I can already hear someone saying "If you don't know the XHTML/CSS specs by heart you shouldn't be making pages" but that's just arrogant. Technology should destroy barriers, not create them, the technology which implements this idea better, will succeed. Look at Google: it will parse even the most horrendous code and extract proper information for it. This is why they are number 1.
BTW, Google already extracts semantic information from both the site and query, but this quite primitive compared to the potential mentioned in the article. Google looks for term context, meaning context, synonyms, related words etc. I hope Hakia.com and businesses like them take this idea further, so there's finally some innovation happening in search (something that only enjoyed gradual and miniscule improvements for the last 9 years, since Google introduced pagerank).
Yes, people will abuse it in any way they can. Mostly to try and get higher up in the search engines. But this does not mean it is by definition useless. It is useless to do ranking, but once you (the search engine) have decided to list a site, you could use the metadata for semantic web-stuff. How about allowing for a physical address, phone number, opening hours (for brick & mortar )... This would e.g. allow for a "copy address to contacts" button. Make an easy (web based) program to generate the HTML so mom&pop shops can include it tin their website, and refrain from using it for ranking purposes, and you should be ok.
10 ?"Hello World" life was simple then
One line blog. I hear that they're called Twitters now.
There is a huge problem with the argument made in the article - one which is plainly visible in the "Palladium" example. The meaning of "Palladium" is related to an internal state (i.e. my internal state). What am *I* thinking about when I write "Palladium"? Am I referring to the element Palladium? Am I referring to the DRM technologies from Microsoft? This is dependent on three things primarily:
1: my "role". What am I? Am I a journalist at a newspaper? Am I a private citizen with a large collection of illgotten mp3s?
2: my "context". Am I discussing something? Is this a query related to a conversation I am having with someone else? God only knows how many Google queries actually stem from ongoing IM-conversations where a, to the reader, previously unknown term/subject is brought forward.
3: my "personality". What am I primarily interested in? What is my preferred format of consumtion? If I am 7 years old - what the hell does "Palladium" really mean?
To me it is obvious that the idea of a semantic web, the promise if you will, can never be delivered upon without a framework that is usercentric rather than centralistic in the current Googlefashion. Desktop search is interesting to some extent as a way of tying our personal space with the dataspace outside of our local control but that is still a very limited tool. Since much of what is very simplistically covered in 1 and 2 above is related to interpersonal communication it becomes obvious that what is necessary is data structures that learn from ongoing conversation, eg the intersection of Person A and Person B is described in a way that can give us guidance as to what the appropriate (or most likely) interpretation of the term used is.
There is much that can be said about this but suffice to say that the semantic web people are ignoring the real needs that have to be met in order to create something that is truly semantic and carries a knowledge of what the end user actually intends. Because if we don't understand the intent, we don't really understand anything.
I've had a wonderful time, but this wasn't it -- Groucho Marx
Quick! Tag this story as "Goldfish" and "Hairdressing".
Semantic Web = the promise that never quite delivers
Such a good idea in theory, but where does trust come from? Who can we trust to mark anything?
And by the time any of this is solved google will have evolved so it can understand plain text better than mark up. How do you markup something as ambiguous? Unsure? Rumor? It's pretty easy in plain English:
"I hear Joe is living in Cornwall". There you go, easy to use and no angle brackets.
monk.e.boy
Open source, flash charts
Is 'Semantic Web' already included in Web 2.0? Or will that be the 3.0 version?
BWAAAHAHAHAAAAHAAA
It just needs to be used right. We have had the "keywords" HTML tag for years, but it has been abused and subsequently abandoned.
If search engines would give pages (and domains) a score distributed among its keywords and other metadata, metadata spamming would soon be over.
Let's start properly using the tools we have available now, what is ignored today might be "semantic" tomorrow.
How do you propose enforcing any sort of universal or specialized ontology?
If I have a turd, and I add metadata to it that says its prure gold, it's still a turd; you have to trust me to trust my metadata. That's what the op is talking about, not the container.
Nerd rage is the funniest rage.
Look, I'm not interested the academic niceties of semantic searching and metadata. What I want, would would actually be useful, would be a way to separate out the low value sites from those of relevance FOR THE SEARCH I'M DOING.
If I'm looking for a review of a product, I don't want 50 shops trying to sell me one. If I'm looking for a explanation/definition of a term, I don't want a page that may mention the word, even if it does have high page rank. If I'm looking for a site that gives me lots of links to connected sites, I don't want one that thinks its an island on its own.
Stop trying to classify the small scale, focus on getting the broad scale right and on classifying the search first. Its an easier and more important question.
"Susan saw the dog in the window. She pressed her nose against it. She wanted to buy it."
The SW project exists *because* machines are too dumb to read English. Or Chinese. And will probably stay that way for the forseeable future.
So W3C's RDF is positioned half-way between the world of dumb computers and smart people. It structures data in terms of classes and properties, and allows different groups to define sets of class and property names that can be freely mixed together without the need for heavyweight standardisation. And it gives us an SQL-ish querying framework, SPARQL, for asking questions of this data, and getting back tables of results. Despite the myths, RDF doesn't oblige people to put metadata "inside ever Web page". It just defines a common data model that information from various sources and formats can be mapped to, so that what they say can be processed with less regard for fiddly detail of file formats and encodings. And RDF certainly doesn't require that you believe everything you read: the SPARQL spec, unlike SQL, provides built-in machinery for querying properties of the data source, inline in your query, so you can filter the data down to the bits you decide to trust in some specific app.
This sounds like yet another company doing something like Latent Semantic Indexing or some sort of context processing on the text rather than using RDF markup to decide the semantics. To me, this isn't the semantic web..just another fancy search company trying to jump on the bandwagon.
You're stuck on this idea of how the data is formatted, not the very very very important question of where it comes from. Who cares if the metadata is embedded in the web page, delivered via. RDF, or tied to a brick and thrown through your window? I can still deliver inaccurate metadata. I can still be an asshat and SPAM you with it. "The semantic web" and RDF doesn't solve this problem at all.
Is there a relevance to the display of ignorance in the title?
Three o'clock is always too late or too early for anything you want to do. - Jean-Paul Sartre
before google buys hakia.com and Dr. Riza C. Berkan finally becomes rich and be able to buy this jeep and his friend Dr. Rizzla will be jealous that he didn't want to give him ink for the tonner the other day?
wikipedia....cures cancer, gives the blind sight, teaches small children in Africa that 2+2=5.
Together with a friend from Caltech, I've helped create a social content network for food information which supports semantic search for food information. For example, you can go to efoodi.com and search for 'meat', 'vegetable', or 'Mediterranean' to get a glimpse of the concepts it understands. It also supports social search and tag-based browsing. These technologies are powerful and it's surprising they're not more commonplace on the web.
Re the "very very very important question of where it comes from" and RDF, ...
t
...take a look at the example query there:
... better tutorials and demos are needed) to assume that RDF and SemWeb ignore this problem space.
...) datasources and make them look like SPARQL too, so your apps can be couched in terms of globally-used schemas rather than per-datasource schemas. It's also worth keeping an eye on what Oracle have been up to ... http://www.oracle.com/technology/tech/semantic_tec hnologies/index.html ... no SPARQL yet but some serious RDF support.
See the RDF query spec, SPARQL, specifically the "FROM" clause in the query language.
http://www.w3.org/TR/rdf-sparql-query/#specDatase
Section "8.3.1 Accessing Graph Names"
PREFIX foaf:
SELECT ?src ?bobNick
FROM NAMED
FROM NAMED
WHERE
{
GRAPH ?src
{ ?x foaf:mbox .
?x foaf:nick ?bobNick
}
}
The spec gives the resultset table, which basically says that according to http://example.org/foaf/aliceFoaf the nickname is "Bobby", and according to http://example.org/foaf/bobFoaf the nickname is "Robert".
It's a mistake (although understandable
There's an online SPARQL demo at http://xmlarmyknife.org/api/rdf/sparql/query and another at http://librdf.org/query to get a feel for how some of this stuff works. There are also tools like SquirrelRDF and D2RQ that wrap existing (SQL, LDAP,
Oops, should've previewed my post. Angle brackets in my query examples. See http://www.w3.org/TR/rdf-sparql-query/#specDataset for the real thing plus the corresponding resultsets.
That is a good question. The answer, I think, is "sometimes".
For example, google "USA", vs. "Where are the USA?" and you will get different results. If you really wanted to know where the USA are, the second query will be far more useful, giving you the desired information in the first link.
The "according to" links seem to be more sensitive to natural speech.
In which respect it does not contrast with the Semantic Web, which doesn't require that any more than the regular Web required every computer attached to the internet to start running a web server. Since this article wasn't about the semantic web to start with, was an inaccurate gratuitous attack on the Semantic Web necessary? (Yes, yes, it mirrors the gratuitous attack on the idea made by the author of TFA.)
Also, semantic search has a harder problem than getting people to start using metadata (which only requires demonstrating utility so that it becomes attractive to adopt), it requires developing a system to understand natural language, including understanding which of many diverse senses of a word is intended in context on a page.
Yeah, so Semantic Web requires getting some web authors to put structured information in their pages, and for that to spread as utility is demonstrated. Semantic search requires, per the author of TFA, "a system which understands both the user's query and the Web text using cognitive algorithms similar to that of the human brain, then brings results that are dead on target (right context) at first glance (not requiring to open the Web page for further investigation.)" (emphasis added)
Compared to that, the Semantic Web is easy.
Some (if not all) of the concept relation semantics needed for doing "semantic search"
or "machine comprehension" of text on the web can be gleaned by
doing statistical analysis of the relationships between words and phrases
across the entire web. Aggregating across a large corpus eliminates "noise"
in usage and draws out the semantic "signal" about how people relate the
concepts to each other.
Where are we going and why are we in a handbasket?
There are many researchers who are approaching this problem and aiming to put trust measures PDF file
/.)
While it may not be the holy grail in the format that is described in the article it is always possible to add additional social measures to the framework (similar to what has done to filter out trolls and and off topics here in
That is why research is important and although you only read about how it is formated there are definately more research addressing the credibility problem. It is just that you need to have a common basis of understanding (i.e. data format) clarified first before you can extend it to the next problem. Of course the problem with the Semantic Web is deciding on exactly how and what needs to be represented.
"There are so many ways of doing it improperly, and only one way of doing it right."
That is a fallacy - you can not know that there is 'only one way of doing it right', if you don't know what that 'right' way is to begin with - you are dealing with unknowns. In truth there are very few systems that collapse to a solution set of one. The phrase 'there is more than one way to skin a cat' comes to mind. This one statement tells me the person making the statement is more interested in controlling the method, rather than pursuing investigation for the sake of advancing our fundamental understanding. Every problem is not a nail, and every tool is not a hammer.
Given the variations evident (particularly when trying to propogate the 'one true' ontology), I see the semantic web as a utopia that is unapproachable. The reality will be some hybrid of the best ideas to come out of this research, coupled with (or layered above/below) the practicalities inherent with multiple ontologies/tagging systems, human interpretations and how to resolve/share those differences for each person. That is where the real solution set lays.
Lodragan Draoidh
The more you explain it, the more I don't understand it. - Mark Twain
"Semantic search" is actually a dumbed-down version of what, in AI, used to be called "natural language question answering systems". The first one that was sort of useful was Bobrow's "Baseball", which, unlike Eliza, actually did something useful. "Baseball" had a small database of baseball statistics, and could answer questions like "How many games did the Orioles play in June?". I'm surprised that someone doesn't have a natural language query system for sports statistics on the web today. It's not out of reach technically, because the underlying data is well-structured. Sports fans would use it.
What something like this is really doing is translating natural language to SQL. "How many games did the Orioles play in June?" translates to something like SELECT COUNT(*) FROM games.baseball WHERE (hometeam="Orioles" OR awayteam="Orioles") AND month(gamedate) = 6 AND baseballseason(gamedate) = baseballseason(NOW()); There are existing tools for this, and there have been for years.
"Semantic search" is a dumbed down version of that because it doesn't try to answer the question. It just tries to spew back material which appears to contain an answer to the question. It's like talking to a politician, sales rep, or Jesus freak. "Ask Jeeves" was about as close as we ever got in the WWW era.
The problem with semantic search is that standalone queries have to be stated with more clarity and precision than most users are likely to achieve. The original article suggested "What is palladium used for?" as a query. That's a completely different query from "What is the Palladium used for?". As a standalone query, the best answer is probably "Worship of the goddess Pallas Athene". Which is probably not what the user wanted. With location hints, one might guess that the user wanted information about some theater or nightclub named the Palladium. But that's a guess; sometimes it will be wrong.
This leads to systems that engage in dialogue with the user. Probably by asking the user multiple choice questions. That's quite feasible, but it usually just means funneling the user into some kind of "wizard"-like sequence of dialog boxes. Many sites have "product selectors" like that.
Another approach, which seems to be where Google is going, is to collect vast amounts of information about the user's previous behavior, which can be used as additional context for search requests. That's likely to help, but it makes downsides. If everybody gets a different answer when searching for something, you can't tell other people what to search for to find something. Asking the same question again, after doing other things, might get you a different answer. It's probably going to do the wrong thing some of the time. Given the model that "search is a box into which you type in what you want, more or less", that could drive users nuts.
And none of this really applies to shopping-related searches, which aren't formal queries at all.
My understanding is that you aren't tagging an item with metadata, rather the search engine is tagging your item with metadata on its end based on the linguistic context of the page. Meaning, based on context, it would understand that there is a difference between the word "server" on a page about restaurants vs. the word "server" on a page about office equipment, so you won't get links to Hooter's and Jimmy's Seafood Hut mixed in with your results for equipment. Ideally, any metadata tags you throw will be flat ignored. Context is king.
That's the dream, anyway. The reality is that it doesn't work yet, and may not for a long, long time. This is not an easy trick to pull off.
"Hey, the third matrix movie would have been good except for the plot,story, and acting." --AC
I think that maybe the community could come up with some kind of user-generated meta data system. For example, some one could create a site similar to StumbleUpon, but have it be just a general meta-data service. So when you visit a page, if you feel like it, you can tag it with certain meta data. This could be helpful, for example, in blocking AND finding porn.
Probably something more effective would be something a little more complex than just tags. Using the porn example, you wouldn't want articles talking about porn to be blocked (if you were blocking porn) because it actually wasn't porn. So you might have a couple different categories of tags. You might even put in a rating on the content (I'm thinking along the lines of PG, PG-13, R, etc). The validity of certain meta data could be based on the frequency of the reported meta data.
Essentially, it's like a wiki-meta-data system. You could make a great search engine out of it. You could make good content control systems with it. If you made the data available through a web service, you can put the control for its user in the hands of the user. The meta-data rating software wouldn't be for the average joe, but you could motivate people to rate using systems like what's used in the google image labeler http://images.google.com/imagelabeler/. Or you could require the user to rate a page to "pay" for each search they do. People could also submit their site to be rated.
It would probably be hard to get wide participation, but it would cool if it could be done.
-br
Chapter 2007 Ingrid 7.3.01 Graphics processing is based on a linear database kernel re-engineered from Patrick Slater's psychological repertory grid subroutine of the same name. Ingrid v7.3 will hopefully lay semantic long-tail search plans to put a dynamically flexible, graphically acoustic, externally scheduled version of the RadioChomsky4pp.exe into a global grid computer. This and the instructions to get the latest Ingrid On Winamp software are ready for download now at http://ingridx.dyndns.org/download.html#download Now for my perennial request for help in identifying the subject of what I described, to a Rent-A-Spy Inc., on their web form today. I only told them of the discovery I made of the young version of a very powerful person c.1970. The subject is shouting a criminally compromising line in an obscure unnamed 16mm film. They are to assume that I'm fearing for my safety should they become involved. Cautiously, while hoping they might help, I only gave my first name (Last Name Withheld) and an unidentified prepaid Simcard number for them to text me their investigator's email contact. Knowing for sure in which film exactly where he is to find the approximate timecode of the 30 frames in question, an independent investigator will pass this info onto an IAI investigator, who's foreign associates can then retrieve the original. My point is that, knowing in advance only that this may identify a very high level alleged crime, but not yet the personage, any curious investigator can be proven to start out uncontaminated, as the evidence demands. Thus I hope to remain on the other side of a legal "Chinese Wall" from thence on. Finally to remove my evidence from my website. The inducing Ingrid software includes complete open source code under its own license. There is a discussion at comp.software about future license changes for those wanting an OP Client or to protect against nano-terrorist use of Ingrid. Such dual licenses can be introduced under the present terms of *The Strong inGridX Free Public License 1.1*, available at http://homepages.ihug.co.nz/~income/ingridx/ingrid x_free_public_license.htm
Unfortunately the quid pro quo is that Ingrid must be installed before the source code directory is created. It does, however, not need to be run to view the source, at which point we'll anyway be speaking by phone, I hope.
Best ever
Argumentum ad Probabilitum
That's because librarians are slow. Those questions can lead to saving a great deal of time off the top.
A search engine is fast, effectively providing a ton of answers in seconds or fractions of seconds. The problem then is that we are slow. We can't go through all the hits as fast as the search engine spits them out.
What would be helpful would be if the search engine clustered results as if in response to the sorts of questions our hypothetical librarian might ask. The Clusty search engine attempts this.
Palladium
Loose lips lose spit.
But that's contextual analysis, not metadata. Yes, the search engine is storing its analysis of the content in conjunction with the content, but as far as I know, no one in rdf land ever meant that when they said 'metadata', they meant direct input.
And really, why not just server -food -menu? It would be nice if when confronted with a search for 'server' that the engine noticed that there were several broad interpretations of it and offered to narrow by context(Google sometimes does this *right now*), but just having the user actually input some context works pretty well.
Nerd rage is the funniest rage.
Trust comes from the user's decision to trust a particular source of information, the same as anywhere else.
Who can you trust to tell you anything?
"Computers will understand natural language so that specialized vocabularies for interacting with them are no longer necessary or beneficial" has been the leading "promise that never delivers" for far longer than the Semantic Web has been an idea.
A search on Charlotte's semantic web turned up "SOME PIG", whose real name was Wilbur, a sweet little porker who the locals grew very fond of, especially as he brought fame (and a bit of fortune) to their little town. Thanks to Wilbur's great and true friend Charlotte, Wilbur's essence of character was boiled down to one short phrase, making the search results highly relevant and easily accessible by all of God's creatures -- including spiders, of course. E.B White's creative mind gave us a fascinating character in Charlotte whose "web services" are the kind every child (and most adults too) can appreciate and enjoy.