Semantic Search Points To Better Relevancy
ReadWriteWeb writes in to tell us about an article by Dr. Riza C. Berkan, founder and CEO of hakia.com, describing the promise of and potential for semantic search. This approach to providing more on-target search results contrasts with the dream of the semantic Web. Semantic search doesn't require all the Web page authors in the world to begin adding metadata; but it's not a sure thing that the researchers now developing the idea will get it right.
Fists of fury!
people just spam total crap in their metedata headers. searching based on it is a total fucking waste of time.
If you mod me down, I will become more powerful than you can imagine....
From TFA:
"There are so many ways of doing it improperly, and only one way of doing it right."
But he doesn't say what the right way is, or how it could be, or even if he thinks his company is on the right track. There is no information at all.
When his defense asked, "Which computer has Jon Johansen trespassed upon?" the answer was: "His own."
...the best example/s they know of a definition (or better still a demonstration) of "social search." Thanks much.
The semantic web is about more than search. Rich semantics will enable applications of a completely different nature than today. Aggregating and mashing up data could be taken to a whole new level. Just because someone comes up with better indexing we shouldn't give up on the semantic web.
Just my 2 cents, anyway.
.: Max Romantschuk
Fuck you you fucking faggots.
Hear the outlandish claims ladies and gentlemen, of how the brave doctor wants us just to have better searches.
There are places where the networks are not touching,and there are places where they are-Boeing's Lori Gunter
IMHO semantics don't work on a global scale, it does work if you only check trusted sources. If everyone can create data and place semantics on it, it becomes useless. You can't trust everyone to place correct semantics on it, either they don't have the knowledge to place correct semantics on data, or they maliciously place the wrong semantics on it.
09 f9 11 02 9d 74 e3 5b d8 41 56 c5 63
While this is not strictly PR piece for Hakia.com, it mentions the site (and some others) and I just to try it. I gotta be honest, it does produce more interesting results than Google in some cases (i.e. more accurate). While in others it produces worse results. But the company's young.
Overall, this is the direction we should be taking. The semantic web is indeed just that: a shiny dream.
Today, we're talking about anyone having the ability to create a web page, using pre-made online page/blog tools, or easy to use WYSIWYG desktop apps.
You can't ask of people who can't make the difference between typing a query in the search engine and typing an URL in the address bar, to add proper meta information on his blog. Not to mention the abuse potential.
I can already hear someone saying "If you don't know the XHTML/CSS specs by heart you shouldn't be making pages" but that's just arrogant. Technology should destroy barriers, not create them, the technology which implements this idea better, will succeed. Look at Google: it will parse even the most horrendous code and extract proper information for it. This is why they are number 1.
BTW, Google already extracts semantic information from both the site and query, but this quite primitive compared to the potential mentioned in the article. Google looks for term context, meaning context, synonyms, related words etc. I hope Hakia.com and businesses like them take this idea further, so there's finally some innovation happening in search (something that only enjoyed gradual and miniscule improvements for the last 9 years, since Google introduced pagerank).
Yes, people will abuse it in any way they can. Mostly to try and get higher up in the search engines. But this does not mean it is by definition useless. It is useless to do ranking, but once you (the search engine) have decided to list a site, you could use the metadata for semantic web-stuff. How about allowing for a physical address, phone number, opening hours (for brick & mortar )... This would e.g. allow for a "copy address to contacts" button. Make an easy (web based) program to generate the HTML so mom&pop shops can include it tin their website, and refrain from using it for ranking purposes, and you should be ok.
10 ?"Hello World" life was simple then
One line blog. I hear that they're called Twitters now.
There is a huge problem with the argument made in the article - one which is plainly visible in the "Palladium" example. The meaning of "Palladium" is related to an internal state (i.e. my internal state). What am *I* thinking about when I write "Palladium"? Am I referring to the element Palladium? Am I referring to the DRM technologies from Microsoft? This is dependent on three things primarily:
1: my "role". What am I? Am I a journalist at a newspaper? Am I a private citizen with a large collection of illgotten mp3s?
2: my "context". Am I discussing something? Is this a query related to a conversation I am having with someone else? God only knows how many Google queries actually stem from ongoing IM-conversations where a, to the reader, previously unknown term/subject is brought forward.
3: my "personality". What am I primarily interested in? What is my preferred format of consumtion? If I am 7 years old - what the hell does "Palladium" really mean?
To me it is obvious that the idea of a semantic web, the promise if you will, can never be delivered upon without a framework that is usercentric rather than centralistic in the current Googlefashion. Desktop search is interesting to some extent as a way of tying our personal space with the dataspace outside of our local control but that is still a very limited tool. Since much of what is very simplistically covered in 1 and 2 above is related to interpersonal communication it becomes obvious that what is necessary is data structures that learn from ongoing conversation, eg the intersection of Person A and Person B is described in a way that can give us guidance as to what the appropriate (or most likely) interpretation of the term used is.
There is much that can be said about this but suffice to say that the semantic web people are ignoring the real needs that have to be met in order to create something that is truly semantic and carries a knowledge of what the end user actually intends. Because if we don't understand the intent, we don't really understand anything.
I've had a wonderful time, but this wasn't it -- Groucho Marx
I only ever google on the main words of the subject i am interested in. "Warez" as opposed to "where can i get the latest warez from?". Seriously, does google do anything with the "where can i get" part?
I have excellent Karma and I am not afraid to Troll it.
Quick! Tag this story as "Goldfish" and "Hairdressing".
Which of the three passages below is the authentic excerpt from Wikipedia?
Conan Christopher O'Brien, 44, is the comedian and the host of The Tonight Show With Jay Leno. He is Scottish, as were his parents, as well as his three brothers and two siblings. He has no relation to CNN anchor Soledad O'Brien.
O'Brien, who is 43, is commonly thought by television audiences to be of diminutive stature, though some journalists and alternative biographers dispute this claim.
As of 2007, O'Brien has been confirmed dead of tuberculosis. His hair color was red. He was 45.
—
{This page is currently protected from editing until disputes have been resolved.}
[Image:KarlMarx.jpg]
United States President Abraham Lincoln was President of the United States during the Revolutionary War, and a well-known Libertarian.[1] Though some historians see Objectivist tendencies in his greatness.[2][3] Many of his most generous qualities can be traced back to the philosophy of Ayn Rand.[4]
Lincoln is now known to have suffered a mild form of Autism known as Asperger's Syndrome.[5][6][7]
Assassinated at 54 by a vandal known as Jon Harvey Booth,[8] or some say by political crony Edwin Stanton, Lincoln would have been 187 years old today (as of 2005)[original research?] had he not been assassinated in the prime of his life at the age of 45 by unemployed actor Juliette Lewis Botch.[9]
—
The Pokédex (Pokemon Zukan[?], lit. "Pokémon Encyclopedia") is an electronic device designed to catalogue and provide information regarding the various species of Pokémon featured in the Pokémon video game and anime series. The name Pokédex is a neologism including Pokémon (which itself is a portmanteau of pocket and monster) and index. The Japanese name is simply "Pokémon Encyclopedia" in Japanese.
In the video games, whenever a Pokémon is first captured, its data will be added to a player's Pokédex. In the anime the Pokédex is a comprehensive electronic reference encyclopedia, usually referred to in order to deliver exposition. There are four differently numbered Pokédex modes to date: the Kanto Pokedex, introduced in Pokémon Red and Blue; the Johto Pokédex, introduced in Pokémon Gold and Silver; the Hoenn Pokédex, introduced in Pokémon Ruby and Sapphire and expanded upon in Pokémon FireRed and LeafGreen; and the Sinnoh Pokédex, introduced in Pokémon Diamond and Pearl.
Is 'Semantic Web' already included in Web 2.0? Or will that be the 3.0 version?
BWAAAHAHAHAAAAHAAA
It just needs to be used right. We have had the "keywords" HTML tag for years, but it has been abused and subsequently abandoned.
If search engines would give pages (and domains) a score distributed among its keywords and other metadata, metadata spamming would soon be over.
Let's start properly using the tools we have available now, what is ignored today might be "semantic" tomorrow.
Look, I'm not interested the academic niceties of semantic searching and metadata. What I want, would would actually be useful, would be a way to separate out the low value sites from those of relevance FOR THE SEARCH I'M DOING.
If I'm looking for a review of a product, I don't want 50 shops trying to sell me one. If I'm looking for a explanation/definition of a term, I don't want a page that may mention the word, even if it does have high page rank. If I'm looking for a site that gives me lots of links to connected sites, I don't want one that thinks its an island on its own.
Stop trying to classify the small scale, focus on getting the broad scale right and on classifying the search first. Its an easier and more important question.
This sounds like yet another company doing something like Latent Semantic Indexing or some sort of context processing on the text rather than using RDF markup to decide the semantics. To me, this isn't the semantic web..just another fancy search company trying to jump on the bandwagon.
Is there a relevance to the display of ignorance in the title?
Three o'clock is always too late or too early for anything you want to do. - Jean-Paul Sartre
before google buys hakia.com and Dr. Riza C. Berkan finally becomes rich and be able to buy this jeep and his friend Dr. Rizzla will be jealous that he didn't want to give him ink for the tonner the other day?
wikipedia....cures cancer, gives the blind sight, teaches small children in Africa that 2+2=5.
Together with a friend from Caltech, I've helped create a social content network for food information which supports semantic search for food information. For example, you can go to efoodi.com and search for 'meat', 'vegetable', or 'Mediterranean' to get a glimpse of the concepts it understands. It also supports social search and tag-based browsing. These technologies are powerful and it's surprising they're not more commonplace on the web.
aal over America HOT ON THE hEELS OF
In which respect it does not contrast with the Semantic Web, which doesn't require that any more than the regular Web required every computer attached to the internet to start running a web server. Since this article wasn't about the semantic web to start with, was an inaccurate gratuitous attack on the Semantic Web necessary? (Yes, yes, it mirrors the gratuitous attack on the idea made by the author of TFA.)
Also, semantic search has a harder problem than getting people to start using metadata (which only requires demonstrating utility so that it becomes attractive to adopt), it requires developing a system to understand natural language, including understanding which of many diverse senses of a word is intended in context on a page.
Yeah, so Semantic Web requires getting some web authors to put structured information in their pages, and for that to spread as utility is demonstrated. Semantic search requires, per the author of TFA, "a system which understands both the user's query and the Web text using cognitive algorithms similar to that of the human brain, then brings results that are dead on target (right context) at first glance (not requiring to open the Web page for further investigation.)" (emphasis added)
Compared to that, the Semantic Web is easy.
Some (if not all) of the concept relation semantics needed for doing "semantic search"
or "machine comprehension" of text on the web can be gleaned by
doing statistical analysis of the relationships between words and phrases
across the entire web. Aggregating across a large corpus eliminates "noise"
in usage and draws out the semantic "signal" about how people relate the
concepts to each other.
Where are we going and why are we in a handbasket?
"There are so many ways of doing it improperly, and only one way of doing it right."
That is a fallacy - you can not know that there is 'only one way of doing it right', if you don't know what that 'right' way is to begin with - you are dealing with unknowns. In truth there are very few systems that collapse to a solution set of one. The phrase 'there is more than one way to skin a cat' comes to mind. This one statement tells me the person making the statement is more interested in controlling the method, rather than pursuing investigation for the sake of advancing our fundamental understanding. Every problem is not a nail, and every tool is not a hammer.
Given the variations evident (particularly when trying to propogate the 'one true' ontology), I see the semantic web as a utopia that is unapproachable. The reality will be some hybrid of the best ideas to come out of this research, coupled with (or layered above/below) the practicalities inherent with multiple ontologies/tagging systems, human interpretations and how to resolve/share those differences for each person. That is where the real solution set lays.
Lodragan Draoidh
The more you explain it, the more I don't understand it. - Mark Twain
"Semantic search" is actually a dumbed-down version of what, in AI, used to be called "natural language question answering systems". The first one that was sort of useful was Bobrow's "Baseball", which, unlike Eliza, actually did something useful. "Baseball" had a small database of baseball statistics, and could answer questions like "How many games did the Orioles play in June?". I'm surprised that someone doesn't have a natural language query system for sports statistics on the web today. It's not out of reach technically, because the underlying data is well-structured. Sports fans would use it.
What something like this is really doing is translating natural language to SQL. "How many games did the Orioles play in June?" translates to something like SELECT COUNT(*) FROM games.baseball WHERE (hometeam="Orioles" OR awayteam="Orioles") AND month(gamedate) = 6 AND baseballseason(gamedate) = baseballseason(NOW()); There are existing tools for this, and there have been for years.
"Semantic search" is a dumbed down version of that because it doesn't try to answer the question. It just tries to spew back material which appears to contain an answer to the question. It's like talking to a politician, sales rep, or Jesus freak. "Ask Jeeves" was about as close as we ever got in the WWW era.
The problem with semantic search is that standalone queries have to be stated with more clarity and precision than most users are likely to achieve. The original article suggested "What is palladium used for?" as a query. That's a completely different query from "What is the Palladium used for?". As a standalone query, the best answer is probably "Worship of the goddess Pallas Athene". Which is probably not what the user wanted. With location hints, one might guess that the user wanted information about some theater or nightclub named the Palladium. But that's a guess; sometimes it will be wrong.
This leads to systems that engage in dialogue with the user. Probably by asking the user multiple choice questions. That's quite feasible, but it usually just means funneling the user into some kind of "wizard"-like sequence of dialog boxes. Many sites have "product selectors" like that.
Another approach, which seems to be where Google is going, is to collect vast amounts of information about the user's previous behavior, which can be used as additional context for search requests. That's likely to help, but it makes downsides. If everybody gets a different answer when searching for something, you can't tell other people what to search for to find something. Asking the same question again, after doing other things, might get you a different answer. It's probably going to do the wrong thing some of the time. Given the model that "search is a box into which you type in what you want, more or less", that could drive users nuts.
And none of this really applies to shopping-related searches, which aren't formal queries at all.
I think that maybe the community could come up with some kind of user-generated meta data system. For example, some one could create a site similar to StumbleUpon, but have it be just a general meta-data service. So when you visit a page, if you feel like it, you can tag it with certain meta data. This could be helpful, for example, in blocking AND finding porn.
Probably something more effective would be something a little more complex than just tags. Using the porn example, you wouldn't want articles talking about porn to be blocked (if you were blocking porn) because it actually wasn't porn. So you might have a couple different categories of tags. You might even put in a rating on the content (I'm thinking along the lines of PG, PG-13, R, etc). The validity of certain meta data could be based on the frequency of the reported meta data.
Essentially, it's like a wiki-meta-data system. You could make a great search engine out of it. You could make good content control systems with it. If you made the data available through a web service, you can put the control for its user in the hands of the user. The meta-data rating software wouldn't be for the average joe, but you could motivate people to rate using systems like what's used in the google image labeler http://images.google.com/imagelabeler/. Or you could require the user to rate a page to "pay" for each search they do. People could also submit their site to be rated.
It would probably be hard to get wide participation, but it would cool if it could be done.
-br
Chapter 2007 Ingrid 7.3.01 Graphics processing is based on a linear database kernel re-engineered from Patrick Slater's psychological repertory grid subroutine of the same name. Ingrid v7.3 will hopefully lay semantic long-tail search plans to put a dynamically flexible, graphically acoustic, externally scheduled version of the RadioChomsky4pp.exe into a global grid computer. This and the instructions to get the latest Ingrid On Winamp software are ready for download now at http://ingridx.dyndns.org/download.html#download Now for my perennial request for help in identifying the subject of what I described, to a Rent-A-Spy Inc., on their web form today. I only told them of the discovery I made of the young version of a very powerful person c.1970. The subject is shouting a criminally compromising line in an obscure unnamed 16mm film. They are to assume that I'm fearing for my safety should they become involved. Cautiously, while hoping they might help, I only gave my first name (Last Name Withheld) and an unidentified prepaid Simcard number for them to text me their investigator's email contact. Knowing for sure in which film exactly where he is to find the approximate timecode of the 30 frames in question, an independent investigator will pass this info onto an IAI investigator, who's foreign associates can then retrieve the original. My point is that, knowing in advance only that this may identify a very high level alleged crime, but not yet the personage, any curious investigator can be proven to start out uncontaminated, as the evidence demands. Thus I hope to remain on the other side of a legal "Chinese Wall" from thence on. Finally to remove my evidence from my website. The inducing Ingrid software includes complete open source code under its own license. There is a discussion at comp.software about future license changes for those wanting an OP Client or to protect against nano-terrorist use of Ingrid. Such dual licenses can be introduced under the present terms of *The Strong inGridX Free Public License 1.1*, available at http://homepages.ihug.co.nz/~income/ingridx/ingrid x_free_public_license.htm
Unfortunately the quid pro quo is that Ingrid must be installed before the source code directory is created. It does, however, not need to be run to view the source, at which point we'll anyway be speaking by phone, I hope.
Best ever
Argumentum ad Probabilitum
That's because librarians are slow. Those questions can lead to saving a great deal of time off the top.
A search engine is fast, effectively providing a ton of answers in seconds or fractions of seconds. The problem then is that we are slow. We can't go through all the hits as fast as the search engine spits them out.
What would be helpful would be if the search engine clustered results as if in response to the sorts of questions our hypothetical librarian might ask. The Clusty search engine attempts this.
Palladium
Loose lips lose spit.
A search on Charlotte's semantic web turned up "SOME PIG", whose real name was Wilbur, a sweet little porker who the locals grew very fond of, especially as he brought fame (and a bit of fortune) to their little town. Thanks to Wilbur's great and true friend Charlotte, Wilbur's essence of character was boiled down to one short phrase, making the search results highly relevant and easily accessible by all of God's creatures -- including spiders, of course. E.B White's creative mind gave us a fascinating character in Charlotte whose "web services" are the kind every child (and most adults too) can appreciate and enjoy.