Using the Semantic Web to Enhance Search
RobMcCool writes "At Stanford KSL, we really like the Semantic Web. So we've taken many of our favorite web sites, scraped them, and put together a huge pile of RDF, which we'll let you download. We've used that RDF to create a search application, in the spirit of Google Q & A or Microsofts recently announced MSN Search extensions. Our search can answer simple factual queries like the previously discussed population of Portugal but can also answer some more complex ones. We also have a smart autocomplete system, type "tom hanks birth" slowly to see it in action (best with Firefox). We're looking for people to be a part of this search system by running their own search sites, and by putting their data on the Semantic Web. Come check it out!"
Semantic-driven search engines have awesome potential. However, it does place a lot of demand on the content provider to provide metadata-rich content - or to be able to provide intelligent mining tools to create metadata from existing sites.
This is definitely one to watch...
That's nice and all but who shot first and is there a mash up of both scenes with crazy alien bar music mixed with 20's sinister piano.
Autocomplete is a useless feature that nobody wants to see when the type "a"...and see it load everything that beings with "a". The user is not interested in items starting with "a". Perhas they're interested in terms beging with "anon" or something, which has many fewer items to load, therefore making the load time much faster and not annoying the user in the process.
Or, even better, never have any autocomplete turned on automatically. Do a VB-like idea, where if you want to see possibilities at a certain point, hit a specific key that will register for the list to pop down.
The Stanford research is interesting, but I'm still trying to make up my mind about the Semantic Web, learning about RDF, and whether I need to bake in ways of handling these kinds of assertions in my web app. The Stanford group writes, "Our hope is that our search application spurs development of the Semantic Web, and leads to sites publishing their data in this format so that we don't have to." It obviously takes more work to encode such information and getting user contributions auto-marked for the semantic web. For a counter viewpoint, take a look at some of Clay Shirky's work -- in particular:
Will the semantic web be supported by future versions of Drupal, phpBB, and other grass-roots content management web apps? Not sure. Since a lot of the content is visitor generated, you would have to build in ways of providing easy markup. Would be interested to hear /. thoughts on the matter.
No, 'works best with Firefox' is just as bad as 'works best with IE'. What would be nice would be to see 'works best with any standards compliant browser'.
Do not try to read the dupe, thats impossible. Instead, only try to realize the truth
What truth?
There is no dupe
faster than a thousand speeding gazelles
$ strings FTP.EXE | grep Copyright
@(#) Copyright (c) 1983 The Regents of the University of California.
While the idea of the semantic web has been legitimately lambasted, I think it's a bit far from DOA. While I agree that it's not exactly practical, I think that if you get enough sites displaying their content in such a manner, you'll eventually reach a point at which others will do the same.
I mean, think about it this way - while laziness or inertia might initially win out, once someone's competitors start to explore the idea of the semantic web, interest will start to be shown in it, especially once it becomes either profitable to do so.
concrete5: a cms made for marketing, but strong enough for geeks.
Secondly, scraping doesn't always work and you will surely have low-grade porno and get rick quick schemes/scams littering your sematic data.
But let us suppose that the main benefits of a semantic web are (A) access to reference data [which may be falsified, oops], and (B) access to product availability data [which may be falsified, oops, like mail order companies that pretend they have something in stock but don't and yet still charge your credit card].
It's just won't work.
It will always be a rough approximation of reality.
It's just a way of bad way of caching the results of scraping.
The Semantic Web appears to be a budding server-side solution to the paradigm of information glut online. Social bookmarking appears to be a client-side solution to the paradigm of information glut online.
It is refreshing to see exciting new solutions to the problems we have at present of targeted information retrieval on the internet. I can remember years of stagnation in this field (read: early 90's), and any change from today's google-and-pray searching mentality among the majority of end-users will be welcome.
The Crimson Dragon
...now I can finally search for "images of women with breasts larger than 36D"!
This looks like it will broaden the volume of useful searches. Right now, there are at least two limits that show up when searching:
1. For really popular subjects, the useful links are swamped in the noise of sites trying to make a buck off of getting you to look at their ads before directing you to somewhere else, that might have the actual content or might not.
2. For many less popular subjects, there is some oddity, like an unusual term being borrowed by some other field, so that it is something most people have never heard of, but people in two or more specialties use it frequently, in very different ways. resulting in strangeness. (i.e. the search engine throws up 23,003 links for a search on "Sator Resartus". 30% are esoteric literary criticism, 20% relate to apoptosis (cell biology), 20% relate to building moral inhibitions into A.I., 10% to Keith Laumer novels, and the rest are probably noise).
(I'm sure there are more than these two limits. Someone else may want to comment on some others).
This is likely to help with the second case, oddities in the data set grouping. (it could sort links into the larger sub-categories, query the user which one(s) seemed most applicable, and maybe even sort out a small set of links that explain, for the previous example, how a high brow literary term got borrowed by the other fields).
It's not as likely it would help with the first case, though, as sites that don't have actual content are actively duplicitous. Something that is actively trying to fool humans is still likely to be very successful at fooling our tools.
Who is John Cabal?
Does it have a countermeasure against 'semantic spam'?
That second link goes to http://www.google.com/url?sa=U&start=1&q=http://ww w.w3.org/2001/sw/&e=9707
How is that different to linking to http://www.w3.org/2001/sw/?
Is Slashdot trying to improve someone Google ranking?
(Also, did Slashdot always linkify URLs entered as plaintext? I didn't write any "a href" for those two.)
# cat
Damn, my RAM is full of llamas.
Nice straw man argument. How many people making their own personal site is going to dedicate 2/3 of their time to tag their content? The only people that are going to tag their content are those looking to abuse the system. No sane individual is going to spent 3 months of time to go back and edit all their pages with tags. Even then, you still have the problem of conflicting categories (aka ontologies). There will never be a globally accepted set of Onotologies. It's all pipe dream. Why should users spend hours and hours to tag their site when google is already doing a good job of indexing pages?
...not only what the Semantic Web is about, but more pragmatically why this is in "Hardware." :)
Mit der Dummheit kämpfen Götter selbst vergebens.
The Semantic Web is about describing resources, not tagging pages.
Indeed, you might output RDF from your processing of Web pages.
Extracting information from semi-structured text is very different to making logical assertions about resources.
if it would mean that their sites would rank higher in the search results, I'd say that they all would...
Isn't this basically what HTML is supposed to do kind of?