The Web of Data, Beyond What Google and Yahoo Show

← Back to Stories (view on slashdot.org)

The Web of Data, Beyond What Google and Yahoo Show

Posted by timothy on Sunday July 26, 2009 @11:00AM from the thought-symantic-was-just-some-company dept.

jccq writes "Both Google and Yahoo have been supporting Semantic Web markup (RDFa, RDF and Microformats) for weeks and months respectively. What they do, at the moment, is use the markup only for visual feedback by returning better looking, more functional 'page snippets.' But how would it look if you could get all these bits and compose them automatically to form a single structured information page about what you're searching for? The folks at the DERI institute have just released Sig.ma, a visual browser and mashup generator that will go all over the web of data and find dozens of sources to combine together when answering a user query. It also comes in API mode to reuse the information Sig.ma finds inside applications. Here are a screencast and a blog post, with semantic-web-geek details."

50 comments

Min score:

Reason:

Sort:

I for one by Anonymous Coward · 2009-07-26 11:04 · Score: 0

I for one wouldn't want to. I'd much rather search and find a good site on the topic myself.
as someone who was involved with them by ionix5891 · 2009-07-26 11:15 · Score: 4, Interesting

and studied at nearby uni,
DERI is a money blackhole, most of the people there know that semantic web has many many issues and probably will never bear fruit, but chose not to speak up in order not to damage their academic careers and keep their cushy "research" positions
1. Re:as someone who was involved with them by Sique · 2009-07-26 11:29 · Score: 1
  
  Hey, but going sledge riding in the Alps with some of the women from DERI was nice though.
  
  --
  .sig: Sique *sigh*
2. Re:as someone who was involved with them by jccq · 2009-07-26 22:15 · Score: 1
  
  The semantic web has many issues but these are not denied
  quite evidently however there is people who put a lot of work into this in finding the right balance between technology and socially sustainable models..
  but go ahead just diss everybody :-) why not, i mean.
3. Re:as someone who was involved with them by commodore64_love · 2009-07-27 00:22 · Score: 1
  
  The summary calls this a "visual browser" but I don't see any downloadable browser programs??? All I see is a *search engine*. Oh well. I guess that's to be expected in a world where people think Google/Yahoo are browsers.
  I typed in my name to this Sig.Ma search engine, and it turned-up virtually nothing. So yes, I'd say this approach has serious problems. Using my name in Google turns-up all kinds of dirt... er, information about myself. I'll stick with google.
  
  --
  "I disapprove of what you say, but I will defend to the death your right to say it." - historian Evelyn Beatrice Hall
4. Re:as someone who was involved with them by commodore64_love · 2009-07-27 01:06 · Score: 1
  
  P.S.
  Anyone have suggestions on how I can remove the "dirt" off my self-search google results? I've deleted some of the original messages from the 1980s and 90s, but for some reason they keep hanging around in archives.
  Could I claim "copyright" over my own words, and issue a DMCA takedown notice? Hmmm.
  
  --
  "I disapprove of what you say, but I will defend to the death your right to say it." - historian Evelyn Beatrice Hall
5. Re:as someone who was involved with them by MarkWatson · 2009-07-27 02:38 · Score: 1
  
  DERI has produced some good stuff: I particularly like D2R which is a wrapper providing a SPARQL endpoint around a relational database. Both cool and useful (if you know how to use SPARQL queries).
6. Re:as someone who was involved with them by TranscenDev · 2009-07-27 05:39 · Score: 1
  
  i was not involved with them, but I agree with waht you're saying!
  ~Ami
  Chicago Web Design
7. Re:as someone who was involved with them by Anonymous Coward · 2009-07-27 10:25 · Score: 0
  
  Well, seems you don't talk about the institute today, but that of several years ago. The ideas of the Web of Data have - as that work - matured since then and are being deployed... even google and yahoo index RDF now, so wake up and rethink "will never bear fruit". Anyways...
8. Re:as someone who was involved with them by Anonymous Coward · 2009-08-04 21:14 · Score: 0
  
  DERI in Galway, Ireland (where Sig.ma originated) has produced a lot of good stuff with dedicated people.
  There used another institute called DERI in Innsbruck, Austria.
  Which one are you talking about?
  You should get your facts right!
Copyright law completely unfit for this by Anonymous Coward · 2009-07-26 11:23 · Score: 0

Facts aren't copyrighted. At what point does a result of combining facts become a copyrighted document?
Fixed for you... by ghostis · 2009-07-26 11:24 · Score: 2, Funny

The folks at the DERI institute used to have Sig.ma, a visual browser and mashup generator that will go all over the web of data and find dozens of sources to combine together when answering a user query.

--

Computer Science is all about trying to find the right wrench to bang in the right screw. -T.Cumbo?
1. Re:Fixed for you... by ghostis · 2009-07-26 11:26 · Score: 2, Interesting
  
  As I wrote the above I realized that "used to {verb}" is a really odd idiom. Can anyone explain?
  
  --
  
  Computer Science is all about trying to find the right wrench to bang in the right screw. -T.Cumbo?
2. Re:Fixed for you... by djfuq · 2009-07-26 11:30 · Score: 1, Informative
  
  I used to smoke - now I'm smoking
  
  --
  Dj fuQ [url="http://djfuq.org"]djfuq urges you to listen to the beats[/url] [url="http://djfuq.org"]http://djfuq.org[
3. Re:Fixed for you... by ghostis · 2009-07-26 12:38 · Score: 1
  
  I was looking for the origin, actually... :-/
  
  --
  
  Computer Science is all about trying to find the right wrench to bang in the right screw. -T.Cumbo?
4. Re:Fixed for you... by GigsVT · 2009-07-26 12:58 · Score: 4, Informative
  
  I was looking for the origin, actually... :-/
  http://www.englishpage.com/verbpage/usedto.html
  
  "Used to" expresses the idea that something was an old habit that stopped in the past. It indicates that something was often repeated in the past, but it is not usually done now.
  I wonder how you could ever tell a semantic search engine that you wanted the history of the idiom itself. Google picked it right up though, just had to search for "used to" quoted.
  Semantic intelligence in the form of incoming links is pretty damned powerful, anyway.
  
  --
  I've had enough abrasive sigs. Kittens are cute and fuzzy.
5. Re:Fixed for you... by Anonymous Coward · 2009-07-26 13:23 · Score: 0
  
  "Hey, bitch. I want the history of the idiom "used to". NOW!"
6. Re:Fixed for you... by brusk · 2009-07-26 13:23 · Score: 2, Informative
  
  With Google, you can search for "define:$word", which looks in dictionaries. Not perfect but for this kind of task it's helpful.
  
  --
  .sig withheld by request
7. Re:Fixed for you... by CarpetShark · 2009-07-26 21:36 · Score: 1
  
  I wonder how you could ever tell a semantic search engine that you wanted the history of the idiom itself.
  Probably much more easily than with Google. If you want to look up the etymology of "used to", a query like:
  
  "used to" etymology ?
  And it would complete your "sentence" by finding the value of "?".
8. Re:Fixed for you... by logixoul · 2009-07-28 03:15 · Score: 1
  
  "I *used to* X" - "In the past I *was used to* doing X" - "In the past I had the habit of doing X".
Markup by jefu · 2009-07-26 11:29 · Score: 3, Informative

RDF is nice and there are various different syntaxes for it (including various triples formats), and promises, if it can be built, deployed and trusted(!!!) to make the web ever so much more searchable. This will depend though on people writing good ontologies (not easy) and using them correctly (even less easy).
RDFa and microformats look, on the surface at least, to be nice ways to manage RDF type information in HTML. But I'm a bit more dubious - they don't, in many cases, have careful ontologies built around them - when they do (RDFa, mostly) they seem to be very resource intensive (a heavily RDFa annotated HTML page is likely to balloon to several times the same page without RDFa), and the uses of them I've seen have been less than convincingly correct. This doesn't mean that they're useless, just that they're not doing the job at the moment, or they're doing the job poorly.
The solution that seems to be favored by the semantic web types is to present RDF pages as an alternative to HTML pages when RDF is requested. This looks, by far, to be the best way to work this, but does require site builders (and CMSs and web frameworks), and content authors, to be able to build correct RDF pages that represent the information presented, often at the same time as they present HTML pages to human readers (and non-RDF search engines). This is going to be a major problem.
1. Re:Markup by QuantumG · 2009-07-26 13:28 · Score: 1
  
  Why is it that every time someone mentions Ontologies I can't help but think of pseudoscience.. and that typically makes me think of scientology. Oh, that's right, because its all bullshit. Ontological classification is completely arbitrary.. and typically only helpful when it is specifically tailored to a particular application.
  
  --
  How we know is more important than what we know.
2. Re:Markup by Onymous+Coward · 2009-07-26 20:05 · Score: 1
  
  "Foo is bullshit. Foo is completely arbitrary. ... Uh, but Foo is useful when done a certain way..."
  While I get the gist of your comment (assuming you don't actually have a self-contradictingly caricaturized model in your head), it seems to me you could have put it more clearly.
ah yes, semantic web via RDF is the future by Trepidity · 2009-07-26 11:32 · Score: 2, Informative

It was the future in 2001; inspired the masses with its vision of the glorious future in 2003; and of course we are presumably right on the cusp of this golden future today.

--
10 PRINT CHR$(205.5+RND(1)); : GOTO 10
1. Re:ah yes, semantic web via RDF is the future by aharth · 2009-07-26 21:20 · Score: 1
  
  The field has come a long way since 2001 or 2003.
  
  The main obstacle to "this golden future" so far has been an insufficient amount of data published online. Many organisations sit on their data like hens sit on their eggs, and publishing data right requires some effort.
  
  That's slowly changing, especially with more openness and transparency -- voluntarily or forced -- in all kinds of organisations and agencies (data.un.org, data.gov, data.gov.uk... ), more people getting the idea of open data, and the establishing of simplified best practices on how to publish data on the web following the Linked Data paradigm.
  
  It's about time that Yahoo and Google finally start to take note and add open data to their systems (which don't exploit the full power of these technologies but hey you've got to start somewhere).
2. Re:ah yes, semantic web via RDF is the future by Anonymous Coward · 2009-07-27 04:50 · Score: 0
  
  No, I think the whole point is that it's the present instead of the future, now. This markup was pointless because no one would read it. But if Google and Yahoo's robots are actually reading and using this shit, then I'll write it, because in the end, we'll all search engine whores.
Cat got my tongue by WiFiBro · 2009-07-26 11:33 · Score: 3, Interesting

I don't know why but their presentation pisses me off beyond reason.
Probably because it's the n-th time somebody is trying to impose some silly standard.
And pretends it's the best invention since you-know-what.
I have in real life a fairly common name, there's at least 10 of me worldwide, I recognized that they deliberately picked a unique name to show how well it works.
Ach we'll see.
1. Re:Cat got my tongue by Anonymous Coward · 2009-07-26 12:03 · Score: 1, Interesting
  
  I don't know why but their presentation pisses me off beyond reason.
  Probably because it's the n-th time somebody is trying to impose some silly standard.
  And pretends it's the best invention since you-know-what.
  I have in real life a fairly common name, there's at least 10 of me worldwide, I recognized that they deliberately picked a unique name to show how well it works.
  Ach we'll see.
  It seems trivial to add a city to go with your name and narrow it down.
2. Re:Cat got my tongue by derGoldstein · 2009-07-26 12:23 · Score: 3, Informative
  
  I managed to try it out while it was posted on the firehose, and the very initial impression was good. Gradually, however, I noticed that it was just dumping data on my lap, and left it up to me to sort it out. It reminded me a bit of Wolfram Alpha, except half of the information was wrong (and if I gave it names, most of the information was wrong).
  Even within the presentation, they point out the flaw of having to sift through the mess and pick out the irrelevant information.
  I don't think it's useless, I mean it does provide you with many links that you'd normally not get on other search engines, at least when you enter something unique as a query. But as far as actually placing relevant information in brackets (location:... history:... personal-information:...), it doesn't do a very good job.
  Also, if something is truly unique, you'll get a better result in wikipedia anyway (in terms of how its arranged, anyway). And if you want more accurate info dumps, Wolfram Alpha currently does it better.
  
  --
  Entomologically speaking, the spider is not a bug, it's a feature.
3. Re:Cat got my tongue by Hurricane78 · 2009-07-26 12:30 · Score: 1
  
  Have you actually ever looked into the idea, or do you like to just read a summary, and then rant about how silly it is?
  Semantic data structures (ontologies) are most likely the ultimate way to structure data. If you think that the table is the advancement of the list. And the tree is one step further. Then the next step, that contains it all, are graphs of semantically structured data.
  Tagging stories on /. is a simplified version of it. File systems with soft-/hardlinks are another. And ultimately, I can't think of a better bridge between humans an computers.
  I really hope we will use it for file systems, wikis, documents, databases, and everything on the web and on computers. I personally developed a small lib for an internal "file manager" that used this structure. And I miss the ability to combine it with the rest of the world every single time I use it.
  I don't care who defines a general standard for it. As long as it's standardized and in broad use as fast as possible.
  
  --
  Any sufficiently advanced intelligence is indistinguishable from stupidity.
4. Re:Cat got my tongue by Jane+Q.+Public · 2009-07-26 12:47 · Score: 1
  
  Sure... once somebody gets it to actually work!
  
  And this is yet another example. I entered in a name that I know to be unique, and know also to have hundreds of listings in Google, for example, and many other sources. Yet Sig.ma came up with exactly nothing, after 5 minutes of grinding away.
  
  FAIL.
5. Re:Cat got my tongue by GigsVT · 2009-07-26 12:54 · Score: 1
  
  Maybe because web devs struggle more than necessary with simple things like maintainability while the W3C tilts at windmills and lets semantic web nerds run the show?
  
  --
  I've had enough abrasive sigs. Kittens are cute and fuzzy.
6. Re:Cat got my tongue by CarpetShark · 2009-07-26 21:56 · Score: 1
  
  I don't know why but their presentation pisses me off beyond reason.
  Because you're unreasonable? ;)
  Personally, I think it's pretty great. There have been lots of attempts at a semantic search engine, but this one looks usable. Unless it does very unpleasant things, it's going to be my default search engine from now on. The semantic web has been a long time coming, but for me, it's finally arrived.
7. Re:Cat got my tongue by WiFiBro · 2009-07-26 22:12 · Score: 1
  
  "Have you actually ever looked into the idea, or do you like to just read a summary, and then rant about how silly it is?"
  I've RTFA and watched the presentation. It is the presentation I have the biggest problem with, too "we'll change the web experience" and pretending it is working while it is clearly in it's infancy when you test it.
8. Re:Cat got my tongue by WiFiBro · 2009-07-26 22:13 · Score: 1
  
  "Because you're unreasonable? ;)"
  ouch! got me....
9. Re:Cat got my tongue by jccq · 2009-07-26 22:19 · Score: 1
  
  You're right, name disambiguation is not covered by this release.
  The truth is that thanks to the semantic descriptions it will be more and more possible to do disambiguation is a smarter, more precise way e.g. using any other property you might put in any of your online presence files, e.g. homepages, work , interests etc.
  it just takes work :-) a disambiguating sigma is expected by december.
  Cheers.
  p.s. we're not imposing any standard really.. Google and Yahoo ARE supporting RDF, RDFa and Microformats, and peopele ARE putting them on their pages. we only show how you can recombine them.
10. Re:Cat got my tongue by ashtophoenix · 2009-07-27 03:10 · Score: 1
  
  I tried, "Barack Obama" and my own name. None showed any results. Yes I understand its because there are probably no RDF/RDFa formatted pages with these names yet. But as of now I'm not sure what to do with Sig.Ma
  
  --
  Life is about being a Phoenix!
On the other hand... by xactuary · 2009-07-26 11:35 · Score: 0

Now that you've been slashdotted, I'm wondering how would it look if you could get all these bits and compose them automatically to form your home page?

--
Say hello to my little sig.
Goodbye Google by Anonymous Coward · 2009-07-26 11:42 · Score: 0

It was fun while it lasted.
1. Re:Goodbye Google by CarpetShark · 2009-07-26 22:02 · Score: 1
  
  This was exactly my thought watching the video. I'd love to make this my default search engine. However, it doesn't seem to be QUITE there yet. For example, if I search for my name, it throws in people with the same forename, but different surnames. The semantic stuff they've done beyond normal search is GREAT, but they seem to have slacked off a bit on making the plain old search stuff work well enough. Shame, as it would have easily been the successor to google for me otherwise.
  Still, I'm definitely going to play with this more, and see if I can get enough mileage out of it. It's not like google doesn't mix in things you didn't ask for I suppose, and part of the fun of the web is finding things you never expected.
My name is Chang Lee, thats "somewhat" common. by Phizzle · 2009-07-26 12:56 · Score: 1

My name brought the sig.ma server to its knees.

--
I will not be pushed, filed, stamped, indexed, briefed, debriefed or numbered. My life is my own.
Searching by popularity is good, but... by Waccoon · 2009-07-26 13:38 · Score: 1

Some people have already suggested that common names will cause problems with this system. The next big thing should be searching by context. I hate searching for "supernova" only to get a long list of songs by some band. The keyword "space" or "star" helps, but that usually results in other false hits, too. Don't even get me started on acronyms, or things that don't have anything to do with computer technology.
Would there be any way for a search engine to examine a whole bunch of keywords and content in a page, and learn the difference between the context of music and astronomy? That would be a big help.
1. Re:Searching by popularity is good, but... by Anonymous Coward · 2009-07-26 20:32 · Score: 0
  
  yeah...by typing either "music" or "astronomy". How hard is that?
I made a semantic engine once by Anonymous Coward · 2009-07-26 13:55 · Score: 0

I worked on a semantic engine out of Redmond, WA (not for MS) last year, and I'll say this: Its only marginally more difficult to put together than a keyword based search engine for the base components, however it is exceptionally time consuming to make usable (think of a 3d engine, for every remaining 10% its just as much work as everything before it) and theres one fatal flaw that killed our project, and will likely continue to kill any level of semantic search: with every page indexed computation time doubles when measured per page. This is unavoidable, because to have good results you must cross index every last thing, now you can do some neat things on groups sets of indicies by keywords to cross reference in a more logical manner, but just in the way people write theres no way your going to get 100% of the data glitch free, because you still have to interpret things well enough to pick out the keywords before you can even cross reference, and the most advancement you get toward one of those 50% jumps is by applying a feedback off a known good datasource that is at least 90% accurate, which you can then key your indicies off of (not in terms of keywords, but semantic footprints). There are so many better ways to search if people just sort their data, at least until we have some sort of AI engine that can actually interpret it (yes, I am sure when it happens it will be quite complex and processor intensive, though likely FAR FAR FAR less so than semantic search on the web in the sense the phrase semantic search is defined - though you would still get the same result).
Sig.ma ok interface, fails on practice by physburn · 2009-07-26 16:45 · Score: 1

it Dies on common names and website, seems to find the wrong names most of the time. Its main info source is dbpedia, which is a ad hoc, system for turning wikipedia entries in database items, (since wikipedia isn't very semantic the dbpedia has to do some guessing). Maybe Sig.ma will get usable someday, it isn't now.
---
AI Feed @ Feed Distiller
1. Re:Sig.ma ok interface, fails on practice by jccq · 2009-07-26 22:24 · Score: 1
  
  Thanks, i tend to agree with you myself (the poster).
  This is still a demonstrator.. the idea is show that this is possible and that putting markup on your pages is useful becouse eventually there will be sigma 2, 3 (or whoever else), the S/N ratio will increase and it will be possible to reuse it with one simple HTTP call to make any SAAS software (or any software really) do cool things automatically.
  Giovanni
Gibberish by techno-vampire · 2009-07-26 18:23 · Score: 1

I gave it my first and last name, and it came up with nothing. Then, I gave it my complete name. It took my middle name (David) added "Baltimore" as a random last name and gave me facts about somebody named David Baltimore. Absolute, utter, meaningless gibberish. I am not impressed.

--
Good, inexpensive web hosting
Combine together... by Anonymous Coward · 2009-07-26 19:39 · Score: 0

Combined together is better than combined apart I suppose.
Bingo! by Wee · 2009-07-26 20:53 · Score: 1

I have a slightly outdated buzzword bingo card, but I think I have a winner even still. So, hold your cards.

-B

--
Ash and Hickory, straight-grained and true, make excellent bludgeons, dandy for the cudgeling of vegetarians.
worst. website. ever. by rossjp · 2009-07-27 11:08 · Score: 1

the frackin thing doesnt even work. and when you hit the 'Contact' button to yell at the creators of the site for sucking, all you get is an Apache error. Smoooooooooooth.