Super-Fast RDF Search Engine Developed
The Register is reporting that Irish researchers have developed a new high-speed RDF search engine capable of answering search queries with more than seven billion RDF statements in mere fractions of a second. "'The importance of this breakthrough cannot be overestimated,' said Professor Stefan Decker, director of DERI. 'These results enable us to create web search engines that really deliver answers instead of links. The technology also allows us to combine information from the web, for example the engine can list all partnerships of a company even if there is no single web page that lists all of them.'"
Here's the link to the official NUIG: DERI (omgwtfbbq) website in Ireland:
DERI
Except for the minor little problem of getting everyone to agree on the ontologies. Being able to search quickly is important, but until somebody comes up with the Dewey Decimal System for all knowledge, it won't mean much.
No folly is more costly than the folly of intolerant idealism. - Winston Churchill
Is "Reality Distortion Field."
Sounds about right, in this case.
for a Radio Direction Finder?
"National Security is the chief cause of national insecurity." - Celine's First Law
I need both: answers *and* links! Many times when I search the web, I don't know for sure what am I searching for, let alone being able to ask specific question...
May Peace Prevail On Earth
NO.
The first answer will be 42.
Now all we need to do is get everyone to start using RDF.... wait.. you dont even know what that is??
Having solved the problem of search, and providing a breakthrough product that has consciousness to what was previously mere series of tubes, now the National University of Ireland announced that it is going to solve world hunger next, may be in three months. Other projects in the pipeline includes cure for cancer and solving full Navier Stokes equation.
sed -e 's/Chuck Norris/Rajnikant/g' joke > fact
Yet another
Cool. It'll end war and bring universal freedom to all people.
Yes, creating a consistent ontology is challenge. But the bigger challenge is the lack of incentive for ontology truthfulness. If this type of search becomes popular, ontology spam and OSEO (Ontology Search Engine Optimization) will become a booming industry.
Two wrongs don't make a right, but three lefts do.
I didn't realize Steve Jobs' Reality Distortion Field was able to be harnessed and bottled in a search engine, or any software for that matter. His abilities are boundless!
Why would anyone want to search Steve Jobs Reality Distortion Field?
"'The importance of this breakthrough cannot be overestimated,' said Professor Stefan Decker, director of DERI."
This is without a doubt the greatest invention in the history of time!
There, I just proved the professor wrong. Muahaha.
SIG: TAKE OFF EVERY 'CAPTAIN'!!
- "The importance of this breakthrough cannot be overestimated"
The importance of any event can be overestimated and quite often is overestimated. It is called hype.
When speaking of XML, XHTML and semantic WEB then the word "overestimated" fits just nice.
If this was not the case then HTML should long have been dead and the whole WEB should have been based on pure XML with meaningful tags.
-- Do not read me, I am a stupid tag
Why assume everyone knows your acronyms. To me RDF means "Reality Distortion Field". Zeesh, 7 billion triples or whatever.
There are 10 types of people in this world, those who can count in binary and those who can't.
Since everyone's being pedantic... I notice it takes more than one fraction of a second then.
Ok, I might say something very stupid here but even the best search engine still isn't a WEB search engine but a DATABASE search engine (searching in copies/excerpts from websites previously (i.e. recently) acquired).
My question: has someone ever proposed (i.e. written down) ideas/plans/designs for a life-searchable web (altough something like that would seem impossible to me)? It might be a very interesting read however.
What kind of queries are they running? There are several different RDF query languages (think of SeRQL, RDQL, N3, SPARQL, etcetera) and some of them support quite complex queries. Quickly finding the answers to a simple query like is just a matter of an indexed lookup and not very special. But, like in SQL, much more complex expressions can be generated that require complex index operations on the query execution level. Having implemented an RDF database that supports SPARQL queries an order of magnitude faster than the software the W3C uses for their experiments (which, admitedly, doesn't have performance as a prime requirement), I know that it's possible to do simple things fast, but the interesting part is handling RDF queries that don't easily map to efficient database operations.
Which brings me to the most important point: where is their detailed report? Can I get the software somewhere and perform my own tests? The article is too vague to draw any conclusions about what their RDF database does, and how good it is. I'd love to read up on it, but I can't seem to find the information.
Colonel Sandurz: Prepare ship for light speed. Dark Helmet: No, no, no. Light speed is too slow. Colonel Sandurz: Light speed is too slow? Dark Helmet: Yes. We're gonna have go right to... SUPER speed. [everybody gasps] Colonel Sandurz: SUPER speed? Sir, we've never gone that fast before. I do'nt know if this ship can take it. Dark Helmet: What's the matter Colonel Sandurz? Chicken? Colonel Sandurz: [Wimpering] Prepair ship! [Calms down] Colonel Sandurz: Prepare ship, for Ludicrous speed. Fasten all seat belts. [everybody fastens in their seat belts and locks all of the doors] Colonel Sandurz: Seal all entrances and exits. Lock all stores in the mall. Cancel the 3-ring circus. Secure all animals in the zoo... Dark Helmet: [Takes the intercom from Sandurz] Gimme that, you petty excuse for an officer! [speaks into the intercom as Sandurz puts on his seat belt] Dark Helmet: Now hear this, Ludicrous speed... Colonel Sandurz: [Interrupts] Sir, you better buckle up. Dark Helmet: [to Sandurz] Ah, buckle this. [Into the intercom] Dark Helmet: SUPER speed, go!
if those RDF statements are tiny and basically pointless, then searching 1 billion entries isn't all that hard. Especially if they are properly indexed. Most RDF engines suck ass at the moment. If they implement an efficient bitmap index for the RDF statements, the query times for complex n-dimensional queries should be basically constant. W3C's specs for semantic web suck ass and their approach is totally impractical.
Hello, I am one of the main developers of SWSE. True, the press release is vague, but there is only so much you can say in a press release aimed for the general public.
0 7-04-20.pdf that should answer most of the technical questions.
We have a Technical Report available at http://www.deri.ie/fileadmin/documents/DERI-TR-20
From the abstract:
"We present the architecture of an end-to-end search engine that uses a graph data model to enable interactive query answering over structured and interlinked data collected from many disparate sources on the Web.
In particular, we study distributed indexing methods for graph-structured data and parallel query evaluation methods on a cluster of computers.
We evaluate the system on a dataset with 430 million statements collected from the Web, and provide scale-up experiments on 7 billion synthetically generated statements."
First, giving the amount of time and the number of items searched means nothing. Are they doing it on a BlueGene or an Apple II?
Second, the problem with "the semantic web" if you're relying on people providing the metadata themselves, is the reliability (trustworthiness?) of the person creating the metadata. There's a reason the meta name="keywords" tags aren't a significant factor if at all in any of the major search engines' ranking systems.
500GB of disk, 5TB of transfer, $5.95/mo
Don't put slashdot to trouble with such messages
Of course a search based on meta data is going to be faster and more accurate, but only when the meta data is correct. We've had this since the beginning of the interweb; people would load up their pages with bogus meta data just to generate search traffic. Because of this dishonesty, search engines have had to resort to other methods of evaluating and indexing pages (for example, based on actual content).
I don't see any difference between this new RDF and that old stuff.
So now we have a search engine capeable of making a godzillion searches in a data domain that does not exist yet. That's all great and dandy, and we do indeed need new models and architectures for search engines once (if) the web goes all semantic. However, when (if) the semantic web ever becomes a reality, this search engine will long be retired. So, this result is great from a research point of view, but don't expect it to leave the lab.
As one of the developers on the project (along with user aharth), feel free to ask any specific questions you may have here. The article is quite vague and so I refer you to a technical report at http://www.deri.ie/fileadmin/documents/DERI-TR-200 7-04-20.pdf/.
but why would I want to search several million statements from the Robotech Defense Force? I mean, sure I'm an Anime nerd, but there are limits...
Hi! I make Firefox Plug-ins. Check 'em out @ https://addons.mozilla.org/en-US/firefox/addon/youtube-mp3-podcaster/
Top o'the mornin' laddy! We've got this crackpot idea that doesn't work in real-world scenarios, but you see, we're out of Guinness and me welfare checks be runnin' dry. How's about a nice research grant to refill me beer fridge ?
Tally-ho!
-Billco, Fnarg.com
... for obvious reasons.
One of misconceptions about the Semantic Web - that it's only about metadata when in fact it's about a Web of Data, e.g., currently locked in in databases, blog engines or social software sites. (related: SemWeb FAQ entry on "Does the Semantic Web require me to manually markup all the existing web-pages ... ?")
A very, very simple example - if you enable creation of RDF data creation in a WordPress weblog (via a WordPress SIOC plugin), all this information is generated automatically, from the data already inside a database. What you get is every blog post, etc. in a machine-readable form (RDF), ready for query and reuse.
Of course, that is very "light" semantics - expressing what the blog engine knows. As for data / structured content created by people directly - there's always risk for someone writing lies. Then there's a need for the concept of trust (can we trust the source?) and some ranking mechanism.
Give me a break! Scientists produce a technology that has the potential to change the face of the web and all they get is the usual, automated Slashdot moaning.
Listen, we're not just talking about a technology for searching cookie recipies. We're talking about a technology that will match your favorite ingredients with a recipe, find pastry chefs in your area who knows the recipe, weed out the ones that doesn't meet your quality standards, and, on your command, order the ingredients at the lowest available price, have them sent to the chef of your choice along with the payment for your order.
So, you just take a look around on Web 2.0 while eating your dry, factory produced, hard-as-a-rock cookies. I'll be enjoying my customized inexpensive home-baked quality pastries and getting real answers at the fraction of a second on the Semantic Web in the meantime.
Hey, i created RDF (Reality Distorsion Field)
Cease or Desist!
Signed,
Steve Jobs
"RDF is just a way to express knowledge. In answer to "any difference between this new RDF and ..." you may take a look at the W3C Semantic Web FAQ (published very recently)."
Mozilla uses RDF under the hood.
Fuzzy Sets
Web 3.0 is near... Seriously - what does this development give us?
2 all: remove the ending slash '/' from the URL above, it will work then.
Correct link: http://www.deri.ie/fileadmin/documents/DERI-TR-200 7-04-20.pdf
I just read the basics of RDF and I can see that this could be a really really bad idea. If RDF is intended as an internal data representation for a search engine company to use then this is great. The search engine company or your own company's search engine staff can police and audit your RDF data. However, if I'm reading this right RDF is *supposed* to be populated by *volunteered* data. As such you're going to suffer not just the Wikipedia effect but all the problems seen in MetaData from an internet generation ago.
You'll see RDF associations linking the president to a crass picture of a donkey or a goat of some kind. You'll see companies set up to deliberately poison RDF data with false links designed to drive traffic to a site... you'll see sock-puppets and all kinds of other attacks.
This whole effort reminds me of the "this is spam" bit that was proposed to stop spam. You can't expect spammers to say to themselves, "wait, I better flip the this-is-spam but to true before I send this" you also can't expect people to not abuse the RDF system in similar ways.
Don't expect that if you RDF search for Stephen King that everything that comes up was actually posted by him. Imagine the pages that would get attributed to the president or Mr. T as a prank... the information would only be useful if you could verify the document as legitimate first.
The "is part of" feature is the most likely target of abuse I think. I could say that everything I wrote is part of the New York Times or as part of some official document that gets searched for often. The result would be erroneous hits in RDF search and artificial authority for my crack pot theories.
[signature]
Job's abilities are bounded by the little pinky of Chuck Norris.
You can't handle the truth.
Hard to know if this article is worth reading or not when the summary doesn't even tell me WHAT RDF IS!! Criminy.
Comment of the year
Yeah, I was wondering why they were so pleased about searching Steve Jobs' keynote addresses (and where they found 7 billion of them too).
Why would anyone engrave "Elbereth"?
just what the man needs, a good search engine for his mesmerizing skills, im leaving california now.
put this request into google
site:www.deri.ie technical report 2007 4 20
Liberty freedom are no1, not dicks in suits.
Hello Andreas,
Does Yars2 provide any interoperability with Jena (or Sesame)?
For example I would like to create a Jena Model that has Yars2 as underlying storage, due to the fact that I can use over this model API from an ontology model to reasoners designed for Jena.
Is Yars2 available for download somewhere, or it is just the same Yars svn repository?
I agree. In my estimation, this could well foretell the cure to AIDS, cancer, world hunger, war, and genital warts.
No comment.
From the paper: The use-case scenario is to find mutual acquaintances between two people. More specifically, the query is as follows: give me a list of people known to both Tim Berners-Lee and Dave Beckett. If that's really all there is to it then there's nothing new here other than some acronyms. It may be interesting research but apart from the RDF syntax there is nothing new here worthty of the PR hype.
r sion4/Documentation/mel/index.html have been able to do that kind of thing blindingly fast for many years. Once you store everything as a BAT (binary association table), 20-way joins that would choke an ordinary RDBMS become easy.
Fast queries over 7 billion triples may sound like a breakthrough but systems like http://monetdb.cwi.nl/projects/monetdb/MonetDB/Ve
The weakness of MonetDB-like systems is generally that they don't do well in the face of OLTP style transactional update patterns. They are optimized for loading data once and reading many times, not dealing with a steady stream of queries and updates, and typically you have no choice but to resort to http://infolab.stanford.edu/~backrub/google.html style parallel index construction, which is most of what the technical report goes into.