Slashdot Mirror


WebQL Turns the Web Into A Giant Database

An anonymous reader says " This article was posted on ZDNet by Bill Machrone on a new type of query language for aggregating information from the Web." Somewhat light on the details, but definitely something to think about.

7 of 84 comments (clear)

  1. Re:Ingenius! by Shoeboy · · Score: 3

    ROTFLMAO!!!!!!!
    Good god! An "Al Gore invented the internet" joke is combinied with a "stupid patent idea" joke! The originality of the average slashbot never ceases to amaze me! You should send some of your jokes to illiad so he can put them in user friendly!
    --Shoeboy

  2. Yeah, okay. by 1010011010 · · Score: 4

    How will this be different from Google's back-end query interface? I ask, because I can't imagine someone making a "screen-scraping" search engine that returns bits of data and not just a link. They will probably get sued by the owners of the purloined content. Plus, parsing HTML to extract one little field of data is tricky, and highly dependant on the layout of the page. I've written a number of things to do just that, from Amazon, IMDB, Borders, finance.Yahoo.com, etc., for my own purposes. I wrote them in both C and Perl. It's a job keeping the filters updated to accomodate the changes in page layout style, regardless of language. Good luck to them and all, but until we have an XML + XSL web, with standard DTDs for the XML, forget it.


    ________________________________________

    --
    Napster-to-go says "Fill and refill your compatible MP3 player", which is a lie. It's not MP3. It's WMA with DRM.
  3. just a pretty interface by _|()|\| · · Score: 4
    I grab the html and apply various forms of on the fly text processing

    I downloaded the WebQL Business Edition manual. Here's an abbreviated version of the first example query:

    select
    text("(\(206\)\s+\d{3}-\d{4})","","","T")
    from
    http://foo/bar.html
    where
    approach=sequence("1","10","1","XX")
    The select clause accepts a variety of functions, of which text() seems to be the most useful. You can see that the first argument is a regex designed to match phone numbers. The from clause is an URL. The where clause primarily takes the approach "descriptor," which can crawl or guess new URLs.

    So basically, it doesn't do anything a Perl script can't. It just presents a simpler interface.

  4. Freely-Available Web Query Languages by Ellen+Spertus · · Score: 5

    For my thesis, I created a Web query system called ParaSite. The best introduction is the paper Squeal: A Structured Query Language for the Web, which I presented at the World-Wide Web Conference. Anybody is welcome to use my code, algorithms, or ideas.

    See also WebSQL and W3QL, which also come from academia.

  5. I don't know about this by Alien54 · · Score: 3
    with all of the varient site structures, never mind security issues, pay sites, and things like Microsoft constantly rebuilding/breaking its' website, it is hard to see how it would be better results then any meta search across the common search engines sites with prebuilt indexing, etc.

    especially with the web running at well over a billion pages by now. Just think of the time to query a billion pages all around the planet, never mind on a small business line, with say a dsl line (forget modem!)

    but then I don't get the big bucks for this either....

    --
    "It is a greater offense to steal men's labor, than their clothes"
  6. goof? by fluxrad · · Score: 5

    >drop table internet;
    OK, 135454265363565609860398636678346496
    rows affected.

    "oh fuck"


    FluX
    After 16 years, MTV has finally completed its deevolution into the shiny things network

    --
    "It is seldom that liberty of any kind is lost all at once." -David Hume
  7. The Semantic Web by hemul · · Score: 3
    WebQL looks like an interesting hack, but have a look at the semantic web project for people trying to do it properly.

    The Semantic Web Page is a good starting point.
    TBLs personal notes Is another one. Probably the best one, actually.

    "The Semantic Web" was a term coined by Tim Berners-Lee (we all know who that is, don't we?) to describe a www-like global knowledge base, which when combined with some simple logic forms a really interesting KR system. His thesis is that early hypertext systems died of too much structure limiting scalability, and current KR systems (like CYC) have largely failed for similar reasons. The Semantic Web is an attempt to do KR in a web-like way.

    This really could be the next major leap in the evolution of the web. Do yourself a favour and check it out. And it's not based on hacks for screen-scraping HTML, it's based on real KR infrastructure.