WebQL Turns the Web Into A Giant Database
An anonymous reader says "
This article was posted on ZDNet by Bill Machrone on a new type of query language for aggregating information from the Web." Somewhat light on the details, but definitely something to think about.
How will this be different from Google's back-end query interface? I ask, because I can't imagine someone making a "screen-scraping" search engine that returns bits of data and not just a link. They will probably get sued by the owners of the purloined content. Plus, parsing HTML to extract one little field of data is tricky, and highly dependant on the layout of the page. I've written a number of things to do just that, from Amazon, IMDB, Borders, finance.Yahoo.com, etc., for my own purposes. I wrote them in both C and Perl. It's a job keeping the filters updated to accomodate the changes in page layout style, regardless of language. Good luck to them and all, but until we have an XML + XSL web, with standard DTDs for the XML, forget it.
________________________________________
Napster-to-go says "Fill and refill your compatible MP3 player", which is a lie. It's not MP3. It's WMA with DRM.
I downloaded the WebQL Business Edition manual. Here's an abbreviated version of the first example query:
The select clause accepts a variety of functions, of which text() seems to be the most useful. You can see that the first argument is a regex designed to match phone numbers. The from clause is an URL. The where clause primarily takes the approach "descriptor," which can crawl or guess new URLs.So basically, it doesn't do anything a Perl script can't. It just presents a simpler interface.
For my thesis, I created a Web query system called ParaSite. The best introduction is the paper Squeal: A Structured Query Language for the Web, which I presented at the World-Wide Web Conference. Anybody is welcome to use my code, algorithms, or ideas.
See also WebSQL and W3QL, which also come from academia.
>drop table internet;
OK, 135454265363565609860398636678346496
rows affected.
"oh fuck"
FluX
After 16 years, MTV has finally completed its deevolution into the shiny things network
"It is seldom that liberty of any kind is lost all at once." -David Hume