Learning About Full-text Search
An anonymous reader writes "Tim Bray who's known for XML and has been /.'ed once or twice for that kind of stuff, actually seems to be a search geek and has been writing this endless series of essays on search technology since summer. He says he's finished now - it's like a textbook on searching."
You mean two or three times now.
Trolling is a art,
He writes about seaching technology, but you can't easily search through his writings.
Finished an endless series?
i cringe at the bandwidth demands a slashdotting can bring with it. here's google's cache of the page.
Sig (appended to the end of comments you post, 120 chars)
Though, I'm unaware of how to apply this to my life. I think I'll take it and put it in the "Unaware of How to Apply This to My Life" Stack with The Simpsons and The Internet.
clifgriffin > blog
Why the fascination with XML? Well, I certainly know the reason why *Tim* is fascinated, but I want to know why he's seriously contemplating reinventing the wheel - namely using XML as data storage when we already have gobs and gobs of systems (think SQL DBMS products) that do this in a much faster, more compact, safer, better way.
Also, most SQL DBMS (Oracle, Sybase ASE, MS SQL, etc.) come with full-text indexing built in, so all it would take would be to chop up HTML pages and stick them in the DBMS, then you can perform rich-text queries on them with minimal effort.
Thanks,
--
Matt
The essay series converges to text book when time tends to infinity. Proof is left as an exercise to the reader.
getSexySig();
Search technology. Hmmm. Wasn't that outsourced to India last month? Or was that last year? I just can't keep up with IT today.
"Someone" ought start building ... I wonder why this someone isn't Tim Bray. He is one of the most well known names in XML, has experience under his belt with another Search Engine Project Antartica .....
I just mean it in the sense that if he is having trouble getting his own ideas himself off the ground, what a challenge it will be for someone else to do so.
Mr. Bray should get the thing going like Linus did, and call in help from the Open Source Community. If he is waiting for someone with moneybags to catch the bait, and call him on the project as a highly paid consultant, maybe the approach needs to be modified.
Go Open Source Tim ... and get the ball rolling. You will be surprised how much help you will get ...
To see a world in a grain of sand, and then to step back and see the beach where the sand lies
Mr. Simpson, this is the most blatant case of fraudulent advertising since my suit against the film, ``The Never-Ending Story''.
--
"Outlook not so good." That magic 8-ball knows everything! I'll ask about Exchange Server next.
There are no trolls. There are no trees out here.
Try reading the articles/essays. Knuth's vol 3 is about comparison search, not full-text search.
"A Quantum Theory of Internet Value" by Andrew Orlowski
-- why librarians are better at finding the book you want than Google.
Maybe search technology has changed a lot since Kuth days. If one cursorily glances through the last coupla journals on Information Search and Retrieval, one cannot help the heavy influence of PageRank (Google's own technology). Thankfully the algorithm is well known. On the flip side, Critics have often asked wheather such algorithms be published? The bloggers have demonstrated that even Google rankings can be rigged... Personally, I would choose the open architecture philosophy, due to parallels with the ideas of Bruce on cryptography. A peer reviewed system is always better than a closed proprietery system.
I don't know... see for yourself, then come and tell us... The comment on this page suggests that you are right...
"Go to CNN [for a] spell-checked, fact-checked summary" -- CmdrTaco
Mirror #1
Mirror #2
Mirror #3
It just has a new name, and it's being developed by librarians.m l
http://www.dlxs.org/products/xpat.ht
More search related functions should be available to php and perl and built in to them .. Even Mysql too...
Chris ,
Php Programmers.