Google's Search Appliance
An anonymous reader noted that Google is working on a Search Engine
that you can install behind your corporate firewall for indexing
your internal documents. It's a bit thin on information, but it
looks like for as little (cough) as $20k, you can have your own
google box. Not for everyone obviously ;)
It's a little more indepth than the India times article.
-- Dan
Yes, quite CLEARLY it's only for those who've got some cash to blow. If you've got a modest-sized Intranet site, I would highly recommend htDig. I've installed and configured it in several places and it works like a charm. Best of all, it's GPLed! Sure, it doesn't have all the fancy matching algorithms used by Google, but it does a damned good job nonetheless.
I only post comments when someone on the internet is wrong.
They just implemented this were I work, it's a vast improvement over what we had before. It even includes the cache and newsgroup features!!
Two thumbs up!!
No one got beat up more often than the mimes of the old west!
Try http://www.mnogosearch.org
Brilliant search engine. It has parser for most file-formats (You can use pdf2txt to index your pdf-files). It even indexes your mp3's if you should happen to have some on your local net.
Free (at least as in beer) for Unix. Binaries for Windows costs between $99 and $699.
...richie - It is a good day to code.
Actually, I've seen interviews in some business magazines with their CEO. In fact, they are slightly profitable and have been for a few years.
Sig (appended to the end of comments you post, 120 chars)
Google searches .doc files.
http://www.google.com/help/faq_filetypes.html
1. What file types are returned in a Google search? There are 12 main file types searched by Google in addition to standard web formatted documents in HTML. The most common formats are PDF, PostScript, Microsoft Office formats:
Adobe Portable Document Format (pdf)
Adobe PostScript (ps)
Lotus 1-2-3 (wk1, wk2, wk3, wk4, wk5, wki, wks, wku)
Lotus WordPro (lwp)
MacWrite (mw)
Microsoft Excel (xls)
Microsoft PowerPoint (ppt)
Microsoft Word (doc)
Microsoft Works (wks, wps, wdb)
Microsoft Write (wri)
Rich Text Format (rtf)
Text (ans, txt) ~jeff
If you read the entire article you would know that there are two versions for sale, one small $20k box which can index up to 150,000 documents, and one "millions of millions" version which costs $250k.
If a large company puts out all the revisions of all their documents it will be quite a lot of documents :). $250k is still quite cheap for something that will index all electronic documents the company has ever produced.
No taint checking (What happens if 'q' contains ";rm -rf /;".
No warnings.
No proper formatting of HTML, on the output. If the grep matches "", then it's not going to display anything on netscape. You need to either strip tags, or force tag matches.
Actually, saying it doesn't have all the fancy matching algorithms isn't really fair.
t or
Granted, we can't implement Google's patented things, but that's not to say we don't come close.
Indexing the text of links to documents? Yes.
http://www.htdig.org/attrs.html#description_fac
Keeping track of the weight of links pointing to a document? Yes.
http://www.htdig.org/attrs.html#backlink_factor
Probably the big "missing link" is a proximity weighting. Interested? Help is always welcome!
-Geoff