Slashdot Mirror


Google's Search Appliance

An anonymous reader noted that Google is working on a Search Engine that you can install behind your corporate firewall for indexing your internal documents. It's a bit thin on information, but it looks like for as little (cough) as $20k, you can have your own google box. Not for everyone obviously ;)

18 of 250 comments (clear)

  1. Re:Advertising?? by djmurdoch · · Score: 2, Informative

    Go to Google, search for "google advertising", and you'll get this page near the top of the search results. Basically, they're selling people "sponsored links".

  2. article from C|Net here: by mESSDan · · Score: 4, Informative
    From C|Net.

    It's a little more indepth than the India times article.

    --

    -- Dan
  3. Ouch. Try HTDIG. by Kozz · · Score: 3, Informative

    Yes, quite CLEARLY it's only for those who've got some cash to blow. If you've got a modest-sized Intranet site, I would highly recommend htDig. I've installed and configured it in several places and it works like a charm. Best of all, it's GPLed! Sure, it doesn't have all the fancy matching algorithms used by Google, but it does a damned good job nonetheless.

    --
    I only post comments when someone on the internet is wrong.
  4. We're using it here...it rocks! by HRH+King+Lerxst · · Score: 4, Informative

    They just implemented this were I work, it's a vast improvement over what we had before. It even includes the cache and newsgroup features!!

    Two thumbs up!!

    --
    No one got beat up more often than the mimes of the old west!
  5. Re:Looking for a good internal search engine by pere · · Score: 3, Informative

    Try http://www.mnogosearch.org

    Brilliant search engine. It has parser for most file-formats (You can use pdf2txt to index your pdf-files). It even indexes your mp3's if you should happen to have some on your local net.

    Free (at least as in beer) for Unix. Binaries for Windows costs between $99 and $699.

  6. Re:Looking for a good internal search engine by richieb · · Score: 5, Informative
    Try htDig. It does all these things and is free software. I used it on a corporate intranet in the past. Not as good as Google, but you can't argue with the price.

    --
    ...richie - It is a good day to code.
  7. Re:Looking for a good internal search engine by NewbieSpaz · · Score: 2, Informative

    try ht://Dig. It's free and works with *nix. Info about pdf indexing is here: http://www.htdig.org/FAQ.html#q4.9
    It's a good solution for a small to medium sized website. If you run Linux, it might be on your install CD's, or might be installed already.

    --
    ------
    Random, useless fact: I type in startx entirely with my left hand.
  8. Re:Why Google Can Be So Expensive... by PoiBoy · · Score: 4, Informative

    Actually, I've seen interviews in some business magazines with their CEO. In fact, they are slightly profitable and have been for a few years.

    --
    Sig (appended to the end of comments you post, 120 chars)
  9. Re:Google enters this market at the right time by Anonymous Coward · · Score: 1, Informative

    Google already indexes PDF documents, and extracting text from a Word document isn't particularly hard. They could either treat it as a text file, use reverse-engineered file layout information, or license dewording technology from MS.

  10. Re:Google enters this market at the right time by jeffehobbs · · Score: 3, Informative

    Google searches .doc files.

    http://www.google.com/help/faq_filetypes.html

    1. What file types are returned in a Google search? There are 12 main file types searched by Google in addition to standard web formatted documents in HTML. The most common formats are PDF, PostScript, Microsoft Office formats:

    Adobe Portable Document Format (pdf)

    Adobe PostScript (ps)

    Lotus 1-2-3 (wk1, wk2, wk3, wk4, wk5, wki, wks, wku)

    Lotus WordPro (lwp)

    MacWrite (mw)

    Microsoft Excel (xls)

    Microsoft PowerPoint (ppt)

    Microsoft Word (doc)

    Microsoft Works (wks, wps, wdb)

    Microsoft Write (wri)

    Rich Text Format (rtf)

    Text (ans, txt) ~jeff

  11. Re:Didn't we know this all along? by neonstz · · Score: 3, Informative

    If you read the entire article you would know that there are two versions for sale, one small $20k box which can index up to 150,000 documents, and one "millions of millions" version which costs $250k.

    If a large company puts out all the revisions of all their documents it will be quite a lot of documents :). $250k is still quite cheap for something that will index all electronic documents the company has ever produced.

  12. Re:Search engine by gorilla · · Score: 3, Informative
    What a horrible script.

    No taint checking (What happens if 'q' contains ";rm -rf /;".

    No warnings.

    No proper formatting of HTML, on the output. If the grep matches "", then it's not going to display anything on netscape. You need to either strip tags, or force tag matches.

  13. Re:Google enters this market at the right time by Hallow · · Score: 2, Informative

    You probably haven't used Acrobat or Word for awhile. They both can contain links.

  14. Re:Cheaper to beef up... by ghutchis · · Score: 2, Informative

    Nah. Keep in mind that the ht://Dig project has several contributors. A few contributions of code go a long way.

    Keep in mind, though, that ht://Dig already implements many "Google-like" features such as indexing the text of links to documents and keeping track of the backlink count.

    http://www.htdig.org/attrs.html#backlink_factor
    http://www.htdig.org/attrs.html#description_fact or

    A proximity weighting would be nice, but there's some work to be done before that.

    -Geoff

  15. Re:Ouch. Try HTDIG. by ghutchis · · Score: 3, Informative

    Actually, saying it doesn't have all the fancy matching algorithms isn't really fair.

    Granted, we can't implement Google's patented things, but that's not to say we don't come close.

    Indexing the text of links to documents? Yes.
    http://www.htdig.org/attrs.html#description_fact or

    Keeping track of the weight of links pointing to a document? Yes.
    http://www.htdig.org/attrs.html#backlink_factor

    Probably the big "missing link" is a proximity weighting. Interested? Help is always welcome!

    -Geoff

  16. Re:Looking for a good internal search engine by ghutchis · · Score: 2, Informative


    "Not as good as Google,"
    OK, fair enough. Have some suggestions for how to improve it? Unlike Google, you can tailor all the search weightings in ht://Dig.

    Either general suggestions like "titles should be weighted more" or parameter changes would be quite welcome.

    It's open source, it's yours. So don't you want to see it improve?

    -Geoff

  17. Mercedes DOES sell cheap cars by RealisticWeb.com · · Score: 2, Informative

    If you have been to europe you know that mercedes DOES sell cheap cars. They are like euorpean Fords. You see Mercedes busses, tractors, compacts, everything. They are so common that thats what people think of when they see the symbol, and they can't sell as many sports cars or SUV's. So they export all the high end cars here, where we buy them.

    Point is, I agree that this is a smart Google move. You separate the market, and give people in both places the things that they want. That's why you are never going to see an ad banner on google trying to get the average surfer to buy their $20 engine

    --
    Sigs are out of style, so I'm not going to use one...oh wait..
  18. Re:$20K Isn't really that much if you consider it. by LoseNotLooseGuy · · Score: 2, Informative

    The real question is how much money is the company loosing from people who have to redo misplaced documents

    I find it difficult to believe that the company would be capable of "letting loose or releasing" money from people--that would be tantamount to theft. However, it is possible that the company would fail to obtain money from these people. The word you were looking for is losing.

    Congratulations! You have been participant #27 in my campaign to rid Slashdot of this error.

    --
    Proudly correcting Slashdot's most irritating linguistic error since 2002.