Google's Search Appliance
An anonymous reader noted that Google is working on a Search Engine
that you can install behind your corporate firewall for indexing
your internal documents. It's a bit thin on information, but it
looks like for as little (cough) as $20k, you can have your own
google box. Not for everyone obviously ;)
People don't have THAT much pr0n do they?! :)
Aside from anything else, it gives Google a revenue stream so they can continue to provide their services (web, image and usenet searches) for free; they need to find a valid business model, and hopefully this can contribute.
Everywhere you look, companies are hawking products geared for searching internal documents. Google is making a good move; enter an expanding market as an established leader in searching.
hawaiianshirt
will it also index employee email?
Searched the intranet for 'herbal viagra'.
Results 1-10 of about 1,279,500. Search took 0.14 seconds.
your jesus is another mans xebu. chew on that hypocrites.
Or you could write a 10-line perl script to index the titles of all your documents. Then maybe another 10-line perl script to do searches on it.
It does sound quite useful actually. If you have any serious amount of information to categorise (spy agencies, perhaps?)
I see more of this in the future - if you want a search engine, buy one and put it on the network. If you want a web server, buy one and put it on the network. You want a disk server... Well you get the point.
As hardware continues to get cheaper and software more expensive as it gets more complex it makes sense to do this rather than trying to configure multiple applications all on the same server.
And good luck to google making money on this so they can keep their search engine fast and free of annoying advertisments.
Sig is taking a break!
find | grep missingdata
Ctrl-Esc, F, 'missing data'
I would like to find a search engine that will index:
- text files
- html files
- PDF files
- names of binary files
Unfortunately, I am not able to spend much to purchase such a search engine (say $20, not $20K). This would be for my personal use, not for any kind of commercial use, and would not be funded except by my anemic hobby budget.Does anybody have any recommendations?
Edward Burr
Having a smoking section in a restaurant is like having a peeing section in a swimming pool.
If it can find my colleges I will definately make my boss buy this product.
Google did exactly what us fanboys all whined and complained for - a company that made a good product (awesome search engine) without selling out (no popup ads). Google offered a free service, built up an enoumous following, and now offers its premium service for a premium price, while insuring its loyal customers continued free services. Forget eBay, Google is an Internet-Success-Story worthy of such praise!
The companies that are useing the apliance are Large Corporation with Hundreds perhaps Thousands of computers and Millions of files and documents to find. The real question is how much money is the company loosing from people who have to redo misplaced documents. or make new ones which are simular to an other document that someone else made a while back. In a large corportation a Thousand of people working at $20 an hour are taking 1 hour to redo a document or spend time finding it. It makes up for the caust. Also if it gives google more money the better change the search eng. Stays free and without a ton of anoying avertising.
If something is so important that you feel the need to post it on the internet... It probably isn't that important.
The article says: "Google's core consumer search business is free and is funded largely by advertising."
But where? Has anyone seen this advertising? It's certainly not in the form of banners... I've always wondered how Google supports itself.
Accountability on the heads of the powerful.
Power in the hands of the accountable.
I think you're going to see a LOT more of this type of 'appliance' in the future... with the ever growing masses of information exploding off the desktops in every office around the country, we are quickly approaching critical mass -- we need better ways of tracking and managing data. I think Scott Adams mocks this as a "Knowledge Management" line of thinking, but its true.
I've dealt with this at the last three employers I've been at, and no one seems to have a good solution.
The CIA is using Northern Lights for their document managing, so is the FBI going to take up the Google-cross?
"This above all, to thine own self be true"
These days companys are downsizing and cutting back. We just sold off a bunch of extra inventory and were told to power down our computers at night and especially the weekend to save electricity.
Now when I came in this morning, a bunch of furniture has been hauled out with "sold" signs tagged to them.
So they are giving out 10k for a contest and getting 20k for every copy they sell... im not sure i want to help them now :)
Little lines of text from advertisers. Sweet, huh?
I'm the stranger...posting to
Google sometimes has ads in a sidebar on the right or top. These are targeted based on your search, and are thus usually relevent enough not to be annoying (not to mention being ignorable).
I find it hard to believe the revenue from those is really significant, but who knows; I bet their clickthrough rates are much better than those damn popup ads.
Companies such as Cisco and the likes with a huge intranet, have been using Google for some time. Use the search engine on their main page to get the idea.
To me, it was only a matter of time until they port their technology to simpler environments (home users & smaller corporations) for a fraction of the cost.
(incidentally I searched for porn and still got 4 results back :)
It's a little more indepth than the India times article.
-- Dan
Our corporate intranet has an excite search on it, and the intranet is not accessible from the net. I doubt they would have paid $20k for it either. Does anyone else have something like this, because I was under the impression it was common to have an internal search engine?
"Da ist ein Technölüst in mein Unterpanten!"
Tom: "Can you give me Google in a box?"
Mary: "Yes, we can."
Tom: "Well, let him out!"
Sounds really cool. I think I'd like to get one of these bad boys. I wonder how the details will work from an implementation perspective.. will one have to put all of their documents on one file server, or will it span multiple machines?
Also, how will it detect relavance? I'm pretty sure right now Google analyzes hyperlinks as part of its relevance algorithm... How will that work with internal documents if they aren't hyperlinked? How useful will this thing actually be? I'm sure they will think of something.
Yes, quite CLEARLY it's only for those who've got some cash to blow. If you've got a modest-sized Intranet site, I would highly recommend htDig. I've installed and configured it in several places and it works like a charm. Best of all, it's GPLed! Sure, it doesn't have all the fancy matching algorithms used by Google, but it does a damned good job nonetheless.
I only post comments when someone on the internet is wrong.
I could see one of the advantages that this would have is the ability to index pages/emails/whatever very quickly. No need for the wait that accompanies a index request on a web search engine because the spider will be around every hour or less in an intranet.
Surprisingly few corporations are willing to spend money indexing their internal document set, as other search engine companies discovered.
Excite, Altavista, HotBot, Lycos all at one time or another tried to sell to the corporate market with little success. So either things have changed since, or Google management repeating an old mistake from other companies...
Moreover, companies such as Verity which specialize in corporate search engines have reported falling revenues as of late...
They just implemented this were I work, it's a vast improvement over what we had before. It even includes the cache and newsgroup features!!
Two thumbs up!!
No one got beat up more often than the mimes of the old west!
... the ht://dig search engine.
In this climate of IT layoffs, I reckon it would prove cheaper and better to hire a programmer to take the GPL'ed ht://dig code and hack in some Google-like improvements.
The major improvement needed is the ability to search on phrases, and to do boolean searches.
Such a beefed up search/indexing system would not be subject to licensing fees, and would be freely redistributable (say, to other company offices).
-- In the beginning was the WORD, and the WORD was UNSIGNED, and the main(){} was without form and void...
At least then the search feature would work right and they can finally cache all those sites that we take down.
can't sleep slashdot will eat me
Unless Google reimplemented their own operating system, or <shudder> ported it to Win2K, they have a very expensive product, that runs on Linux, that is not GPL.
More power to Google--I'm glad to see them finding a way to make money without trashing their search engine, like happened with the previously good search engines that came before (e.g. Altavista, Lycos).
One CPU cycle wasted on digital restrictions management is ONE TOO MANY.
Why would i need a *search engine* for my internal documents?
If there is decent hierarchy in how i organize my files i suppose it won't be hard to track down anything without the need for a very heavyweight search engine like google's.
Say i want to find an mp3. I will look under music/hard_rock/evil_peas/there_it_is.mp3
maybe it is something i haven't thought of. And my example is very silly, i know.
Looking for people to chat about multicopters, coding, music. skype: gtsiros
If they can make money off their engine, more power to them. Keeping them solvent will keep their public search up and freely running for the rest of us.
If you've never been modded as "flamebait" or "troll," you've never tried to argue a minority viewpoint here!
If the management of Google were truly enlightened OpenSource evangelists, they would have given away the software for free. Google has built it's success on the back of Free Software developers, to the point that it should be called GNU/Google. Using Free and Liberated software to create a commercial monster is offensive and wrong.
I demand that Google allow Jesse Jackson, ESR and RMS on-site to persuade Google to go GPL and to investigate alleged GPL violations.
In addition, I call for the formation of the GNUggle project, an entirely Free search engine that runs of GNU/Hurd systems only.
Conformity is the jailer of freedom and enemy of growth. -JFK
Note the date, gentlemen. If Google is selling wholesale software solutions, the countdown clock to paid searches begins today. I'm betting that in less than a year's time we'll be asked to pay for Google searches. Hopefully by that time someone will have figured out a good system for micropayments.
Free is wonderful, but free doesn't scale when it comes to indexing the majority of the internet.
------
Today's Top Deals
Part of the success of the google technology is based on the page rank system which depends on many people linking to pages and so "ranking" them. On a corporate site you don't have as many separate opinions (i.e. pages managed independently) so perhaps the page rank part of google won't be as successful. OTOH just having fast search of all the docs would be good here :)
development.lombardi.com
It is, but it isnt. I mean I've got... about 16k html files on my one computer. 2 grand to search through them seems like a lot. Then again I'm just a dumb kid with a lot of junk. To a company in the business of information, thats prolly a pretty good deal. Boy, I wish I could make up my mind.
slashdot: where everyone yells sarcastic metaphors to themselves to understand the issue
...at least compared to AltaVista - and theirs really sucks in comparison. (Speaking as one who has coded to their API)
I look forward to our license expiring so I can consider a change to google - I can't wait!
It's not that expensive, considering the amount of money a corp wastes every year. If you put it in perspective - it is half of an average worker's yearly salary - and if management thinks it will save that much money over a year. . . :)
Companies have private jets so the pres / vp can get wasted while traveling across the country - $20k is nothing.
Google roxxor!
1q2w3e4r5t6y7u8i9o0pqawsedrftgthyjukilo;p'azsxdcf
When I spoke with him, they were wooing some fairly high-profile clients, but I can't rightly say I know where they are right now.
This has a LOT more business application that appears on the surface. And $20K for such a solution is comparable to paying $50 for Red Hat to run a server.
Back in my systems integration days, we had very many law firm clients who used document management to organize the truly prodigious quantity of information they had to deal with. Spending $50K on the solution was not unheard of even among small firms. In fact, they usually wound up spending $20K just on third party maintenance utilities to support their document management systems!
Isn't this just confirming what we already knew?
On top of that, depending on the size of your intranet and how efficient/inefficient indexing already has been, $20K may be a bargain.
Of course, how many companies are really going to have a use for it? For giggles, lets say the entire Fortune 500. That's 500 * 20K = 10,000 K = 10 Million Dollars US. In the grand scheme of things, that's a lot of money, but not a LOT of money. Perhaps they'll add on pay-per-use functions for even ritzier search features?
Sigs? We don't need no goddamn sigs!
sig--we don't need no goddamn sig
Years ago Infoseek offered a version of their search engine to Index LARGE collections of documents. We had over 500,000 IT was around 15k if I remeber correctly. Python on a Sparc 20, (20k itself at the time with mem proccesors array and tapes) So we had alomst 4k tied up in the whole thing, There was if I remeber correctly a per site, or per page fee in addition over so many documents, I made an error in a config file once and allowed it to traverse links, other than filling the hard drive, quickly, the additional costing we did after to see how much it would be should we decide to keep those docs was hilarious.
:) Indexing LARGE repositories isnt easy and config can be a pain. 20k sounds ok to me. I have YET to see anopen source solution that can handle VERY large document sets ASPSeek, but it still has issues, and over about 2.5 million docs I hear its a dead horse.
20k, Isnt bad at all if your talking some serious indexing. We indexed 5, F500 compaines techincal documents at the time, before they were all in house, this was 97-98. It was slick, I often wondered what happened to that software package.
Anyone know what google is written in ? I decompiled a fair bit of Infoseeks just to see what was what, and because I could
Sig went tro...aahemmm.....fishing........
Okay, the concept is good, but I don't see anyone paying $20k for it. I think we'll see a clone of this on freshmeat.net in about two months.
In a company (shorthand here for any organization, whatever its purpose), there could be all kinds of information that you don't quite know the categorizations work for every part of the company, or if someone else has a document you might need ...
Being able to search for keywords within your organization might find you a lot of useful things. Have we dealt with Client X before? Is there anything on the company mailing list about a problem I'm having with remote access? Do we still have a specific report around? It doesn't mean you can't ask coworkers or send a company wide email looking for things you need, but it offers another first option that puts the time / effort burden on inanimate objects instead of people with better things to do.
timothy
jrnl: http://tinyurl.com/c2l8yr / foes: http://tinyurl.com/ckjno5
Wouldn't it be great for when they say "your code doesn't meet the specification of what the product needs to do" and you can use it to say "let's look to the wayback machine to see when you changed the spec but didn't bother telling me"
:-)
Demonstrant's Open Source Tools
I know I'm biased (and ignorant), but Google is probably the best general-purpose search engine out there, with truly innovative quality filtering like PageRank(tm) and other very neat tricks. They have been around long enough that even the weakest of minds know Google. If this new retail product is as efficient and clean as their websearch, and well supported, they're going to make a killing! I really hope they find huge success, they've earned it.
-Billco, Fnarg.com
I work for a firm that indexes a scholarly database of research articles in psychology. We use a controlled vocabulary to describe the content of each abstract, which can vastly simplify life (for the users who know how to use it, natch.) Does Google (or anyone else) pursue this sort of strategy?
"One empirical experiment is worth a thousand expert opinions." -Bill Nye
I don't know too much about Google's technology, but I thought it used a scheme were web pages having many referring links would score higher in the search results.
For a corporate intranet, do you have this information? I mean are there people building home pages linking to their favorite corporate policy page?
but I think I will stick to using Grep and Locate.
_______
Death wish, n.:
The only wish that always comes true, whether or not one wishes it t
slashdot talked about this in 1999 when the patent came up. Its 2+ years later now. google has mostly crushed the competing search engines because the results of their algorithm are preferred to other algorithms. Their revenue sources are not public, but I believe I read recently that half of their revenue is from advertisements and half from technology licensing.
So, the point for discussion...
The world's favorite search engine exists because of its software patent. This patent has caused great harm to the competing search engines. Is this ok because...
to make them profitable. Google does so many things so well, and provides it all free to the world. It's not asking too much, I think, for them to ask companies to foot the bill for something like this if that's what it takes for them to continue to stay in business and keep doing all this neat wonderful free stuff.
You see? You see? Your stupid minds! Stupid! Stupid!
Now the Crackers have an easy way to *search* for passwords and confidential docs!
"Hey dude, find anything?"....."Nothing other than a google search engine"...."Alright!!!, now let them do our work!"
Unfortunately, we've already made the investment in a SQL 2000 database. I think the Google solution would be better because the SQL database relies on people entering the data correctly (and just entering it, period) to work well. It looks like the Google product would actually search through the documents for you. Right now I keep a pile of old reports on my text to pull out some recyclable material, but Google's search engine would eliminate that need.
Bill Clinton: Pimp we can believe in. - The Shirt!!!
They have been doing indexing of public intranet sites (like try here) but this is different since it is in the intranet and has to host the hardware.
Does anyone but me think that this may not work so great? The way that google works for the web (filtering down way too many hits and ranking them) is quite different than an intranet where fuzzy searching / regular expressions is alot more necessary. The Apple Developer Site (link above) uses google and it stinks!
I am not a number! I am a man! And don't you
Google is great search engine for the Intenet, because it ranks pages according to how many other pages link to it. Its very democratic. I don't see how Google behind the firewall would be a viable product, what will it rate document on how many other company documents link to it?
There a number of other existing indexing engines that are signigiantly cheaper and more mature. Google should stick to what it does best. I guess this shows they aren't very profitable and are looking for other sources of revenue.
We've already spent way to much just for the software from someone else. Still have yet to launch it though. Google should have done this long ago as soon as they realized their software works. Well, ok, that's an oversimplification, but still, the worked on these corporate search programs before, and they just weren't up to par.
-- these are only opinions and they might not be mine.
But I was wondering, how exactly does Google make money? They serve up so many goddamn pages and their bandiwth, storage, power consumption must be through the roof; so how do they pay for it all? This is a good start, but $20k can't go too far @ google.
Not kidding. I work for a very large multinational and the corporate search engine is an excercise in frustration. It's purpose in life seems to be to return bizarre and obscure documents as the results of it's searches.
$20k is nothing to shell out[1] for the capabilities that Google has.
[1] In corporate terms.
Government of the people, by corporate executives, for corporate profits.
Yeah, I hope they're already thinking about the personal version, because I've been dreaming of Google on my machine for a long time. An intelligent search beats just about any other kind of infrequent interaction: menus, directory navigation, dialog boxes with lots of little pages on them. I want to hit ctrl-g, type in what I'm interested in, and get the right thing.
Finding that vital piece of information can be far more important than $20k, especially to a large organisation.
Government of the people, by corporate executives, for corporate profits.
The first item in your search results. Google matches up what you are searching for with a company offering a compatible service/product.
This kind of directed advertising is valuable and a good application of their service.
Do not spread "09 F9 11 02 9D 74 E3 5B D8 41 56 C5 63 56 88 C0" over the internet, thank you.
Right now Google tends to be among the bigger darlings of Slashdot, but will they remain that way if they release this product and it's not Open Source? 'Cause they're nuts if they're planning on charging $20K for it but making it Open Source. Are they traitors to the cause, or is it just another understandable case of "Money talks, bullshit walks" when it comes to Open Source and the Real World?
I have been using ASPseek for some time. The search results are remarkably similar to Googles. If you want a libre alternative to Google for your own sites, ASPseek is probably the way to go.
Most companies don't have most of their docs
on internal websites - they are on Network Drives.
Zillions of "folders" full of ".doc" files - yuck.
Since it isn't hmtl there are no links I wonder
how Google sould deal with that.
I see that no one has mentioned the GPL perl script Perlfect www.perlfect.com. It's a very capable search engine including PDF search. Check it out.
So what, Google isn't a 100% libre-kosher company? Name any of their competitor that is. It's called "lesser of two evils".
As far as I know, Google has never filed for frivolous "IP" lawsuits, they respect web standards, they provide gratis, decent service, they don't fuck with your browser, and they tell you who paid for word placement as opposed to just putting paying advertisers on top without mention. They also happen to use free software and give it good press.
For IBM, $20 grand is pocket change, and it is SO needed!!! Anyone who has actually tried to use the 'search' on either the public or internal network knows that the search just doesn't work.
If anyone thinks $20k is expensive for 150k documents, they haven't bought a search engine recently!
Check out prices for Inktomi . Of course the more documents you have, the lower the per-document cost, but still they charge $7500 for 10k documents.
The "average" price of a Verity K2 license is $200k. (check this itworld.com link.
Good content indexing is expensive. Google will be undercutting the competition with this release. $20k really is a bargain.
Hmmm.. Looks like an interesting concept. If you have an admin with a little time on his hands, which would probably cost you a *LOT* less than $20k, you could set up something else.
:)
:(
We've been using Namazu to make all of our documents searchable. It's shareware, and does a pretty decent job of it. If we make it public or private is just a matter of who you allow access.
I guess the days of `grep "searchstring" *` are pretty much gone..
Next thing they're going to tell me is that I should start using something more modern than Pine to read my mail..
Serious? Seriousness is well above my pay grade.
Because the majority of
and the head sheep are clueless?
Flame away.
Anonymous posts are filtered.
With all the thousands of people who view Slashdot daily we can get a few thousand (preferably 20,000) to chip in. And we'll all give each other copies and we can each have a copy. Email cmdrtaco@slashdot.org if you're interested. :)
ZDNet also has the story.
As for personal reaction, I just wonder whether the option to search emails will be available to everyone, or just a select few. In either case, I don't think I like it very much.
Altavista have had something like this available for years. Its pretty good.
Google's claim to fame is its ability to rank results properly (something no other search engine ever got right). The rank, if I recall correctly, is _mostly_ based on links from other sites.
Now, when you're indexing thousands of doc and pdf files on a company network, how many of those link to each other?
And how many companies have internal newsgroups that can be searched? (No, Exchange shared folders don't count - or can Google index those as well?)
Like duh!
*cough*
(Please think about it before you roast me.)
Please mod this post only if you think others should/n't read this. I have enough ego^H^H^Hkarma. Thanks!
for $40000, you can get a sun e220, and run altavista's search engine on it. even then, if you want to integrate it, you still need to do 30-40 hours of work to make it all work right.
having something for $20000 or so is a godsend, especially if it comes with its own hardware (even though its hardware is probably not as nice as an e220)... throw in that they'll probably do the work when it breaks, and this is a no-brainer for anyone needing to index even as few as 25000 pages.
For high-traffic sites, the search engine can be run as a multithreaded daemon process that listens on either a Unix or TCP socket.
You could write a filter or a native module to index the names of binary files.
If you reply, do so only to what I explicitly wrote. If I didn't write it, don't assume or infer it.
now we know where /. should invest their next 20G's
Something to replace that poor poor search box on the bottom every page.
For chrissake's it's easier to search on Google right now and browse the cache! doh!!
Mod this up! Indeed, this is a HORRIBLE script, stupid idea, lame lame lame.
This would be a great way to introduce a really NASTY security hole into your site by using this script.
I only post comments when someone on the internet is wrong.
When Slashbot's Attack...
Today's Feature:
Slashbot Moderators
Search engine on your desktop?
Joel (On Software http://www.joelonsoftware.com ) has mentioned SixDegrees as a potential Google on your Desktop. http://www.creo.com/sixdegrees
No more 25-man midnight raids that cart off your entire data center. Now the FBI or BSA can just pick up your search applicance.
You actually got results returned from your search server?
Lucky bastard. Our corporate Intranet search engine usually would just return 'Query Timed out'. Eventually they just took the search boxes off all the web pages.
I've since built a simple Harvest index for the Intranet.
It can be very interesting finding all of the 'cobweb' documents on intranet sites. Ancient documents relating to projects and managers long since vanished among other stuff that management would prefer to see forgotten...
There are some cool features that are unique to Google, but I'm not sure if 'Convert PDF to HTML' and 'highlight search terms' are worth $20K.
I do not deploy Linux. Ever.
Aside from the GNU license and association with SourceForge, I'm not sure what advantages ht://Dig has over the other free/commercial indexing products. Perhaps somebody has a comparison page?
I do not deploy Linux. Ever.
Actually, it's perfect for searches about that size, and bigger even. When you talk about fast find (at least the later versions), you're actually talking about the Windows Index Server in drag. Index Server is a fairly robust piece of work that allows sites to implement (as a part of Commerce Server, SQL Server, and others) full text searches across the media. It's componentized nature makes it convenient to use from VB/VBS/ASP/other COM capable languages. Not too bad actually...
The joke was about Fast Find though which, IMO, is the most crufty unfriendly piece of sh*t ever incorporated into MS Office. In Office 95, 97, and 2000 (haven't tried Office XP yet) it's something I systematically eradicate on every machine I see. It's known for firing up it's re-indexing while the user is already using the machine, and it's also known for not being controllable by the user (i.e. the user can't tell it when to re-index).
Please mod this post only if you think others should/n't read this. I have enough ego^H^H^Hkarma. Thanks!
I'm sorry but what exactly is wrong with software patents? Did I fall asleep and suddenly wake up in a socialst country or something?
Mac OS X and Windows XP working side by side to fight back the night.
Imagine a Beowolf Cluster of THESE!!!
Part of their business is licensing out the engine to other companies such as AOL.
Mac OS X and Windows XP working side by side to fight back the night.
A bit late to get into the game, isn't it? I mean, there are a number of document management systems already out there. PC Docs, etc. And these are VERY powerful system. It makes you wonder how good Google's system is going to be.
And while you cough at the $20k pricetag, that seams about right for what you are looking to do.
RonB
It is human nature to take shortcuts in thinking.
Actually, saying it doesn't have all the fancy matching algorithms isn't really fair.
t or
Granted, we can't implement Google's patented things, but that's not to say we don't come close.
Indexing the text of links to documents? Yes.
http://www.htdig.org/attrs.html#description_fac
Keeping track of the weight of links pointing to a document? Yes.
http://www.htdig.org/attrs.html#backlink_factor
Probably the big "missing link" is a proximity weighting. Interested? Help is always welcome!
-Geoff
htdig has made me a hero here. Mostly because of its reliability and price.
It astonishes me how people can sell something that's already free. Canned air will be next.
- Freddy
Google's selling this to corporations for $20,000 per two year license. Our company is hopefully about to buy one... to replace the $250,000 per year Verity product that just doesn't work at well! To be fair, the Verity engine also indexes Lotus Notes and Oracle databases, but apparently Google's about to do that too. Heheh I guess when they add that support the only two differences between verity and google will be that verity costs 20x as much (over time) and... verity doesn't work very well!
...is it weird that Slashdot doesn't have a specific Google topic yet?
It seems to me that Google's (patent-pending) pagerank algorithm wouldn't be of much help on an intraweb. The linkstructure of a single website mostly reflects design decisions, and hardly says anything about the popularity/authority/value of a page. And even if it did, it wouldn't be very objective (let's call that "inter-subjective") since the site is probably maintained by a rather small group of people.
If that is so, why choose Google over a cheaper competitor?
Being well balanced is overrated. -- John Carmack
Google's Search Appliance - I thought that would be another of those internet appliance gadgets.
Think about it: one in the kitchen so I-can't-double-click-mom can get her receipts, another in the garage for AOL-dad's do-it-yourself shop, and so on...
My life's goal is to get a score of +3!
If you have been to europe you know that mercedes DOES sell cheap cars. They are like euorpean Fords. You see Mercedes busses, tractors, compacts, everything. They are so common that thats what people think of when they see the symbol, and they can't sell as many sports cars or SUV's. So they export all the high end cars here, where we buy them.
Point is, I agree that this is a smart Google move. You separate the market, and give people in both places the things that they want. That's why you are never going to see an ad banner on google trying to get the average surfer to buy their $20 engine
Sigs are out of style, so I'm not going to use one...oh wait..
This is a very neat free-text indexing engine. Not sure about how much it costs though...
How would this model work in an intranet setting? Would it count the number of desktop shortcuts? Yes the algorithm works great in the internet world but is it a universal find-all?
I noticed this last week when searching Cisco's site. The addition of the "powered by Google" snippet in the upper right hand corner of the search results threw me for a loop.
I haven't noticed much of an improvement in their search results yet - perhaps it takes time to build the link relationships index?
Cheers,
J.J.
Don' think I'll be buying one anytime soon, but they sure do look slick. Here is a picture of the beauty.
Also, here's the press release that was sent out on the googlepress group:
-------------Media Alert
-------------
February 11, 2002
Today, Google announced the availability of the Google Search Appliance, an integrated hardware/software solution that extends the power of Google.com to corporate intranets and web servers. The Google Search Appliance simplifies corporate search for administrators and makes it fast and easy for employees to find the intranet information they need.
The new product comes in two versions: GB-1001 for departments and medium-size companies with up to 150,000 documents, and the GB-8008 for large corporations with millions of documents. Google Search Appliance features include:
- Complete solution: both hardware and software
- Easy install: up and running in less than one hour
- Simple administration: simple and intuitive browser-based admin console
- High quality: quickly delivers relevant search results
- Affordable: pricing starts at $20,000 for two years of support and software updates
The Google Search Appliance was designed to address the growing demand for simple, cost-effective search solutions within corporations. The Google Search Appliance is based on Google's award- winning search technology and provides a complete solution to companies that need search services to manage data behind the firewall.
An image of the Google Search Appliance can be found here:
http://www.google.com/press/images.html.
Additional product information can be found at:
www.google.com/appliance.
what about those folks in Norway? They did just the same thing two years ago and nobody noticed.
That's what I am reqesting from our IT guys and managers for a long time. Of course there are personal and other matter that shouldn't be spidered and indexed. I imagine kind of an email client which has some default setting, but also asks you before sending off an email: should this message be open for corporate search?
However, since am working for a large international financial institution where client data privacy is very very important, it's very difficult to get the attention of the management for such ideas.
For a larger site or for distributed harvesting then there is Combine which is an old one from 1996. It does text, HTML, and PDF. It's free, but takes a bit of time to set up and can even handle metadata (i.e. keywords). There are binaries for linux and solaris, but most is in perl.
It's about to begin some modernization to make it easier to install and operate, perhaps even use MySQL as a backend.
Beta is broken and the link to classic doesn't work. Stop wasting our time or there won't be anybody left here.
This is not including fairly high maintenance fees...
When you consider that any corporate site could be a window on a huge corporate database of information (literally, a each web page could be a database record), you could blow through hundreds of thousands of documents easily.
For the uber-geeks out there, search engines work nice with configuration management systems (like Perforce) for searching source code for large projects.
By the way, my experience with the Altavista product is it is very buggy and unreliable (they re-wrote it in Java, any surprise?), so Google's entry into the field is welcome.
that is peanuts compared to thuderstone although its a special type of relational database. thunderstone runs around in the millions and people buy it!
check out Endeca - they use your structured data (in your case, a multi-faceted controlled vocabulary) to help you explore the results of your search on unstructured text. e.g., type "early childhood" in search box, instantly get back thousands of matches-- but placed in a precise context that tells you exactly how to refine (by facets like subjects, medicines, labs, authors, dates, etc., with only categories valid for that unique search shown.) new and really cool technology.
Really stylish. It'd probably look great next to a Cobalt.