The Google Search Server

Neat insides by AKAImBatman · 2005-09-06 03:06 · Score: 5, Insightful

Let's see here:

Took lots of pretty pictures [Check]
Tore the box apart wondering if we could finally find a flux capacitor [Check]
Tried to play with all the hardware and software we've been encouraged to leave alone. [Check]
Actually tested how the device performed doing its intended function? [Why would you want to do that?]

--
Javascript + Nintendo DSi = DSiCade

Re:Neat insides by b0r1s · 2005-09-06 03:13 · Score: 4, Informative

These are neat little boxes - we've managed 2 (the yellow appliance, and the blue mini appliance), and the performance of both was pretty nice.

The tools google provides (very easy binary updates, strong web control panel, for example) turn the relatively common task into a dead-simple, point-and-click configuration.

They even provide a decent interface for skinning the search pages, and while it's not perfect, it's certainly adequate for even the best looking sites on the internet.

--
Mooniacs for iOS and Android

Google is Dead anyway by stecoop · 2005-09-06 03:08 · Score: 3, Funny

Microsoft's Ballmer Threatened To 'Kill' Google

Re:Google is Dead anyway by Anonymous Coward · 2005-09-06 03:18 · Score: 5, Funny

"I'm going to bury that guy, I have done it before, and I will do it again. I'm going to kill Google."
- Steve Ballmer

"Whether you like it or not, history is on our side. We will bury you."
- Nikita Khrushchev

Did Ballmer take off his shoe and start banging on the podium while he talked?
Re:Google is Dead anyway by Jugalator · 2005-09-06 05:40 · Score: 5, Funny

"I'm going to bury that guy, I have done it before, and I will do it again. I'm going to kill Google."

This should clearly tell you that Google is already undead, and keep rising again. He has already killed them before. Don't worry!

--
Beware: In C++, your friends can see your privates!

AnandTech not very search optimization saavy by DeadSea · 2005-09-06 03:12 · Score: 5, Informative

The Mini considers any unique URL string to be a unique document, which makes sense (but is a bit surprising the first time that you run an index). After four hours of indexing, the Mini had managed to reach its document limit and we had to improvise.

Anybody who doesn't know that search engines consider each url to contain a unique document does't know much about getting their site to be properly represented in search engines.

Their solution was to create a list of urls for the appliance to crawl. If they had to do that for the search appliance, there is no way that googlebot, msnbot, or yahoo slurp is going to be able to properly index their site.

Your public accessable urls need to managed and canonicalized through judicious use of robots.txt, 302 redirects, site wide linking, and just plain thinking out the layout of your site.

Re:AnandTech not very search optimization saavy by Anonymous Coward · 2005-09-06 04:28 · Score: 3, Informative

... which flows right into this statement:
A word to the wise: don't let the Mini crawl your entire site without keeping a close eye on it.

The same could be said of any search engine, or any automated process for that matter. We use ht://Dig and the issues are the same, except ht://Dig can be run locally on the server, saving bandwidth (and speeding up the indexing process) by indexing locally and re-writing urls for static files, through apache for dynamic, it's free, and you aren't limited to 100000 documents. It supports the same feature set, minus the Google Gui.

Of course, it does have a steeper learning curve... you actually need to understand how search, url filters, regex, synonyms, etc works.

I'd provide screenies, but most people glaze over when confronted with terminal output ; ) A shell just isn't as hip as an html gui. What else can I say?

L8,
AC

Oh come on by Black+Perl · 2005-09-06 03:19 · Score: 4, Funny

First, it wasn't a review. They didn't review anything.

Second, it was a Google Mini.

Third, they didn't "put it through its paces" at all.

Lousy article, misleading /. blurb. But it was about Google! Gooooooooogle!

--
bp

Good, but... by hazzey · 2005-09-06 03:21 · Score: 5, Interesting

While this is an interesting article, it really isn't much of a review of the Google Mini. All they do is take it apart, take pictures, and tell you that they set it up after a little bit of trouble. There is nothing about how well it actually works. No benchmarks. No comparisons. They just say that it worked well and leave it at that. Anandtech has had more indepth reviews of mice before.

It is more information that I have seen anywhere else though.

Free Google T-Shirt by nudeatom · 2005-09-06 03:21 · Score: 5, Funny

Thats it, I gotta get me one of those just for the tee.

--
Yeah right, Like Im gonna write a sig.

It's "its"! by dtmos · 2005-09-06 03:21 · Score: 5, Informative

The guys from anandtech put it through it's paces

It's really easy: It's "his", hers", and "its". Even a flower knows!

--cycling through grammar Nazi mode. Please wait.

Re:It's "its"! by Traa · 2005-09-06 03:36 · Score: 3, Informative

Use "it's" when you can replace it with "it is"

Well, that is what someone told me anyway. English is not my primary language, if the above is not correct then please don't shoot me.
Re:It's "its"! by radishes · 2005-09-06 05:29 · Score: 4, Informative

and use its' when it's possesive
john's coming to get johns' hat
Don't listen to this guy. He has lied to you twice. 1) Its' is never valid. 2) The example with John is just so wrong it hurts. "John is coming to get John's hat." You use 's for possessive; s' is for possessive plural, like this: "Slashdotters tend to live in their parents' basement."

--
[ Reply to This | Parent ]

where's the raid? by Darth_Burrito · 2005-09-06 03:22 · Score: 5, Interesting

Did it strike anyone else as insane that this thing only had one hard drive? For $3,000, where's the raid array? Ok, sure it's a search appliance and doesn't really hold any mission critical data, but if the hard drive crashes, how long is your search functionality going to be down? You'll need to get a replacement drive and rebuild your whole database (a slow crawl process). What about your configuration settings?

Re:where's the raid? by horati0 · 2005-09-06 03:29 · Score: 5, Funny

Did it strike anyone else as insane that this thing only had one hard drive? For $3,000, where's the raid array?

Here.

--
The neutrality of this sig is disputed.
Re:where's the raid? by slim · 2005-09-06 03:32 · Score: 5, Informative

I guess if you want RAID, you pay more than $3,000.

What you're really buying here is closed-source software, wrapped in the hardware that turns it into an "appliance". Assume $2,000 of that $3,000 pays for the software.

By specifying the hardware in this way, and by keeping the BIOS and root passwords to themselves, Google greatly simplify their support role.

This is common practice: an IBM HMC (Hardware Management Console) is a 1U PC with a custom Linux distribution and the management software preinstalled. You don't get the root password; you just use the software as delivered.

Try searching the site for "google mini" by openSoar · 2005-09-06 03:24 · Score: 4, Funny

Maybe it takes a while for the documents to be indexed but you'd think they would have added it manually given the nature of the article.

Google ate my server by PIPBoy3000 · 2005-09-06 03:28 · Score: 5, Interesting

A few months ago, we asked for a demo of the product. My main involvement was to help compare with our existing search strategy. Just to cut to the chase, we generally had a very positive experience with it. Searches would bring up what we wanted more often than not. Our existing search system, which was based around IIS and custom SQL code, was pretty good, though it couldn't beat Google for pulling up relevant pages. We did have a few quirky things happen, though.

We had a couple times when the appliance locked up and had to be rebooted. That was probably the most distressing as it had to be on 24x7 to support our organization and I wasn't looking forward to the help desk calls.

More amusing, though, was the way it crawled content. Google works like any other crawler - it goes around and clicks hyperlinks. Unfortunately it's not too bright, not paying attention to the text of the hyperlink, like if it said "delete" or something like that.

Unfortunately I had a poorly secured application that Google was able to sneak into via another link I wasn't aware of. It held the custom links for each of our departments to display a personalized set of links on the home page. Unfortunately it went through the admin tool and clicked every delete link it could find. I was paged the next morning and was fairly unhappy. My fault, though.

The irony is that the budget money evaporated and we aren't getting it after all.

From TFA by Anonymous Coward · 2005-09-06 03:41 · Score: 5, Funny

The screw is threaded - it just can't be undone with a regular screwdriver.

Right.. Only unthreaded screws can be opened by a regular screwdriver.

Re:GPl compliance by Anonymous Coward · 2005-09-06 03:42 · Score: 3, Informative

http://code.google.com/mirror/gsa.html

For those who're interested... by Homicide · 2005-09-06 03:44 · Score: 5, Informative

I admin a full blown Google Search Appliance, the mimi's big brother.

If you want the specs:
Dual Xeon 2.6GHz
12GB RAM
4 250GB HD's in RAID(something) with a hot-swap spare.

Never tried taking off the cover though, since we want to keep the warranty.

All of the money you pay is a license for the software on the box, the system itself is effectively free, so once the 2 year warranty expires, you've effectively got a nice powerful linux box for free. You can keep running the software, but without any support.

As for performance, this thing works great, we have about 250,000 pages that it can index, both public and private (and it can do searches cleverly checknig username/pasword to see if you should have access to certain results), and we've had nothing but positive responses from our users. The results come up quickly, they're the results people want, and the results that management think should be at the top, are at the top.

Re:For those who're interested... by Homicide · 2005-09-06 03:59 · Score: 4, Informative

It submits a HTTP HEAD request for the URL to the server the page is on, with the username and password supplied, so the server at the other end decides if you should be able to see the search results, thus saving you from having to faff around telling the google box who can get to what pages.

product review: the yellow GSA by msblack · 2005-09-06 03:53 · Score: 3, Informative

We evaluated on of those yellow Google search appliances (GSA) and experienced very mixed results. The appliance is very easy to set-up and launch an initial scan of our website.

The GSA will blindly search all web servers in your domain. When setting-up the GSA, you give it an initial page from which to start crawling and baseline domains. For example:

Inital page: http://www.slashdot.org/
Domain(s): .slashdot.org,slashdot.org

The leading dot on the first domain entry says to search all hosts in the domain.

Problem: GSA does not provide very good status of where or what it is searching. It only has a dashboard light to say it is crawling. No details.

Problem: We found that the GSA would get caught in an endless loop if it encountered a user website controlled by a database. It would endlessly follow the next and previous links to find every database entry.

Our university library subscribes to a number of electronic databases, such as, EBSCO PsychINFO, etc. The GSA indexed every possible look-up.

Our eval licenses was limited to 1.5 million pages. Some of these databases contain hundreds of thousands of pages. Solution: Those setting up their own web server must employ proper robots.txt files or risk having their entire server blocked from indexing.

--
signature pending slashdot approval

Re:After BIOS and before web-interface? by Homicide · 2005-09-06 03:54 · Score: 3, Interesting

If it's the same as its big brother, then it boots up into RedHat Linux. You can watch all the usual bootup things happening, just not interfere with them, as the keyboard is ignored.

It does end up at a login prompt, but you're not given any usernames or passwords to access it.

Why some places won't buy this by BenEnglishAtHome · 2005-09-06 03:55 · Score: 5, Interesting

The pictures are pretty and I'll assume the thing works. Some folks, however, won't buy it because they don't want their intranets to work like you or I might expect. Let me explain.

I work for a large TLA govt agency. I've begged our people to get something like this. I know, from working with our folks and doing my own digging, that we have a wealth of knowledge tucked away, here and there, on local group shares and out-of-the-way internal web sites. And yet our internal search function is ludicrously bad. It works off "key words" that are simply a manually maintained (I think) list of useless, often off-the-mark descriptions of approved sites of general interest. Special-interest pages are not indexed in this way. The crawler, if you want to call it that, is terrible at doing its job. Enter a string of text and get a hit on a known, universally accessible web page containing that exact string? Not a chance. I test it occasionally and find that it remains as ridiculous as ever, with a level of functionality that would have been technologically uninteresting the better part of a decade ago but is, in this day, infuriating to users.

The reason for all this is that if our intranet were automatically crawled, well indexed, and truly searchable, people would be able to find things. People in Work Area A would be able to see how they might be impacted by something going on in Work Area B. Horrors! That would mean that management would lose much of their ability to keep employees selectively in the dark.

All this came to a head a number of years ago. At that time, our intranet content was maintained by IT. Anybody that wanted a site (literally anybody) could just get their first-line manager to approve the request and they'd get server space and some help setting up a page or two. The exchange of information that started happening was highly disruptive, so a "Communications and Liaison" office was set up that wrenched control of the intranet from IT and required (what seems to be essentially political) approval of the business case for anything that went online. No web sites unless the Communications gods approved.

Nowadays, the employees of one division are only vaguely aware that other divisions exist or have web sites. Each individual fiefdom is protected from the ravages of communications that don't strictly follow the org chart lines. I guess the executives in charge are happy in their insulated little worlds.

If you're going to sell an effective intranet search tool, you're going to have to face the fact that lots of large organization leaders (and you find the same attitudes in both the public and the private sector) would recoil in horror at the thought of having their intranet be effectively searchable. It's too threatening.

Don't use GET to modify application state! by Augusto · 2005-09-06 03:56 · Score: 4, Informative

The problem is not google, is the way your app is designed!

Universal Resource Identifiers -- Axioms of Web Architecture : Identity, State and GET

In HTTP, GET must not have side effects.

In HTTP, anything which does not have side-effects should use GET

If somebody visited your site with a pre-fetching tool like the google web accelerator, you will also find the "delete" button being checked automatically like this. Change those deletes to use POST instead.

--

- sigs are for wimps.

Re:I tested it.... by jshaped · 2005-09-06 04:12 · Score: 3, Interesting

offtopic?

At anandtech's website,
to test the ability of their google search server,
I searched for the title of that article.
You would think it would point me to the article;
it did not.

Carpetting by ukleafer · 2005-09-06 07:45 · Score: 3, Funny

Anyone else think the Anandtech server room has some lovely, lovely carpets?

28 of 178 comments (clear)