The Google Search Server

Neat insides by AKAImBatman · 2005-09-06 03:06 · Score: 5, Insightful

Let's see here:

Took lots of pretty pictures [Check]
Tore the box apart wondering if we could finally find a flux capacitor [Check]
Tried to play with all the hardware and software we've been encouraged to leave alone. [Check]
Actually tested how the device performed doing its intended function? [Why would you want to do that?]

--
Javascript + Nintendo DSi = DSiCade

Re:Neat insides by b0r1s · 2005-09-06 03:13 · Score: 4, Informative

These are neat little boxes - we've managed 2 (the yellow appliance, and the blue mini appliance), and the performance of both was pretty nice.

The tools google provides (very easy binary updates, strong web control panel, for example) turn the relatively common task into a dead-simple, point-and-click configuration.

They even provide a decent interface for skinning the search pages, and while it's not perfect, it's certainly adequate for even the best looking sites on the internet.

--
Mooniacs for iOS and Android
Re:Neat insides by op12 · 2005-09-06 03:16 · Score: 2, Funny

4. Actually tested how the device performed doing its intended function? [Why would you want to do that?]

Quit complaining, it's not like this was being called an indepth review.....oh, wait.
Re:Neat insides by Anonymous Coward · 2005-09-06 03:17 · Score: 1, Funny

Actually tested how the device performed doing its intended function?
You can do this yourself; try searching the Anandtech site. It's quick, and the results look like Google results.
Re:Neat insides by Anonymous Coward · 2005-09-06 03:31 · Score: 1

I think what most people were looking for was a little blurb like, "After using it for awhile, we must say that this ROCKS in comparison to our previous search engines!" Or alternatively, "After some testing, we were a bit disappointed in the quality of the results. We really expected better out of Google, and plan to contact them to see if there's anything that can be done to improve the results of the device."
Re:Neat insides by hackstraw · 2005-09-06 03:47 · Score: 2, Interesting

I wish we would get one of those google appliances instead of whatever horrible search "solution" we have now. I use google with site:mysite.com to search our website.

When looking at the google appliances, I thought it was really cool how it learns your specific terms and acronyms and it will do the "Did you mean correctspellingword?" like google does.

Pretty slick from what I gather. I have no direct experience except for google proper.
Re:Neat insides by MarkGriz · 2005-09-06 05:22 · Score: 2, Funny

2. Tore the box apart wondering if we could finally find a flux capacitor [Check]

I must say I'm disappointed that this is what Google passes off as a flux capacitor.

--
Beauty is in the eye of the beerholder.

Google is Dead anyway by stecoop · 2005-09-06 03:08 · Score: 3, Funny

Microsoft's Ballmer Threatened To 'Kill' Google

Re:Google is Dead anyway by Anonymous Coward · 2005-09-06 03:18 · Score: 5, Funny

"I'm going to bury that guy, I have done it before, and I will do it again. I'm going to kill Google."
- Steve Ballmer

"Whether you like it or not, history is on our side. We will bury you."
- Nikita Khrushchev

Did Ballmer take off his shoe and start banging on the podium while he talked?
Re:Google is Dead anyway by chris_eineke · 2005-09-06 04:01 · Score: 1

Balmer confirms, google is dead! ;)

--
"All you have to do is be fragile and grateful. So stay the underdog." Chuck Palahniuk, Choke
Re:Google is Dead anyway by Jugalator · 2005-09-06 05:40 · Score: 5, Funny

"I'm going to bury that guy, I have done it before, and I will do it again. I'm going to kill Google."

This should clearly tell you that Google is already undead, and keep rising again. He has already killed them before. Don't worry!

--
Beware: In C++, your friends can see your privates!
Re:Google is Dead anyway by WilliamSChips · 2005-09-06 08:54 · Score: 1

Sigh... Khrushchev didn't say "We will bury you". He said something more like "We will hold your funeral".
Wikipedia

--
Please, for the good of Humanity, vote Obama.
Re:Google is Dead anyway by c0n0 · 2005-09-06 08:58 · Score: 2, Funny

Remember he does things the M$ way...the kill() routine has a bug, that's why google is still alive.
Re:Google is Dead anyway by Anonymous Coward · 2005-09-07 03:12 · Score: 0

Khrutschev actually said "We will show you Kuzma's mother", but I doubt non-Russian speakers will understand that.

Now, Ballmer really did bury Eric Schmidt two times. Eric used to work for Sun Mictosystems. Sun is pwned. Eric worked for Novell and, to be honest, performed not so bad. But left Novell pwned anyway.

Whether you like it or not Ballmer is helluva competitor. I personally like his attitude.

AnandTech not very search optimization saavy by DeadSea · 2005-09-06 03:12 · Score: 5, Informative

The Mini considers any unique URL string to be a unique document, which makes sense (but is a bit surprising the first time that you run an index). After four hours of indexing, the Mini had managed to reach its document limit and we had to improvise.

Anybody who doesn't know that search engines consider each url to contain a unique document does't know much about getting their site to be properly represented in search engines.

Their solution was to create a list of urls for the appliance to crawl. If they had to do that for the search appliance, there is no way that googlebot, msnbot, or yahoo slurp is going to be able to properly index their site.

Your public accessable urls need to managed and canonicalized through judicious use of robots.txt, 302 redirects, site wide linking, and just plain thinking out the layout of your site.

Re:AnandTech not very search optimization saavy by Moby+Cock · 2005-09-06 03:15 · Score: 2, Funny

All of your points are valid. But you need to include countless digital photos to make sure that people think you know what it is you are talking about. Just like Anandtech.
Re:AnandTech not very search optimization saavy by Anonymous Coward · 2005-09-06 03:31 · Score: 0

you mean like a sitemap?
Re:AnandTech not very search optimization saavy by Anonymous Coward · 2005-09-06 03:59 · Score: 0

The entire POINT of a search appliance like this is that you shouldn't need to be search-optimization-saavy to get good results from it.
Re:AnandTech not very search optimization saavy by Anonymous Coward · 2005-09-06 04:28 · Score: 3, Informative

... which flows right into this statement:
A word to the wise: don't let the Mini crawl your entire site without keeping a close eye on it.

The same could be said of any search engine, or any automated process for that matter. We use ht://Dig and the issues are the same, except ht://Dig can be run locally on the server, saving bandwidth (and speeding up the indexing process) by indexing locally and re-writing urls for static files, through apache for dynamic, it's free, and you aren't limited to 100000 documents. It supports the same feature set, minus the Google Gui.

Of course, it does have a steeper learning curve... you actually need to understand how search, url filters, regex, synonyms, etc works.

I'd provide screenies, but most people glaze over when confronted with terminal output ; ) A shell just isn't as hip as an html gui. What else can I say?

L8,
AC
Re:AnandTech not very search optimization saavy by K-Man · 2005-09-06 06:00 · Score: 1

The real Google has duplicate detection to handle situations like these. Their crawl ends up with something like 30% duplicates from different sources, things like the same online manual repeated on dozens of different sites, mirrors, multiple servers, etc. They use various approximate-matching algorithms to find near-duplicates and merge them, so that the search results don't show the same document with a hundred different urls.

Unfortunately feature holes like this are why the thing hasn't taken off. If I have to submit a list of urls to avoid duplication, I might as well index the stuff myself.

--
---- "If we have to go on with these damned quantum jumps, then I'm sorry that I ever got involved" - Erwin Schrodinger
Re:AnandTech not very search optimization saavy by rsmith-mac · 2005-09-06 06:09 · Score: 2, Interesting

Keep in mind, AnandTech's previous search systems were all on the DB end, so it only counted each article once. Google Mini on the other hand counts the normal view of an article, the print view, etc. It is a very important consideration if you're moving from DB-based searching.
Re:AnandTech not very search optimization saavy by randomblast · 2005-09-06 07:50 · Score: 1

> A shell just isn't as hip as an html gui.

Get out. Wannabe!

--
...these aren't my real teeth.

Was this a review? by defkkon · 2005-09-06 03:12 · Score: 1, Informative

Was this a hardware review, or was this an instruction manual?

I gotta say, I was looking for benchmarks, usability scores, maybe some test scenarios. Even better, compare this to other products available out there.

It looked promising at the start, but when you get to the last page it leaves you wondering if they forgot the hyperlinks for the rest of the article!!

Re:Was this a review? by Chaotic+Spyder · 2005-09-06 03:17 · Score: 0, Redundant

But they took so many pretty pictures.

--
Losers whine about their best, Winners go home to fuck the prom queen
Re:Was this a review? by Knight+Thrasher · 2005-09-06 03:19 · Score: 0, Redundant

I agree, I thought I'd get a little opinion on the device itself, and ended up seeing the only opinion given was that gee, not many people use PIII processors anymore.
That aside, it was a neat looking 1U case though.
Re:Was this a review? by LiquidCoooled · 2005-09-06 03:22 · Score: 2, Funny

The Microsoft search box comes with inbuilt Balmer power conduit!
This revolutionary interface will fire off your search responses as accurately as a plastic chair bouncing around the room.

--
liqbase :: faster than paper
Re:Was this a review? by HotNeedleOfInquiry · 2005-09-06 04:39 · Score: 1

The only question you have to ask yourself is "will this work to index Taco's porn collection?"

--
"Eve of Destruction", it's not just for old hippies anymore...

subcontractors by hey · 2005-09-06 03:17 · Score: 1, Funny

So Google subcontracted a company called GigaByte to make this box.
I was disappointed to see GigaByte didn't use MegaByte to make some subcomponent.

Re:subcontractors by schon · 2005-09-06 03:47 · Score: 2, Funny

I was disappointed to see GigaByte didn't use MegaByte to make some subcomponent.

Maybe he was too busy trying to take over Mainframe? :o)
Re:subcontractors by fimbulvetr · 2005-09-06 03:50 · Score: 0

And Byte to pay for the prostitutes.
Re:subcontractors by SCO+STINKS · 2005-09-06 04:17 · Score: 0

The problem is GigaByte's parent company TeraByte has been widely critisized for its use of MegaByte.

--
Reason #32767 not to use VB6: Integers are 2 bytes... Think about it!
Re:subcontractors by DA-MAN · 2005-09-06 09:12 · Score: 1

Maybe he was too busy trying to take over Mainframe? :o)

Wow! That's a pretty obscure Reboot reference. I had totally forgotten about that show . . .

--
Can I get an eye poke?
Dog House Forum

Oh come on by Black+Perl · 2005-09-06 03:19 · Score: 4, Funny

First, it wasn't a review. They didn't review anything.

Second, it was a Google Mini.

Third, they didn't "put it through its paces" at all.

Lousy article, misleading /. blurb. But it was about Google! Gooooooooogle!

--
bp

Re:Oh come on by Anonymous Coward · 2005-09-06 03:46 · Score: 1, Funny

Okay Steve, Steve, Steve, you can put the chair down now.

Good, but... by hazzey · 2005-09-06 03:21 · Score: 5, Interesting

While this is an interesting article, it really isn't much of a review of the Google Mini. All they do is take it apart, take pictures, and tell you that they set it up after a little bit of trouble. There is nothing about how well it actually works. No benchmarks. No comparisons. They just say that it worked well and leave it at that. Anandtech has had more indepth reviews of mice before.

It is more information that I have seen anywhere else though.

Re:Good, but... by Anonymous Coward · 2005-09-06 04:36 · Score: 0

well put. anand's site is a piece of shit. i cant believe he's gotten rich from doing absolutely nothing
Re:Good, but... by Donny+Smith · 2005-09-06 05:16 · Score: 2, Interesting

I was surprised that they've done what they did.
The terms & conditions probably forbid reverse engineering and/or disassembly of the appliance.

It would have been veeerrry easy to rip out the HDD and mount it on a Linux box to check out its internals....
They must have thought of that. As they've already ruined the warranty (by opening the box), it was probably the EULA or something like that that made them stop short of reviewing contents of the hard disks.
Re:Good, but... by NickCatal · 2005-09-06 07:02 · Score: 1

I'm personally suprised we don't have a leaked image of a Google Mini or Search Appliance HDD somewhere in the bittorrent world...

--
-nick
Re:Good, but... by Anonymous Coward · 2005-09-06 08:02 · Score: 0

Your wish is my command...

Free Google T-Shirt by nudeatom · 2005-09-06 03:21 · Score: 5, Funny

Thats it, I gotta get me one of those just for the tee.

--
Yeah right, Like Im gonna write a sig.

It's "its"! by dtmos · 2005-09-06 03:21 · Score: 5, Informative

The guys from anandtech put it through it's paces

It's really easy: It's "his", hers", and "its". Even a flower knows!

--cycling through grammar Nazi mode. Please wait.

Re:It's "its"! by dtmos · 2005-09-06 03:32 · Score: 2, Funny

Like a perfect vacuum, I believe nature abhors a grammar Nazi post without a grammatical error.

Make that "It's "his", her", and "its".

*sigh*

--completed grammar Nazi mode. Resuming normal operation.
Re:It's "its"! by Traa · 2005-09-06 03:36 · Score: 3, Informative

Use "it's" when you can replace it with "it is"

Well, that is what someone told me anyway. English is not my primary language, if the above is not correct then please don't shoot me.
Re:It's "its"! by aug24 · 2005-09-06 03:47 · Score: 1

Sorry mate, Grammer Nazi errors are recursive... when you opened the double quote to quotate the phrase containing your own grandparent post error, you didn't close it!

Should've used single quotes there in the first place, and confused everyone cos on computers they're drawn the same as apostrophes ;-)

J.

--
You're only jealous cos the little penguins are talking to me.
Re:It's "its"! by Anonymous Coward · 2005-09-06 04:27 · Score: 0

"It's" can also be used as a replacement for "it has" - but the rule you state above is already a lot closer to correct than what most Slashdot editors seem to use.
Re:It's "its"! by Anonymous Coward · 2005-09-06 05:29 · Score: 0

Could be "it has" as well.
Re:It's "its"! by radishes · 2005-09-06 05:29 · Score: 4, Informative

and use its' when it's possesive
john's coming to get johns' hat
Don't listen to this guy. He has lied to you twice. 1) Its' is never valid. 2) The example with John is just so wrong it hurts. "John is coming to get John's hat." You use 's for possessive; s' is for possessive plural, like this: "Slashdotters tend to live in their parents' basement."

--
[ Reply to This | Parent ]
Re:It's "its"! by Anonymous Coward · 2005-09-06 05:38 · Score: 0

Use "it's" when you can replace it with "it is"

Well, that is what someone told me anyway.

I'm a native speaker. Is this true? Indeed, it's.
Re:It's "its"! by sootman · 2005-09-06 06:58 · Score: 1

Yeah, it's really easy. Use an apostrophe when you want to show possession: Bob's, Sam's, It's.

Oh, wait, I followed the wrong rule in my "proof."

At least the whole "I before E" thing has a little rhyme that usually works.

--
Dear Slashdot: next time you want to mess with the site, add a rich-text editor for comments.
Re:It's "its"! by Dolda2000 · 2005-09-06 09:06 · Score: 1

That's not just correct, it is correct by definition. "It's" isn't a word in itself or anything, it's just short for "it is". The apostrophe is used to signify the removal of the space and the second "i". Nothing more, nothing less.
Re:It's "its"! by sparkz · 2005-09-06 11:01 · Score: 1

I suppose that's a valid usage by the literal definition, but it's not in common usage, certainly.
In your example, you want to emphasise that it is, as opposed to isn't.
Also, "Indeeed, it is" isn't a proper sentence; there's no verb.
How about "It's true that you can replace 'it is' with 'it's'"?

--
Author, Shell Scripting : Expert Re
Re:It's "its"! by WillerZ · 2005-09-06 11:41 · Score: 1

Surely "Slashdotters tend to live in their parents' basements."

Unless there's just the one big basement?

--
I guess today is a passable day to die.
Re:It's "its"! by bhiestand · 2005-09-06 22:00 · Score: 1

Its' is never valid?!

What about my cousin, John Its, and his wife, Ashley Its. Surely their house could be referred to as "The Its'".

Or should I say Its's?

Now what if I told you that, in a way its is just short for itis? I mean, that's why we use the apostrophe in possessive, because a letter was omited. "John's Olde Shope" used to be written as "Johnis Olde Shope". Then you bloody "modern" people came around and decided to just up and change everything, and the possessive "is" went the way of the brave frenchman. I guess what I was trying to get at is that the apostrophe isn't to denote posession, but rather to denote a missing letter. It still is, really, you just stopped using the long form. Hence

Correctly placed apostrophes:
"John's hat" (Johnis)
"It's (It is) going to rain today"
"The Its' place (referring to Mr. and Mrs. Its, short for Its's)"

Terribly wrongly incorrectly placed apostrophes:
"Put this back in it's place" (its! no missing letter)
"CD's for sale" (no missing letter. not needed)
"'" (Never put an apostrophe in quotes like that, it's hard to read in most fonts. It's better to be grammatically incorrect, damnit)

No apostrophe needed:
"Put this back in its place"
"CDs for sale"
"My cousins are coming over for dinner"

I'm out of ideas to end this post, so I will be ending it briefly and leave you hanging.

--
SWM seeks new sig for a brief fling

where's the raid? by Darth_Burrito · 2005-09-06 03:22 · Score: 5, Interesting

Did it strike anyone else as insane that this thing only had one hard drive? For $3,000, where's the raid array? Ok, sure it's a search appliance and doesn't really hold any mission critical data, but if the hard drive crashes, how long is your search functionality going to be down? You'll need to get a replacement drive and rebuild your whole database (a slow crawl process). What about your configuration settings?

Re:where's the raid? by horati0 · 2005-09-06 03:29 · Score: 5, Funny

Did it strike anyone else as insane that this thing only had one hard drive? For $3,000, where's the raid array?

Here.

--
The neutrality of this sig is disputed.
Re:where's the raid? by slim · 2005-09-06 03:32 · Score: 5, Informative

I guess if you want RAID, you pay more than $3,000.

What you're really buying here is closed-source software, wrapped in the hardware that turns it into an "appliance". Assume $2,000 of that $3,000 pays for the software.

By specifying the hardware in this way, and by keeping the BIOS and root passwords to themselves, Google greatly simplify their support role.

This is common practice: an IBM HMC (Hardware Management Console) is a 1U PC with a custom Linux distribution and the management software preinstalled. You don't get the root password; you just use the software as delivered.
Re:where's the raid? by fimbulvetr · 2005-09-06 03:54 · Score: 1

This is common practice: an IBM HMC (Hardware Management Console) is a 1U PC with a custom Linux distribution and the management software preinstalled. You don't get the root password; you just use the software as delivered.

Just an fyi: There's not much that's interesting underneath, I've looked. Though if you're still using the DVDRAM to make backups, you can put your own DVDRW in - they work much better and you don't have to purchase the cartridges:)
Re:where's the raid? by dan+the+person · 2005-09-06 04:37 · Score: 1

And what if the power supply fuzzes out?

And what if a ram chip goes faulty?

What if a capacitor on the motherboard starts leaking?

Just get two of the damn things, place them in seperate data centers, and round robin them if search is a critical feature.
Re:where's the raid? by chrisd · 2005-09-06 05:34 · Score: 1

The mini doesn't have raid, you have to buy one of the higher end models for that.
Chris

--
Co-Editor, Open Sources
Open Source Program Manager, Google, Inc.
Re:where's the raid? by Darth_Burrito · 2005-09-06 08:25 · Score: 1

Insane was too strong of a word. Our university just licensed a couple of the higher end models and from what I understand, people love them.
Re:where's the raid? by Darth_Burrito · 2005-09-06 08:35 · Score: 1

That's a good point, but consider that with the things you describe above (psu, mem, mobo), the problem can be solved by replacing the part. With a dead hard drive, all of the data needs to be recrawled and all of the settings need to be restored. It could be a more painful process, but like you said, if you're really worried about it, you could buy two.
Re:where's the raid? by Adam+Wiggins · 2005-09-06 10:04 · Score: 1

> I guess if you want RAID, you pay more than $3,000.

Now that's just plain silly. A basic x86 1U server runs around $1100 with two hard drives configured in software RAID1, which works wonderfully other than not allowing hotswap and preventing boot if the first drive is the one that fails. For another $150 or so you can add a hardware RAID card to fix both of those things and get slightly better performance.

There is absolutely NO excuse for not running a raid on any modern server. Drives are the most likely component to fail, the most critical if they do (even if you replace it, you've still lost all your data), and are cheap as dirt. Any sysadmin still using single drive configs should have their head examined.
Re:where's the raid? by Knetzar · 2005-09-07 03:52 · Score: 1

That's a good point, but consider that with the things you describe above (psu, mem, mobo), the problem can be solved by replacing the part.

I think the point is that Google doesn't want you replacing parts yourself. If you can deal with sending the device back to Google for servicing, then you can deal with reindexing.

Try searching the site for "google mini" by openSoar · 2005-09-06 03:24 · Score: 4, Funny

Maybe it takes a while for the documents to be indexed but you'd think they would have added it manually given the nature of the article.

Review? & capacity by Red+Flayer · 2005-09-06 03:25 · Score: 1

From the Summary: "a reasonably indepth review of the Google search appliance."

If, by "resonably indepth review", you mean lots of pretty pictures and a narrative about opening the box and the case, then sure.

Rather than calling this a review, perhaps it could be re-titled "One man's demonstration of the Google search appliance."

That said, I'm a little concerned about how many URLs it can handle... 100,000? According to TFA, 40,000 documents overloaded this thing.

The article did not address how this could be overcome, except by eliminating some of the URLs from the crawl. How scalable is it?

--
"Trolls they were, but filled with the evil will of their master: a fell race..." -- J.R.R. Tolkien on Olog-hai

Re:Review? & capacity by slim · 2005-09-06 03:35 · Score: 1

That said, I'm a little concerned about how many URLs it can handle... 100,000? According to TFA, 40,000 documents overloaded this thing.

My reading of TFA was that the Mini was encumbered with an arbitrary limit of 40,000 documents.

That is, if you want to index >40,000, Google wants more money from you. It's purely to do with software licensing.
Re:Review? & capacity by Anonymous+Custard · 2005-09-06 03:42 · Score: 1

Anandtech said "The mini allows for 100,000 documents/URLs to be stored in a collection, and AnandTech contains approximately 40,000 articles, news and blog entries."

But if each article is 3 pages long on average, that's 120,000 documents/url's right there.

--
$8.95/mo web hosting
Re:Review? & capacity by slavemowgli · 2005-09-06 03:46 · Score: 2, Interesting

RTFA (and actually read it). The Google Mini has a built-in limit of 100,000 documents; it's not that it can't index more because of a lack of CPU power or HD space or whatever, it's just that if you want (or need) more than that, Google wants you to buy their regular Search Appliances instead.

All this info can also be gotten from http://www.google.com/enterprise/, which is exactly 1 (one) click away from Google's index page.

--
quidquid latine dictum sit altum videtur.
Re:Review? & capacity by Manitcor · 2005-09-06 04:22 · Score: 2, Insightful

If your not careful when setting up your crawlers many search engines will index every link they find in a document. Including the headers and footers on the page that point to About, Legal, Copyright, Sponsors and Links.

Depending on how you have configured things it may also go ahead and read your banner ads and such as well. If you havent expliclty told your crawler to stay within someurl.com then it will go ahead and index the links that go to outside sites as well.

The solution that was presented in the article is a very common one when you want to simply index a subset of site content. Another common method for crawl systems that support scripting (like Plumtree's Ripfire or Verity) is to parse out the various urls you are looking for explicity as well as handle for things like pagination.

The former is perffered as it can easily be adapted to work with other search engines without re-writing custom scripts. I would not be surprised if anandtech now detects when GoogleBot is crawling thier site and presents GoogleBot as well as other search bots with the same page that thier applicance sees.

--
"Don't mess with him, he taunts the happy fun ball."
Re:Review? & capacity by cmallinson · 2005-09-06 04:45 · Score: 1

My reading of TFA was that the Mini was encumbered with an arbitrary limit of 40,000 documents.
The appliance can index 100,000 at the lowest licencing level. Even if you only have 40,000 documents, you need to keep an eye on the crawler, and make some changes if it starts counting pages twice (printable/alternate versions, or multiple pages of single documents perhaps).
Re:Review? & capacity by kmarius · 2005-09-06 06:06 · Score: 1

They probably did some tricks and displayed the article as one page for indexing.

But I miss a comparison with other products. Both search appliances like Thunderstone and Fast and compared to full text search software.
Re:Review? & capacity by Anonymous Coward · 2005-09-06 21:35 · Score: 0

Now you've got me thinking.....

'©2005 Google - Searching 8,168,684,336 web pages'

8,168,684,336 / 100,000 = 81,686.843

81,686.843 x $2,999 = $244,978,842.157

So for less than 250 million dollars I could get Google to set me up my very own private Google? Cool. Now I've just got to find a colo facility to put the 81,687 servers. Anyone know what 1RU costs in a half decent colo these days?
Re:Review? & capacity by Anonymous Coward · 2005-09-07 18:30 · Score: 0

In a real google cluster, it only takes 2000 servers to deal with 8 billion pages.

Google ate my server by PIPBoy3000 · 2005-09-06 03:28 · Score: 5, Interesting

A few months ago, we asked for a demo of the product. My main involvement was to help compare with our existing search strategy. Just to cut to the chase, we generally had a very positive experience with it. Searches would bring up what we wanted more often than not. Our existing search system, which was based around IIS and custom SQL code, was pretty good, though it couldn't beat Google for pulling up relevant pages. We did have a few quirky things happen, though.

We had a couple times when the appliance locked up and had to be rebooted. That was probably the most distressing as it had to be on 24x7 to support our organization and I wasn't looking forward to the help desk calls.

More amusing, though, was the way it crawled content. Google works like any other crawler - it goes around and clicks hyperlinks. Unfortunately it's not too bright, not paying attention to the text of the hyperlink, like if it said "delete" or something like that.

Unfortunately I had a poorly secured application that Google was able to sneak into via another link I wasn't aware of. It held the custom links for each of our departments to display a personalized set of links on the home page. Unfortunately it went through the admin tool and clicked every delete link it could find. I was paged the next morning and was fairly unhappy. My fault, though.

The irony is that the budget money evaporated and we aren't getting it after all.

Re:Google ate my server by Anonymous Coward · 2005-09-06 03:35 · Score: 2, Insightful

Unfortunately I had a poorly secured application that Google was able to sneak into via another link I wasn't aware of. It held the custom links for each of our departments to display a personalized set of links on the home page. Unfortunately it went through the admin tool and clicked every delete link it could find.

Sounds like it wasn't much of an admin tool if it required no authorization...any employee could have done what Google did, just not as quickly.
Re:Google ate my server by iluvcapra · 2005-09-06 03:55 · Score: 2, Insightful

Don't ridicule his misery, AC, unless you're willing to post your name. Someday, once you graduate from high school, you will encounter this situation and you'll wish you weren't so critical.

--
Don't blame me, I voted for Baltar.
Re:Google ate my server by Moridineas · 2005-09-06 03:57 · Score: 1

Sigh, the exact same thing happened to me, except it was a non-google search engine (I forget which) that explicitly disobeyed robots.txt. Ditto as to my fault. Still annoying.

Thank god for backups..
Re:Google ate my server by BetterThanCaesar · 2005-09-06 04:05 · Score: 1

The HTTP spec says that a GET should not perform anything, i.e. not change data. This is why "delete" hyperlinks should at least have an "are you sure" page with a posting form before actually deleting anything. Just a hint for your next project!

--
"Stop failing the Turing test!" -- Dilbert
Re:Google ate my server by arkanes · 2005-09-06 05:05 · Score: 1

I offer my condolences but it's still his own damn fault. His application was written poorly, for a number of reasons.
(I'm not the AC who first posted)
Re:Google ate my server by PIPBoy3000 · 2005-09-06 05:42 · Score: 1

Yep. It was clearly "my fault" in this particular case. It was one of those applications secured by NT groups. Unfortunately we had some issues where security got screwed up by an overzealous administrator and this one didn't get fixed. I ended up changing the security model after the fact, switching to my typical database authorization method.
Re:Google ate my server by Anonymous Coward · 2005-09-06 07:36 · Score: 0

Uhh... he admitted it was his fault. Try reading, please.
Re:Google ate my server by ePhil_One · 2005-09-06 13:36 · Score: 1

it was a non-google search engine (I forget which) that explicitly disobeyed robots.txt
Robots.txt has the protective power of a big red Don't Push button on a public street. Heck, I keep an eye on anyone that comes to my datacenter, in case their eyes start to fixate on the EPO button...

--
You are in a maze of twisted little posts, all alike.

interesting review. by Suppafly · 2005-09-06 03:30 · Score: 1

This was an interesting review if you had never seen what a google appliance looks like, but it wasn't very in-depth at all.

I was certainly looking forward to some overclocking and linux installing. I mean, I'm sure they voided whatever agreement they had with google just by opening the case up, so why no go all out and give us the review we really want to read.

I didn't even realize the review was over until I realized there was no "next" button on that last page.

Re:interesting review. by herrison · 2005-09-06 03:33 · Score: 1

Add the above and get it indexing an ipod and it'd be the ultimate fanboy story...

--
You know what I miss? Leeches.

Hmm, they didn't find a sandbox? by DhinkTifferent · 2005-09-06 03:30 · Score: 1

The Google Sandbox

Who cares about the hardware, let's see the algo ;)

Re:Hmm, they didn't find a sandbox? by Anonymous Coward · 2005-09-06 05:55 · Score: 0

word...

it's by troon · 2005-09-06 03:31 · Score: 0, Redundant

Maybe CmdrTaco could use it to search for tips on apostrophe usage.

--
Ydco co ,df C erb-y go. a Ekrpat t.fxrapev

Re:it's by Saeed+al-Sahaf · 2005-09-06 05:18 · Score: 1

Better yet, maybe Slashdot could use it to reduce dupes!

--
"Who are in control, they are not in control of anything - they don't even control themselves!" - Glen Beck

Sweet by DroopyStonx · 2005-09-06 03:32 · Score: 1

Just a matter of time before it's reverse engineered :)

--
We have secretly replaced these Slashdot mods' sense of humor with a rusty nail. Let's see if they notice!!

GPl compliance by Anonymous Coward · 2005-09-06 03:34 · Score: 0

I heard that this google mini is using a modified Version of a linux distribution. Is the source code given by google somewhere?

Re:GPl compliance by Anonymous Coward · 2005-09-06 03:42 · Score: 3, Informative

http://code.google.com/mirror/gsa.html

Save $3000 with site:anandtech.com by Anonymous Coward · 2005-09-06 03:34 · Score: 2, Informative

I can search the 63,000 online documents with http://www.google.com/search?q=site:www.anandtech. com

Re:Save $3000 with site:anandtech.com by diegocgteleline.es · 2005-09-06 04:01 · Score: 1

Except that google takes a long time to reindex recent changes and you can't "personalize" your search ie: for a given section of the web
Re:Save $3000 with site:anandtech.com by K-Man · 2005-09-06 06:26 · Score: 1

Search isn't about search, it's about advertising.

The ads on the Google page are put there by Google, not Anandtech. I assume Anand will have search-targeted ads on their own results page soon enough.

--
---- "If we have to go on with these damned quantum jumps, then I'm sorry that I ever got involved" - Erwin Schrodinger

does anybody know the best way by Anonymous Coward · 2005-09-06 03:35 · Score: 0

to manage/limit file access? we just got one to index our companies docs. their (the files) access is managed by permissions. i've googled the web and not found a clear "how to" doc that helps the problem of IUSRs (yes i'm using MS IIS...:(... ) permissions opening the door for anybody who clicks a link to a doc.

OS by Anonymous Coward · 2005-09-06 03:36 · Score: 1, Insightful

So what os does this thing run and why is it not mentioned anywhere?

Too bad by sdirrim · 2005-09-06 03:41 · Score: 1

We need people to use the google toolbar, because that is one more bite out of Microsoft. Although Google works best with Microsoft, the more accessable and usable it is, the better equipped Google will be to do battle with Microsoft. It will finally be a relatively balanced (well more balanced than others) battle between Microsoft and Linux.
Read the article in PC Magazine (I think) "Why Google scares Gates".

--
Not only "land of the free" but "land of the lawyers" who love a good old 1st amendment smackdown. Shihar 153932

Re:Too bad by Anonymous Coward · 2005-09-06 05:01 · Score: 0

I found the article online here.

http://www.fortune.com/fortune/technology/articles /0,15114,1050065-1,00.html

Thanks for the lead... Here are some interesting snips...

-----
Gates says that when Microsoft is done integrating search into future versions of Windows and Office, the world will look back at the way we are now "Googling" for stuff on the Internet and laugh. "The idea that you type in these words [in the search box] that aren't sentences and you don't get any answers--you just get back all these things you have to click on--that is so antiquated," he says, later adding, "We need to take search way beyond how people think of it today and just have it be naturally available, based on the task they want to do." For example, if you wanted to look up a factoid while you were writing a document, you might search for it without ever leaving Word.
-----
In spring 2003, Payne pitched Gates on buying Overture, a move that would have given Microsoft search engine technology out of AltaVista as well as an advertising business that was generating huge profits. But Gates shot the plan down, convinced that Microsoft could do a better job for less money on its own. Instead, Yahoo bought Overture, a move that, together with its earlier purchase of Inktomi, enabled it to catapult itself successfully into the search game in a year.
-----
In fall 2003, Microsoft briefly considered buying Google, only to realize that even if Brin, Page, and their board could have been persuaded to sell--which seemed unlikely--Microsoft would have been left to explain to the world why it was now running a search engine built entirely on Linux instead of Windows.
-----
Privately, Google's executives understand exactly the impact they are having on Gates and his team. They project a carefree image in part because it makes business sense. One blunder by Netscape was that it let Andreessen tell the world how he intended to put Microsoft out of business. Count on Google not to repeat that mistake.
-----

That last part seems to be only a theory that the reporter came up with, a pretty valid one, but just not one with any factual support.

Anyway a very worthwhile article.
Re:Too bad by Anonymous Coward · 2005-09-06 06:17 · Score: 0

What about Yahoo? It's not good to focus on only one competitor. Yahoo has a much broader range of products and is in a much better position to compete with Microsoft.

Just because Google has a high geek-factor (Yahoo is not far behind Google in search statistics), doesn't mean it's the best choice for everyone.

very google-like by teodz · 2005-09-06 03:41 · Score: 1

go to http://search.anandtech.com/ and do some googling. a bug?? http://search.anandtech.com/search?q=hardware To access the search results, you must issue a GET request to the Google Search Appliance via a search box. You can do this by copying and pasting the following HTML code into a Web page. Enter your server name and your collection name where indicated in the code.  <form method="get" action="http://enteryourservernamehere/search"> <table> <tr> <td> <input type="text" name="q" size="25" maxlength="255" value=""/> <input type="submit" name="btnG" value="Google Search"/> <input type="hidden" name="site" value="ENTER_COLLECTION_NAME"/> <input type="hidden" name="client" value="ENTER_COLLECTION_NAME"/> <input type="hidden" name="proxystylesheet" value="ENTER_COLLECTION_NAME"/> <input type="hidden" name="output" value="xml_no_dtd"/> </td> </tr> </table> </form>

From TFA by Anonymous Coward · 2005-09-06 03:41 · Score: 5, Funny

The screw is threaded - it just can't be undone with a regular screwdriver.

Right.. Only unthreaded screws can be opened by a regular screwdriver.

Re:From TFA by K-Man · 2005-09-06 06:47 · Score: 1

At least they tested the functionality...right?

--
---- "If we have to go on with these damned quantum jumps, then I'm sorry that I ever got involved" - Erwin Schrodinger

Where are the pigeons? by TeXMaster · 2005-09-06 03:42 · Score: 2, Funny

I thought Google used pigeons ...

--
"I'm never quite so stupid as when I'm being smart" (Linus van Pelt)

Re:Where are the pigeons? by Alias00 · 2005-09-06 04:30 · Score: 2, Funny

"I thought Google used pigeons ..." They do! Why do you think they don't want people taking the covers off the servers? Plus, it does say in the manual that you're supposed to push seeds through the cooling vents every day.
Re:Where are the pigeons? by macshome · 2005-09-06 05:12 · Score: 2, Informative

Why did this get marked troll?

According to Google, they do use pigeons.
Re:Where are the pigeons? by TeXMaster · 2005-09-06 05:34 · Score: 1

While I don't consider my post wildly funny nor something to laugh your guts out (ok, I was hoping to raise a smirk at least), I do see, while metamoderating, that there are some funny posts which do get marked as Troll. Guess it's only fair I get my share of this.
Obvious question is, is it a trend of modern times, some humourless (pardon the British spelling) dork, or a troll?

--
"I'm never quite so stupid as when I'm being smart" (Linus van Pelt)

RTFA more closely by jbellis · 2005-09-06 03:43 · Score: 1

http://www.anandtech.com/IT/showdoc.aspx?i=2523&p= 4

The mini allows for 100,000 documents/URLs to be stored in a collection, and AnandTech contains approximately 40,000 articles, news and blog entries.
When we first set up the Mini, we told it to start in each of the website's sections (for example, http://www.anandtech.com/it/) and in the web news area. The Mini considers any unique URL string to be a unique document, which makes sense (but is a bit surprising the first time that you run an index).
After four hours of indexing, the Mini had managed to reach its document limit and we had to improvise... A word to the wise: don't let the Mini crawl your entire site without keeping a close eye on it.

In other words, spidering the entire site led to the Mini wasting space on stuff other than the ~40k articles they really wanted indexed and running into its 100k limit.

For those who're interested... by Homicide · 2005-09-06 03:44 · Score: 5, Informative

I admin a full blown Google Search Appliance, the mimi's big brother.

If you want the specs:
Dual Xeon 2.6GHz
12GB RAM
4 250GB HD's in RAID(something) with a hot-swap spare.

Never tried taking off the cover though, since we want to keep the warranty.

All of the money you pay is a license for the software on the box, the system itself is effectively free, so once the 2 year warranty expires, you've effectively got a nice powerful linux box for free. You can keep running the software, but without any support.

As for performance, this thing works great, we have about 250,000 pages that it can index, both public and private (and it can do searches cleverly checknig username/pasword to see if you should have access to certain results), and we've had nothing but positive responses from our users. The results come up quickly, they're the results people want, and the results that management think should be at the top, are at the top.

Re:For those who're interested... by linzeal · 2005-09-06 03:56 · Score: 1

What does it use for user authentication AD or some other LDAP implentation?

--
An Education is the Font of All Liberty
Re:For those who're interested... by Homicide · 2005-09-06 03:59 · Score: 4, Informative

It submits a HTTP HEAD request for the URL to the server the page is on, with the username and password supplied, so the server at the other end decides if you should be able to see the search results, thus saving you from having to faff around telling the google box who can get to what pages.
Re:For those who're interested... by picklepuss · 2005-09-06 03:59 · Score: 1

Thanks. Your single slashdot thread was more insightful than TFA.
Re:For those who're interested... by linzeal · 2005-09-06 04:23 · Score: 1

So long as the username and password are entered through an encrypted portal there should be no problem. If not may the sniffers not find a juicy admin password that gives your college students control of your entire school's computing structure down to the smallest laptop. Lol.

--
An Education is the Font of All Liberty

After BIOS and before web-interface? by Anakron · 2005-09-06 03:44 · Score: 2, Interesting

What happens after the BIOS screen and before you "log in" to the web interface? Surely it runs some sort of operating system?

--
There are 11 types of people. Those who understand binary, those who don't and those who are sick of this lame joke.

Re:After BIOS and before web-interface? by Homicide · 2005-09-06 03:54 · Score: 3, Interesting

If it's the same as its big brother, then it boots up into RedHat Linux. You can watch all the usual bootup things happening, just not interfere with them, as the keyboard is ignored.

It does end up at a login prompt, but you're not given any usernames or passwords to access it.

Daily Dose of Google News^WAds by Gothmolly · 2005-09-06 03:47 · Score: 1

Boy, here it is almost noon EDT, and nothing about Google yet! I was getting worried. Should we start a pool now, betting on which Internet trend the Slashdot fanboys will pick? Apple is now passe, I think. Google will fade soon. Tivo is WAY passe, at this point.

--
I want to delete my account but Slashdot doesn't allow it.

Re:RTFC more closely by Red+Flayer · 2005-09-06 03:53 · Score: 1

I am aware of what TFA said. My point is this: 100k URLs is not a lot; I was merely pointing out that 40k docs can be > 100k URLs, and this means that capacity become an issue very quickly.

I guess TFA being from the you-know-for-the-kids-dept explains it pretty well.

--
"Trolls they were, but filled with the evil will of their master: a fell race..." -- J.R.R. Tolkien on Olog-hai

product review: the yellow GSA by msblack · 2005-09-06 03:53 · Score: 3, Informative

We evaluated on of those yellow Google search appliances (GSA) and experienced very mixed results. The appliance is very easy to set-up and launch an initial scan of our website.

The GSA will blindly search all web servers in your domain. When setting-up the GSA, you give it an initial page from which to start crawling and baseline domains. For example:

Inital page: http://www.slashdot.org/
Domain(s): .slashdot.org,slashdot.org

The leading dot on the first domain entry says to search all hosts in the domain.

Problem: GSA does not provide very good status of where or what it is searching. It only has a dashboard light to say it is crawling. No details.

Problem: We found that the GSA would get caught in an endless loop if it encountered a user website controlled by a database. It would endlessly follow the next and previous links to find every database entry.

Our university library subscribes to a number of electronic databases, such as, EBSCO PsychINFO, etc. The GSA indexed every possible look-up.

Our eval licenses was limited to 1.5 million pages. Some of these databases contain hundreds of thousands of pages. Solution: Those setting up their own web server must employ proper robots.txt files or risk having their entire server blocked from indexing.

--
signature pending slashdot approval

Re:product review: the yellow GSA by jcuervo · 2005-09-06 04:24 · Score: 1

Problem: GSA does not provide very good status of where or what it is searching. It only has a dashboard light to say it is crawling. No details.
*shrug* Hook it up to a Squid proxy.
Problem: We found that the GSA would get caught in an endless loop if it encountered a user website controlled by a database. It would endlessly follow the next and previous links to find every database entry.
I have a bash.org-style quotes page (originally written in Perl, ported to PHP for a Wordpress plugin). One of the sort options is "random"; it embeds a random seed into the link, so you can still use prev/next page links, and so you can click random again and get another page of randomness.

Google (and other search engines) got hung up on this for a while -- not to mention the karma +/- links. I ended up keeping track of who requests robots.txt (by making it a CGI), and just leaving out those links for those hosts.

--
Assume I was drunk when I posted this.
Re:product review: the yellow GSA by Anonymous Coward · 2005-09-06 08:21 · Score: 0

ok i know it is a low shot but at my old high school we had a GSA, well it was a club that stood for Gay Straight Aliance. I always laughed when I moved and my new high school had a Govonors School of Arts program.

Why some places won't buy this by BenEnglishAtHome · 2005-09-06 03:55 · Score: 5, Interesting

The pictures are pretty and I'll assume the thing works. Some folks, however, won't buy it because they don't want their intranets to work like you or I might expect. Let me explain.

I work for a large TLA govt agency. I've begged our people to get something like this. I know, from working with our folks and doing my own digging, that we have a wealth of knowledge tucked away, here and there, on local group shares and out-of-the-way internal web sites. And yet our internal search function is ludicrously bad. It works off "key words" that are simply a manually maintained (I think) list of useless, often off-the-mark descriptions of approved sites of general interest. Special-interest pages are not indexed in this way. The crawler, if you want to call it that, is terrible at doing its job. Enter a string of text and get a hit on a known, universally accessible web page containing that exact string? Not a chance. I test it occasionally and find that it remains as ridiculous as ever, with a level of functionality that would have been technologically uninteresting the better part of a decade ago but is, in this day, infuriating to users.

The reason for all this is that if our intranet were automatically crawled, well indexed, and truly searchable, people would be able to find things. People in Work Area A would be able to see how they might be impacted by something going on in Work Area B. Horrors! That would mean that management would lose much of their ability to keep employees selectively in the dark.

All this came to a head a number of years ago. At that time, our intranet content was maintained by IT. Anybody that wanted a site (literally anybody) could just get their first-line manager to approve the request and they'd get server space and some help setting up a page or two. The exchange of information that started happening was highly disruptive, so a "Communications and Liaison" office was set up that wrenched control of the intranet from IT and required (what seems to be essentially political) approval of the business case for anything that went online. No web sites unless the Communications gods approved.

Nowadays, the employees of one division are only vaguely aware that other divisions exist or have web sites. Each individual fiefdom is protected from the ravages of communications that don't strictly follow the org chart lines. I guess the executives in charge are happy in their insulated little worlds.

If you're going to sell an effective intranet search tool, you're going to have to face the fact that lots of large organization leaders (and you find the same attitudes in both the public and the private sector) would recoil in horror at the thought of having their intranet be effectively searchable. It's too threatening.

Re:Why some places won't buy this by gumbo · 2005-09-06 04:29 · Score: 2, Interesting

Based on my experiences working in government, my guess is it was more that they wanted to have control over what was on their internal web site more than they wanted to restrict information sharing. Of course, it might be that where you work is just a lot more dysfunctional than where I work.

I set up a search for our intranet at my govt agency (one part of a larger cabinet agency) many years ago. For some reason I never understood, the one guy who controls the intranet site decided that the search link should just be one of about 50 fairly random links on the main intranet page. And way at the bottom. Nobody ever uses it, I think because they have no idea its down there. I think that's his tendency to avoid change whenever possible rather than any interest in stifling information exchange.

I guess we're dysfunctional too, but just in a different way.

Slightly on-topic: you know, I don't know why I never realized it, but whenever I saw Google units in data centers, I always assumed that Google was using that DC for some of their servers. I never thought about them being Google's search appliances. I'm not very bright sometimes.
Re:Why some places won't buy this by barole · 2005-09-06 04:40 · Score: 1

FEMA is not technically a TLA.
Re:Why some places won't buy this by Anonymous Coward · 2005-09-06 05:09 · Score: 0

Gov. of Québec:
Same thing.
Re:Why some places won't buy this by Anonymous Coward · 2005-09-06 09:55 · Score: 0

Sounds like JWICS last time I used it over a year ago.

Don't use GET to modify application state! by Augusto · 2005-09-06 03:56 · Score: 4, Informative

The problem is not google, is the way your app is designed!

Universal Resource Identifiers -- Axioms of Web Architecture : Identity, State and GET

In HTTP, GET must not have side effects.

In HTTP, anything which does not have side-effects should use GET

If somebody visited your site with a pre-fetching tool like the google web accelerator, you will also find the "delete" button being checked automatically like this. Change those deletes to use POST instead.

--

- sigs are for wimps.

Curious... by PerspexAvenger · 2005-09-06 04:04 · Score: 2, Interesting

Given the actual content of their review, I'm very surprised they didn't pull the drive and have a stroll around the filesystem. They've pretty much toasted the warranty as it is, anyway.

Re:Curious... by huwnet · 2005-09-06 04:46 · Score: 1

Are there any articles about installing other services on the Google Mini?
Re:Curious... by Anonymous Coward · 2005-09-06 05:01 · Score: 0

Yes... but why screw around opening the box when you can simply PXE boot it. This has the dubious quality of being able to hack around it while "keeping the warranty" (I'm sure Google's lawyers would have a word or two about that).

Both the mini and it's BigBrother are your run-of-the-mill Intel machines that can be PXE booted to any other OS (any old Redhat will do). At that point you can mount the hard disk (the bigger machine requires Raid controller drivers, but the latest distro's will work).

The search engine's interface is a nice collection of Python scripts along with the crawling/indexing binaries. I haven't cared enough to go diving on those.

Have fun!

Re:I tested it.... by jshaped · 2005-09-06 04:12 · Score: 3, Interesting

offtopic?

At anandtech's website,
to test the ability of their google search server,
I searched for the title of that article.
You would think it would point me to the article;
it did not.

grammar 101 by Anonymous Coward · 2005-09-06 04:16 · Score: 0

"its paces", not "it's paces"

Google Applicance Internals, etc. by Anonymous Coward · 2005-09-06 04:35 · Score: 0

Would be interesting to see more info about the filesystem layout, OS and version, and the code. Apart from Google's engine, some hacker should try to piece together an open source solution ;-)

Google Appliance by Anonymous Coward · 2005-09-06 04:39 · Score: 0

The team I manage has four of the Google appliances that are the big brother to the mini. These devices provide pretty good search results with minimal effort. They will do strange things when hitting a site that contains another search engine or pdf generation. Google refers to this a a "Search Vortex" and results so far are a death match with Google Device 1 , Web Server 0. Finding the content that causes this problem and removing it from the search can be painful. Overall the boxes are solid.

Better than Google by TyroneShoe · 2005-09-06 04:43 · Score: 1

A company named Thunderstone based out of Cleveland, OH makes a way better (and cheaper) search appliance than Google's. FYI, they aren't new to the search engine industry either. Up until very recently, they were the search engine for Ebay and a few other significant sites as well. www.thunderstone.com

Re:Better than Google by Anonymous Coward · 2005-09-06 04:59 · Score: 0

LOL. Gee, I wonder where thunderstone, er, TyroneShoe, came up with that "white logo on dark blue" trade dress for their "search appliance."
Re:Better than Google by TyroneShoe · 2005-09-06 05:03 · Score: 1

You sir, are a tool. I guess they need more flaming animated gifs and a nice Flash intro? I have no vested interest in Thunderstone, period. I am familiar with them because I participate in a number of NE Ohio technology conferences and I have seen their product in action. Next time you want to make a knee-jerk reaction to a post, make sure you get a little more fact and a little less "jerk"
Re:Better than Google by SlamMan · 2005-09-06 05:22 · Score: 1

We've actually got one of each in our rack. Thunderstone handles less pages, but the ability to be able to use your own thesaurus is perfect for one of our products.

--
Mod point free since 2001
Re:Better than Google by shovey · 2005-09-07 01:10 · Score: 1

And Google has since removed it from thier index: http://www.cmswatch.com/Trends/508-Shame-on-Google
Re:Better than Google by ghqman · 2005-09-08 01:36 · Score: 1

The better question is where did Google come up with that trade dress for their mini, as Thunderstone's search appliance was white on blue long before the Google Mini was launched.

Nice review by zlogic · 2005-09-06 04:47 · Score: 2, Interesting

I like this kind of reviews. A bit of what packaging looks like (noone writes that, although it's quite interesting for me personally: how does packaging for a $10000 unit differ from a $300 maching), a bit of a view from the inside, a bit about the software. Nothing too complicated, because that would make the article dull to read. What the article provides is the general feel of the product.
One thing I wonder is that Google can probably use the included modem to download private company data which the server caches (if the company bought the server for internal use).

Re:Nice review by Kadin2048 · 2005-09-06 05:28 · Score: 1

I'll bet you watch network television, too.

--
"Ladies and gentlemen, my killbot features Lotus Notes and a machine gun. It is the finest available."

GPL? by tdvaughan · 2005-09-06 04:48 · Score: 1

It's not clear from the article but I know that Google's server farm runs on Linux. Does the same apply for these machines and, if so, do they come with the source code to the GPL-ed parts of the server software?

Re:GPL? by frangipani · 2005-09-06 06:05 · Score: 1

I used a Google Mini for a short time and no source code was available. From Google's Support site for appliance / mini owners:
Question

What are the specifications for the hardware and software used by the Google Mini?

Answer

The Google Mini is a self-contained hardware and software solution supported by Google. It's built using standard server components, including Intel processors. The OS is a hardened version of Linux and the remaining software has been developed by Google. For more details, take a look at the Google web site: www.google.com/enterprise
Re:GPL? by Anonymous Coward · 2005-09-06 06:32 · Score: 0

IANAL, but as long as the box contains GPL software, they have to open it. They can do whatever they want with their own servers, but the moment they distribute they have to open it.

Yet another fine piece of trash from google by tacodealer · 2005-09-06 04:52 · Score: 0

News for nerds? Stuff that matters? This reference to a mediocre article is neither, and should be removed.

Besides, nothing google does is newsworthy unless it's filing for bankruptcy or submitting to Microsoft and yielding to a hostile takeover.

--
I post at -1. Clearly I'm not a poster child for slashbot.

Appliance and GPL by Anonymous Coward · 2005-09-06 05:32 · Score: 0

What would really be interested is to know what it runs. Most likely it is some gnu/linux system. If it is some kind of custom distribution with modded kernel then, according to GPL, Google must make source code for such modified kernel available. I am really surprised nobody actually got hold of this. I guess the indexing software is off limits since it is a separate application, not derived from anything GPL. But even custom kernel should be interesting.

Also would like to know how do "google does no evel"-fanboys find those "custom" screws that you can't undo with a normal screwdriver?

Google Mini Support / Install. by rickbliss · 2005-09-06 06:10 · Score: 2, Interesting

I am currently in the midst of setting up setting up a Google mini. I have noticed most articles mention that getting the *initial* crawl setup is quite easy. It is. Even this article mentions "The last thing that we worked on was making the Mini look like it is part of AnandTech.com. There are two ways to go about this in the Mini admin. One is to use their built-in page layout helper, which allows you to wrap the search screens with a custom header and footer. The other way (which we prefer) is to use the XSLT Stylesheet editor and modify the stylesheet to meet your needs." But the screen shots nor the article go on to mention this process of which, I have found very little information. Also, one pitfall is that the MINI offers only 1 collection, meaning that if you want to search multiple sites you will have to filter content by URLS, i.e /my_site1/:* for one collection and my_site2/:*. And keyword searches are made across the whole collection. Also, having a Google mini I have access to the support site and forums. Through out all the forums I have yet to see a Google associate reply. I have contacted Google four times stating that I needed help getting a correct xlst sheet working aside from their default. I seem to be getting Macro replies from Google stating that they do not provide support on XSLT. I think this is considered ranting. My apologies.

look beyond brand name for better alternatives by BigGerman · 2005-09-06 06:34 · Score: 1

EnterFind appliance (the product I helped developing last year) is cheaper, handles native Windows shares(not just HTTP) as well as databases and has web-services API.

Not great for large sites by Anonymous Coward · 2005-09-06 06:54 · Score: 1, Informative

Our experience in evaluating the google machine for a large (100+ Million hit/day) site was less than positive. We'd have needed over 40 of their regular boxes to supply the search results, and there is no built-in cluster management. Since there is no access to the filesystem, this means we need to write the tool to interact with their web-based gui, and if they change bits with an automatic software update, too bad for us :-(

Needless to say, we declined. Results and response times were pretty good though.

It looks like the OS is WINDOWS by UltimaGuy · 2005-09-06 06:59 · Score: 0

I really think that the OS is windows ... the web browser it loads is Internet Explorer .. so I guess it should be windows ... the truth is I was expecting it to be Firefox or atleast Konqueror :-(

--
"In questions of science the authority of a thousand is not worth the humble reasoning of a single individual."

Re:It looks like the OS is WINDOWS by ARRRLovin · 2005-09-06 07:04 · Score: 1

You can't be serious.
If you are.....I don't know how to respond.

--
-Randy
Re:It looks like the OS is WINDOWS by UltimaGuy · 2005-09-06 07:19 · Score: 1

I am sure ... check this link from Anandtech and say what the browser they are accessing inside it is. For my eyes it looks like Internet Explorer ... it even says so on the top ...

--
"In questions of science the authority of a thousand is not worth the humble reasoning of a single individual."
Re:It looks like the OS is WINDOWS by coconutstudio · 2005-09-06 07:24 · Score: 2, Insightful

OS it is running is RedHat Linux. The IE you are seeing is from the client machine, which happens to be Win. You can't access the server directly, only via the web interface.

--
http://www.up0.com/
Re:It looks like the OS is WINDOWS by Anonymous Coward · 2005-09-06 07:27 · Score: 0

your post makes my brain hurt, please go back to school and take notes this time when your in networking class.
Re:It looks like the OS is WINDOWS by Anonymous Coward · 2005-09-06 07:47 · Score: 0

I'm pretty sure that you use any web browser in any operating system to connect to the appliance in order to configure it, but the appliance itself runs something else.
Re:It looks like the OS is WINDOWS by zachmagaw · 2005-09-06 08:40 · Score: 0

its a linux distro... not sure its red hat.. it uses RPMs yes.. but from I hear they are using their own distro...

Nice machines by bart416 · 2005-09-06 07:02 · Score: 0

They look nice. Preformance seems good. I wouldn't mind getting my hands on one. But it would be quite useless for me :) I actualy wonder what OS those boxes run. Somebody knows?

How long before... by chrysalis · 2005-09-06 07:28 · Score: 0, Redundant

Microsoft also sells boxes like those?

--
{{.sig}}

Re:How long before... by ARRRLovin · 2005-09-06 07:33 · Score: 1

When Longhorn2007VistaEnterpriseEdition receives its 3rd service pack.

--
-Randy

It's all in the G! by javiercr · 2005-09-06 07:29 · Score: 1

Clearly they just chose Gigabyte So it could be a G-appliance.

--
Mac toys and accessories blog

Re:I tested it.... by Manitcor · 2005-09-06 07:31 · Score: 1

From TFA: We created a file to which a link to every article, news post and blog post that have been published on the site would be dumped. That file is cached for a few hours as we update the index 3 times a week.

--
"Don't mess with him, he taunts the happy fun ball."

Google on Time4ink.com by RobHeritage · 2005-09-06 07:40 · Score: 2, Insightful

We looked into the testing of the Google appliance for searching our printer ink site. We found using our Google ad sense account gave our printer ink customers the ability to search our site and suited our small business needs just fine. You can see our search box at the top of our site let's the search happy people search away. If they go somewhere else we felt being a directory will allow us to keep them coming back due to our printer help sections. Why buy a big Google appliance??? -- Especially with the fees. I know some techies would disagree and want better control over their pages, but so far we have had great results having clients actually find what cartridge they are looking for by model number or keyword specific terms.

Re:Google on Time4ink.com by Manitcor · 2005-09-06 09:21 · Score: 1

good idea and great if it fits your purposes.

In the case of your application I would say it was a good call.

In the case of more content rich sites that may have varied types of articles as well as the desire to have a more intergrated look and feel the applicance is more neccassary.

There are also many intranets that have tons of content that is not available to the Net at large however the people who manage and use these networks would still like to be able to search the content they have on thier internal sites, file shares, etc.

Which brings a question, I see the applicance seems quite adept at crawling web pages but most search products provide a plugin type framework so that I can write my own data accessors and crawlers. Does the google appliance provide this capibility? Can I write a crawler for say my email system, my ERP data store, my customized 3rd party document managment system?

How about metadata support?

--
"Don't mess with him, he taunts the happy fun ball."
Re:Google on Time4ink.com by RGRistroph · 2005-09-06 10:25 · Score: 1

Correct me if I'm wrong, but the google applience is not for other people to search your site. It's for YOU to search your non-web, private data, right ?

Say you are a lawyer and have 10 years worth of electronic versions of communications on 50 different computers. You can buy a google appliance, configure it index everyone of those computers (you have to network share the drives in some way), and the "cache" link also works as a sort of backup. You don't want any jackass on the web searching that stuff. If you did, you would just put it on the web and let googlebot index it for free.

Carpetting by ukleafer · 2005-09-06 07:45 · Score: 3, Funny

Anyone else think the Anandtech server room has some lovely, lovely carpets?

You're both right! by Anonymous Coward · 2005-09-06 09:28 · Score: 0

it's not offtopic, it's flamebait.

[OT] Meme come true? by Neoncow · 2005-09-06 11:50 · Score: 1

Bow to your Google Overlord. ;)

A bit disappointing by Anonymous Coward · 2005-09-06 12:13 · Score: 0

I'm a bit disappointed. I would have absolutely loved to have seen what was actually on the hard drive as to get a better idea as to how Google actually thinks and organises.

I loath the appliance by funkman · 2005-09-06 12:52 · Score: 1

I loath the google appliance. I liked it for the price and it was supposed to be like an appliance. Plug it in, turn it on and click a few buttons and off you go.

It locked up for me waay to many times even though google cites this as rare. I wasted way to much time on support for a device which should not need this level of babysitting.

When my contract ends, I'm switching to Nutch.

Os is linux - What about GPL? by Anonymous Coward · 2005-09-06 14:55 · Score: 0

since he OS seems linux, should they not give access
to the source code etc.. to comply with GPL.

From the reviews so far it seems it is a closed system
and can be used only through the web browser.

Benchmarked: Google Appliance != Performance by LordBlackadder · 2005-09-06 16:18 · Score: 2, Informative

Google has many production quality problems with its distributor. I had to return 2 units before I received a functioning unit the 3rd time. I benchmarked the functioning Google Mini the other day. I havent published detailed results yet, but I can tell you that the performance was very poor considering the performance expection from a brand like Google. While I think the appliance is very capable, neither the Google Mini nor the larger yellow appliance are suitable for wide enterprise deployment. I benchmarked the Mini at an average of only 3 transactions per second. Max of 7 TPS, Min of 1 TPS0. Load balancing with 2 boxes only increased speed of transaction time by ~30%. My company of 100,000+ users certainly can't use a system at this performance. I don't think my workgroup of 20+ people will be able to use it productively. We bought the box, but I think it will stay in the closet for limited uses. It has potential for h4xng with processor/mem upgrades - maybe even dd to new hardware. But until Google concentrates on appliance performance, their "Google Enterprise" initiative won't be taken seriously by the target market.

Follow Up:Google Mini Support / Install. by rickbliss · 2005-09-07 02:30 · Score: 1

A Response From Google: Thank you for your message. I apologize that there is currently some ambiguity in our documentation regarding external stylesheet behavior. Google has recently begun shipping Mini appliances with patch "google-mini-patch1.bin" pre-installed; this appears to be the case with your Mini. Documentation for this patch describes the procedure for installing this patch (which will not be necessary in your case), but also describes how to add #TRUSTED_STYLESHEET rules to allow reference to specific external stylesheets:

swish-e as a Google Mini alternative by guacamolefoo · 2005-09-07 02:38 · Score: 2, Informative

I seriously considered getting a Google Mini for my law office. The desktop search stuff wasn't really doing it for us, and we have boatloads of work that we reuse on a regular basis -- pleadings/contracts/settlement agreements, etc. are sort of like code in that respect -- we always want to reuse our knowledge rather than reinventing the wheel. My concern was that the regular Google appliance was too expensive. The mini seemed reasonable, but I still was resisting the idea of paying that much for search.

In any case, I had searched high and low for a decent search function when I happened upon swish-e. I am exceptionally pleased with it. It can be found at swish-e.org.

I am not an uber geek, but I was capable of spending an afternoon monkeying with it to install it, set up regular indexing as a cron job, get it to properly read and index OpenOffice documents, and to launch them from the browser. This involved some frightening security settings, but I have a small enough office (three people) that I'm not too torqued about this. The wide open settings I used were not swish-e's fault, as near as I could tell. Rather, they resulted from my laziness -- "It works well enough now, and the likelihood of malicious use is pretty low, so fuck it".

Obviously, it could be set up a bit more cleanly on my end, but I am really, really happy with it apart from that. Currently, it runs on a used SCSI-RAIDed IBM Netfinity box that I picked up for a little under $500.

The time and money I spent on the hardware plus getting it running has paid immense dividends. I have benefitted in two primary ways:

First: my office minions use the network for storage and do not store anything locally. This means that everything is indexed (and can be found!) and because they like the search so much, they also (unwittingly, perhaps) give me the peace of mind knowing that our data also gets the other benefits of being on the network (everything is backed up automatically/regularly, etc.).

They like being able to find stuff, so the search has really encouraged saving stuff on the network. I could mandate this in other ways, but I'd rather have them drinking my Kool Aid than simply imposing the idea.

Second: My minions and I have saved tons of time using the search feature. Any good search does that. The additional bonus is that I no longer have to worry about the next version of Google Desktop or Copernic or installing it on various machines, blah, blah, blah. It's all centrally saved and configured. Administration is essentially zero since I am getting good search results on all the document types that I need - some old MS Office leftovers, Open Office, and PDF.

I don't see needing to change this in any significant way for at least as long as I keep the hardware. I think that the next time I'll need to touch it will be when the index outgrows the box serving the searches.

The box I'm running has dual 1.something gig pentiums with a gig of RAM. The drives are the weak link, with only 9.1 GB of space available for storage of OS, index, etc. The box also has redundant power supplies, redundant power supplies , redundant ethernet connections (100MB), and redundant ethernet connections (100MB).

The front end to the search is just a standard, "came with it" CGI script (swish.cgi). It works just fine. It gets called up as a webpage locally, and it spits our results.

On a final note, we are pretty aggressive in enforcing standardized file naming conventions. The naming conventions typically include te client name, the matter, a date, the type of document, and the subject of the document. Swish-e has document path, title, title and body searches off the interface we use, and you'll usually find exactly what you're looking for if you're reasonably specific.

On a final note, swish-e has been unsuccessful when I have used the following search terms "nubile blonde woman" and "willing to get with me". In that respect, swish-e has been an outright failure, though it is conceivable that the fault lies with operator error.

GF.

--
Lots of petrified grits

178 comments