Amdahl's law is not like Moore's "law". Amdahl's law is an observation of mathematics. You can't ever get around the fact that if you increase the performance of 90% of the instructions in a program, you still have to deal with the other 10%. Even if you increase the performance of 90% of the instructions by 100x or something large, if the other 10% take a long time (like disk access) its going to kill your performance.
It makes the implicit assumption that there is a '10%'. The whole point of some pure functional languages is that essentially all of it can be parallelized, 99% or more in some cases. There are programs where the sequential part is 0.001%, if that.
However, another comeback I've heard is that the "end goal" of parallelization is not necessarily to do the same thing in less time, but to do more in the same time. That is, you increase the "parallelizable" bit while keeping everything else constant. For example, take a look at computer games, where you'd just do "higher resolution", or "bigger textures", or "more complex shaders". You do more, while the sequential stuff stays much the same.
I can't stand it when people trot out Amhdal's law like it's some sort of reason that parallelization is futile or something. It's missing the point completely!
Get real, for 150 users at WRT54 will do DNS etc....
Want a bit more poke, VIA EPIA + small flash disk.
"buy a server".. jeez, you work for IBM sales dept?
I'm responding to your comment:
I, personally, am TOTALLY in agreement with the ethos of whoever designed it, a single box for each service.
I recommended at least two boxes, for redundancy. He may need more, depending on load.
For a 150 user organization, that's nothing, most such organisation are running off a dozen servers or more, which is what the original poster in fact said. With virtualization, he'd be reducing his costs.
One per service is insane, which is what you said. If you wanted dedicated boxes for each service AND some redundancy, that's TWO per service!
Backpedaling and pretending that a WRT54 can somehow host all of the services required by a 150 user organization is doubly insane.
Years ago the Microsoft DNS implementation had a very nasty memory leak and used a lot of cpu - you really did need a dedicated DNS machine for small sites and to reboot it once a week. I think that's why people are still thinking about putting it in a virtual box so it can't eat all the resources, even for a pile of trivial services that a sparcstation 5 could handle at low load.
In practice, everyone just builds two domain controllers, where each one runs Active Directory, DNS, DHCP, WINS, and maybe a few other related minor services like a certificate authority, PXE boot, and the DFS root.
I haven't seen any significant interoperability problems with that setup anywhere for many years.
Still, virtualization has its place, because services like AD have special disaster recovery requirements. It's a huge mistake to put AD on the OS instance as a file server or a database, because they need to be recovered completely differently. The last thing you want to be doing during a restore is juggling conflicting restore methods and requirements!
Poppycock. You can buy small form factor single core pc's for under $200, or even a refurbished 3-4 year old server box for close to the same price. Depending on the environmental and space considerations, you can pick the platforms to suit and keep the costs minimal. Shoot, even a $200 netbook would have more cpu power and storage than most 7 year old computers, generate little or no heat, and demand a fraction of the power. If this guy is smart, he can cut electrical costs and cooling costs substantially without changing a perfectly functional architecture.
What doesnt make sense is grossly overcomplicating things by trying to shove too much into some large scale platform and then further complicate it with a virtualization layer. We gave up mainframes and thin clients/fat servers didnt work for a reason.
Sure, its cool and technically challenging. Whats the business reason/driver for going the cool/challenging route again?
If the OP decides to quit 2 months after implementing his super cool setup because the job after that is completely boring, who can come in and grasp what he's set up and maintain/upgrade it? Another finicky tech guru that wants to play with the stuff on the job and gets bored and walks off a couple of months later?
$200 machine = no raid, no ECC memory, no hardware monitoring, no support for server OS-es, not to mention that most netbooks can't run 64-bit, which means the latest Windows server is Just Not An Option.
Good advice! Lets run ALL of our business critical functions off laptops just to avoid learning about new technology! Lets all run on mixed hardware and have to deal with drivers from fifty vendors!
You really don't understand what virtualization provides, so maybe you should read up on it a little bit before you go spouting off.
It's not hard, it's not "massively complex" unless you go out of your way to do it wrong, and it has nothing to do with mainframes or thin clients.
Virtualization isn't some "super cool" buzzword technology, it's a money saver. It reduces costs massively. It makes hardware maintenance an order of magnitude cheaper and safer. There's a reason everyone is switching to it.
If you can't keep up with technology, you shouldn't be in IT.
I, personally, am TOTALLY in agreement with the ethos of whoever designed it, a single box for each service.
...
Virtualisation is, IMHO, *totally* inappropriate for 99% of cases where it is used, ditto *cloud* computing.
I totally disagree.
Look at some of the services he listed: DNS and DHCP.
You literally can't buy a server these days with less than 2 cores, and getting less than 4 is a challenge. That kind of computing power is overkill for such basic services, so it makes perfect sense to partition a single high-powered box to better utilize it. There is no need to give up redundancy either, you can buy two boxes, and have every key services duplicated between them. Buying two boxes per service on the other hand is insane, especially services like DHCP, which in an environment like that might have to respond to a packet once an hour.
Even the other listed services probably cause negligible load. Most web servers sit there at 0.1% load most of the time, ditto with ftp, which tends to see only sporadic use.
I think you'll find that the exact opposite of your quote is true: for 99% of corporate environments where virtualization is used, it is appropriate. In fact, it's under-used. Most places could save a lot of money by virtualizing more.
I'm guessing you work for an organization where money grows on trees, and you can 'design' whatever the hell you want, and you get the budget for it, no matter how wasteful, right?
That would only works if the password is kept on a temporary file. Otherwise there is no reason whatsoever the password would be anywhere on disk. And that does not work at all if you use a bootable CD.
But that's not how it happens in the real world. Most people don't run their computers from read-only media with the swap turned off!
First of all, there's lots of bad developers out there. Passwords get saved all over the place, in the registry, configuration files, etc... I've seen web sites that were "https", but then put the plain text password into the URL, which is saved in the unencrypted browser history!
Second, even if you store passwords in memory only, the pagefile might still contain it, if a page containing the password was swapped out. It's even more likely with hibernation files, which swap out everything, including kernel space marked as non-pageable.
In theory, there's features like "protected memory" that developers can use to store passwords securely in memory, but this takes a lot of work. In Win32 there's a set of APIs for it, but many developers don't use it, or haven't even heard of it. It's such a low level "buffer manipulation" style API that lots of high-level languages can't or don't use it. It's only recently that C# got support for it, for example, and I don't think Java has anything comparable. Most garbage-collecting languages are vulnerable, because memory can be relocated (copied) at any time, which may prevent buffers from being properly cleared.
One of the worst culprits are those "I forgot my password" web pages that email you your plain text password to your mailbox, so that your email client can then cheerfully write it all over the place. Even if you encrypt your PC's disk, but use corporate email, your password is now in plain text, on the server's disk.
In practice, real security is hard. Very, very hard. As a consultant, I've been to over 100 clients, including major banks and very security sensitive government institutions, and I've only ever seen 2 secure networks: One financial services company, and the internal LAN on the new generation Boeing planes.
However, to break an 8-character alphanumeric password (case and numbers), that becomes seven months
Ah... theory!
In practice, even very long passwords are trivially cracked in little time, using simple methods.
Unfortunately, I lost the source, but while studying cryptography myself, I stumbled upon a quote from some guy involved in government decryption in the US, and (paraphrasing), he said that their technique was basically to pick up the hard disk from the machine with the protected content, and then simply try every consecutive range of bytes as a password.
Unless the disk was encrypted with 'whole disk encryption', it works something like 90% of the time, simply because of stupid software saving plain-text passwords, users reusing passwords for various purposes, things like hibernation and page files, etc... I suspect that on disks from corporate networks, it would work even better, because if any one disk reveals the network admin password, you can unlock everything else from there.
So if you have a 100 GB disk, and you try all byte ranges from 4 to 20 bytes long (to account for various password lengths), and you try every byte range as both an ASCII and UTF-16 string, that's merely 17x2x100*10^9 = 3400 billion passwords to try, or 3.2 days at your quoted "12 million passwords per second".
In practice, most disks would crack much faster than that, if you aim the algorithm at the most likely sources first, such as the page and hibernation files, the user registry, and the web browser cache and configuration folders.
The lesson I took away from that is that against an attacker with physical access, it really doesn't make the slightest difference how strong your password is, unless the entire disk is encrypted.
I have looked up roaming profiles, they don't work right and a large percentage of software won't work with them. a large percentage of (third party)apps don't allow multiple instances of themselves to be run by different users.
while not specifically all MSFT fault it is, because MSFT designed the system for single users, and then let third party developers do whatever they want. Oh and the corporate deployments store a login and one folder usually my documents. not the entire user folder. not all their settings. Somethings simple like background picture, outlook settings, IE settings, even taskbar colors are stored only locally. meanign I get a different setup depending on which manchine has my profile. Also each machine with your profile needs to be setup ahead of time to allow you to login. I can't just get up and walk over to another work station and login and have everything at hand. the admins have to login and allow you to have your profile on the specific machine. It is far from easy and far from trivial. As this is the third company I worked for that have had these limitations.
that is a roaming profile, not a roaming user.
I think you're confusing things, roaming profiles are basically indistinguishable from the perspective of end-user software, in all my years of Windows system administration, I've never heard of a program "incompatible" with roaming profiles.
Second, if your background pictures and other settings aren't following you from machine to machine, then you do NOT have roaming profiles, because that's precisely what they're for.
Last, profiles don't contain all of your information, they need to be combined with folder redirection to get the "full" effect.
This is well known, well documented, and takes like 100 mouse clicks to configure across the entire environment (it's set by group policy).
In most of the environments I've worked in (or set up myself), a user can sit down in front of any computer, and have everything exactly the same. This is not rocket science!
Be careful with the attitude that just because you know how to do something for platform A but not platform B, that it somehow "can't be done for platform B".
I could do that with *nix in 1996. and the setup was several years old at that point. What you describe is nothing special in fact it should be standard but the corporate world had to what for MSFT's closed buggy versions. and had to wait for the "desktop" to be replaced once again with mainframes with the horsepower to drive them.
MSFT reinventing unix poorly for 20 plus years. I am still waiting for proper multi user and multi monitor support( the hardware and drivers for multi monitors are good, the apps suck hard and yes OSX isn't any better) why do only some apps store information in the user data folders, while others store it locked in a registry that is machine specific. and i am talking about MSFT own apps not third party?
"PXE was introduced as part of the Wired for Management framework by Intel and is described in the specification (version 2.1) published by Intel and Systemsoft on September 20, 1999." (Wikipedia)
I imagine you're talking about something like booting Unix workstations from an NFS share, right?
I don't know what you mean by "proper" multi user support, Windows desktops allow multiple logged on users (one active at a time), and server editions have had terminal services for years now. The registry is not "locked", and not "machine specific". Look up roaming profiles, the user hive is just a file, and almost all corporate deployments store it on the network.
Just because you don't know how to set it up right, doesn't mean it can't be done.
Come on, you're talking about ancient history. The 70s? That was the Soviet Union!
What about Alexander Litvienenko? He fled Russia in 2000, was granted asylum in the UK in October 2006, and by November 2006 he was murdered. The killers used a radioactive isotope that would not have been available to the average crazy on the street -- clearly sending a message.
A rare isotope that's only made in quantity in old Russian RBMK nuclear reactors.
It's genious. They deny any official involvement, while choosing an obscure method of murder that only they could have applied.
"We didn't officially do this thing that only we could do. Be warned that way may not officially do similar things to you too."
People forget that many of the Microsoft tools just work.
Just recently, a coworker and I set up a "green field" environment from scratch in two days flat (starting with one blank server), and had network authentication (AD), enterprise email (Exchange), and a PXE-booting machine imaging system (including Office) deploying images to laptops, including drivers! It was kinda neat that we could boot a blank machine, and within 30 minutes be able to log in with a user account, double-click Outlook, and see an inbox.
In the corporate world, Microsoft still rules, and will continue to rule for a long time.
When you buy the Sparc monster you get all the necessary software in order to partition your machine as needed.
With your 10 Intel machines, please pray tell me, how do you assign two of them to the same virtual server? (including memory, disk and network capacity).
I'm guessing you skipped the second part of my post, where I mentioned that both Intel and IBM make 8-32 socket servers systems. The Intel servers would be well under $100K.
Dell ships servers with ESXi hypervisors installed as firmware on the motherboard.
If you want a more powerful virtualiation solution, there's Xen, Hyper-V, or the very mature ESX server. The latter lets you treat a cluster of hosts as basically one giant pool of memory and CPU. Virtual machines just float around, migrating from host to host in seconds, with zero outage. You can literally drag & drop a virtual machine across data centers, live. Solaris doesn't even come close to that.
If you don't like Windows, you may have also heard of this operating system called Linux.
And finally, as I mentioned, Intel can run SOLARIS.
"Some hardware engineering but that SPARC stuff really isn't competitive."
Really?
How much do you know about "that SPARC stuff?" It's true that x86 has finally surpassed a lot of the things that Sparc led the way in, but there are still ways that traditional Sparc scales better.
Now moving to the next generation of Sun's gear, we have hardware virtualisation and CoolThreads. Under a hundred grand will buy you a system with four 8-core CPUs, and each core can process eight simultaneous threads. That is OLTP nirvana! Too much power? Chop it up into a handful of smaller servers, each running their own OS. Any one of them can in turn be split into zones--soft OS partitions.
I keep hearing about how Sparc is obsolete, and yet the new generation of Sparc processors and supporting hardware is pushing the state of the art that Intel and AMD aren't even planning in yet.
Umm... what?
First of all, for "a hundred grand", I can buy 10 systems that add up to 80 Intel 3Ghz cores (160 threads) with 720GB of memory, which is going to shit all over that SUN box with its anemic 1Ghz processors. That's retail pricing, in Aussie dollars! Including tax! Delivered to your door in under a week, assembled!
Meanwhile, to get that SUN box, I'd have to "call your nearest SUN dealer". Oh good, I can't wait to have him explain to me how spending $100K is going to "save me money", or something.
I'll grant you that 32 cores in a single box is needed for those rare cases where you need "one big box to rule them all", but SUN has dropped the ball on that too:
Intel is releasing their 2GHz+ 8 core, 16 thread Nehalem-EX processors this year (or very early next year), and it has glue-less scaling to 8 processor sockets (64 cores, 128 threads) and a jaw-dropping 128 DIMM sockets. With the dirt-cheap 4GB DIMMs that most people are buying, you could pack in 512GB into a single box for a mere $24K. Again, that's retail pricing, in Aussie dollars, including tax.
Meanwhile, IBM is about to ship their 4GHz+ 8 Core, 32 thread POWER7 CPU, which scales to 32 sockets. In case you missed that, it's 4x the clock rate and 8x the sockets, or 32x the performance of that SUN server.
Not to mention that both IBM and Intel processors have had virtualization (same thing as "zones" or "partitions") for a long time now, and can run more than one kind of OS side-by-side. The POWER processors can run various IBM operating systems as well as Linux, and Intel is compatible with damned near anything, including Solaris.
Face it, SPARC is dead, the big-boys are making chips with several times the power, for a fraction of the cost. (admittedly, POWER7 isn't going to be cheap)
PS: I'm not surprised SUN is generally losing their market share, even their x86 kit is overpriced. I personally love the concept of the SUN Thumper ZFS-based storage array, and was all excited about it, right up until I saw their pricing model: They only go to up to 1TB drives, and it's actually cheaper to buy the model with 250GB drives, then throw the drives out, and go buy 48 replacement 2TB SATA drives from a retail store. That's 2x the storage for 1/2 the cost. Insanity.
I think SUN forgot that some of their potential clients can count.
"The AP1000 design saves money and time with an accelerated construction time period of approximately 36 months, from the pouring of first concrete to the loading of fuel"
That's not bad. I imagine it would take at least a year or two to get funding and approval, but even then it would only take 5 or at most 6 years for a new plant to start producing power.
Once a company has approval and a line of funding, building more would take less time, you can deploy these side-by-side in a cookie cutter fashion with an accelerated time line.
Coal and oil are plentiful, cheap, and easy to use. Compare this to idiotic technologies like wind and solar that are hugely expensive, unreliable, and hurt the eyeline of the cities they are installed in. And people wonder why environmentalists are considered stupid.
Excuse me, but caring about our planet does not make somebody stupid. Caring only about your pocketbook, however, does make you a greedy asshole. And thinking that eveyone must have the same order of priorities as you does make you stupid.
Also, most wind turbines aren't built in or even near cities, they're usually off-shore or on hilltops somewhere out in the countryside.
There is one experimental wind turbine in Sydney, which I could see from my University. I used to love staring out the window at it, I found the slow steady movement to be relaxing.
This is what computers were designed to do, but instead of just doing a numerical simulation, physicists insist on waving their hand and dismissing the error term like it's not even there, so they can keep using nice pretty exact solutions that... don't agree with reality.
I think these people may disagree with you.........
You'll find most of those simulations are Newtonian. I just checked some of their latest papers, and they all use Newtonian or modified Newtonian (MOND) codes. The code they run is called "GADGET-3" (they also used earlier versions in the past), and according to this high level description, it's Newtonian. Admittedly, it's an impressive simulator, but it seems to concentrate on scale (many particles) and on including many effects like gas interactions, magnetohydrodynamics, etc... but not a relativistic metric.
If you can find even one of the published papers on that site that even mentions to worth 'relativistic', I'd be very interested in reading it. The papers are linked from here, they have links to the full papers on Arxiv.
Using relativistic mechanics, the fit is an exact match to observed galaxies.
I can tell from that quote that you don't have the slightest idea of what you're talking about. Even a sophomore physics student can see that these stars are no where near in the conditions where relativistic equations even matter. Jesus christ man, don't you think that if it was that simple a fix physicists would know about it?
And you can show that mathematically? DID YOU PROVE IT, or just assume, like everyone else?
Umm... I took more than a year of physics. I have a Bachelor of Science. That particular story actually happened in second year (it came up while we were studying Quantum Mechanics), but I've seen similar hand-wavy assumptions made even in very advanced materials later on.
NASA uses Newtonian mechanics for solar system navigation because it is precise enough for that application, and it's predictions agree with reality.
Newtonian mechanics is not good enough for dealing with the motion of galaxies, and it's predictions do not agree with reality.
So all that stuff I heard about MOND was just in my head? Thanks for grounding me in reality!
Did you read read ANY of that? MOND stands for Modified NewtonianDynamics.
My entire point was that physicists are using simplified newtonian equation of motions instead of the known correct General relativity.
It's basically a type of laziness. Newtonian equations are simple, and solutions for the rotation of galaxies can be done on a blackboard. Even MOND is relatively straight forward. Full General Relativity is hard to solve, and galaxy rotations are particularly complex because of the complex axial distribution of matter. An exact solution for a system as complex as a galaxy is probably impossible, and even if it was possible, it would be way out of the league of any human mathematician.
This is what computers were designed to do, but instead of just doing a numerical simulation, physicists insist on waving their hand and dismissing the error term like it's not even there, so they can keep using nice pretty exact solutions that... don't agree with reality.
O M G ! - W T F ! Low level physics classes use lots of simplifications? That explains why I can't find massless ropes and frictionless pulleys on E-Bay!
Except that I found this general tendency to dismiss higher order error terms to persist through every year I studied physics at University. I didn't drop out in first year, just so you know.
Simplifying equations is not straight forward, you have to be able to show mathematically that the error term is truly insignificant, but this part seems to be glossed over. Students learn a huge array of simplified equations, and are never really exposed to the original thinking and justification behind them, and often don't even realize they're working with techniques that may not work in corner cases. These same students become researchers and write papers about dark matter.
How do you propose that we expand our knowledge without acquiring more evidence? How would you test a hypothesis or a theory without searching for more evidence for or against it?
Why waste time looking for evidence to support a theory based on a known-wrong theory?
Why wouldn't we first re-try the simulations of galactic motion with the known-correct theory, before wasting millions of dollars looking for some mythical invisible matter to match the error term of a simplified equation?
Simulations of stars in galaxies are approximations because: 1) there isn't an equation for an exact solution to any gravitationally bound system containing more than 5 objects.
That's not what I said, numerical precision is not the issue. If you're only 10 decimal points accurate, that's still good enough. Numerical simulations, of for example, the solar system can be done (and have been done) relativistically to huge precision. Dark matter theories are implying that there's a 10x error. That's not accountable for by numerical precision!
2) stars in a typical galaxy are not uniform so the simulations must take this into account as a best guess.
Well duh. However, the simulations I was referring to perform a 'fit' of observed data. We know the observed velocities of stars in galaxies, and given that, a good model can predict the mass distribution of stars that can produce that motion. Using newtonian mechanics, this 'fit' can't be done, an extra "dark matter" term is required, where that dark matter doesn't behave like ordinary stars under gravitation. Using relativistic mechanics, the fit is an exact match to observed galaxies.
3) newton's equations are indeed incorrect however, Einstein's equations only dominate to a significant degree under unusual conditions.
You just made the exact same, unfounded assumption everyone else has! How the do you not see that it's a HUGE MISTAKE when observed reality doesn't match the predictions of your simplified model? How is that not enough of a clue that maybe the model may be oversimplified?
Galaxies are massive. Did you not notice that? It takes 10,000 years for either light or gravity to cross one, and they bend light to the point that we've got nice Hubble pictures of them looking like a reflection in a funhouse mirror. Higher order error terms can't be ignored every time with a wave of your hand. Nonlinear effects can be subtle, and will bite you in the ass if you ignore them.
In so far as dark matter is concerned, you are incorrect. Experiments like the Cryogenic Dark Matter Search are attempting to detect dark matter particles directly, we've got neutrino detectors looking for evidence of annihilation events... Particle accelerator experiments attempting to actually synthesize dark matter candidates.. To claim that there isn't a way to test the dark matter hypothesis would be grossly inaccurate. Disclaimer: Physics isn't my major but I did study quite a bit of it in high school and college.
There's way to test for dark matter, sure, but no test has even provided a slightest hint of anything that might be there. Physicists are claiming that 90% of the matter is invisible.. so where is it? That would mean that there would be... 10x more stuff around. I think we'd notice.
Lesson learned - Give the same system rights to your windows users as your Linux users have, and they can't get infected even if they wanted to.
The corollary to that rule is that many applications won't run because they're poorly architected and require administrative rights to run. Oh, sure, you can finagle around with permissions and get many of them to run, but is it really worth the time to work around broken software? (running Windows which itself is broken notwithstanding)
Yes, it's worth the time, that's what I did for years as a Citrix server admin. It's worth doing it for desktops too, otherwise you end up spending half your time re-imaging infected machines.
The difference is that dark matter and dark energy can be tested for in various ways; a deity can't be. When physicists can't explain something they may use a place holder at times but there's no chance of just giving up like the "god did it" explanation does.
No, they can't test for it, that's the problem.
This is more along the lines of "our equations don't explain the observed motion of galaxies, therefore, there's matter there we can't see or touch."
That's just not a logical conclusion. It leaves out the much more likely answer that our understanding of the equations of motion is wrong.
The really stupid thing is that all of the predictions that disagree with observed reality (and are the cause for the dark matter/energy predictions) are approximations. Most galaxy motion simulations are based on either Newtonian mechanics or "modified Newtonian" mechanics, even though both are known to be wrong. Einstein showed them to be wrong over a hundred years ago! My second sentence actually should have been:
"Our known-wrong, simplified equations don't explain the observed motion of galaxies, therefore, there's some magical invisible stuff there we can't see or touch."
That's some good science, right there.
Disclaimer: I studied physics at University, and both me and a friend of mine noted during our studies that Physics seems to overuse simplified equations even in situations where it leads to substantial errors. The example that my friend noticed for example is the classic "double slit" interference experiment. Take a look at the equation used . Those simple equations are the ones we learned about also. They're wrong. In many practical cases, the error can exceed 30%!
Are there any other filesystems with that feature? If not, I'm very strongly considering writing my own.
I was actually thinking the same kind of thing a few years back, but I did some back-of-the-envelope maths and realized that a de-dupe filesystem is actually quite hard to implement.
A naive implementation is simple, but slow. The issue is that the hash codes are basically random, so you have to store all of them in memory, or suffer horrendously expensive random disk lookups, which can't be cached easily.
Imagine this scenario: If you use SHA-256, then that's 32 bytes per has code, minimum. If you take a single 2TB SATA disk, and carve it up into (relatively large) 64 KB blocks, then you have 16M blocks, or 512MB of raw hash code data that you have to keep in RAM, all at once, ignoring overheads, which are substantial. In practice, expect that to be more like 1 or 2GB. Sure, that's only 0.1% of the original disk capacity, but that's just one disk! A SUN thumper has 48 SATA disks in a single chassis, or about 80 TB usable after overheads, which adds up to at least 40 GB of hash code data, or more like 80-100 GB for a typical naive implementation. That's a lot of data to be keeping in the kernel, and would require 128GB of physical memory in the server if you also wanted some room for file data caches and whatnot.
Real world de-dupe filers often use several fancy algorithms at once to reduce effective RAM requirements, but it takes a lot of work. For example, some filers use hierarchical hashes, others use Bloom Filters, and I've heard of filers that partition the hashtable and use file identification heuristics to load likely partitions on demand.
Amdahl's law is not like Moore's "law". Amdahl's law is an observation of mathematics. You can't ever get around the fact that if you increase the performance of 90% of the instructions in a program, you still have to deal with the other 10%. Even if you increase the performance of 90% of the instructions by 100x or something large, if the other 10% take a long time (like disk access) its going to kill your performance.
It makes the implicit assumption that there is a '10%'. The whole point of some pure functional languages is that essentially all of it can be parallelized, 99% or more in some cases. There are programs where the sequential part is 0.001%, if that.
However, another comeback I've heard is that the "end goal" of parallelization is not necessarily to do the same thing in less time, but to do more in the same time. That is, you increase the "parallelizable" bit while keeping everything else constant. For example, take a look at computer games, where you'd just do "higher resolution", or "bigger textures", or "more complex shaders". You do more, while the sequential stuff stays much the same.
I can't stand it when people trot out Amhdal's law like it's some sort of reason that parallelization is futile or something. It's missing the point completely!
Get real, for 150 users at WRT54 will do DNS etc....
Want a bit more poke, VIA EPIA + small flash disk.
"buy a server".. jeez, you work for IBM sales dept?
I'm responding to your comment:
I, personally, am TOTALLY in agreement with the ethos of whoever designed it, a single box for each service.
I recommended at least two boxes, for redundancy. He may need more, depending on load.
For a 150 user organization, that's nothing, most such organisation are running off a dozen servers or more, which is what the original poster in fact said. With virtualization, he'd be reducing his costs.
One per service is insane, which is what you said. If you wanted dedicated boxes for each service AND some redundancy, that's TWO per service!
Backpedaling and pretending that a WRT54 can somehow host all of the services required by a 150 user organization is doubly insane.
Years ago the Microsoft DNS implementation had a very nasty memory leak and used a lot of cpu - you really did need a dedicated DNS machine for small sites and to reboot it once a week.
I think that's why people are still thinking about putting it in a virtual box so it can't eat all the resources, even for a pile of trivial services that a sparcstation 5 could handle at low load.
In practice, everyone just builds two domain controllers, where each one runs Active Directory, DNS, DHCP, WINS, and maybe a few other related minor services like a certificate authority, PXE boot, and the DFS root.
I haven't seen any significant interoperability problems with that setup anywhere for many years.
Still, virtualization has its place, because services like AD have special disaster recovery requirements. It's a huge mistake to put AD on the OS instance as a file server or a database, because they need to be recovered completely differently. The last thing you want to be doing during a restore is juggling conflicting restore methods and requirements!
Poppycock. You can buy small form factor single core pc's for under $200, or even a refurbished 3-4 year old server box for close to the same price. Depending on the environmental and space considerations, you can pick the platforms to suit and keep the costs minimal. Shoot, even a $200 netbook would have more cpu power and storage than most 7 year old computers, generate little or no heat, and demand a fraction of the power. If this guy is smart, he can cut electrical costs and cooling costs substantially without changing a perfectly functional architecture.
What doesnt make sense is grossly overcomplicating things by trying to shove too much into some large scale platform and then further complicate it with a virtualization layer. We gave up mainframes and thin clients/fat servers didnt work for a reason.
Sure, its cool and technically challenging. Whats the business reason/driver for going the cool/challenging route again?
If the OP decides to quit 2 months after implementing his super cool setup because the job after that is completely boring, who can come in and grasp what he's set up and maintain/upgrade it? Another finicky tech guru that wants to play with the stuff on the job and gets bored and walks off a couple of months later?
$200 machine = no raid, no ECC memory, no hardware monitoring, no support for server OS-es, not to mention that most netbooks can't run 64-bit, which means the latest Windows server is Just Not An Option.
Good advice! Lets run ALL of our business critical functions off laptops just to avoid learning about new technology! Lets all run on mixed hardware and have to deal with drivers from fifty vendors!
You really don't understand what virtualization provides, so maybe you should read up on it a little bit before you go spouting off.
It's not hard, it's not "massively complex" unless you go out of your way to do it wrong, and it has nothing to do with mainframes or thin clients.
Virtualization isn't some "super cool" buzzword technology, it's a money saver. It reduces costs massively. It makes hardware maintenance an order of magnitude cheaper and safer. There's a reason everyone is switching to it.
If you can't keep up with technology, you shouldn't be in IT.
I, personally, am TOTALLY in agreement with the ethos of whoever designed it, a single box for each service.
...
Virtualisation is, IMHO, *totally* inappropriate for 99% of cases where it is used, ditto *cloud* computing.
I totally disagree.
Look at some of the services he listed: DNS and DHCP.
You literally can't buy a server these days with less than 2 cores, and getting less than 4 is a challenge. That kind of computing power is overkill for such basic services, so it makes perfect sense to partition a single high-powered box to better utilize it. There is no need to give up redundancy either, you can buy two boxes, and have every key services duplicated between them. Buying two boxes per service on the other hand is insane, especially services like DHCP, which in an environment like that might have to respond to a packet once an hour.
Even the other listed services probably cause negligible load. Most web servers sit there at 0.1% load most of the time, ditto with ftp, which tends to see only sporadic use.
I think you'll find that the exact opposite of your quote is true: for 99% of corporate environments where virtualization is used, it is appropriate. In fact, it's under-used. Most places could save a lot of money by virtualizing more.
I'm guessing you work for an organization where money grows on trees, and you can 'design' whatever the hell you want, and you get the budget for it, no matter how wasteful, right?
That would only works if the password is kept on a temporary file. Otherwise there is no reason whatsoever the password would be anywhere on disk. And that does not work at all if you use a bootable CD.
But that's not how it happens in the real world. Most people don't run their computers from read-only media with the swap turned off!
First of all, there's lots of bad developers out there. Passwords get saved all over the place, in the registry, configuration files, etc... I've seen web sites that were "https", but then put the plain text password into the URL, which is saved in the unencrypted browser history!
Second, even if you store passwords in memory only, the pagefile might still contain it, if a page containing the password was swapped out. It's even more likely with hibernation files, which swap out everything, including kernel space marked as non-pageable.
In theory, there's features like "protected memory" that developers can use to store passwords securely in memory, but this takes a lot of work. In Win32 there's a set of APIs for it, but many developers don't use it, or haven't even heard of it. It's such a low level "buffer manipulation" style API that lots of high-level languages can't or don't use it. It's only recently that C# got support for it, for example, and I don't think Java has anything comparable. Most garbage-collecting languages are vulnerable, because memory can be relocated (copied) at any time, which may prevent buffers from being properly cleared.
One of the worst culprits are those "I forgot my password" web pages that email you your plain text password to your mailbox, so that your email client can then cheerfully write it all over the place. Even if you encrypt your PC's disk, but use corporate email, your password is now in plain text, on the server's disk.
In practice, real security is hard. Very, very hard. As a consultant, I've been to over 100 clients, including major banks and very security sensitive government institutions, and I've only ever seen 2 secure networks: One financial services company, and the internal LAN on the new generation Boeing planes.
However, to break an 8-character alphanumeric password (case and numbers), that becomes seven months
Ah... theory!
In practice, even very long passwords are trivially cracked in little time, using simple methods.
Unfortunately, I lost the source, but while studying cryptography myself, I stumbled upon a quote from some guy involved in government decryption in the US, and (paraphrasing), he said that their technique was basically to pick up the hard disk from the machine with the protected content, and then simply try every consecutive range of bytes as a password.
Unless the disk was encrypted with 'whole disk encryption', it works something like 90% of the time, simply because of stupid software saving plain-text passwords, users reusing passwords for various purposes, things like hibernation and page files, etc... I suspect that on disks from corporate networks, it would work even better, because if any one disk reveals the network admin password, you can unlock everything else from there.
So if you have a 100 GB disk, and you try all byte ranges from 4 to 20 bytes long (to account for various password lengths), and you try every byte range as both an ASCII and UTF-16 string, that's merely 17x2x100*10^9 = 3400 billion passwords to try, or 3.2 days at your quoted "12 million passwords per second".
In practice, most disks would crack much faster than that, if you aim the algorithm at the most likely sources first, such as the page and hibernation files, the user registry, and the web browser cache and configuration folders.
The lesson I took away from that is that against an attacker with physical access, it really doesn't make the slightest difference how strong your password is, unless the entire disk is encrypted.
I have looked up roaming profiles, they don't work right and a large percentage of software won't work with them. a large percentage of (third party)apps don't allow multiple instances of themselves to be run by different users.
while not specifically all MSFT fault it is, because MSFT designed the system for single users, and then let third party developers do whatever they want. Oh and the corporate deployments store a login and one folder usually my documents. not the entire user folder. not all their settings. Somethings simple like background picture, outlook settings, IE settings, even taskbar colors are stored only locally. meanign I get a different setup depending on which manchine has my profile. Also each machine with your profile needs to be setup ahead of time to allow you to login. I can't just get up and walk over to another work station and login and have everything at hand. the admins have to login and allow you to have your profile on the specific machine. It is far from easy and far from trivial. As this is the third company I worked for that have had these limitations.
that is a roaming profile, not a roaming user.
I think you're confusing things, roaming profiles are basically indistinguishable from the perspective of end-user software, in all my years of Windows system administration, I've never heard of a program "incompatible" with roaming profiles.
Second, if your background pictures and other settings aren't following you from machine to machine, then you do NOT have roaming profiles, because that's precisely what they're for.
Last, profiles don't contain all of your information, they need to be combined with folder redirection to get the "full" effect.
This is well known, well documented, and takes like 100 mouse clicks to configure across the entire environment (it's set by group policy).
In most of the environments I've worked in (or set up myself), a user can sit down in front of any computer, and have everything exactly the same. This is not rocket science!
Be careful with the attitude that just because you know how to do something for platform A but not platform B, that it somehow "can't be done for platform B".
I could do that with *nix in 1996. and the setup was several years old at that point. What you describe is nothing special in fact it should be standard but the corporate world had to what for MSFT's closed buggy versions. and had to wait for the "desktop" to be replaced once again with mainframes with the horsepower to drive them.
MSFT reinventing unix poorly for 20 plus years. I am still waiting for proper multi user and multi monitor support( the hardware and drivers for multi monitors are good, the apps suck hard and yes OSX isn't any better) why do only some apps store information in the user data folders, while others store it locked in a registry that is machine specific. and i am talking about MSFT own apps not third party?
"PXE was introduced as part of the Wired for Management framework by Intel and is described in the specification (version 2.1) published by Intel and Systemsoft on September 20, 1999." (Wikipedia)
I imagine you're talking about something like booting Unix workstations from an NFS share, right?
I don't know what you mean by "proper" multi user support, Windows desktops allow multiple logged on users (one active at a time), and server editions have had terminal services for years now. The registry is not "locked", and not "machine specific". Look up roaming profiles, the user hive is just a file, and almost all corporate deployments store it on the network.
Just because you don't know how to set it up right, doesn't mean it can't be done.
Come on, you're talking about ancient history. The 70s? That was the Soviet Union!
What about Alexander Litvienenko? He fled Russia in 2000, was granted asylum in the UK in October 2006, and by November 2006 he was murdered. The killers used a radioactive isotope that would not have been available to the average crazy on the street -- clearly sending a message.
A rare isotope that's only made in quantity in old Russian RBMK nuclear reactors.
It's genious. They deny any official involvement, while choosing an obscure method of murder that only they could have applied.
"We didn't officially do this thing that only we could do. Be warned that way may not officially do similar things to you too."
Well said.
People forget that many of the Microsoft tools just work.
Just recently, a coworker and I set up a "green field" environment from scratch in two days flat (starting with one blank server), and had network authentication (AD), enterprise email (Exchange), and a PXE-booting machine imaging system (including Office) deploying images to laptops, including drivers! It was kinda neat that we could boot a blank machine, and within 30 minutes be able to log in with a user account, double-click Outlook, and see an inbox.
In the corporate world, Microsoft still rules, and will continue to rule for a long time.
What are you going to run on those Intel boxes?
Windows Vista?
When you buy the Sparc monster you get all the necessary software in order to partition your machine as needed.
With your 10 Intel machines, please pray tell me, how do you assign two of them to the same virtual server? (including memory, disk and network capacity).
I'm guessing you skipped the second part of my post, where I mentioned that both Intel and IBM make 8-32 socket servers systems. The Intel servers would be well under $100K.
Dell ships servers with ESXi hypervisors installed as firmware on the motherboard.
If you want a more powerful virtualiation solution, there's Xen, Hyper-V, or the very mature ESX server. The latter lets you treat a cluster of hosts as basically one giant pool of memory and CPU. Virtual machines just float around, migrating from host to host in seconds, with zero outage. You can literally drag & drop a virtual machine across data centers, live. Solaris doesn't even come close to that.
If you don't like Windows, you may have also heard of this operating system called Linux.
And finally, as I mentioned, Intel can run SOLARIS.
"Some hardware engineering but that SPARC stuff really isn't competitive."
Really?
How much do you know about "that SPARC stuff?" It's true that x86 has finally surpassed a lot of the things that Sparc led the way in, but there are still ways that traditional Sparc scales better.
Now moving to the next generation of Sun's gear, we have hardware virtualisation and CoolThreads. Under a hundred grand will buy you a system with four 8-core CPUs, and each core can process eight simultaneous threads. That is OLTP nirvana! Too much power? Chop it up into a handful of smaller servers, each running their own OS. Any one of them can in turn be split into zones--soft OS partitions.
I keep hearing about how Sparc is obsolete, and yet the new generation of Sparc processors and supporting hardware is pushing the state of the art that Intel and AMD aren't even planning in yet.
Umm... what?
First of all, for "a hundred grand", I can buy 10 systems that add up to 80 Intel 3Ghz cores (160 threads) with 720GB of memory, which is going to shit all over that SUN box with its anemic 1Ghz processors. That's retail pricing, in Aussie dollars! Including tax! Delivered to your door in under a week, assembled!
Meanwhile, to get that SUN box, I'd have to "call your nearest SUN dealer". Oh good, I can't wait to have him explain to me how spending $100K is going to "save me money", or something.
I'll grant you that 32 cores in a single box is needed for those rare cases where you need "one big box to rule them all", but SUN has dropped the ball on that too:
Intel is releasing their 2GHz+ 8 core, 16 thread Nehalem-EX processors this year (or very early next year), and it has glue-less scaling to 8 processor sockets (64 cores, 128 threads) and a jaw-dropping 128 DIMM sockets. With the dirt-cheap 4GB DIMMs that most people are buying, you could pack in 512GB into a single box for a mere $24K. Again, that's retail pricing, in Aussie dollars, including tax.
Meanwhile, IBM is about to ship their 4GHz+ 8 Core, 32 thread POWER7 CPU, which scales to 32 sockets. In case you missed that, it's 4x the clock rate and 8x the sockets, or 32x the performance of that SUN server.
Not to mention that both IBM and Intel processors have had virtualization (same thing as "zones" or "partitions") for a long time now, and can run more than one kind of OS side-by-side. The POWER processors can run various IBM operating systems as well as Linux, and Intel is compatible with damned near anything, including Solaris.
Face it, SPARC is dead, the big-boys are making chips with several times the power, for a fraction of the cost. (admittedly, POWER7 isn't going to be cheap)
PS: I'm not surprised SUN is generally losing their market share, even their x86 kit is overpriced. I personally love the concept of the SUN Thumper ZFS-based storage array, and was all excited about it, right up until I saw their pricing model: They only go to up to 1TB drives, and it's actually cheaper to buy the model with 250GB drives, then throw the drives out, and go buy 48 replacement 2TB SATA drives from a retail store. That's 2x the storage for 1/2 the cost. Insanity.
I think SUN forgot that some of their potential clients can count.
Ten years for a nuclear plant to go online seems high.
For example, check out the Westinghouse AP1000, they claim:
"The AP1000 design saves money and time with an accelerated construction time period of approximately 36 months, from the pouring of first concrete to the loading of fuel"
That's not bad. I imagine it would take at least a year or two to get funding and approval, but even then it would only take 5 or at most 6 years for a new plant to start producing power.
Once a company has approval and a line of funding, building more would take less time, you can deploy these side-by-side in a cookie cutter fashion with an accelerated time line.
Coal and oil are plentiful, cheap, and easy to use. Compare this to idiotic technologies like wind and solar that are hugely expensive, unreliable, and hurt the eyeline of the cities they are installed in. And people wonder why environmentalists are considered stupid.
Excuse me, but caring about our planet does not make somebody stupid.
Caring only about your pocketbook, however, does make you a greedy asshole.
And thinking that eveyone must have the same order of priorities as you does make you stupid.
Also, most wind turbines aren't built in or even near cities, they're usually off-shore or on hilltops somewhere out in the countryside.
There is one experimental wind turbine in Sydney, which I could see from my University. I used to love staring out the window at it, I found the slow steady movement to be relaxing.
Not everyone thinks they 'ruin' a view.
Can you expand on this?
I've never seen a 'free' IOS download on Cisco's site, anywhere, ever.
This is what computers were designed to do, but instead of just doing a numerical simulation, physicists insist on waving their hand and dismissing the error term like it's not even there, so they can keep using nice pretty exact solutions that... don't agree with reality.
I think these people may disagree with you.........
You'll find most of those simulations are Newtonian. I just checked some of their latest papers, and they all use Newtonian or modified Newtonian (MOND) codes. The code they run is called "GADGET-3" (they also used earlier versions in the past), and according to this high level description, it's Newtonian. Admittedly, it's an impressive simulator, but it seems to concentrate on scale (many particles) and on including many effects like gas interactions, magnetohydrodynamics, etc... but not a relativistic metric.
If you can find even one of the published papers on that site that even mentions to worth 'relativistic', I'd be very interested in reading it. The papers are linked from here, they have links to the full papers on Arxiv.
I can tell from that quote that you don't have the slightest idea of what you're talking about. Even a sophomore physics student can see that these stars are no where near in the conditions where relativistic equations even matter. Jesus christ man, don't you think that if it was that simple a fix physicists would know about it?
And you can show that mathematically? DID YOU PROVE IT, or just assume, like everyone else?
Umm... I took more than a year of physics. I have a Bachelor of Science. That particular story actually happened in second year (it came up while we were studying Quantum Mechanics), but I've seen similar hand-wavy assumptions made even in very advanced materials later on.
NASA uses Newtonian mechanics for solar system navigation because it is precise enough for that application, and it's predictions agree with reality.
Newtonian mechanics is not good enough for dealing with the motion of galaxies, and it's predictions do not agree with reality.
So all that stuff I heard about MOND was just in my head? Thanks for grounding me in reality!
Did you read read ANY of that? MOND stands for Modified Newtonian Dynamics.
My entire point was that physicists are using simplified newtonian equation of motions instead of the known correct General relativity.
It's basically a type of laziness. Newtonian equations are simple, and solutions for the rotation of galaxies can be done on a blackboard. Even MOND is relatively straight forward. Full General Relativity is hard to solve, and galaxy rotations are particularly complex because of the complex axial distribution of matter. An exact solution for a system as complex as a galaxy is probably impossible, and even if it was possible, it would be way out of the league of any human mathematician.
This is what computers were designed to do, but instead of just doing a numerical simulation, physicists insist on waving their hand and dismissing the error term like it's not even there, so they can keep using nice pretty exact solutions that... don't agree with reality.
O M G ! - W T F ! Low level physics classes use lots of simplifications? That explains why I can't find massless ropes and frictionless pulleys on E-Bay!
Except that I found this general tendency to dismiss higher order error terms to persist through every year I studied physics at University. I didn't drop out in first year, just so you know.
Simplifying equations is not straight forward, you have to be able to show mathematically that the error term is truly insignificant, but this part seems to be glossed over. Students learn a huge array of simplified equations, and are never really exposed to the original thinking and justification behind them, and often don't even realize they're working with techniques that may not work in corner cases. These same students become researchers and write papers about dark matter.
How do you propose that we expand our knowledge without acquiring more evidence? How would you test a hypothesis or a theory without searching for more evidence for or against it?
Why waste time looking for evidence to support a theory based on a known-wrong theory?
Why wouldn't we first re-try the simulations of galactic motion with the known-correct theory, before wasting millions of dollars looking for some mythical invisible matter to match the error term of a simplified equation?
Did you even read what I wrote?
Simulations of stars in galaxies are approximations because:
1) there isn't an equation for an exact solution to any gravitationally bound system containing more than 5 objects.
That's not what I said, numerical precision is not the issue. If you're only 10 decimal points accurate, that's still good enough. Numerical simulations, of for example, the solar system can be done (and have been done) relativistically to huge precision. Dark matter theories are implying that there's a 10x error. That's not accountable for by numerical precision!
2) stars in a typical galaxy are not uniform so the simulations must take this into account as a best guess.
Well duh. However, the simulations I was referring to perform a 'fit' of observed data. We know the observed velocities of stars in galaxies, and given that, a good model can predict the mass distribution of stars that can produce that motion. Using newtonian mechanics, this 'fit' can't be done, an extra "dark matter" term is required, where that dark matter doesn't behave like ordinary stars under gravitation. Using relativistic mechanics, the fit is an exact match to observed galaxies.
3) newton's equations are indeed incorrect however, Einstein's equations only dominate to a significant degree under unusual conditions.
You just made the exact same, unfounded assumption everyone else has! How the do you not see that it's a HUGE MISTAKE when observed reality doesn't match the predictions of your simplified model? How is that not enough of a clue that maybe the model may be oversimplified?
Galaxies are massive. Did you not notice that? It takes 10,000 years for either light or gravity to cross one, and they bend light to the point that we've got nice Hubble pictures of them looking like a reflection in a funhouse mirror. Higher order error terms can't be ignored every time with a wave of your hand. Nonlinear effects can be subtle, and will bite you in the ass if you ignore them.
In so far as dark matter is concerned, you are incorrect. Experiments like the Cryogenic Dark Matter Search are attempting to detect dark matter particles directly, we've got neutrino detectors looking for evidence of annihilation events... Particle accelerator experiments attempting to actually synthesize dark matter candidates.. To claim that there isn't a way to test the dark matter hypothesis would be grossly inaccurate.
Disclaimer: Physics isn't my major but I did study quite a bit of it in high school and college.
There's way to test for dark matter, sure, but no test has even provided a slightest hint of anything that might be there. Physicists are claiming that 90% of the matter is invisible.. so where is it? That would mean that there would be... 10x more stuff around. I think we'd notice.
The corollary to that rule is that many applications won't run because they're poorly architected and require administrative rights to run. Oh, sure, you can finagle around with permissions and get many of them to run, but is it really worth the time to work around broken software? (running Windows which itself is broken notwithstanding)
Yes, it's worth the time, that's what I did for years as a Citrix server admin. It's worth doing it for desktops too, otherwise you end up spending half your time re-imaging infected machines.
The difference is that dark matter and dark energy can be tested for in various ways; a deity can't be.
When physicists can't explain something they may use a place holder at times but there's no chance of just giving up like the "god did it" explanation does.
No, they can't test for it, that's the problem.
This is more along the lines of "our equations don't explain the observed motion of galaxies, therefore, there's matter there we can't see or touch."
That's just not a logical conclusion. It leaves out the much more likely answer that our understanding of the equations of motion is wrong.
The really stupid thing is that all of the predictions that disagree with observed reality (and are the cause for the dark matter/energy predictions) are approximations. Most galaxy motion simulations are based on either Newtonian mechanics or "modified Newtonian" mechanics, even though both are known to be wrong. Einstein showed them to be wrong over a hundred years ago! My second sentence actually should have been:
"Our known-wrong, simplified equations don't explain the observed motion of galaxies, therefore, there's some magical invisible stuff there we can't see or touch."
That's some good science, right there.
Disclaimer: I studied physics at University, and both me and a friend of mine noted during our studies that Physics seems to overuse simplified equations even in situations where it leads to substantial errors. The example that my friend noticed for example is the classic "double slit" interference experiment. Take a look at the equation used . Those simple equations are the ones we learned about also. They're wrong. In many practical cases, the error can exceed 30%!
Are there any other filesystems with that feature? If not, I'm very strongly considering writing my own.
I was actually thinking the same kind of thing a few years back, but I did some back-of-the-envelope maths and realized that a de-dupe filesystem is actually quite hard to implement.
A naive implementation is simple, but slow. The issue is that the hash codes are basically random, so you have to store all of them in memory, or suffer horrendously expensive random disk lookups, which can't be cached easily.
Imagine this scenario: If you use SHA-256, then that's 32 bytes per has code, minimum. If you take a single 2TB SATA disk, and carve it up into (relatively large) 64 KB blocks, then you have 16M blocks, or 512MB of raw hash code data that you have to keep in RAM, all at once, ignoring overheads, which are substantial. In practice, expect that to be more like 1 or 2GB. Sure, that's only 0.1% of the original disk capacity, but that's just one disk! A SUN thumper has 48 SATA disks in a single chassis, or about 80 TB usable after overheads, which adds up to at least 40 GB of hash code data, or more like 80-100 GB for a typical naive implementation. That's a lot of data to be keeping in the kernel, and would require 128GB of physical memory in the server if you also wanted some room for file data caches and whatnot.
Real world de-dupe filers often use several fancy algorithms at once to reduce effective RAM requirements, but it takes a lot of work. For example, some filers use hierarchical hashes, others use Bloom Filters, and I've heard of filers that partition the hashtable and use file identification heuristics to load likely partitions on demand.