The main thing I wanted to state in the very beginning in all of this was something like:
1) Vector computing gets very little attention in the US. (I couldn't find a single vector machine in the top100 which was located in the US)
2) I find it amusing that japanese vector machines are not in use in the US, because of politics
3) I'm wondering why no US company is building vector machines that are used for *high*end* computing (meaning, it will at least appear at top500, which no Cray Vector does)
I don't want to bash Cray and I'm not on Hitachi's payroll:)
I was just saying that US is not doing vector computing in any big way anymore. And I think that my statement is being backed by the fact that none of the 500 fastest supercomputers in the world is an american produced vector computer. The vector machines rank high on the Top500 and are well represented (considering them being extremely non-commodity hardware), but not one single vector machine present on Top500 is produced by Cray or any other american company.
The fun part is, that Cray has been moving in the direction of ``kludging together 4000 4-way SMP Linux boxes with 100BaseT and some duct tape'' as you so elegantly put it. I know, they're not building beowulf clusters yet, but they are using Alpha processors (instead of Cray Vector processors) in all of their machines that are actually worth mentioning (T3E etc). It is a big step in the commodity-hardware (more cheaper cpus in favour of fewer expensive ones) direction.
Yes, it will be interesting to see what happens next. I do not understand why american companies have been moving away from the vector market - any idea ? At least IBM should have the resources to produce a vector CPU. I think it's Hitachi that's licensing the Power3 core, but they're using it in a heavily modified Power3-like CPU with vector registers. Kind of interesting...
Oh, and about the clusters: The ASCI project is funding the development of hardware and software for very high speed computing. It is an american initiative (so american-only vendors, meaning no vector machines because Cray cannot deliver anything remotely reasonable in the high end) with the goal of producing computing systems powerful enough to simulate nuclear weapon tests (in order to eliminate or reduce the need for actual live testing). Simulating that kind of physics is a very real-world problem (not your average distributed.net/seti@home embarassingly parallel problem). However, the four fastest supercomputers in the world (ASCI White, ASCI Red, ASCI Blue_Pacific and ASCI Blue Mountain) are clusters. Yes, it's not 100BaseT, but the machines are certainly not SMP/ccNUMA either. You will notice that the 4.9TFlops/sec for #1 is much less than the 12.3TFlops/sec which is the theoretical peak for that machine. The Hitachi and the Crays are much closer to reaching their theoretical peak, of course, because as you pointed out their architectures are so different from the clusters. But the clusters are real nonetheless.
Why do you think Fujitsu puts 16 GigaBytes of memory INSIDE the processor module ??
Top vector machines in the world: (from www.top500.org again)
9) Hitachi - 917GFlops/sec - In Japan
12) Fujitsu - 886GFlops/sec - In the UK
13) Hitachi - 873GFlops/sec - In Japan
18) Hitachi - 691.3GFlops/sec - In Japan
24) Hitachi - 577GFlops/sec - In Japan
33) Fujitsu - 492GFlops/sec - In Japan
35) Fujitsu - 482GFlops/sec - In Japan
37) Hitachi - 449GFlops/sec - In Japan
59) Fujitsu - 319GFlops/sec - In Japan
63) Fujitsu - 296.1GFlops/sec - In Japan
65) Fujitsu - 286GFlops/sec - In France
67) NEC - 280GFlops/sec - In France
76) NEC - 244GFlops/sec - In Japan
77) NEC - 243GFlops/sec - In Australia
78) NEC - 243GFlops/sec - In Canada
79) NEC - 243GFlops/sec - In Japan
...
I found lots of crays, but all T3E, meaning Alpha based. In the first 100 entries, I could not find one single vector machine located in the US.
Ok, I was actually now aware that Cray (that was sold to SGI, and then sold to some other company who's now ditched their old name and are using the Cray name for it's brand name value) were still doing vector machines.
They mainly build Alpha processor based machines these days. And if you look at their vector machine, you'll notice that they brag about 1.8GFlops/(sec*cpu). For a vector CPU, that would have been fast five years ago. Look at the other numbers as well... Scales to 1.8TFlops/sec versus 4TFlops/sec for Hitachi, which would do so on much fewer CPUs, which again would mean that you would usually be able to get much closer to the theoretical peak on the Hitachi. 40GBytes/sec memory bandwidth, versus 40TBytes/sec...
A 1 GHz Athlon can sustain a little more than one GFlops/sec on a matrix multiply - compare that to your 1.8GFlops/sec ``vector'' CPU... Oh, the Athlons are made in Dresden (Germany);)
For real evidence, rather than marketing numbers, see the Top500:
1) ASCI White - 8192 Power3 CPUs (IBM)
2) ASCI Red - 9632 Intel CPUs (Intel)
3) ASCI Blue Pacific - 5808 604e CPUs (IBM)
4) ASCI Blue Mountain - 6144 MIPS CPUs (SGI)
5) ?? 1336 Power3 CPUs (IBM)
6) ?? 1104 Power3 CPUs (IBM)
7) SR8000-F1/112 112 Hitachi CPUs
...
10) T3E1200 1084 Alpha CPUs (Cray)
Notice that the Hitachi which is somewhat faster than the Cray has one order of magnitude *fewer* CPUs than the Cray. And the fastest Cray in use today is *NOT* a vector.
See www.top500.org for more info.
I give you, that Vector computing is still alive in the US. I am not going to agree that it is ``well'' too.
Hey, there were no images of the naked vector CPUs presented by Fujitsu, or the tweaked power-cpus from NEC or Hitachi...
One CPU:
256 registers
each register holds 64 double precision floats
330MHz
32 FLOPS per clock-cycle
One CPU will yield 9.something GFLOPS. It has 16GB memory in the processor module. And they sell MP machines from a few GFLOPS to 4TFLOPS. Both Fujitsu and Hitachi had processor modules stripped down - they were *sweet*!:)
So, the 760 chipset has a 200MHz FSB ? Well, Hitachi has 40 TeraBytes/second memory bandwidth in the local machines, and some ~8GigaBytes/second between machines.
Friends, vector computing is not dead - unfortunately only Japan produces vector machines, so only asia and europe can use them. US national laboratories are not allowed to buy them, not because of export regulations, but because of import regulations... Go figure.
Why use a system if it's hard to use and there is an easier one around ? Reasons could be idealism, religion, financial, force etc. Those are not the reasons I prefer a system with decent tools and network connectivity for my work. I have worked with NT, and I've been doing real work there. And I did not like it - why ? Well, if you have two machines and you are building networked software - then you better have a monitor and a keyboard on each one of them, and move both of them into the room where you sit. Not so with most other systems I've touched. Ok, so maybe not everyone is developing distributed systems allright. But the tools - man... When you get an ``enterprise edition'' of a development suite and the tools they ship are mostly of the type ``CPU stress tester'' and ``windiff'', you sort of start wondering... Then you stop wondering.
Anyway, I fully agree with anyone who claims that the UN*X like systems (such as GNU/Linux) are not the easiest systems for everyone to use. Sure, no system is. But don't generalize. We are *a lot* of people out here who don't care that much about following the pack and doing what we're told the others do, but actually need to do real work that is made possible only because of networked time sharing systems. Free or proprietary, gratis or costly, but all interoperable and the easy choice for some of us.
Too bad they think that the Unices can match GNU/Linux in TCO, just because the vendors give away their software for `virtually no cost'.
Why is it so difficult to understand for reporters that the cost of acquiring the software for a GNU/Linux system has nothing to do with the total cost of ownership ? That gratis software does not necessarily mean that bugs get fixed within the hour, etc.
No wonder they are having a hard time predicting the Free systems growth:) Oh well, maybe this time they are less far off than last. In a few decades they will maybe even learn.
Ok great, so we may one day see the.NET subscription software available for GNU/Linux ?
I'm really happy now.
What is it exactly this will mean to us,/if/ some day maybe it is really ported, and, if it actually works sufficiently well to be usable ?
With Koffice (and GNOME Office Suite which will be out and in good shape long before.NET for GNU/Linux) that last reason to bother with alternatively licensed (non-GPL) software is gone.
Let's face it - we're in a position where such news are irrelevant:)
Most likely, I suppose AMD has secretaries like most other companies;)
But do you think that AMD would be spending time and effort on a Linux port right now, if NT for Sledgehammer was just around the corner, with server applications support etc. etc. ?
My bet is, that AMD tried, and Microsoft were either honest (heh, no let's be serious) or AMD figured out that Oracle/MsSQL/DB2/SAP/whatever on 64-bit NT is much further away than anyone planning to ship a new CPU this decade would like to even think about.
The GNU/Linux system has the easily-adabtable tools, the easily-portable user-space, and an easily portable kernel. While Microsoft has the marketing people, the company with an employment politic that says education doesn't matter, somehow (I wonder why, nah, I don't) doesn't have any of the other.
Don't expect NT/2K to run in ``real 64-bit mode'' on the sledgehammer anytime soon. NT never ran 64-bit on the Alpha either, user-space processes were always 32-bit.
Porting NT to a 64-bit CPU may be doable, and it seems MS is doing just that. With Linux that took some effort too, but luckily that was done while Linux was still young.
However, there's a lot more to 64-bit computing than just the kernel. You want your user-space to go 64-bit as well, and this is where it gets innteresting... Why didn't NT on the Alpha run 64-bit processes ? Well, does Win_32_ ring a bell ?
In POSIX-like systems such as GNU/Linux, you use data-types such as int and char* for passing integers and pointers to data. In Win32 you use DWORD and LPxxx etc. The Win32 DWORD is _exactly_ 32 bits, and every program written for Win32 will tend to depend on this. Even worse. the LPxxx pointers are _also_ assumed to fit in the same space as a DWORD, namely 32 bits. A program using char* to reference a block of data, will be equally valid on 16, 32 and 64 bit architectures, because the datatype states it's just a pointer. In Win32 your LPxxx types are 32 bits, meaning they make no sense in a 64-bit environment. Tough.
What I'm getting at (slowly I know) is, that while Microsoft is currently trying to design a Win64 API that everyone must now port their programs to in order to take advantage of the 64-bit environment, the GNU/Linux community will hit the ``make; make install'' combination, and the vast majority of applications will only need minor fixes if they have not already been ported to a 64-bit architecture. But chances are, of course, that the vast majority of GNU/Linux apps have been working just fine on 64-bit architectures for years.
64-bit windows will not happen in the next few years. Even if Microsoft should ship a native 64-bit NT, _and_ actually finish up Win64. It's the applications that matter, and that is one thing that Windows just do not have (despite popular (trash-media) belief). Go easy:)
There is one problem though... We need the desktops using this processor if it has to be just remotely affordable, and if we want a decent number of motherboards to choose from
How cheap is the Xeon currently, even though it has very few benefits over the ``desktop'' processors such as Athlon or PII/III ? Not very. Why ? Because it's not sold in mind-boggling quantities. Well, also because Intel prices it in the high end, but that's a chicken-and-egg scenario wrt. the desktop market.
Currently, in this era of the lemming mentailty, we depend on MS windows processor support, if a processor is to be used very widely. Unfortunately this means, the Sledgehammer won't be affordable until MS releases an OS for it. Yes that sucks, but blame society:)
But yes, GNU/Linux support for the Sledgehammer some year or two ahead of Microsoft is going to give us great press. At least for a month or so. But counting in the long-term memory and interest in any kind of history (even just last year's history) of reporters, I doubt that we will benefit much from this once MS finally ships their OS for the Sledgehammer.
Don't get me wrong though. I think this is great, and the Sledgehammer may well prove as an alternative to the high-end and expensive Xeon CPUs from Intel, and they may well be used by those who need it enough to be able to afford it, for things like Oracle, SAP, weather forecasts, nukes, and what gives...
Way to go SuSE ! (and AMD!):)
Ok, here's you first C++ lover's comment:)
You're definitely misguided about the pass-by-reference comment. Sure, pointers have their uses, and sometimes (maybe even often, depending on what you're doing) pointers are the sane choice and references aren't.
But references guarantee you one thing: they always reference something, there's no such thing as a null reference. Functions taking pointer arguments should check for (assert() or properly handle) null pointers. A function taking a reference _knows_ that it's a valid addressable object (object as in struct, simple variable, or whatever).
Besides, why do you mention STL along with ``dangerous interfaces'' ? STL is type-safe, and it's a proven implementation (usually - but there could be bugs in glibc too remember). Using std::map<something> is *a lot* safer than re-implementing your balanced tree every time. Sure you can use glib, but STL is type safe and C-casts/void-pointers aren't.
Did I mention that std::map<foo,bar> will also often be faster than your generic C tree ? if the comparison operator for the foo type is simple (eg. integer compare or similar), it can be (and will be) inlined in the core map implementation, something you cannot do in C without either re-implementing your map every time, or implementing it in a macro. You simply save a function call for every compare - something that is noticable when the compare is a simple operation.
The real problem with C++ is that people tend to think of it as object-oriented C. Well, it is, but it's also *much*much* more. A C programmer trying to bake up a ``pretty'' API i C++ will often fail - we've all seen that. But good interfaces are definitely possible - take a look at STL. And note, that STL is not just object-oriented wrappers, it's a type-safe interface of objects and functions relying heavily on parameterized types, actually allowing you to write non-trivial programs that run as fast, or faster, than equivalent C code.
The other real problem with C++ that I have to admit to, is compilation time. It is the *only* real drawback that I can point my finger at. But I can accept it. When the compiler builds type-safe trees of lists of strings for me, that run as fast as they do, I can accept the extra hardware cost as ``fair''.
If one where building commercial software for FreeBSD, which version should be supported ?
It would be nice to support only one version, or at least build for one version and maybe run on more. Probably the newest version (4.1) isn't the good choice, because too few have migrated to that one yet, but would it be backwards compatible ?
What about forward compatibility ? A product build on 3.5, would it run on 4.1 ?
It's hard to find info about this on the web, so I'm asking the question here in the hope that someone in the know could let me know:)
It's really funny to read stuff like this. I use GNU/Linux because I find it the easiest system to use for the work I do, the freedom part is a nice side effect which have become important to me now that I'm used to it, but freedom was never why I chose the system at first. Besides, why am I talking about freedom when we're talking about UNIX ? Nevermind.
Read any paper or article where some two-bit reporter mentions UNIX or GNU, and watch him bitching about those complicated commands, ackward syntax, and what not. Now that's a person who never took the half hour it takes a chimpanse to learn the effect of the ``|''. It's almost not funny.
I'm happy knowing that the system I use is build from the philosophy of making things easy to use. There's just no replacement for ``|'', grep, sed, or their successors. There haven't been in 30 years, and I'd be damn surprised if there was a replacement for this in the next 10 years. Maybe later on, but not in just 10 years. Virtually nothing happens in this industry in 10 years (remember, pipes are from the 50's, they got implemented in the 70's. The wavelet transform is about 100 years old, we still don't use it for streaming media compression)
The other really funny part is, of course, that the pace of real development -- evolution -- is as slow in this industry as in any other. The time between real breakthroughs is not measured in seconds as some would like us to believe, it's measured in decades. A nice example: If you powered off one of your memory banks on your Multics machine, only the processes living in that memory would die -- even Sun Enterprise series can't do that _today_, you'll have to warn the system of the change first. And people were using toilet-paper for storage those days ! We're 30 years past that, we're about to colonize mars, and our operating systems today can't do what they could 30 years ago.
Oh, and don't even get me started on the new economy...
The risk is there of course, if you hack up a poor script, that you may once in a while kill a netscape process that shouldn't have been killed.
If you consider *) CPU time spent (seconds) *) Process state (running or sleeping) *) Last 1-minute CPU time spent (percentage) and of course the UID (don't kill root's netscape;) you can come up with a really good solution.
And sure, once in a blue moon the script will fail. But hey, this is *netscape*, after all.
You will spot which processes tend to get stuck over time, and add them to the script. Been there done that, and it works.
Please, if you have a better approach tell me about it.
Netscape can be killed, so it's not a big problem that it doesn't always die. It can be managed, as I've stated elsewhere.
Besides, you will be running a multi-cpu server, and one spinning netscape process will only bog down one CPU. The system doesn't slow _that_ much down in such a configuration, and if the process is killed within the next five minutes by the aforementioned cron job, I think it will be an acceptable solution.
I think you're guessing when it comes to the animated gifs. I'm _pretty_ sure that netscape will upload the images to the X server, and tell it to change the picture, not upload the new frame every time.
But sure, X will load the network. Put the applications on the workstations, and Coda/AFS/PVFS/NFS/GFS/whatever will load the network instead.
From my experience, X works extremely well even on a low-bandwidth high-latency line. Of course, if the network is *that* bad, users will notice, but all in all I wouldn't be so damn worried about X and network bandwidth. I don't think it's a problem compared to what you'll see with *file* transfers from running the apps locally on the workstations.
You want people to run netscape on the server for *exactly* the reasons you mention:
It's memory hungry, but quite a lot of memory is spent on the (huge bloated) executable itself, and this will be shared between all users.
Also, not everyone is running netscape all the time. So if you want 30 Megs for one netscape instance, you can probably do just fine with 10 Megs for each such instance on the server, and only half the users have a netscape active.
Besides, Netscape uses the X server to hold images etc. so the 32 megs on the desktops will be put to good use still. But the _real_ bloat can be nicely kept on the server.
Every five minutes or so, cron should run a small script to check for netscape processes that hog CPU (this would be a heuristic, but it can be done well - I know because I've done it). That way the server can kill dead netscape processes, and your users won't come back complaining about their workstation being slow everytime a local netscape process dies.
It is *much* easier to install a new application on a few servers, than it is to install it on a few workstations. In the long run.
When some bug is found in the application and it needs an update, who got the application ? You can use a database for keeping track of it of couse, but still...
Keeping everything homogenous when you can actually do so, is the clever thing to do. This is actually a special kind of setup, since they only need one architecture, and all the servers can be configured the same. IMO it would be stupid not to take advantage of that.
Also, by having the same applications on all servers or on all workstations (whichever approach is chosen), avoids the problem that someone using someone else's workstation is missing applications (and need to bother the admin with the problem).
Comparing this to my approach (running apps on servers):
Both: *) Communication is not encrypted, so the network must be physically secure if you want secure communications between workstations and servers. This should be considered.
Your: *) Apps run locally, taking advantage of the processing power in each workstation. But resources available to one user are those of one workstation, no more. *) Each workstation must be powerfull enough to run the apps well *) Requires a somewhat secure way of sharing user information between workstations (NIS is out) *) A workstation is trusted - it is allowed to mount a filesystem, so NFS is out and Coda is in *) Printing must be set up for each workstation - maybe not a concern, it depends on printing policies. *) Upgrading applications should be automated, so that each workstation can be upgraded easily *) The network is used for file transfers
Mine: *) Apps run on servers, taking better advantage of shared information (shared libraries and loaded executables). The load is balanced on the servers because of the multiple users, so one user can use more than his share of the resources, if they don't all do it at once. *) A workstation will only need to run the X server. If it can do more, we won't benefit from that. *) Communication between servers is physically secure, so you can use NIS *) Workstations are completely un-trusted. You can use the best performing technology to share filesystems between servers (PVFS/NFS/GFS) *) Printing is set up on the servers only. *) Upgrading applications is only a concern on the servers. *) The network is used for X communications
My suggestion: Leave the terminals with 32 meg, as that is plenty for the X server. Your standard terminal should have one single harddrive with a minimal installation of some distribution. You will want to have a standard way of setting up such a machine when a disk dies and gets replaced, but backups aren't needed (as there is no user data on the terminal) and you won't have to care much about updates either (as there should be no daemons running on the terminals).
One big benefit: As the terminals do not hold data, it doesn't matter if they are stolen. Terminals are not trusted.
The X protocol is made for networks, and a 10 MBit/s hose to each terminal would be just great. However, it's not encrypted, so you should at least consider how physically secure your network is, and what the requirements would be.
Then set up one server for each N users. If they are doing web access and text editing, your average ``high end but not that high'' server should be able to run 15-40 users. Maybe more, but I haven't tried this type of workload myself so I can't say. Anyone ?
You will end up with a server farm. Each server should hold a home filesystem locally, and preferrably the users with the homes on that local fs should log in on that server. You can choose to let the server export their home fs'es to the other servers as well and share user accounts with NIS, which would let any user log in anywhere. If a terminal is tied to a user and vice versa, there should be no need for a terminal to be able to choose other servers, but if they're not, then the need will be there.
I've done a few such setups, but at a *much* smaller scale. I can tell you that it is a relief to _only_ have to update software on the server(s).
The Linux kernel has had threads for a very long time now. In fact, it has no other concept available to user-space for an executable task.
Threads are either alone in the VM, in which case we call the whole thing a ``process'', or there can be more threads of execution in one VM area (a so-called multithreaded process or whatever). The difference is _only_ whether there's one or more threads in that VM area.
The difference between Linux and NT is, to the programmer, that on Linux you usually use the pthreads library to create a thread (the library will call clone() which tells the kernel to create a new thread), whereas on NT you use the Win32 library call CreateThread() to tell the NT kernel to create a thread.
pthreads is fairly inefficient, which is why some people believe that threads aren't native to Linux. That is, using pthreads compared to a fork() isn't a lot faster usually, whereas on NT CreateThread() is a lot faster than CreateProcess(). What people tend to forget is, that creating a full-blown process using fork() on Linux, is still a lot faster than creating just a single thread using CreateThread() on NT, on identical hardware (measured in clock-cycles from start of call till first line in new process/thread is reached - source is article LJ some time ago).
Threads can't work much better in the Linux kernel. The pthreads _library_ could probably be improved to make thread creation faster, or you could just call clone() yourself. But this is not a kernel issue.
I get 5.5 MB/s read and 2.0 MB/s write on my four disk software RAID 5. I usually (with 2.2) get 18 and 12 respectively. It may of course well be the software RAID playing in here.
On an older (SCSI) disk I get 5.2 MB/s read and 3.9 MB/s write. This is pretty close to what I can expect from that disk. I guess this would indicate that it's only the semi-experimental software RAID-5 code that could need a little improvement.
On the positive side, the system doesn't freeze for 5 second periods while I benchmark it:)
It think it's safe to say that 2.4-test series are on the right track. There can't be many huge problems left, and the ones currently in the kernel doesn't keep people from testing - which is a very good thing.
I don't know, but they are for sure the ones that ruin performance in a very noticably way:)
I mentioned that those were the only two areas (they're even somewhat related), because I think just about everything else is in place. Knowing that there are only one/two problem areas left, might help give an idea of where we stand today.
The main thing I wanted to state in the very beginning in all of this was something like:
1) Vector computing gets very little attention in the US. (I couldn't find a single vector machine in the top100 which was located in the US)
2) I find it amusing that japanese vector machines are not in use in the US, because of politics
3) I'm wondering why no US company is building vector machines that are used for *high*end* computing (meaning, it will at least appear at top500, which no Cray Vector does)
I don't want to bash Cray and I'm not on Hitachi's payroll
Let me know what you think about 1, 2, and 3
I was just saying that US is not doing vector computing in any big way anymore. And I think that my statement is being backed by the fact that none of the 500 fastest supercomputers in the world is an american produced vector computer. The vector machines rank high on the Top500 and are well represented (considering them being extremely non-commodity hardware), but not one single vector machine present on Top500 is produced by Cray or any other american company.
The fun part is, that Cray has been moving in the direction of ``kludging together 4000 4-way SMP Linux boxes with 100BaseT and some duct tape'' as you so elegantly put it. I know, they're not building beowulf clusters yet, but they are using Alpha processors (instead of Cray Vector processors) in all of their machines that are actually worth mentioning (T3E etc). It is a big step in the commodity-hardware (more cheaper cpus in favour of fewer expensive ones) direction.
Yes, it will be interesting to see what happens next. I do not understand why american companies have been moving away from the vector market - any idea ? At least IBM should have the resources to produce a vector CPU. I think it's Hitachi that's licensing the Power3 core, but they're using it in a heavily modified Power3-like CPU with vector registers. Kind of interesting...
Oh, and about the clusters: The ASCI project is funding the development of hardware and software for very high speed computing. It is an american initiative (so american-only vendors, meaning no vector machines because Cray cannot deliver anything remotely reasonable in the high end) with the goal of producing computing systems powerful enough to simulate nuclear weapon tests (in order to eliminate or reduce the need for actual live testing). Simulating that kind of physics is a very real-world problem (not your average distributed.net/seti@home embarassingly parallel problem). However, the four fastest supercomputers in the world (ASCI White, ASCI Red, ASCI Blue_Pacific and ASCI Blue Mountain) are clusters. Yes, it's not 100BaseT, but the machines are certainly not SMP/ccNUMA either. You will notice that the 4.9TFlops/sec for #1 is much less than the 12.3TFlops/sec which is the theoretical peak for that machine. The Hitachi and the Crays are much closer to reaching their theoretical peak, of course, because as you pointed out their architectures are so different from the clusters. But the clusters are real nonetheless.
As a foot note on this page:
:)
www.top500.org/lists/2000/11/trends.html
you will see that:
Of the 500 fastest supercomputers in the world today:
All vector based systems are of Japanese origin.
Even I had not expected that
Why do you think Fujitsu puts 16 GigaBytes of memory INSIDE the processor module ??
Top vector machines in the world: (from www.top500.org again)
9) Hitachi - 917GFlops/sec - In Japan
12) Fujitsu - 886GFlops/sec - In the UK
13) Hitachi - 873GFlops/sec - In Japan
18) Hitachi - 691.3GFlops/sec - In Japan
24) Hitachi - 577GFlops/sec - In Japan
33) Fujitsu - 492GFlops/sec - In Japan
35) Fujitsu - 482GFlops/sec - In Japan
37) Hitachi - 449GFlops/sec - In Japan
59) Fujitsu - 319GFlops/sec - In Japan
63) Fujitsu - 296.1GFlops/sec - In Japan
65) Fujitsu - 286GFlops/sec - In France
67) NEC - 280GFlops/sec - In France
76) NEC - 244GFlops/sec - In Japan
77) NEC - 243GFlops/sec - In Australia
78) NEC - 243GFlops/sec - In Canada
79) NEC - 243GFlops/sec - In Japan
...
I found lots of crays, but all T3E, meaning Alpha based. In the first 100 entries, I could not find one single vector machine located in the US.
Please point me to it if you can find it.
Ok, I was actually now aware that Cray (that was sold to SGI, and then sold to some other company who's now ditched their old name and are using the Cray name for it's brand name value) were still doing vector machines.
;)
:)
They mainly build Alpha processor based machines these days. And if you look at their vector machine, you'll notice that they brag about 1.8GFlops/(sec*cpu). For a vector CPU, that would have been fast five years ago. Look at the other numbers as well... Scales to 1.8TFlops/sec versus 4TFlops/sec for Hitachi, which would do so on much fewer CPUs, which again would mean that you would usually be able to get much closer to the theoretical peak on the Hitachi. 40GBytes/sec memory bandwidth, versus 40TBytes/sec...
A 1 GHz Athlon can sustain a little more than one GFlops/sec on a matrix multiply - compare that to your 1.8GFlops/sec ``vector'' CPU... Oh, the Athlons are made in Dresden (Germany)
For real evidence, rather than marketing numbers, see the Top500:
1) ASCI White - 8192 Power3 CPUs (IBM)
2) ASCI Red - 9632 Intel CPUs (Intel)
3) ASCI Blue Pacific - 5808 604e CPUs (IBM)
4) ASCI Blue Mountain - 6144 MIPS CPUs (SGI)
5) ?? 1336 Power3 CPUs (IBM)
6) ?? 1104 Power3 CPUs (IBM)
7) SR8000-F1/112 112 Hitachi CPUs
...
10) T3E1200 1084 Alpha CPUs (Cray)
Notice that the Hitachi which is somewhat faster than the Cray has one order of magnitude *fewer* CPUs than the Cray. And the fastest Cray in use today is *NOT* a vector.
See www.top500.org for more info.
I give you, that Vector computing is still alive in the US. I am not going to agree that it is ``well'' too.
But thanks for the information
Hey, there were no images of the naked vector CPUs presented by Fujitsu, or the tweaked power-cpus from NEC or Hitachi...
:)
One CPU:
256 registers
each register holds 64 double precision floats
330MHz
32 FLOPS per clock-cycle
One CPU will yield 9.something GFLOPS. It has 16GB memory in the processor module. And they sell MP machines from a few GFLOPS to 4TFLOPS. Both Fujitsu and Hitachi had processor modules stripped down - they were *sweet*!
So, the 760 chipset has a 200MHz FSB ? Well, Hitachi has 40 TeraBytes/second memory bandwidth in the local machines, and some ~8GigaBytes/second between machines.
Friends, vector computing is not dead - unfortunately only Japan produces vector machines, so only asia and europe can use them. US national laboratories are not allowed to buy them, not because of export regulations, but because of import regulations... Go figure.
Man I could not disagree more with you.
Why use a system if it's hard to use and there is an easier one around ? Reasons could be idealism, religion, financial, force etc. Those are not the reasons I prefer a system with decent tools and network connectivity for my work. I have worked with NT, and I've been doing real work there. And I did not like it - why ? Well, if you have two machines and you are building networked software - then you better have a monitor and a keyboard on each one of them, and move both of them into the room where you sit. Not so with most other systems I've touched. Ok, so maybe not everyone is developing distributed systems allright. But the tools - man... When you get an ``enterprise edition'' of a development suite and the tools they ship are mostly of the type ``CPU stress tester'' and ``windiff'', you sort of start wondering... Then you stop wondering.
Anyway, I fully agree with anyone who claims that the UN*X like systems (such as GNU/Linux) are not the easiest systems for everyone to use. Sure, no system is. But don't generalize. We are *a lot* of people out here who don't care that much about following the pack and doing what we're told the others do, but actually need to do real work that is made possible only because of networked time sharing systems. Free or proprietary, gratis or costly, but all interoperable and the easy choice for some of us.
That was my 0.02 Euro.
Too bad they think that the Unices can match GNU/Linux in TCO, just because the vendors give away their software for `virtually no cost'.
:) Oh well, maybe this time they are less far off than last. In a few decades they will maybe even learn.
Why is it so difficult to understand for reporters that the cost of acquiring the software for a GNU/Linux system has nothing to do with the total cost of ownership ? That gratis software does not necessarily mean that bugs get fixed within the hour, etc.
No wonder they are having a hard time predicting the Free systems growth
Ok great, so we may one day see the .NET subscription software available for GNU/Linux ?
/if/ some day maybe it is really ported, and, if it actually works sufficiently well to be usable ?
.NET for GNU/Linux) that last reason to bother with alternatively licensed (non-GPL) software is gone.
:)
I'm really happy now.
What is it exactly this will mean to us,
With Koffice (and GNOME Office Suite which will be out and in good shape long before
Let's face it - we're in a position where such news are irrelevant
Most likely, I suppose AMD has secretaries like most other companies ;)
But do you think that AMD would be spending time and effort on a Linux port right now, if NT for Sledgehammer was just around the corner, with server applications support etc. etc. ?
My bet is, that AMD tried, and Microsoft were either honest (heh, no let's be serious) or AMD figured out that Oracle/MsSQL/DB2/SAP/whatever on 64-bit NT is much further away than anyone planning to ship a new CPU this decade would like to even think about.
The GNU/Linux system has the easily-adabtable tools, the easily-portable user-space, and an easily portable kernel. While Microsoft has the marketing people, the company with an employment politic that says education doesn't matter, somehow (I wonder why, nah, I don't) doesn't have any of the other.
Don't expect NT/2K to run in ``real 64-bit mode'' on the sledgehammer anytime soon. NT never ran 64-bit on the Alpha either, user-space processes were always 32-bit.
:)
Porting NT to a 64-bit CPU may be doable, and it seems MS is doing just that. With Linux that took some effort too, but luckily that was done while Linux was still young.
However, there's a lot more to 64-bit computing than just the kernel. You want your user-space to go 64-bit as well, and this is where it gets innteresting... Why didn't NT on the Alpha run 64-bit processes ? Well, does Win_32_ ring a bell ?
In POSIX-like systems such as GNU/Linux, you use data-types such as int and char* for passing integers and pointers to data. In Win32 you use DWORD and LPxxx etc. The Win32 DWORD is _exactly_ 32 bits, and every program written for Win32 will tend to depend on this. Even worse. the LPxxx pointers are _also_ assumed to fit in the same space as a DWORD, namely 32 bits. A program using char* to reference a block of data, will be equally valid on 16, 32 and 64 bit architectures, because the datatype states it's just a pointer. In Win32 your LPxxx types are 32 bits, meaning they make no sense in a 64-bit environment. Tough.
What I'm getting at (slowly I know) is, that while Microsoft is currently trying to design a Win64 API that everyone must now port their programs to in order to take advantage of the 64-bit environment, the GNU/Linux community will hit the ``make; make install'' combination, and the vast majority of applications will only need minor fixes if they have not already been ported to a 64-bit architecture. But chances are, of course, that the vast majority of GNU/Linux apps have been working just fine on 64-bit architectures for years.
64-bit windows will not happen in the next few years. Even if Microsoft should ship a native 64-bit NT, _and_ actually finish up Win64. It's the applications that matter, and that is one thing that Windows just do not have (despite popular (trash-media) belief). Go easy
There is one problem though... We need the desktops using this processor if it has to be just remotely affordable, and if we want a decent number of motherboards to choose from
:)
:)
How cheap is the Xeon currently, even though it has very few benefits over the ``desktop'' processors such as Athlon or PII/III ? Not very. Why ? Because it's not sold in mind-boggling quantities. Well, also because Intel prices it in the high end, but that's a chicken-and-egg scenario wrt. the desktop market.
Currently, in this era of the lemming mentailty, we depend on MS windows processor support, if a processor is to be used very widely. Unfortunately this means, the Sledgehammer won't be affordable until MS releases an OS for it. Yes that sucks, but blame society
But yes, GNU/Linux support for the Sledgehammer some year or two ahead of Microsoft is going to give us great press. At least for a month or so. But counting in the long-term memory and interest in any kind of history (even just last year's history) of reporters, I doubt that we will benefit much from this once MS finally ships their OS for the Sledgehammer.
Don't get me wrong though. I think this is great, and the Sledgehammer may well prove as an alternative to the high-end and expensive Xeon CPUs from Intel, and they may well be used by those who need it enough to be able to afford it, for things like Oracle, SAP, weather forecasts, nukes, and what gives...
Way to go SuSE ! (and AMD!)
Ok, here's you first C++ lover's comment :)
You're definitely misguided about the pass-by-reference comment. Sure, pointers have their uses, and sometimes (maybe even often, depending on what you're doing) pointers are the sane choice and references aren't.
But references guarantee you one thing: they always reference something, there's no such thing as a null reference. Functions taking pointer arguments should check for (assert() or properly handle) null pointers. A function taking a reference _knows_ that it's a valid addressable object (object as in struct, simple variable, or whatever).
Besides, why do you mention STL along with ``dangerous interfaces'' ? STL is type-safe, and it's a proven implementation (usually - but there could be bugs in glibc too remember). Using std::map<something> is *a lot* safer than re-implementing your balanced tree every time. Sure you can use glib, but STL is type safe and C-casts/void-pointers aren't.
Did I mention that std::map<foo,bar> will also often be faster than your generic C tree ? if the comparison operator for the foo type is simple (eg. integer compare or similar), it can be (and will be) inlined in the core map implementation, something you cannot do in C without either re-implementing your map every time, or implementing it in a macro. You simply save a function call for every compare - something that is noticable when the compare is a simple operation.
The real problem with C++ is that people tend to think of it as object-oriented C. Well, it is, but it's also *much*much* more. A C programmer trying to bake up a ``pretty'' API i C++ will often fail - we've all seen that. But good interfaces are definitely possible - take a look at STL. And note, that STL is not just object-oriented wrappers, it's a type-safe interface of objects and functions relying heavily on parameterized types, actually allowing you to write non-trivial programs that run as fast, or faster, than equivalent C code.
The other real problem with C++ that I have to admit to, is compilation time. It is the *only* real drawback that I can point my finger at. But I can accept it. When the compiler builds type-safe trees of lists of strings for me, that run as fast as they do, I can accept the extra hardware cost as ``fair''.
If one where building commercial software for FreeBSD, which version should be supported ?
:)
It would be nice to support only one version, or at least build for one version and maybe run on more. Probably the newest version (4.1) isn't the good choice, because too few have migrated to that one yet, but would it be backwards compatible ?
What about forward compatibility ? A product build on 3.5, would it run on 4.1 ?
It's hard to find info about this on the web, so I'm asking the question here in the hope that someone in the know could let me know
...so they say in the article.
It's really funny to read stuff like this. I use GNU/Linux because I find it the easiest system to use for the work I do, the freedom part is a nice side effect which have become important to me now that I'm used to it, but freedom was never why I chose the system at first. Besides, why am I talking about freedom when we're talking about UNIX ? Nevermind.
Read any paper or article where some two-bit reporter mentions UNIX or GNU, and watch him bitching about those complicated commands, ackward syntax, and what not. Now that's a person who never took the half hour it takes a chimpanse to learn the effect of the ``|''. It's almost not funny.
I'm happy knowing that the system I use is build from the philosophy of making things easy to use. There's just no replacement for ``|'', grep, sed, or their successors. There haven't been in 30 years, and I'd be damn surprised if there was a replacement for this in the next 10 years. Maybe later on, but not in just 10 years. Virtually nothing happens in this industry in 10 years (remember, pipes are from the 50's, they got implemented in the 70's. The wavelet transform is about 100 years old, we still don't use it for streaming media compression)
The other really funny part is, of course, that the pace of real development -- evolution -- is as slow in this industry as in any other. The time between real breakthroughs is not measured in seconds as some would like us to believe, it's measured in decades. A nice example: If you powered off one of your memory banks on your Multics machine, only the processes living in that memory would die -- even Sun Enterprise series can't do that _today_, you'll have to warn the system of the change first. And people were using toilet-paper for storage those days ! We're 30 years past that, we're about to colonize mars, and our operating systems today can't do what they could 30 years ago.
Oh, and don't even get me started on the new economy...
The risk is there of course, if you hack up a poor script, that you may once in a while kill a netscape process that shouldn't have been killed.
;) you can come up with a really good solution.
If you consider
*) CPU time spent (seconds)
*) Process state (running or sleeping)
*) Last 1-minute CPU time spent (percentage)
and of course the UID (don't kill root's netscape
And sure, once in a blue moon the script will fail. But hey, this is *netscape*, after all.
You will spot which processes tend to get stuck over time, and add them to the script. Been there done that, and it works.
Please, if you have a better approach tell me about it.
Netscape can be killed, so it's not a big problem that it doesn't always die. It can be managed, as I've stated elsewhere.
Besides, you will be running a multi-cpu server, and one spinning netscape process will only bog down one CPU. The system doesn't slow _that_ much down in such a configuration, and if the process is killed within the next five minutes by the aforementioned cron job, I think it will be an acceptable solution.
I think you're guessing when it comes to the animated gifs. I'm _pretty_ sure that netscape will upload the images to the X server, and tell it to change the picture, not upload the new frame every time.
But sure, X will load the network. Put the applications on the workstations, and Coda/AFS/PVFS/NFS/GFS/whatever will load the network instead.
From my experience, X works extremely well even on a low-bandwidth high-latency line. Of course, if the network is *that* bad, users will notice, but all in all I wouldn't be so damn worried about X and network bandwidth. I don't think it's a problem compared to what you'll see with *file* transfers from running the apps locally on the workstations.
You want people to run netscape on the server for *exactly* the reasons you mention:
It's memory hungry, but quite a lot of memory is spent on the (huge bloated) executable itself, and this will be shared between all users.
Also, not everyone is running netscape all the time. So if you want 30 Megs for one netscape instance, you can probably do just fine with 10 Megs for each such instance on the server, and only half the users have a netscape active.
Besides, Netscape uses the X server to hold images etc. so the 32 megs on the desktops will be put to good use still. But the _real_ bloat can be nicely kept on the server.
Every five minutes or so, cron should run a small script to check for netscape processes that hog CPU (this would be a heuristic, but it can be done well - I know because I've done it). That way the server can kill dead netscape processes, and your users won't come back complaining about their workstation being slow everytime a local netscape process dies.
It is *much* easier to install a new application on a few servers, than it is to install it on a few workstations. In the long run.
When some bug is found in the application and it needs an update, who got the application ? You can use a database for keeping track of it of couse, but still...
Keeping everything homogenous when you can actually do so, is the clever thing to do. This is actually a special kind of setup, since they only need one architecture, and all the servers can be configured the same. IMO it would be stupid not to take advantage of that.
Also, by having the same applications on all servers or on all workstations (whichever approach is chosen), avoids the problem that someone using someone else's workstation is missing applications (and need to bother the admin with the problem).
Comparing this to my approach (running apps on servers):
Both:
*) Communication is not encrypted, so the network must be physically secure if you want secure communications between workstations and servers. This should be considered.
Your:
*) Apps run locally, taking advantage of the processing power in each workstation. But resources available to one user are those of one workstation, no more.
*) Each workstation must be powerfull enough to run the apps well
*) Requires a somewhat secure way of sharing user information between workstations (NIS is out)
*) A workstation is trusted - it is allowed to mount a filesystem, so NFS is out and Coda is in
*) Printing must be set up for each workstation - maybe not a concern, it depends on printing policies.
*) Upgrading applications should be automated, so that each workstation can be upgraded easily
*) The network is used for file transfers
Mine:
*) Apps run on servers, taking better advantage of shared information (shared libraries and loaded executables). The load is balanced on the servers because of the multiple users, so one user can use more than his share of the resources, if they don't all do it at once.
*) A workstation will only need to run the X server. If it can do more, we won't benefit from that.
*) Communication between servers is physically secure, so you can use NIS
*) Workstations are completely un-trusted. You can use the best performing technology to share filesystems between servers (PVFS/NFS/GFS)
*) Printing is set up on the servers only.
*) Upgrading applications is only a concern on the servers.
*) The network is used for X communications
Any thoughs Erik ?
My suggestion: Leave the terminals with 32 meg, as that is plenty for the X server. Your standard terminal should have one single harddrive with a minimal installation of some distribution. You will want to have a standard way of setting up such a machine when a disk dies and gets replaced, but backups aren't needed (as there is no user data on the terminal) and you won't have to care much about updates either (as there should be no daemons running on the terminals).
One big benefit: As the terminals do not hold data, it doesn't matter if they are stolen. Terminals are not trusted.
The X protocol is made for networks, and a 10 MBit/s hose to each terminal would be just great. However, it's not encrypted, so you should at least consider how physically secure your network is, and what the requirements would be.
Then set up one server for each N users. If they are doing web access and text editing, your average ``high end but not that high'' server should be able to run 15-40 users. Maybe more, but I haven't tried this type of workload myself so I can't say. Anyone ?
You will end up with a server farm. Each server should hold a home filesystem locally, and preferrably the users with the homes on that local fs should log in on that server. You can choose to let the server export their home fs'es to the other servers as well and share user accounts with NIS, which would let any user log in anywhere. If a terminal is tied to a user and vice versa, there should be no need for a terminal to be able to choose other servers, but if they're not, then the need will be there.
I've done a few such setups, but at a *much* smaller scale. I can tell you that it is a relief to _only_ have to update software on the server(s).
The Linux kernel has had threads for a very long time now. In fact, it has no other concept available to user-space for an executable task.
Threads are either alone in the VM, in which case we call the whole thing a ``process'', or there can be more threads of execution in one VM area (a so-called multithreaded process or whatever). The difference is _only_ whether there's one or more threads in that VM area.
The difference between Linux and NT is, to the programmer, that on Linux you usually use the pthreads library to create a thread (the library will call clone() which tells the kernel to create a new thread), whereas on NT you use the Win32 library call CreateThread() to tell the NT kernel to create a thread.
pthreads is fairly inefficient, which is why some people believe that threads aren't native to Linux. That is, using pthreads compared to a fork() isn't a lot faster usually, whereas on NT CreateThread() is a lot faster than CreateProcess(). What people tend to forget is, that creating a full-blown process using fork() on Linux, is still a lot faster than creating just a single thread using CreateThread() on NT, on identical hardware (measured in clock-cycles from start of call till first line in new process/thread is reached - source is article LJ some time ago).
Threads can't work much better in the Linux kernel. The pthreads _library_ could probably be improved to make thread creation faster, or you could just call clone() yourself. But this is not a kernel issue.
Ok, I got test2 running and benchmarked it.
:)
I get 5.5 MB/s read and 2.0 MB/s write on my four disk software RAID 5. I usually (with 2.2) get 18 and 12 respectively. It may of course well be the software RAID playing in here.
On an older (SCSI) disk I get 5.2 MB/s read and 3.9 MB/s write. This is pretty close to what I can expect from that disk. I guess this would indicate that it's only the semi-experimental software RAID-5 code that could need a little improvement.
On the positive side, the system doesn't freeze for 5 second periods while I benchmark it
It think it's safe to say that 2.4-test series are on the right track. There can't be many huge problems left, and the ones currently in the kernel doesn't keep people from testing - which is a very good thing.
You gotta be kiddin' me :)
.dk takeover on SlashDot!)
test2-pre11 (or so) gave me 3 MB/s on my four disk SCSI RAID. That's about the speed of my laptop with 2.2. I'll go benchmark the real test2 tonight.
Anyways, that means we're down to one hard problem right ?
Axboe, you know more about this than me, if you feel like it, could you write a four-liner about what's going on in the VM currently ?
(beware, hostile
I don't know, but they are for sure the ones that ruin performance in a very noticably way :)
I mentioned that those were the only two areas (they're even somewhat related), because I think just about everything else is in place. Knowing that there are only one/two problem areas left, might help give an idea of where we stand today.