This brochure from the ACM, IEEE Computer Society, and the Association for Information Systems claims, "Estimates for job growth in the United States range from 38% to 56% across the computing spectrum. With more choices and more opportunities, it's a better time than ever to begin a career in computing. In fact, according to CNN/Money Magazine in 2006, software engineering is the number one best job for salary and opportunities!" The document is targeted at high school students. In my opinion all the parents have been taken in by the FUD. I have worked in Software Engineering since 11/95 and have seen the market keep growing. More and more people are relying on technology and software everyday. Applications are never "done". There are new features to add, new hardware to support, and new technologies to take advantage of. As for outsourcing, I was involved in an attempt to outsource some software development. They wanted to find a company in India who could do Windows device drivers. Again and again, we could talk to the PhD highups but when we pushed to talk to the actual folks that would work on the project we found they had little experience in driver development and almost no experience in development on multiprocessor severs. This happened with several different companies. Also, folks are finding that outsourcing to far away countries is a massive management headache. It takes all the problems of local contractors and makes them worse.
Outsourcing is like anything in life...in the end you get what you pay for!
I never said that "clusters are parallel computers." I do believe that they are related enough to make a reply about clusters relevant to parallel computers. So did the authors of the referenced article: "We believe that much can be learned by examining the success of parallelism at the extremes of the computing spectrum, namely embedded computing and high performance computing." Many HPC systems are computing clusters using API/libraries like MPI to allow code written for single box/many processor parallel systems to also run on clusters. The TOP500 shows this but even more importantly anyone who works in the HPC market sees a huge demand for clusters for domains once dominated by parallel systems. There are clusters with thousands of processors in use today.
Clusters could be made to be even more like parallel computers. If we have faster I/O (both in terms of bandwidth and latency) and we have an single image operating system running over all the computers, then we approach an architecture similar to a large scale SMP system or even a parallel computer.
From http://en.wikipedia.org/wiki/Parallel_computer: "Parallel computing is the simultaneous execution of the same task (split up and specially adapted) on multiple processors in order to obtain results faster. The idea is based on the fact that the process of solving a problem usually can be divided into smaller tasks, which may be carried out simultaneously with some coordination."
From http://en.wikipedia.org/wiki/Computer_cluster: "A computer cluster is a group of loosely coupled computers that work together closely so that in many respects they can be viewed as though they are a single computer. The components of a cluster are commonly, but not always, connected to each other through fast local area networks. Clusters are usually deployed to improve performance and/or availability over that provided by a single computer, while typically being much more cost-effective than single computers of comparable speed or availability."
The idea is to improve the I/O and operating systems of clusters to approach a parallel computer.
I wouldn't get so hung up on word choice. In the end you have CPU(s), memory subsystems, I/O, an operating system, and some user software. As I said, many tasks once reserved for parallel computers have been written to run quite well on clusters.
Oh...and when telling someone that something is NOT something else, it helps to back it up with some information.:-)
I used to work for SilverStorm (recently purchased by QLogic). They make InfiniBand switches and software for use in high performance computing and enterprise database clustering. The quality of the I/O subsystem of a cluster played a large part in determining the performance of a cluster. Latency (down the microsecond) and bandwidth (over 10 gigabits per second) both mattered.
Also, we found that sometimes, what made a deal go through was how well your proposed system could run some prexisting software. For example, vendors would publish how well they could run a standard crash test simulation.
Also, I would like to see more research put into making clustered operating systems like mosix good enough so that developers can stick to what they have learned on traditional SMP systems and have their code just work on large clusters. I don't think that multicore processors eliminate the need for better cluster software.
I have added hardware like a PcCard or USB device while the system is in hibernation and it happily detected it when it came back. Windows Plug And Play handles device detection on insertion if the underlying bus supports it. This works for USB, FireWire, PCMCIA, PCI Hot Plug, fibre Channel, etc. Removing hardware while in hibernation can cause some badly designed drivers to crash. In Windows hibernation doesn't just happen...drivers are warned about going into hibernation and coming back from it. The same is true for suspend or other lower power states. They get nice Power IRPs (I/O Request Packets) and should make sure their hardware is turned off while entering the lower power state and make sure it is still there when coming back from hibernation. I worked on a driver for a fibre channel card and our drivers handled disks and cards being removed or added while in hibernation. The latest driver model makes handing power IRPs even easier then in the win2k/xp days.
Check out http://www.osronline.com/ . They have some similar utilities and are the place to go for windows device driver questions and debug. They are the folks that finally fixed much of the DDK documenation . I still have the mugs they gave away for finding doc errors. By the way, I highly recommend their classes. I have taken a bunch of them and I am pretty sure that these folks know windows internals better than any other organization...maybe even MS.
So are you saying that the APIs used by drivers have not changed at all between 2.4 and 2.6? I had thought that a bunch of things had changed. I had also heard that in some cases drivers that worked in 2.4 broke in 2.6 due to API changes. Am I totally off base here? -Ack
The difference is who you are trying to help the most. The current Linux system seems to be made to keep life easier for the core kernel developers, not the hardware driver developers who represent the marjority of the linux kernel community. Microsoft goes the other way: they make things easier for hardware driver developers while making life harder for themselves.
By the way, they do not handle the versioning that I speak of through different code bases. The Windows XP kernel is one code base that supports various methods to run drivers that support older versions of APIs while letting them add new features over time. This is done with versioning (NDIS), new APIs that depricate but not replace the old ones (storport vs. scsiport ), or optional APIs (IRPs that can be ignored for NT4 style behavior).
Merging does not seem to help as bug reports are coming in for merged drivers that no one wants to update. They may ask themselves, "why should I have to change my driver when it was working?" Some drivers that worked before are now suddenly broken because the world changed around them. There is nothing worse then someone else making changes that break your code and then expecting you to fix it.
Have you opened up the dump in the Windows kernel debugger? Did you send the bug report into Microsoft and have tracked its progress? Did you check the event logs? Also, software that writes over 200 MB to the registry is broken in my opinion. The registry is not a general use database. Software that fills the temp directory on linux would also probably result in instability. -Ack
"As someone who is resbposible for many of those bug reports I can tell you it's not the fetures that break things. It's things like driver API cleanups that don't get all of the older drivers."
This reminds of the question of if linux should have a stable driver ABI/API. I am talking about the interfaces in ther kernel used by other drivers. It would be nice if they had some sense of support for older drivers so that folks had more time to digest the new interfaces and then move things over. Perhaps using versioned interfaces with deprecation through build warnings or something similar.
Windows drivers have an advantage here. Many interfaces are either versioned (NDIS) or optional (PnP IRPs) for some time. Windows supported NT4 style drivers in Win2k, XP. NDIS 4 drivers could be built to work on 9x and 2k, xp. This is the same for SCSI drivers. When they decided that the storage stack needed a change, they added new interfaces and made the old scsi miniport ones deprecated. Folks moved to the new interfaces over time to take advantage of new features and performance changes. They had time to make the switch properly because they still had the old interfaces hanging around for a bit. It makes the core OS more complex but it makes the driver developer's life easier in my opinion.
Back when I worked for Montgomery Investment Technology we worked on addins to the Applix spreadsheet. The main code was written in C and then called from their ELF language. I remember we had some functions with lots of parameters and it caused all sorts of issues in ELF. At the time, ELF had a limit on the length of the function name and arguments. This called problems with functions like: http://fintools.com/WebHelp/index.html?bondsanalyt ics.htm . We had to have arguments like a, b, c, d, etc. Arg. One thing that was neat was that Applix running the same calculations on a linux box on pentium 100 was faster then excel on windows nt server running on a ppro 200. -Ack
Check out http://www.ether2.com/ for a technology that uses ethernet but avoids using switches at all. They seem to try to deal with collisions by each station learning and knowing when it is ok to send. They claim to get much higher utilization and latency that scales down as bandwidth scales up. While not token ring; it is one of the few new switchless technologies I know of in networking. -Ack
Re:Type of NIC and clustered file systems?
on
ILM's Datacenter
·
· Score: 1
I agree the cost for enough FC ports to handle the entire cluster would be quite a bit. I am just curious if it would be faster or not. Another option would be to use lots of local storage and use a clustered file system over a second 1 GigE lan. The cost may balance out in that case. 5000 nodes can provide a lot of shared storage to themselves and if each node had a reasonable SATA drive they could all keep up with the GigE demands. Of course spinnaker uses multiple smart gigE links per box and multiple boxes. I wonder how many GigE links that NAS boxes have in total? -Ack
I put in 4 years working on IB for SilverStorm so I have been doing IB since before there were any ASICs or even a 1.0 spec. Can you be more specific? What kind of problems do you have with IB software and which IB software are you talking about? OpenIB? VendorX's proprietary stuff? Which HCA (Mellanox, Pathscale, IBM, etc)? Which switching (Voltaire, SilverStorm, Cisco, etc.)? What kind of setup were you doing (big HPC cluster or little database)? I not looking for flame war, I am genuinely curious about what you didn't like about the software. I promise, no angry words even if you knock the stuff I used to work on!
-Ack
Type of NIC and clustered file systems?
on
ILM's Datacenter
·
· Score: 1
I wonder if they are using NICs that have offloading capabilities in the compute nodes. InfiniBand, Myrinet, and iWarp NICs are all designed to get rid of TCP/IP processing. One would think with relativly large data sets, TCP could be a big CPU consumer. Also, standard NICs using TCP have horrible latency compared to InfinBand, Myrinet, etc. That latency really eats up cluster performance when the nodes all wait for something (like new data, results, etc.) In lots of high performance computing applications latency matters more than almost anything else.
On the spinnaker side, I wonder if this type of solution (clustered NAS) competes well with clustered file systems where the compute nodes are the storage nodes as well. It eats up CPU for running the clustered file system but it puts the compute nodes right on the SAN without having to go through a NAS head.
-Ack
Where are RedHat, Novel, IBM, etc on this?
on
The SLI Godfather
·
· Score: 1
I don't understand why the folks who have the money don't pay to get better drivers for linux. It is pretty sad that things like the ndis wrapper have to exist. I am pretty sure Apple pays to get its drivers. The problem isn't nvidia here. I don't have an issue with them not releasing open source drivers. In my opinion, they spent a lot of time and money to just get a few frames ahead of the competition and I don't have problem with them not giving it out. I am not anti-open source, I just don't think it should be forced down someone's gut. Driver development is difficult and linux driver development requires constant effort since there isn't a stable ABI in the kernel. You have to stay on top of things because they can change at any time. While other operating systems change as well, they make a nice attempt to maintain backward compatibility. This constant churn makes it very expensive to support Linux kernel devices. If you become an expert kernel person in the 2.4 series and then work on another project and then come back to 2.6 land, you are in for fun. There are so many upgrading from 2.4 to 2.6 horror stories I have heard and personally experienced. On the other hand, I experienced smooth sailing with windows driver changes because I could choose when I added the support for new features. Driver architecture changes in windows always built on what you knew before. It is amazing but I have always felt I knew what the Windows kernel was doing more than I could the linux one even though I had soure only for linux. Over time you get a feel for how things work in an operating system. This can't happen when you have significant architectural changes to the kernel.
So if Linux wants good drivers, the Linux vendors should make it happen. Heck, if they expect me to pay for a distro, I want more than support; I want drivers that work! -Ack
All sorts of RDMA hardware (InfiniBand, iWarp, Myrinet, etc) allow user space access directly to hardware for performance reasons. In HPC and high databases copies and context switches are increase latency too much. IB has a nice security mechanism through the use of local and remote keys to protect memory on both sides of the wire. -Ack
Lots of other things use PCI-Express including: Single and Dual Port 4X SDR and DDR InfiniBand over PCI-Express x8 Dual port 2Gb and 4Gb FibreChannel over PCI-Express x4 Ethernet (multiport 1 gigabit and 10 gigabit), over PCI-Express x4 Multi port FireWire 800 over PCI-Express x1 DualChannel UltraSCSI320 over PCI-Express x1
There are more probably... PCI-Express grew out of InfiniBand. They cut out the networking to make it cheaper for just inside a single system. Ironically, they put a lot of the networking back in for Advanced Switching Interconnect.
You can get an InfiniBand switch for 24 10 gigibit ports at full speed from SilverStorm, Cisco, Voltaire, HP, IBM, etc. All are based on a mellanox ASIC and probably under $1000/port. I found an HP box for 6500 on froogle just now. I am sure there are boxes at this level for eternet switching and ip routing. Check out this monster from Foundry. From here, it has "3.84 Terabits per second of backplane switching capacity". -Ack
I have been thinking that it would be interesting to have virtualized operating systems running over a distributed system. All the resources of the distributed system could then be shared amongst the hosted OSs. You could move resources from one hosted OS to another as needed. If things are too slow, just add another system in the distributed system and its resources help the hosted systems. -Ack
My first thought about this was that it would make a great HPC box. I think boxes like this might move the i/O folks to x16 pciE. As far as I know nothing besides graphics has gone past x8. InfinBand SDR 4X and some 10GE cards could go to x16 and have dual ports. Finally a port that could run that dual port DDR 4X IB card...or that SDR 12X IB card. Bandwidth for I/O is getting better all the time...at some point it will get close enough to memory speeds that a whole new set of distributed applications will become possible. -ack
"I think that it was this last thing, the Federation interconnect, that they were pushing the data over in this test, since it forms the backbone of the machine and links the storage nodes to the login node controllers, which then connect to the login nodes themselves (of which there are apparently over 1,400 of, according to this). I couldn't find much information on Federation, as it seems to only be used in a few systems, of which Purple is the most notable. One reference I found seems to put it at 1.49 GB/sec (11.92 Gbit/s) bandwidth, although it's not clear if that's "dual plane" Federation or not. 4X SDR Infiniband is around 10 Gbit/sec, IIRC, so Federation's a little faster."
It appears to be an IBM thing that is only used on these big ASC platforms. The other parts of the company are using InfiniBand quite a bit though. -Ack
It would be nice to see comparisons to RedHat/Sistina's GFS, Lustre (backed by HP), and others listed here.
Also how does this compare to clustered storage that is not run on the hosts themselves like NetApp new Spinnaker based clustering. You also have folks like Isilon, Panasas, and Terrascale.
Correct, Linux rejected TOE. But they are accepting IB, iWARP, RDMA, and iSER which essentially include the same ideas. You could argue that Linux's method for supporting TCP/IP offload is to support the RMDA APIs and then run sockets direct protocol. So while Linux doesn't support TOE, they support iWARP which includes TOE. -Ack
At 1 gigabit/second, I would agree that popularity is dropping. All the 1GbTOE folks are gone. At 10 Gbit, I think you still need it and I think the users are showing signs of agreement. Pushing 10 gigabit/second of TCP/IP over ethernet (1500 byte packets) will take up a LOT of CPU time even on todays boxes. I think that Intel's comments about TCP/IP "on loading" represent their failure to get a TOE NIC to really work. At one point they pushed a TOE model. I can say that the solution has to be an ASIC or FPGA based...putting in an XScale to run the TCP/IP for your P4 doesn't make sense. Most of the 10 Gbit NICs are offering TOE to compete with IB, Myrinet, and in storage FibreChannel. Hardware based solutions have shown lower overhead and lower latency then software solutions so far. Perhaps CPU speeds will catch up but network speeds have been increasing faster then CPU speeds. Of course, if you want to dedicate a main CPU core to networking you get a nice I/O processor of sorts.
This brochure from the ACM, IEEE Computer Society, and the Association for Information Systems claims, "Estimates for job growth in the United States range from 38% to 56% across the computing spectrum. With more choices and more opportunities, it's a better time than ever to begin a career in computing. In fact, according to CNN/Money Magazine in 2006, software engineering is the number one best job for salary and opportunities!" The document is targeted at high school students. In my opinion all the parents have been taken in by the FUD. I have worked in Software Engineering since 11/95 and have seen the market keep growing. More and more people are relying on technology and software everyday. Applications are never "done". There are new features to add, new hardware to support, and new technologies to take advantage of.
As for outsourcing, I was involved in an attempt to outsource some software development. They wanted to find a company in India who could do Windows device drivers. Again and again, we could talk to the PhD highups but when we pushed to talk to the actual folks that would work on the project we found they had little experience in driver development and almost no experience in development on multiprocessor severs. This happened with several different companies. Also, folks are finding that outsourcing to far away countries is a massive management headache. It takes all the problems of local contractors and makes them worse.
Outsourcing is like anything in life...in the end you get what you pay for!
I never said that "clusters are parallel computers." I do believe that they are related enough to make a reply about clusters relevant to parallel computers. So did the authors of the referenced article:
:-)
"We believe that much can be learned by examining the success of parallelism at the extremes of the computing spectrum, namely embedded computing and high performance computing." Many HPC systems are computing clusters using API/libraries like MPI to allow code written for single box/many processor parallel systems to also run on clusters. The TOP500 shows this but even more importantly anyone who works in the HPC market sees a huge demand for clusters for domains once dominated by parallel systems. There are clusters with thousands of processors in use today.
Clusters could be made to be even more like parallel computers. If we have faster I/O (both in terms of bandwidth and latency) and we have an single image operating system running over all the computers, then we approach an architecture similar to a large scale SMP system or even a parallel computer.
From http://en.wikipedia.org/wiki/Parallel_computer:
"Parallel computing is the simultaneous execution of the same task (split up and specially adapted) on multiple processors in order to obtain results faster. The idea is based on the fact that the process of solving a problem usually can be divided into smaller tasks, which may be carried out simultaneously with some coordination."
From http://en.wikipedia.org/wiki/Computer_cluster:
"A computer cluster is a group of loosely coupled computers that work together closely so that in many respects they can be viewed as though they are a single computer. The components of a cluster are commonly, but not always, connected to each other through fast local area networks. Clusters are usually deployed to improve performance and/or availability over that provided by a single computer, while typically being much more cost-effective than single computers of comparable speed or availability."
The idea is to improve the I/O and operating systems of clusters to approach a parallel computer.
I wouldn't get so hung up on word choice. In the end you have CPU(s), memory subsystems, I/O, an operating system, and some user software. As I said, many tasks once reserved for parallel computers have been written to run quite well on clusters.
Oh...and when telling someone that something is NOT something else, it helps to back it up with some information.
I used to work for SilverStorm (recently purchased by QLogic). They make InfiniBand switches and software for use in high performance computing and enterprise database clustering. The quality of the I/O subsystem of a cluster played a large part in determining the performance of a cluster. Latency (down the microsecond) and bandwidth (over 10 gigabits per second) both mattered.
Also, we found that sometimes, what made a deal go through was how well your proposed system could run some prexisting software. For example, vendors would publish how well they could run a standard crash test simulation.
Also, I would like to see more research put into making clustered operating systems like mosix good enough so that developers can stick to what they have learned on traditional SMP systems and have their code just work on large clusters. I don't think that multicore processors eliminate the need for better cluster software.
I have added hardware like a PcCard or USB device while the system is in hibernation and it happily detected it when it came back. Windows Plug And Play handles device detection on insertion if the underlying bus supports it. This works for USB, FireWire, PCMCIA, PCI Hot Plug, fibre Channel, etc.
Removing hardware while in hibernation can cause some badly designed drivers to crash. In Windows hibernation doesn't just happen...drivers are warned about going into hibernation and coming back from it. The same is true for suspend or other lower power states. They get nice Power IRPs (I/O Request Packets) and should make sure their hardware is turned off while entering the lower power state and make sure it is still there when coming back from hibernation.
I worked on a driver for a fibre channel card and our drivers handled disks and cards being removed or added while in hibernation. The latest driver model makes handing power IRPs even easier then in the win2k/xp days.
Check here:
http://www.osronline.com/article.cfm?article=476
Check out http://www.osronline.com/ . They have some similar utilities and are the place to go for windows device driver questions and debug. They are the folks that finally fixed much of the DDK documenation . I still have the mugs they gave away for finding doc errors.
By the way, I highly recommend their classes. I have taken a bunch of them and I am pretty sure that these folks know windows internals better than any other organization...maybe even MS.
So are you saying that the APIs used by drivers have not changed at all between 2.4 and 2.6? I had thought that a bunch of things had changed. I had also heard that in some cases drivers that worked in 2.4 broke in 2.6 due to API changes. Am I totally off base here?
-Ack
The difference is who you are trying to help the most. The current Linux system seems to be made to keep life easier for the core kernel developers, not the hardware driver developers who represent the marjority of the linux kernel community. Microsoft goes the other way: they make things easier for hardware driver developers while making life harder for themselves.
By the way, they do not handle the versioning that I speak of through different code bases. The Windows XP kernel is one code base that supports various methods to run drivers that support older versions of APIs while letting them add new features over time. This is done with versioning (NDIS), new APIs that depricate but not replace the old ones (storport vs. scsiport ), or optional APIs (IRPs that can be ignored for NT4 style behavior).
Merging does not seem to help as bug reports are coming in for merged drivers that no one wants to update. They may ask themselves, "why should I have to change my driver when it was working?" Some drivers that worked before are now suddenly broken because the world changed around them. There is nothing worse then someone else making changes that break your code and then expecting you to fix it.
-Ack
Have you opened up the dump in the Windows kernel debugger? Did you send the bug report into Microsoft and have tracked its progress? Did you check the event logs?
Also, software that writes over 200 MB to the registry is broken in my opinion. The registry is not a general use database. Software that fills the temp directory on linux would also probably result in instability.
-Ack
"As someone who is resbposible for many of those bug reports I can tell you it's not the fetures that break things. It's things like driver API cleanups that don't get all of the older drivers."
This reminds of the question of if linux should have a stable driver ABI/API. I am talking about the interfaces in ther kernel used by other drivers. It would be nice if they had some sense of support for older drivers so that folks had more time to digest the new interfaces and then move things over. Perhaps using versioned interfaces with deprecation through build warnings or something similar.
Windows drivers have an advantage here. Many interfaces are either versioned (NDIS) or optional (PnP IRPs) for some time. Windows supported NT4 style drivers in Win2k, XP. NDIS 4 drivers could be built to work on 9x and 2k, xp. This is the same for SCSI drivers. When they decided that the storage stack needed a change, they added new interfaces and made the old scsi miniport ones deprecated. Folks moved to the new interfaces over time to take advantage of new features and performance changes. They had time to make the switch properly because they still had the old interfaces hanging around for a bit. It makes the core OS more complex but it makes the driver developer's life easier in my opinion.
Back when I worked for Montgomery Investment Technology we worked on addins to the Applix spreadsheet. The main code was written in C and then called from their ELF language. I remember we had some functions with lots of parameters and it caused all sorts of issues in ELF. At the time, ELF had a limit on the length of the function name and arguments. This called problems with functions like:t ics.htm . We had to have arguments like a, b, c, d, etc. Arg. One thing that was neat was that Applix running the same calculations on a linux box on pentium 100 was faster then excel on windows nt server running on a ppro 200.
http://fintools.com/WebHelp/index.html?bondsanaly
-Ack
Check out http://www.ether2.com/ for a technology that uses ethernet but avoids using switches at all. They seem to try to deal with collisions by each station learning and knowing when it is ok to send. They claim to get much higher utilization and latency that scales down as bandwidth scales up. While not token ring; it is one of the few new switchless technologies I know of in networking.
-Ack
I agree the cost for enough FC ports to handle the entire cluster would be quite a bit. I am just curious if it would be faster or not.
Another option would be to use lots of local storage and use a clustered file system over a second 1 GigE lan. The cost may balance out in that case. 5000 nodes can provide a lot of shared storage to themselves and if each node had a reasonable SATA drive they could all keep up with the GigE demands.
Of course spinnaker uses multiple smart gigE links per box and multiple boxes. I wonder how many GigE links that NAS boxes have in total?
-Ack
I put in 4 years working on IB for SilverStorm so I have been doing IB since before there were any ASICs or even a 1.0 spec. Can you be more specific? What kind of problems do you have with IB software and which IB software are you talking about? OpenIB? VendorX's proprietary stuff? Which HCA (Mellanox, Pathscale, IBM, etc)? Which switching (Voltaire, SilverStorm, Cisco, etc.)? What kind of setup were you doing (big HPC cluster or little database)? I not looking for flame war, I am genuinely curious about what you didn't like about the software. I promise, no angry words even if you knock the stuff I used to work on!
-Ack
I wonder if they are using NICs that have offloading capabilities in the compute nodes. InfiniBand, Myrinet, and iWarp NICs are all designed to get rid of TCP/IP processing. One would think with relativly large data sets, TCP could be a big CPU consumer. Also, standard NICs using TCP have horrible latency compared to InfinBand, Myrinet, etc. That latency really eats up cluster performance when the nodes all wait for something (like new data, results, etc.) In lots of high performance computing applications latency matters more than almost anything else.
On the spinnaker side, I wonder if this type of solution (clustered NAS) competes well with clustered file systems where the compute nodes are the storage nodes as well. It eats up CPU for running the clustered file system but it puts the compute nodes right on the SAN without having to go through a NAS head.
-Ack
I don't understand why the folks who have the money don't pay to get better drivers for linux. It is pretty sad that things like the ndis wrapper have to exist. I am pretty sure Apple pays to get its drivers.
The problem isn't nvidia here. I don't have an issue with them not releasing open source drivers. In my opinion, they spent a lot of time and money to just get a few frames ahead of the competition and I don't have problem with them not giving it out. I am not anti-open source, I just don't think it should be forced down someone's gut.
Driver development is difficult and linux driver development requires constant effort since there isn't a stable ABI in the kernel. You have to stay on top of things because they can change at any time. While other operating systems change as well, they make a nice attempt to maintain backward compatibility. This constant churn makes it very expensive to support Linux kernel devices. If you become an expert kernel person in the 2.4 series and then work on another project and then come back to 2.6 land, you are in for fun. There are so many upgrading from 2.4 to 2.6 horror stories I have heard and personally experienced. On the other hand, I experienced smooth sailing with windows driver changes because I could choose when I added the support for new features. Driver architecture changes in windows always built on what you knew before. It is amazing but I have always felt I knew what the Windows kernel was doing more than I could the linux one even though I had soure only for linux. Over time you get a feel for how things work in an operating system. This can't happen when you have significant architectural changes to the kernel.
So if Linux wants good drivers, the Linux vendors should make it happen. Heck, if they expect me to pay for a distro, I want more than support; I want drivers that work!
-Ack
All sorts of RDMA hardware (InfiniBand, iWarp, Myrinet, etc) allow user space access directly to hardware for performance reasons. In HPC and high databases copies and context switches are increase latency too much. IB has a nice security mechanism through the use of local and remote keys to protect memory on both sides of the wire.
-Ack
Lots of other things use PCI-Express including:
Single and Dual Port 4X SDR and DDR InfiniBand over PCI-Express x8
Dual port 2Gb and 4Gb FibreChannel over PCI-Express x4
Ethernet (multiport 1 gigabit and 10 gigabit), over PCI-Express x4
Multi port FireWire 800 over PCI-Express x1
DualChannel UltraSCSI320 over PCI-Express x1
There are more probably... PCI-Express grew out of InfiniBand. They cut out the networking to make it cheaper for just inside a single system. Ironically, they put a lot of the networking back in for Advanced Switching Interconnect.
You can get an InfiniBand switch for 24 10 gigibit ports at full speed from SilverStorm, Cisco, Voltaire, HP, IBM, etc. All are based on a mellanox ASIC and probably under $1000/port. I found an HP box for 6500 on froogle just now. I am sure there are boxes at this level for eternet switching and ip routing. Check out this monster from Foundry. From here, it has "3.84 Terabits per second of backplane switching capacity".
-Ack
I have been thinking that it would be interesting to have virtualized operating systems running over a distributed system. All the resources of the distributed system could then be shared amongst the hosted OSs. You could move resources from one hosted OS to another as needed. If things are too slow, just add another system in the distributed system and its resources help the hosted systems.
-Ack
My first thought about this was that it would make a great HPC box. I think boxes like this might move the i/O folks to x16 pciE. As far as I know nothing besides graphics has gone past x8. InfinBand SDR 4X and some 10GE cards could go to x16 and have dual ports. Finally a port that could run that dual port DDR 4X IB card...or that SDR 12X IB card. Bandwidth for I/O is getting better all the time...at some point it will get close enough to memory speeds that a whole new set of distributed applications will become possible.
-ack
"I think that it was this last thing, the Federation interconnect, that they were pushing the data over in this test, since it forms the backbone of the machine and links the storage nodes to the login node controllers, which then connect to the login nodes themselves (of which there are apparently over 1,400 of, according to this). I couldn't find much information on Federation, as it seems to only be used in a few systems, of which Purple is the most notable. One reference I found seems to put it at 1.49 GB/sec (11.92 Gbit/s) bandwidth, although it's not clear if that's "dual plane" Federation or not. 4X SDR Infiniband is around 10 Gbit/sec, IIRC, so Federation's a little faster."
a tion.htmlm l - Info on the switch used
Did some research and found the following:
http://www.llnl.gov/asc/platforms/purple/configur
http://www.redbooks.ibm.com/abstracts/sg246978.ht
It appears to be an IBM thing that is only used on these big ASC platforms. The other parts of the company are using InfiniBand quite a bit though.
-Ack
It would be nice to see comparisons to RedHat/Sistina's GFS, Lustre (backed by HP), and others listed here.
Also how does this compare to clustered storage that is not run on the hosts themselves like NetApp new Spinnaker based clustering. You also have folks like Isilon, Panasas, and Terrascale.
Anybody have an good data on this?
-Ack
Correct, Linux rejected TOE. But they are accepting IB, iWARP, RDMA, and iSER which essentially include the same ideas. You could argue that Linux's method for supporting TCP/IP offload is to support the RMDA APIs and then run sockets direct protocol. So while Linux doesn't support TOE, they support iWARP which includes TOE.
-Ack
At 1 gigabit/second, I would agree that popularity is dropping. All the 1GbTOE folks are gone. At 10 Gbit, I think you still need it and I think the users are showing signs of agreement. Pushing 10 gigabit/second of TCP/IP over ethernet (1500 byte packets) will take up a LOT of CPU time even on todays boxes. I think that Intel's comments about TCP/IP "on loading" represent their failure to get a TOE NIC to really work. At one point they pushed a TOE model. I can say that the solution has to be an ASIC or FPGA based...putting in an XScale to run the TCP/IP for your P4 doesn't make sense. Most of the 10 Gbit NICs are offering TOE to compete with IB, Myrinet, and in storage FibreChannel. Hardware based solutions have shown lower overhead and lower latency then software solutions so far. Perhaps CPU speeds will catch up but network speeds have been increasing faster then CPU speeds. Of course, if you want to dedicate a main CPU core to networking you get a nice I/O processor of sorts.