Maybe I am not really getting your RX caching argument but it sort of falls apart.
Say you have a typical nic where it DMAs frames into main memory. That's actually a pretty typical scenario. So there is this frame in DRAM. Depending on the cache and dma coherence and snooping abilities of your hardware that frame is not in the cache of the processor at that point yet. The kernel now copies it to the user space buffer. That is where it gets in the cpu cache, but with fancy hardware you might have just cached the orignal DMAed copy as well.
If there was no copy it would have gotten cached the moment the userspace process accessed it. There really is no win there and if you have fancy hardware you are flushing extra cachelines. So you have gained nothing, possibly flushed something else you will need soon out of the cache, and also added a copy.
I can tell you from personal experience that I have had great improvements from zero_copy in FreeBSD and Solaris. It really is a lot of time that the cpu burns copying buffers around. I also have the displeasure of dealing with with vxWorks and historically network performance has been abysmal. A lot of it comes down to the netTask but there is certainly a lot of overheard in copying. Using zBufs was not much of an improvement, I am thinking that was due to the overhead in zBufs calls themselves. I had some progress in using so_socket itself but never really finished that-up. Wind River has moved to some proprietary stack it seems now so who knows what is the situation currently.
XTI gave you more control at the application level. So for example you could do TCP/IP, IPX, with very similar code NetBIOS. With sockets you need kernel and/or libsocket support for that kind of thing like how you have SOCK_STREAM, SOCK_DGRAM, and SOCK_RAW. The other thing is it allowed you to tweak things more easily since the kernel was not so involved. Say you did not want naggle with your TCP or you wanted to make a protocol something like SCTP to T/TCP.
Tell him of outside interest. Also there could be a good paper in this to various conferences with notes about how it was related to research supported by said company. That always looks good.
Poll is sadly O(N) but there are some optimizations that can be made to poll to make it faster.
First poll does not need to be a simple syscall that copies over the entire array into kernel memory. Almost every time that poll is called that array is identical to what it was the last time and at the same address. libc can in userspace first compare that array, base addr, and size and when all is the same it can call a faster poll syscall or pass an arg to tell it to do the fast path of what was done the last time.
Secondly the kernel does not need to copy a buffer for the results into the userspace. It can simply just twiddle that memory itself while still in the kernel context.
Finally there can be optimizations for the simple typical case of there only being a few descriptors, like 8 or less.
Many systems do some or all of those, from performance measurements it seems to me that Solaris does particularly well.
Sure you can. Check-out zerocopy on FreeBSD and Solaris. Also mmap and sendfile has done the zerocopy tcp send half on Linux since 2.4 and there are some more hackish zerocopy read half things. Also you can already limit syscalls with TCP_CORK (and on the BSDs the similar TCP_NOPUSH).
TCP suffers from "head of the line blocking" see the URG hack for more about that. HTTP being on top of TCP suffers from this as well. What this means is say one fragment gets lost. It will take something like 25 seconds for that to be resent. In the meantime all the other data that has been coming along cannot be passed along to your program.
There is always UDP (you can use the same socket for many 'connections'). It really is not too hard to come-up with a little UDP protocol that has a message id and time-out with a retry that gets you really what you need. You can even leverage multicast then. Also if are on a cluster where you can get everyone to use a jumbo frame there is zero copy UDP on both FreeBSD and Solaris. Also at least many of the Solaris (this is a hit or miss feature on other OSs) nic drivers are able to turn down the interrupt rate way down by polling every now and then under heavy load instead of interrupting on every frame.
There is also RAW sockets and SCTP but I have found UDP+multicast work great.
Sure it was cool how you could push and pop drivers (say you wanted a different line discipline) but please tell me how it prevented any copies? The AT&T implementation also had two extra context switches.
This is what was bad about STREAMS:
In early implementations there was no notion of multithreading so a bad thing happened later. There was a time when the STREAMS drivers and demultiplexers assumed single threaded so the kernel had to pass off to a single worker thread in the kernel everything STREAMS related. So yo had some big iron box of the time with say four processors and IO performance was just balls until the STREAMS drivers were rewritten. But then you still had that worker thread around, so one thing was that those were broken out, so there was an extra two thread switches there. Then they did some stuff variously called something like Fast STREAMS where the fast paths would not switch. So all this optimization work went in to making STREAMS fast and they were still slow. It turned-out that the reason for that was due to the complexity of the STREAMS subsystem and all the layering that caused so many extra function calls per driver. STREAMS have largely been relegated to legacy and conformance at this point.
There are also incompatibilities with MacOS X. It really is a lot in regards to feel to how Java was circa '98 where you have different versions of wxWidgets and on different platforms really affecting how you have to write the code so that it works right on everything. Also long ago, when they were still wxWindows, they dropped the support of wxX11 and had you use wxGTK on unix-alikes. The wxX11 code stayed so close to working, only little tweaks are needed every now and then, it is a shame that that branch went dead around 2 years ago. The reason is that you need a version of GTK+ on any box running a wxWidgets program now. I have never in recent years been able to compile a static library of what I needed of GTK+ so I could simply link it to the app. You get into real problems where wxWidgets does not work right due to the way that GTK+ was compiled or the version it is on a particular system and I have no good way around that except for gross LD_* overrides in shell scripts. Those libraries are HUGE too.
To be fair QSharedPointer showed-up in 4.5, hence the reason I had never heard of it until now. OPointer was forced upon me due to everything being a QObject, but then there was the other non-Qt half of the code that used boost, it was not pretty.
GTK+ is not horrible. The reasons people most often cite for it being horrible are in two categories:
The code is so convoluted. That is because it is C and not C++ so it cannot take advantage of features of the language. There are times C (say you want to use it in a C program and do not want to do a bunch of glue code with C linkage) is nice and also it depends on less with a smaller overhead.
It does not do all the fancy stuff like SQL that Qt does: Check again, things have changed, people have written code to do more now that may not be a part of GTK+ itself though but freely available. Also again Qt is a huge thing that then defines how the rest of your program has to be written to interface with it (containers, pointers, signals, and slots for example). Also it is resource intensive because it does so much.
shared_ptr and QPointer don't play together so nicely though. Also QPointer tends be slower than shared_ptr when I only need what shared_ptr does. To get them to work together I end-up copying things between the two or just end-up not using shared_ptr at all.
DN3D came out a bit before Quake. I could not play Quake comfortably on my machine of the time. It was a 486 running at 33 MHz with a 1MB ISA SVGA card and 8-bit stereo SBPro. Duke ran fine though and I remember playing it on a lan in our dorms. It was very fun and funny. Years later I got a new K6 motherboard and borrowed a PCI gfx card. I was very excited to finally get to play Quake. I borrowed the card on Friday and returned it on Sunday. Quake just felt so lonely and there was absolutely no humor, it was not a fun game. Also the online was buggy, laggy, and people were just plain jerks. This was when I was on a 10 mb subnet.
But Duke was not all about the superior game play, it actually did have tech that was neat and first or nearly first at the time. It was able to run at above 320x200. It had levels where things could be above and below you. It had levels where you could damage walls to open new paths in the levels.
Also one thing I do not see mentioned very often about the game design aspect was the whole 3rd wall thing. There were all sorts of little nuggets sprinkled around related to that. I'd find them and think that was neat.
Sorry I did not see your reply earlier, but the other people did a great job. There are very few times to do any asm on a PC these days. In fact even if you are doing kernel/driver work it does not really come-up in practice as others before you have put the special stuff in the code that makes sure that all the right barriers are in the right places. C/C++ compilers do a very good job and it is not worth it to be non-portable for a few percent speed improvements.
So unless you are writing/maintaining a compiler, linker, debugger, thread library, C library, or kernel you won't really do any asm. If on the other hand you are doing something not needing to support more than a few targets and those targets are more speced for the least MIPS needed in order to hold down cost, then yes you do see asm in embedded in projects where you have big deployments.
Maybe you could look into some embedded ARM or Dragon (is that still around) board. In ARM you profile and then find that slow/small bus leads to a few routines that you can hand code in thumb to get very good improvements in performance. I don't do any of this (I do embedded in my job) but maybe you would be interested in homebrew on Nintendo DS, That would probably have that feel that you want to experience and it should not be very expensive to get into, just a DS and flash cart in addition to what you probably already have.
As to books, coding is not done this way anymore so there have not been books like this in a long time. The books (online) from the vendor of your chip will have all the info you need, but not in tutorial/textbook form. I honestly don't know a good starting point today other than to say maybe a more theoretical approach would be okay, ie first read a book or two by Hennesy and Patterson to get the theory of a machine, and then you realize that almost all machine languages are very similar and once you get one pretty good, others are easy to pick-up on quickly as well. I would also say that the TASM assembler rotted my brain in the same way that line number in BASIC did for computer programming. You should stay away from that big macro assembler mentality and go for a AT&T style assembler used in concert with a pre-processor (like cpp or m4) since that applies more generically to other architectures.
From your reply I just realized that to use real mode would be incredible stupid. A real mode emulator like you state makes much more sense. The reason being regardless of what an OS could do to try and make things unreachable from real mode that if you allowed a process to enter real mode you could put some a far pointer to some code to take over the machine at 40:67 and then just triple fault, Mea culpa.
I'm sorry my frustration was directed to the fellow you were answering and I forgot who I was responding to as my fingers went into a furry answering him as well.
Oh I forgot one thing. The tech that the DOE had rights to which was done at Fermilab related to magnets. The DOE granted rights to it very cheaply to GE for it to use in MRI machines. GE and all the other companies involved in that industry have made a fortune because of that. This directly created an entire new industry, a very big deal in macro economic terms, and also a good for further medical science and biology as well as the quality of life of humanity.
Ever hear of Loma Linda University and Medical Center? Very linac tech based accelerator, and you know why? Because it was essentially designed and built with the cooperation of the linac people from Fermilab.
If you are thinking accelerators that produce nuclear materials for medicine by bombarding non radioactive materials, sure but accelerators that produce beam to treat people, such as those with tumors, those are either cyclotron or linac style. All descended from HEP tech BTW.
Also linacs in HEP are quite common. They are a great way to do initial acceleration where the particles are no where near c at first and exit close enough to c to have sufficient momentum so that bending magnets will not unfocus the beam into the beam pipe.
Finally the neutron therapy facility at fermilab, essentially a wall, chair, and a bend magnet in the regular linac has saved countless lives over the last 25 years.
Also why no response from you in my comment where I list numerous advancements outside of HEP because of HEP research. Are you still loking for some small issue with what I wrote to latch onto and attack?
Here is another one, Where do you thin eurocard was used extensively first? Hint it ends RN and starts CE. Eurocard became VME which was an industry standard (in fact the dominant one in industry for some time) and also in the US NAVY. Fermilab was one of the principle members of VITA (the org that runs VME) as well. Lots of companies made their fortune on VME initially, probably the two largest were Motorola and Sun Microsystems. These two companies in particular really left a stamp on technology in the first 20 years of your last 25 years question, all of it completely unrelated to HEP. Also lots and lots of money directly and indirectly.
MS bought a company that made a product that did virtualization via binary translation, VPC. Why they do not enable that for teh cases where there is lacking VT hw support is not known to me.
The other unnecessary thing is all the current and recent Intel systems that have come-out with vt disabled solely to have a higher price part identical in every other way.
The other unnecessary thing is how ISVs make systems that have vt chipsets and procs yet disable it for product differentiation but added confusion (some cases of buggy SMM BIOS code are for support though). If Intel was not doing the same they could use leverage on the ISVs so that they would stop that practice.
I think it was more the tax benefits for a small business owner to buy a vehicle over a certain GVWR that led to the explosion of cheap to build easy to mark-up 'luxury' SUVs that tanked in sales when oil speculators drove the price of crude to unprecedented levels last summer.
And why are GM and Chrysler in trouble because of this? Because at the same time the whole mortgage and HELOC scam perpetrated by the likes of Countrywide and AIG made the credit market dry-up for GM and Chrysler.
Why is Ford okay? They mortgaged everything right before the crisis and so they have cash on hand plus they have sold off a brand or two for more quick cash. They are also in a decent place to sell their stake in Volvo if they need to.
What is wrong with your argument is that GM, Ford, and Chrysler did sell small cars (actually Chrysler screwed that up when they stopped selling the Neon at the worst possible time) but they never wanted to cannibalize the sales of their larger cars with bigger margins, so they always made small cars feel just as barely passable as everyone dreads a small car can be. The cars like the Civic and Golf/Jetta were in a different class. Oh and a lot of the small cars that Detroit sold here other than that Neon (which was killed so a higher margin mini-SUV could be sold instead, the sort of example you cite), were made with cheap non-union Mexican or S. Korean labor.
What if your chipset does not support it? What if your BIOS or EFI disables it? What if your SMM code is buggy and borks it?
It is HARD to figure-out if it will work on your particular computer unless you find some forum post where someone tried it and reported on it with the same computer you are interested in.
Here is the unnecessary part:
The Virtual XP feature is only present in some editions of Win7.
How do you know if you are going to need that feature until you buy a version of Win7 and see if all your old programs work without it first?
Maybe I am not really getting your RX caching argument but it sort of falls apart.
Say you have a typical nic where it DMAs frames into main memory. That's actually a pretty typical scenario. So there is this frame in DRAM. Depending on the cache and dma coherence and snooping abilities of your hardware that frame is not in the cache of the processor at that point yet. The kernel now copies it to the user space buffer. That is where it gets in the cpu cache, but with fancy hardware you might have just cached the orignal DMAed copy as well.
If there was no copy it would have gotten cached the moment the userspace process accessed it. There really is no win there and if you have fancy hardware you are flushing extra cachelines. So you have gained nothing, possibly flushed something else you will need soon out of the cache, and also added a copy.
I can tell you from personal experience that I have had great improvements from zero_copy in FreeBSD and Solaris. It really is a lot of time that the cpu burns copying buffers around. I also have the displeasure of dealing with with vxWorks and historically network performance has been abysmal. A lot of it comes down to the netTask but there is certainly a lot of overheard in copying. Using zBufs was not much of an improvement, I am thinking that was due to the overhead in zBufs calls themselves. I had some progress in using so_socket itself but never really finished that-up. Wind River has moved to some proprietary stack it seems now so who knows what is the situation currently.
XTI gave you more control at the application level. So for example you could do TCP/IP, IPX, with very similar code NetBIOS. With sockets you need kernel and/or libsocket support for that kind of thing like how you have SOCK_STREAM, SOCK_DGRAM, and SOCK_RAW. The other thing is it allowed you to tweak things more easily since the kernel was not so involved. Say you did not want naggle with your TCP or you wanted to make a protocol something like SCTP to T/TCP.
Tell him of outside interest. Also there could be a good paper in this to various conferences with notes about how it was related to research supported by said company. That always looks good.
Yes and /dev/poll on Solaris and kqueue on FreeBSD or just do the best thing on whatever your system is by using libevent:
http://www.monkey.org/~provos/libevent/
Poll is sadly O(N) but there are some optimizations that can be made to poll to make it faster.
First poll does not need to be a simple syscall that copies over the entire array into kernel memory. Almost every time that poll is called that array is identical to what it was the last time and at the same address. libc can in userspace first compare that array, base addr, and size and when all is the same it can call a faster poll syscall or pass an arg to tell it to do the fast path of what was done the last time.
Secondly the kernel does not need to copy a buffer for the results into the userspace. It can simply just twiddle that memory itself while still in the kernel context.
Finally there can be optimizations for the simple typical case of there only being a few descriptors, like 8 or less.
Many systems do some or all of those, from performance measurements it seems to me that Solaris does particularly well.
Sure you can. Check-out zerocopy on FreeBSD and Solaris. Also mmap and sendfile has done the zerocopy tcp send half on Linux since 2.4 and there are some more hackish zerocopy read half things. Also you can already limit syscalls with TCP_CORK (and on the BSDs the similar TCP_NOPUSH).
TCP suffers from "head of the line blocking" see the URG hack for more about that. HTTP being on top of TCP suffers from this as well. What this means is say one fragment gets lost. It will take something like 25 seconds for that to be resent. In the meantime all the other data that has been coming along cannot be passed along to your program.
I am very impressed with what you described. Are you in a position to share the code?
There is always UDP (you can use the same socket for many 'connections'). It really is not too hard to come-up with a little UDP protocol that has a message id and time-out with a retry that gets you really what you need. You can even leverage multicast then. Also if are on a cluster where you can get everyone to use a jumbo frame there is zero copy UDP on both FreeBSD and Solaris. Also at least many of the Solaris (this is a hit or miss feature on other OSs) nic drivers are able to turn down the interrupt rate way down by polling every now and then under heavy load instead of interrupting on every frame.
There is also RAW sockets and SCTP but I have found UDP+multicast work great.
Sure it was cool how you could push and pop drivers (say you wanted a different line discipline) but please tell me how it prevented any copies? The AT&T implementation also had two extra context switches.
This is what was bad about STREAMS:
In early implementations there was no notion of multithreading so a bad thing happened later. There was a time when the STREAMS drivers and demultiplexers assumed single threaded so the kernel had to pass off to a single worker thread in the kernel everything STREAMS related. So yo had some big iron box of the time with say four processors and IO performance was just balls until the STREAMS drivers were rewritten. But then you still had that worker thread around, so one thing was that those were broken out, so there was an extra two thread switches there. Then they did some stuff variously called something like Fast STREAMS where the fast paths would not switch. So all this optimization work went in to making STREAMS fast and they were still slow. It turned-out that the reason for that was due to the complexity of the STREAMS subsystem and all the layering that caused so many extra function calls per driver. STREAMS have largely been relegated to legacy and conformance at this point.
Somebody mod this insightful.
There are also incompatibilities with MacOS X. It really is a lot in regards to feel to how Java was circa '98 where you have different versions of wxWidgets and on different platforms really affecting how you have to write the code so that it works right on everything. Also long ago, when they were still wxWindows, they dropped the support of wxX11 and had you use wxGTK on unix-alikes. The wxX11 code stayed so close to working, only little tweaks are needed every now and then, it is a shame that that branch went dead around 2 years ago. The reason is that you need a version of GTK+ on any box running a wxWidgets program now. I have never in recent years been able to compile a static library of what I needed of GTK+ so I could simply link it to the app. You get into real problems where wxWidgets does not work right due to the way that GTK+ was compiled or the version it is on a particular system and I have no good way around that except for gross LD_* overrides in shell scripts. Those libraries are HUGE too.
To be fair QSharedPointer showed-up in 4.5, hence the reason I had never heard of it until now. OPointer was forced upon me due to everything being a QObject, but then there was the other non-Qt half of the code that used boost, it was not pretty.
GTK+ is not horrible. The reasons people most often cite for it being horrible are in two categories:
The code is so convoluted.
That is because it is C and not C++ so it cannot take advantage of features of the language. There are times C (say you want to use it in a C program and do not want to do a bunch of glue code with C linkage) is nice and also it depends on less with a smaller overhead.
It does not do all the fancy stuff like SQL that Qt does:
Check again, things have changed, people have written code to do more now that may not be a part of GTK+ itself though but freely available. Also again Qt is a huge thing that then defines how the rest of your program has to be written to interface with it (containers, pointers, signals, and slots for example). Also it is resource intensive because it does so much.
shared_ptr and QPointer don't play together so nicely though. Also QPointer tends be slower than shared_ptr when I only need what shared_ptr does. To get them to work together I end-up copying things between the two or just end-up not using shared_ptr at all.
A little bit like SotC, I liked that too.
Psion made the even smaller things before that. I was always enamored by the HP 95LX:
http://en.wikipedia.org/wiki/HP_95LX
They also cost $400 when the later HP 200 series came-out, so a netbook-like price but much smaller.
DN3D came out a bit before Quake. I could not play Quake comfortably on my machine of the time. It was a 486 running at 33 MHz with a 1MB ISA SVGA card and 8-bit stereo SBPro. Duke ran fine though and I remember playing it on a lan in our dorms. It was very fun and funny. Years later I got a new K6 motherboard and borrowed a PCI gfx card. I was very excited to finally get to play Quake. I borrowed the card on Friday and returned it on Sunday. Quake just felt so lonely and there was absolutely no humor, it was not a fun game. Also the online was buggy, laggy, and people were just plain jerks. This was when I was on a 10 mb subnet.
But Duke was not all about the superior game play, it actually did have tech that was neat and first or nearly first at the time. It was able to run at above 320x200. It had levels where things could be above and below you. It had levels where you could damage walls to open new paths in the levels.
Also one thing I do not see mentioned very often about the game design aspect was the whole 3rd wall thing. There were all sorts of little nuggets sprinkled around related to that. I'd find them and think that was neat.
Sorry I did not see your reply earlier, but the other people did a great job. There are very few times to do any asm on a PC these days. In fact even if you are doing kernel/driver work it does not really come-up in practice as others before you have put the special stuff in the code that makes sure that all the right barriers are in the right places. C/C++ compilers do a very good job and it is not worth it to be non-portable for a few percent speed improvements.
So unless you are writing/maintaining a compiler, linker, debugger, thread library, C library, or kernel you won't really do any asm. If on the other hand you are doing something not needing to support more than a few targets and those targets are more speced for the least MIPS needed in order to hold down cost, then yes you do see asm in embedded in projects where you have big deployments.
Maybe you could look into some embedded ARM or Dragon (is that still around) board. In ARM you profile and then find that slow/small bus leads to a few routines that you can hand code in thumb to get very good improvements in performance. I don't do any of this (I do embedded in my job) but maybe you would be interested in homebrew on Nintendo DS, That would probably have that feel that you want to experience and it should not be very expensive to get into, just a DS and flash cart in addition to what you probably already have.
As to books, coding is not done this way anymore so there have not been books like this in a long time. The books (online) from the vendor of your chip will have all the info you need, but not in tutorial/textbook form. I honestly don't know a good starting point today other than to say maybe a more theoretical approach would be okay, ie first read a book or two by Hennesy and Patterson to get the theory of a machine, and then you realize that almost all machine languages are very similar and once you get one pretty good, others are easy to pick-up on quickly as well. I would also say that the TASM assembler rotted my brain in the same way that line number in BASIC did for computer programming. You should stay away from that big macro assembler mentality and go for a AT&T style assembler used in concert with a pre-processor (like cpp or m4) since that applies more generically to other architectures.
From your reply I just realized that to use real mode would be incredible stupid. A real mode emulator like you state makes much more sense. The reason being regardless of what an OS could do to try and make things unreachable from real mode that if you allowed a process to enter real mode you could put some a far pointer to some code to take over the machine at 40:67 and then just triple fault, Mea culpa.
My thumb dive has a write protect switch and I use it. Well technically it is an SD card in a USB reader.
I'm sorry my frustration was directed to the fellow you were answering and I forgot who I was responding to as my fingers went into a furry answering him as well.
Oh I forgot one thing. The tech that the DOE had rights to which was done at Fermilab related to magnets. The DOE granted rights to it very cheaply to GE for it to use in MRI machines. GE and all the other companies involved in that industry have made a fortune because of that. This directly created an entire new industry, a very big deal in macro economic terms, and also a good for further medical science and biology as well as the quality of life of humanity.
Ever hear of Loma Linda University and Medical Center? Very linac tech based accelerator, and you know why? Because it was essentially designed and built with the cooperation of the linac people from Fermilab.
If you are thinking accelerators that produce nuclear materials for medicine by bombarding non radioactive materials, sure but accelerators that produce beam to treat people, such as those with tumors, those are either cyclotron or linac style. All descended from HEP tech BTW.
Also linacs in HEP are quite common. They are a great way to do initial acceleration where the particles are no where near c at first and exit close enough to c to have sufficient momentum so that bending magnets will not unfocus the beam into the beam pipe.
Finally the neutron therapy facility at fermilab, essentially a wall, chair, and a bend magnet in the regular linac has saved countless lives over the last 25 years.
Also why no response from you in my comment where I list numerous advancements outside of HEP because of HEP research. Are you still loking for some small issue with what I wrote to latch onto and attack?
Here is another one, Where do you thin eurocard was used extensively first? Hint it ends RN and starts CE. Eurocard became VME which was an industry standard (in fact the dominant one in industry for some time) and also in the US NAVY. Fermilab was one of the principle members of VITA (the org that runs VME) as well. Lots of companies made their fortune on VME initially, probably the two largest were Motorola and Sun Microsystems. These two companies in particular really left a stamp on technology in the first 20 years of your last 25 years question, all of it completely unrelated to HEP. Also lots and lots of money directly and indirectly.
Drat I forgot a key unnecessary part:
MS bought a company that made a product that did virtualization via binary translation, VPC. Why they do not enable that for teh cases where there is lacking VT hw support is not known to me.
The other unnecessary thing is all the current and recent Intel systems that have come-out with vt disabled solely to have a higher price part identical in every other way.
The other unnecessary thing is how ISVs make systems that have vt chipsets and procs yet disable it for product differentiation but added confusion (some cases of buggy SMM BIOS code are for support though). If Intel was not doing the same they could use leverage on the ISVs so that they would stop that practice.
I think it was more the tax benefits for a small business owner to buy a vehicle over a certain GVWR that led to the explosion of cheap to build easy to mark-up 'luxury' SUVs that tanked in sales when oil speculators drove the price of crude to unprecedented levels last summer.
And why are GM and Chrysler in trouble because of this? Because at the same time the whole mortgage and HELOC scam perpetrated by the likes of Countrywide and AIG made the credit market dry-up for GM and Chrysler.
Why is Ford okay? They mortgaged everything right before the crisis and so they have cash on hand plus they have sold off a brand or two for more quick cash. They are also in a decent place to sell their stake in Volvo if they need to.
What is wrong with your argument is that GM, Ford, and Chrysler did sell small cars (actually Chrysler screwed that up when they stopped selling the Neon at the worst possible time) but they never wanted to cannibalize the sales of their larger cars with bigger margins, so they always made small cars feel just as barely passable as everyone dreads a small car can be. The cars like the Civic and Golf/Jetta were in a different class. Oh and a lot of the small cars that Detroit sold here other than that Neon (which was killed so a higher margin mini-SUV could be sold instead, the sort of example you cite), were made with cheap non-union Mexican or S. Korean labor.
Here is the hard part:
What if your chipset does not support it?
What if your BIOS or EFI disables it?
What if your SMM code is buggy and borks it?
It is HARD to figure-out if it will work on your particular computer unless you find some forum post where someone tried it and reported on it with the same computer you are interested in.
Here is the unnecessary part:
The Virtual XP feature is only present in some editions of Win7.
How do you know if you are going to need that feature until you buy a version of Win7 and see if all your old programs work without it first?