A 4TB drive would bother with compression why exactly? It won't improve any benchmarks. Someone like me trying to break the media would just write uncompressible garbage anyway.
A large overprovisioning pool does help stay in the dynamic wear leveling paradigm longer. If the drive performs any amount of static wear leveling, though, where it can get the rest of the sectors into the fun and there's a limited aging ratio / aging difference between the most-erased sector and least-erased sector, then the size of the overprovisioning space doesn't matter too much this attack—rather, the total media size matters.
The point of this attack is to limit the ability of the SSD to separate disk sectors into hot and cold effectively, so you can force the maximum number of erasures through garbage collection cycles. As long as the overprovisioned space is less than the ratio of exposed sector size (512 or 4K bytes) to erase sector size (likely something huge like 512K bytes), you can force a lot of erasures just for GC compaction. The drive will still spread these erasures out as much as possible. With the blended dynamic/static wear leveling model, you'll end up aging all of the sectors roughly evenly.
Once you get one sector to fail and get taken out of service, you're close to making as many more fail as you like, since the SSD did everything it could to age sectors evenly. The additional delta work you need to get it to fail completely isn't very big.
In case my GC point wasn't clear: Consider a simple SSD with a capacity of 32 "blocks", of which it advertises a capacity of 24 to the user, leaving 8 for overprovisioning. Suppose that filesystem blocks get grouped into erase blocks with a 4:1 ratio. Now suppose I've filled that disk up to capacity with a linear series of writes. You might represent it like so, with letters representing FS blocks with data, dashes representing empty clean FS blocks that the FTL can write to, and the groups of 4 separated by dots representing the erase blocks:
abcd.efgh.ijkl.mnop.qrst.uvwx.----.----
Now suppose I write to A, E, I, M. I need to migrate these FS blocks to new locations to absorb the writes. Their old locations are freed, but remain dirty (cannot be rewritten). After these 4 writes, my flash might look like this: (The '#' indicate free-but-dirty FS blocks.)
#bcd.#fgh.#jkl.#nop.qrst.uvwx.aeim.----
Before this SSD can execute my next rewrite, the flash needs to perform a GC cycle to ensure it always has a place to migrate data for GC. So, before processing another rewrite, it performs a GC cycle, migrating blocks until it has at least clean two erase sectors. For the sake of argument, let's assume it employs a simple round robin scheme to spread the GC over the media evenly:
The point of write amplification is solely to get more sectors into the erasure party. A single small write forces the SSD to migrate full sector A to empty sector B.
If you could send direct physical sector erasure and physical write commands to the media, you could just tell it to erase each sector and rewrite its first byte repeatedly until that sector failed, and then march to the next sector.
But, you don't have the opportunity to do that. Instead, you must interact at the filesystem level, and there's an FTL between the file system and the media. So, if your goal is to ruin the media quickly with those two layers between you, you want to minimize the FTL's ability to filter writes out and reduce the required number of erasures.
I'm aware that erase times for sectors are huge, and will slow your I/O rate accordingly. You're right that write amplification doesn't necessary shorten the calendar days to failure, as a different write pattern may have triggered the same number of erasures in the time frame with a larger number of writes. But, writes aren't free either, so there's at least some, err, benefit to minimizing the number of writes required to ruin your SSD, if your goal is to ruin your SSD as quickly as possible.
If you want to write once and forget it, you can fill the thing right to the top of the advertised capacity, and you don't have to worry about failure due to wear. Instead, you have to worry about failure due to electrons migrating off the bits. So, you need to refresh all the bits every so often, much like DRAM, only with a much slower refresh interval. Even if you refresh all the bits on the drive once a day, if you do so in a nice, orderly manner, I'd imagine you won't reach the rewrite limit for the drive in your lifetime.
Yes, I know what Stacker, DoubleSpace and DriveSpace were as a technical implementation. My point is mainly that modern Windows still offers a mechanism to compress files on a live filesystem. It just lets you select at folder granularity rather than whole-disk granularity. The fact that they're not implemented at the same layer of the stack I didn't think was relevant here.
Stacker et al worked at the sector level so that you didn't need to modify DOS or the umpteen programs that made use of sector-level access to the filesystem, and insisted on that level of access in order to function. Copy protection schemes and disk editors both relied on it. (Defraggers too, although defragging a compressed volume is... crazy.) Databases may have as well; I'm not certain. Windows NT forces programs through a narrower, more controlled window of APIs to access the file system.
I actually had a Stacker'd hard drive back in the day and had read up on all the tech, so it's not like I'm unfamiliar with it.
Ok, I just checked in my WinXP box: You can right click on a folder, go to "Properties". Click "Advanced", and there's an option to "Compress to save disk space." I'm too lazy to go get my Win7 laptop to see if that's still there.
So, some version of TroubleSpace...err...DoubleSpace...err...DriveSpace survived beyond Win98.
If you know something about the drive's sector migration policies, in theory you could construct a worst-case amplification attack against a given drive. Leverage that against the drive's wear leveling policies. But, that seems rather unlikely.
Flash pages retain their data until they're erased. You can write at the byte level, but you must erase at the full page level. You can't rewrite a byte until you erase the page that contains it. That's the heart of the attack: Rewriting sectors with new data. You can't rewrite a sector in-place. You mark the old location as "dirty but free", and write the new data to a new location. The SSD can't reclaim the dirty-but-free sectors for writing until they're erased.
Thus, the basic idea goes something like this: Fill the disk to 99.9% full. Then, selectively rewrite individual sectors, forcing the sector to migrate to a new flash page. Wash, rinse, repeat until the drive fails.
If the drive only performs dynamic wear leveling, all subsequent rewrites will erase and reuse only among the free space. (Note: This free space includes all of the space the drive reserves to itself for dynamic wear leveling purposes.) Now all you need to do is reach the erase/rewrite limit among the available dynamic wear leveling pool, which is significantly smaller than the full drive capacity. You can achieve this by rewriting a small subset of sectors until the disk falls over.
Modern drives perform a blend of dynamic and static wear leveling. Dynamic wear leveling only erases/rewrites among the "free" space. Static wear leveling gets otherwise untouched sectors into the fray by wear leveling over all sectors. This blended approach defers static wear leveling until it becomes absolutely necessary. The flash translation layer (FTL) detects when the wear difference between sectors gets too imbalanced, and migrates static sectors into the worn regions and wear-levels over the previously "static" sectors.
A successful attack would take this into account and attempt to keep track of which sectors would be marked "static" vs. "dynamic". It would also predict how the static sectors were grouped together into pages, so it could cherry-pick and inflict the maximum damage: All it needs to do is write to a single sector in each static flash page (creating a bunch of unallocated "dirty-but-free" holes), continuing until the SSD was forced into a garbage collection cycle. That GC cycle then would have to touch all the static pages (or at least a significant fraction) to compact the holes away and make space available for future writes.
If you can keep that up, you can magnify your writes by the ratio between the page size and the sector size. If you have 512 byte sectors and 512K bytes pages, the amplification factor is 1024.
But, as I suggested above, to achieve this directly, you need to have some idea of how the SSD marks things static vs. dynamic. Without such knowledge, you have to approximate.
I imagine if you really wanted to kill an SSD without any knowledge of its algorithms, you could do something simple like rewrite every allocated sector in an arbitrary order, shuffling the order each time. SSD algorithms assume a distribution of "hotness" (ie. some sectors are "hot" and will be rewritten regularly, and most are "cold" and will be rewritten rarely if ever), and so rewriting all sectors in a random order will cause rather persistent fragmentation, recurring GC cycles, and pretty noticeable amplification.
You wouldn't get to the 40 day mark, but if you started with a mostly full SSD, you might get to a few months.
That's my back-of-the-napkin, "I wrote an FTL once and had to reason through all this" estimate.
Yeah, the fact that it takes a UnaryPredicate does pretty much mean you need to use a lambda or a functor. I also meant my comment half ironically. It is a step forward, but you still have shit on your shoe.
I actually use C++ for embedded programming, because when used with care, it can actually do a better job than C for a number of things. I use template meta-programming to compute various things at compile time, such as, say, register initialization values and what not. Sure, I can do the same with #define and a boat load of macros, but that has its own issues. Not only are macros messy in their own way, they don't provide a good way to sanity check your settings. With templates and types done right, I can actually get the compiler to sanity check my settings at compile time. I don't know how many times I've chased down a bug due to swapped macro parameters that could have been caught compile-time with some type checking / trait checking.
I've written an entire C++ based support library just for this purpose. One of its goals is extreme compactness and cycle efficiency, since the code often needs to run in RTL simulation. Software RTL simulation of a large SoC runs in the 10s to 1000s of cycles per second, so cycle efficiency is at an extreme premium.
What my library largely replaces is other C and assembly code that (often hamfistedly) computes everything at run time, and so my code can handily beat that.
I haven't quite hit the nirvana of generating an entire MMU page tree from a compact memory map description using templates (I have a perl script for that), but it sure beats 100,000s cycles or more computing it at run time when that translates to hours of sim time. (Fun fact: Some rather popular modern processors run really slow until you turn the MMU on, because they can't cache any data until you do.)
I have however written dynamic code generators that use templates and function overloading to resolve as much of the opcode encoding as possible at compile time, so that the run-time portion usually is just a "store constant" or maybe a quick field insert into a constant followed by a store. Those can pump opcodes to memory as fast as an opcode per cycle (and in some special cases, faster), which is pretty darn good. Again, all typechecked as much as possible at compile time, to minimize or eliminate the possibility I generate invalid instructions.
Suppose you want to determine if a collection c contains an element e. In any other language, you'd write something like c.contains(e).
I have good news for you! Sure, you still need to provide begin() and end() to specify a range, but it's a step forward. And, with the new non-member begin() and end() you can even use it on plain arrays.
Yeah, you still have to put all the pieces together yourself, but the pieces are a bit more uniform now and there's usually fewer to worry about. (Especially now with auto.)
Before auto it seemed like C++'s error messages were downright passive aggressive: "If you don't know what to put here, I'm not going to tell you." At least, it's not going to tell me the concise thing to put there. It will tell me the completely flattened type, which can be quite huge if you're trying to, say, get an iterator to a nested STL container holding a template class composed against some other classes (that themselves might be templated) a'la the Policy pattern.
I just wish I didn't have to code for the lowest common denominator compiler at work, so I can be sure I can use auto with impunity.:-)
I unfortunately claim ignorance on the license for the runtime. I know some of my employer's products use Dinkumware for the C++ library, but I'm not sure what this processor uses. (TMS320C6600 family, if you're curious.) I'm usually at the other end of the pipeline, using the pre-alpha tools before the silicon exists, so I'm pretty far removed from the customer toolchain distribution end of things. Sorry I can't be more helpful on that detail. I can tell you all about VLIW instruction scheduling and cache memory system pipeline behavior though!
You do have to be sure to compile with full optimization enabled, though, for STL to have a minimal hit. I use STL quite happily to do things I only too eagerly rolled my own implementation of years ago, and then clung to, even if it wasn't a perfect fit. For example, for eons I carried around this AVL tree implementation I wrote for a data structures class, and used it to implement associative containers, just so I wouldn't have to do it again. These days, it's simply map< yadda, yadda > and I'm on my way. I'm willing to bet map<> beats that creaky old AVL tree any day.
Without optimization, the STL containers can slow down quite a bit. I've heard the effect is especially large on some versions of MSVC++, since they have special debugging versions of the iterators that incur their own cost penalties in return for other checks. I wouldn't know; I do all my development under Linux or for embedded processors on bare metal.
With optimization on, I rarely if ever notice a performance issue due to STL. I do run into the occasional limitation, such as needing an actual resizeable 2-D array-like structure. (A vector< vector<... > > doesn't cut it, because resizing the inner dimension doesn't resize all rows.) But, that's more exception than rule.
My biggest complaint about C++11 is that I won't realistically be able to use it for another few years. Grrr.
Just for fun, I tried the same experiment on one of our DSPs, and it pulled in just over 64K. I think our library is generally leaner in the locale department. In fact, I didn't see any locale data linked in. Most of what it pulled in looks to be actual ios/istream/ostream stuff, basic_string<char> and basic_string<wchar_t>.
How about when parents go phone shopping for their kids? It'll at least have a chance of affecting teen drivers, which already are among the worst drivers and heaviest texters.
Yes, imagine a world where the laws of thermodynamics don't apply.
The peak theoretical efficiency of an internal combustion engine is bounded by the efficiency of an equivalent ideal Carnot cycle, which if I remember my ME301 Thermo class, is a bit below 40%. Wikipedia backs me up on this, quoting a limit of 37% for a steel engine block. That jibes with what I remember learning in Thermo.
To get 80% efficiency out of gasoline would require a different method of releasing its energy than an internal combustion engine.
You are correct. I got my wires crossed. I actually have a 7805 replacement here in my "lab" that is an actual switching regulator. And for some reason I had mentally bucketed it with LDOs, which as you noted, are just low-dropout varieties of linear regulators. And yes, switching regulators like these are a little pricier, although I believe with the RECOM R-78xx series you're just paying for the convenience of swapping out a 7805 space heater without touching the rest of your circuit.:-P
500W dissipated in the volume of a soda can is quite a lot. My heat-shrink gun that claims to go up to 500C is rated at 1200W, and that's the air that's leaving it, not the coils inside. Its barrel is about the size of two soda cans.
Because, when it comes to car commercials, ad agencies are bound by so many rules and regulations regarding depictions of reckless driving and such things that it becomes almost impossible tp create a cool car commercial without running the risk of going to court over it (both the ad agency AND car manufacturer).
Ok, the statement I made in my third sentence above is imprecise to the point of being inaccurate. The exact property, as described by Wikipedia:
The original proof shows that for overlapping reads and writes to the same storage cell only the write must be correct. The read operation can return an arbitrary number. Therefore this algorithm can be used to implement mutual exclusion on memory that lacks synchronisation primitives
So the part about not needing "properly arbitrated memory access" is mostly true—a read that collides with a write to the same location can return garbage. Writes still must update memory properly, and presumably must be sequentially consistent.
My first encounter with Leslie's work was Lamport's Bakery. It's a serialization primitive with some surprising properties. For example, it doesn't require properly arbitrated access to memory as the initial value read from memory on entrance to the "bakery" actually doesn't matter!
Dr. Lamport was actually kind enough to reply to an email of mine regarding said primitive. I was optimizing a version of it for a multiprocessor device we were making where I work, and I had come upon what I thought was a clever optimization. (I actually vectorized a portion of the algorithm by way of the "unroll and jam" transformation, so I could test the state of multiple processors in parallel, rather than in serial order as described in the algorithm.) He actually took the time to respond to my email, and was quite gracious. His reply:
In the Bakery Algorithm, process i must wait until a certain condition holds for each other process. The order in which it checks for the different other processes does not matter. So, the algorithm can be parallelized in the manner you suggest.
The only time I was more thrilled on a topic like this was when Dr. Knuth replied to mail I sent him regarding a particular algorithm in Volume 4 of TAOCP. I actually received a hand written reply. Well, he hand wrote notes on a printed copy of the email I had sent to his TAOCP feedback address. Dr. Knuth also encourages me to let all my friends know how much I like TAOCP. So, consider yourself informed: I think Knuth's The Art of Computer Programming series is worth its weight in gold, and if you consider yourself a computer scientist or computer engineer, you should consider getting yourself a copy, and investing the time to at least skim it. (Let's face it, to truly understand everything in there would require as much time as Don put into writing it.)
A 4TB drive would bother with compression why exactly? It won't improve any benchmarks. Someone like me trying to break the media would just write uncompressible garbage anyway.
A large overprovisioning pool does help stay in the dynamic wear leveling paradigm longer. If the drive performs any amount of static wear leveling, though, where it can get the rest of the sectors into the fun and there's a limited aging ratio / aging difference between the most-erased sector and least-erased sector, then the size of the overprovisioning space doesn't matter too much this attack—rather, the total media size matters.
The point of this attack is to limit the ability of the SSD to separate disk sectors into hot and cold effectively, so you can force the maximum number of erasures through garbage collection cycles. As long as the overprovisioned space is less than the ratio of exposed sector size (512 or 4K bytes) to erase sector size (likely something huge like 512K bytes), you can force a lot of erasures just for GC compaction. The drive will still spread these erasures out as much as possible. With the blended dynamic/static wear leveling model, you'll end up aging all of the sectors roughly evenly.
Once you get one sector to fail and get taken out of service, you're close to making as many more fail as you like, since the SSD did everything it could to age sectors evenly. The additional delta work you need to get it to fail completely isn't very big.
In case my GC point wasn't clear: Consider a simple SSD with a capacity of 32 "blocks", of which it advertises a capacity of 24 to the user, leaving 8 for overprovisioning. Suppose that filesystem blocks get grouped into erase blocks with a 4:1 ratio. Now suppose I've filled that disk up to capacity with a linear series of writes. You might represent it like so, with letters representing FS blocks with data, dashes representing empty clean FS blocks that the FTL can write to, and the groups of 4 separated by dots representing the erase blocks:
abcd.efgh.ijkl.mnop.qrst.uvwx.----.----
Now suppose I write to A, E, I, M. I need to migrate these FS blocks to new locations to absorb the writes. Their old locations are freed, but remain dirty (cannot be rewritten). After these 4 writes, my flash might look like this: (The '#' indicate free-but-dirty FS blocks.)
#bcd.#fgh.#jkl.#nop.qrst.uvwx.aeim.----
Before this SSD can execute my next rewrite, the flash needs to perform a GC cycle to ensure it always has a place to migrate data for GC. So, before processing another rewrite, it performs a GC cycle, migrating blocks until it has at least clean two erase sectors. For the sake of argument, let's assume it employs a simple round robin scheme to spread the GC over the media evenly:
##cd.#fgh.#jkl.#nop.qrst.uvwx.aeim.b--- Migrate 'b'.
###d.#fgh.#jkl.#nop.qrst.uvwx.aeim.bc-- Migrate 'c'.
####.#fgh.#jkl.#nop.qrst.uvwx.aeim.bcd- Migrate 'd'.
----.#fgh.#jkl.#nop.qrst.uvwx.aeim.bcd- Erase first sector.
gh--.####.#jkl.#nop.qrst.uvwx.aeim.bcdf Migrate 'f', 'g', and 'h'.
gh--.----.#jkl.#nop.qrst.uvwx.aeim.bcdf Erase second sector.
ghjk.l---.####.#nop.qrst.uvwx.aeim.bcdf Migrate 'j', 'k', 'l'.
ghjk.l---.----.#nop.qrst.uvwx.aeim.bcdf Erase third sector.
ghjk.lnop.----.####.qrst.uvwx.aeim.bcdf Migrate 'n', 'o', 'p'.
ghjk.lnop.----.----.qrst.uvwx.aeim.bcdf Erase fourth sector.
Now I continue with my dickish attack, and rewrite Q, U, A, and B:
#hjk.#nop.quab.----.#rst.#vwx.#eim.#cdf
Oh, hey, that'll trigger another GC cycle to free up a sector:
#hjk.#nop.quab.rst-.####.#vwx.#eim.#cdf Migrate 'r', 's', 't'
#hjk.#nop.quab.rst-.----.#vwx.#eim.#cdf Erase fifth sector
#hjk.#nop.quab.rstv.wx--.####.#eim.#cdf Migrate 'v', 'w', 'x'
#hjk.#nop.quab.rstv.wx--.----.#eim.#cdf Erase sixth s
The point of write amplification is solely to get more sectors into the erasure party. A single small write forces the SSD to migrate full sector A to empty sector B.
If you could send direct physical sector erasure and physical write commands to the media, you could just tell it to erase each sector and rewrite its first byte repeatedly until that sector failed, and then march to the next sector.
But, you don't have the opportunity to do that. Instead, you must interact at the filesystem level, and there's an FTL between the file system and the media. So, if your goal is to ruin the media quickly with those two layers between you, you want to minimize the FTL's ability to filter writes out and reduce the required number of erasures.
I'm aware that erase times for sectors are huge, and will slow your I/O rate accordingly. You're right that write amplification doesn't necessary shorten the calendar days to failure, as a different write pattern may have triggered the same number of erasures in the time frame with a larger number of writes. But, writes aren't free either, so there's at least some, err, benefit to minimizing the number of writes required to ruin your SSD, if your goal is to ruin your SSD as quickly as possible.
If you want to write once and forget it, you can fill the thing right to the top of the advertised capacity, and you don't have to worry about failure due to wear. Instead, you have to worry about failure due to electrons migrating off the bits. So, you need to refresh all the bits every so often, much like DRAM, only with a much slower refresh interval. Even if you refresh all the bits on the drive once a day, if you do so in a nice, orderly manner, I'd imagine you won't reach the rewrite limit for the drive in your lifetime.
Still, I'm not sure I'd choose an SSD for that.
Yes, I know what Stacker, DoubleSpace and DriveSpace were as a technical implementation. My point is mainly that modern Windows still offers a mechanism to compress files on a live filesystem. It just lets you select at folder granularity rather than whole-disk granularity. The fact that they're not implemented at the same layer of the stack I didn't think was relevant here.
Stacker et al worked at the sector level so that you didn't need to modify DOS or the umpteen programs that made use of sector-level access to the filesystem, and insisted on that level of access in order to function. Copy protection schemes and disk editors both relied on it. (Defraggers too, although defragging a compressed volume is... crazy.) Databases may have as well; I'm not certain. Windows NT forces programs through a narrower, more controlled window of APIs to access the file system.
I actually had a Stacker'd hard drive back in the day and had read up on all the tech, so it's not like I'm unfamiliar with it.
Ok, I just checked in my WinXP box: You can right click on a folder, go to "Properties". Click "Advanced", and there's an option to "Compress to save disk space." I'm too lazy to go get my Win7 laptop to see if that's still there.
So, some version of TroubleSpace...err...DoubleSpace...err...DriveSpace survived beyond Win98.
If you know something about the drive's sector migration policies, in theory you could construct a worst-case amplification attack against a given drive. Leverage that against the drive's wear leveling policies. But, that seems rather unlikely.
Flash pages retain their data until they're erased. You can write at the byte level, but you must erase at the full page level. You can't rewrite a byte until you erase the page that contains it. That's the heart of the attack: Rewriting sectors with new data. You can't rewrite a sector in-place. You mark the old location as "dirty but free", and write the new data to a new location. The SSD can't reclaim the dirty-but-free sectors for writing until they're erased.
Thus, the basic idea goes something like this: Fill the disk to 99.9% full. Then, selectively rewrite individual sectors, forcing the sector to migrate to a new flash page. Wash, rinse, repeat until the drive fails.
If the drive only performs dynamic wear leveling, all subsequent rewrites will erase and reuse only among the free space. (Note: This free space includes all of the space the drive reserves to itself for dynamic wear leveling purposes.) Now all you need to do is reach the erase/rewrite limit among the available dynamic wear leveling pool, which is significantly smaller than the full drive capacity. You can achieve this by rewriting a small subset of sectors until the disk falls over.
Modern drives perform a blend of dynamic and static wear leveling. Dynamic wear leveling only erases/rewrites among the "free" space. Static wear leveling gets otherwise untouched sectors into the fray by wear leveling over all sectors. This blended approach defers static wear leveling until it becomes absolutely necessary. The flash translation layer (FTL) detects when the wear difference between sectors gets too imbalanced, and migrates static sectors into the worn regions and wear-levels over the previously "static" sectors.
A successful attack would take this into account and attempt to keep track of which sectors would be marked "static" vs. "dynamic". It would also predict how the static sectors were grouped together into pages, so it could cherry-pick and inflict the maximum damage: All it needs to do is write to a single sector in each static flash page (creating a bunch of unallocated "dirty-but-free" holes), continuing until the SSD was forced into a garbage collection cycle. That GC cycle then would have to touch all the static pages (or at least a significant fraction) to compact the holes away and make space available for future writes.
If you can keep that up, you can magnify your writes by the ratio between the page size and the sector size. If you have 512 byte sectors and 512K bytes pages, the amplification factor is 1024.
But, as I suggested above, to achieve this directly, you need to have some idea of how the SSD marks things static vs. dynamic. Without such knowledge, you have to approximate.
I imagine if you really wanted to kill an SSD without any knowledge of its algorithms, you could do something simple like rewrite every allocated sector in an arbitrary order, shuffling the order each time. SSD algorithms assume a distribution of "hotness" (ie. some sectors are "hot" and will be rewritten regularly, and most are "cold" and will be rewritten rarely if ever), and so rewriting all sectors in a random order will cause rather persistent fragmentation, recurring GC cycles, and pretty noticeable amplification.
You wouldn't get to the 40 day mark, but if you started with a mostly full SSD, you might get to a few months.
That's my back-of-the-napkin, "I wrote an FTL once and had to reason through all this" estimate.
Yeah, the fact that it takes a UnaryPredicate does pretty much mean you need to use a lambda or a functor. I also meant my comment half ironically. It is a step forward, but you still have shit on your shoe.
I actually use C++ for embedded programming, because when used with care, it can actually do a better job than C for a number of things. I use template meta-programming to compute various things at compile time, such as, say, register initialization values and what not. Sure, I can do the same with #define and a boat load of macros, but that has its own issues. Not only are macros messy in their own way, they don't provide a good way to sanity check your settings. With templates and types done right, I can actually get the compiler to sanity check my settings at compile time. I don't know how many times I've chased down a bug due to swapped macro parameters that could have been caught compile-time with some type checking / trait checking.
I've written an entire C++ based support library just for this purpose. One of its goals is extreme compactness and cycle efficiency, since the code often needs to run in RTL simulation. Software RTL simulation of a large SoC runs in the 10s to 1000s of cycles per second, so cycle efficiency is at an extreme premium.
What my library largely replaces is other C and assembly code that (often hamfistedly) computes everything at run time, and so my code can handily beat that.
I haven't quite hit the nirvana of generating an entire MMU page tree from a compact memory map description using templates (I have a perl script for that), but it sure beats 100,000s cycles or more computing it at run time when that translates to hours of sim time. (Fun fact: Some rather popular modern processors run really slow until you turn the MMU on, because they can't cache any data until you do.)
I have however written dynamic code generators that use templates and function overloading to resolve as much of the opcode encoding as possible at compile time, so that the run-time portion usually is just a "store constant" or maybe a quick field insert into a constant followed by a store. Those can pump opcodes to memory as fast as an opcode per cycle (and in some special cases, faster), which is pretty darn good. Again, all typechecked as much as possible at compile time, to minimize or eliminate the possibility I generate invalid instructions.
I have good news for you! Sure, you still need to provide begin() and end() to specify a range, but it's a step forward. And, with the new non-member begin() and end() you can even use it on plain arrays.
Yeah, you still have to put all the pieces together yourself, but the pieces are a bit more uniform now and there's usually fewer to worry about. (Especially now with auto.)
Before auto it seemed like C++'s error messages were downright passive aggressive: "If you don't know what to put here, I'm not going to tell you." At least, it's not going to tell me the concise thing to put there. It will tell me the completely flattened type, which can be quite huge if you're trying to, say, get an iterator to a nested STL container holding a template class composed against some other classes (that themselves might be templated) a'la the Policy pattern.
I just wish I didn't have to code for the lowest common denominator compiler at work, so I can be sure I can use auto with impunity. :-)
I unfortunately claim ignorance on the license for the runtime. I know some of my employer's products use Dinkumware for the C++ library, but I'm not sure what this processor uses. (TMS320C6600 family, if you're curious.) I'm usually at the other end of the pipeline, using the pre-alpha tools before the silicon exists, so I'm pretty far removed from the customer toolchain distribution end of things. Sorry I can't be more helpful on that detail. I can tell you all about VLIW instruction scheduling and cache memory system pipeline behavior though!
You do have to be sure to compile with full optimization enabled, though, for STL to have a minimal hit. I use STL quite happily to do things I only too eagerly rolled my own implementation of years ago, and then clung to, even if it wasn't a perfect fit. For example, for eons I carried around this AVL tree implementation I wrote for a data structures class, and used it to implement associative containers, just so I wouldn't have to do it again. These days, it's simply map< yadda, yadda > and I'm on my way. I'm willing to bet map<> beats that creaky old AVL tree any day.
Without optimization, the STL containers can slow down quite a bit. I've heard the effect is especially large on some versions of MSVC++, since they have special debugging versions of the iterators that incur their own cost penalties in return for other checks. I wouldn't know; I do all my development under Linux or for embedded processors on bare metal.
With optimization on, I rarely if ever notice a performance issue due to STL. I do run into the occasional limitation, such as needing an actual resizeable 2-D array-like structure. (A vector< vector< ... > > doesn't cut it, because resizing the inner dimension doesn't resize all rows.) But, that's more exception than rule.
My biggest complaint about C++11 is that I won't realistically be able to use it for another few years. Grrr.
It seems like writing a terminal driver is a rite of passage. :-) I did one for our DSPs, and most recently, the Intellivision.
Just for fun, I tried the same experiment on one of our DSPs, and it pulled in just over 64K. I think our library is generally leaner in the locale department. In fact, I didn't see any locale data linked in. Most of what it pulled in looks to be actual ios/istream/ostream stuff, basic_string<char> and basic_string<wchar_t>.
How about when parents go phone shopping for their kids? It'll at least have a chance of affecting teen drivers, which already are among the worst drivers and heaviest texters.
Yes, imagine a world where the laws of thermodynamics don't apply.
The peak theoretical efficiency of an internal combustion engine is bounded by the efficiency of an equivalent ideal Carnot cycle, which if I remember my ME301 Thermo class, is a bit below 40%. Wikipedia backs me up on this, quoting a limit of 37% for a steel engine block. That jibes with what I remember learning in Thermo.
To get 80% efficiency out of gasoline would require a different method of releasing its energy than an internal combustion engine.
Interesting. It'd be funnier if it weren't so sad.
Does Gene Ray have a teenaged / twenty-something relative somewhere that's "into computers"? If so, I think he could be your troll. :-P
You are correct. I got my wires crossed. I actually have a 7805 replacement here in my "lab" that is an actual switching regulator. And for some reason I had mentally bucketed it with LDOs, which as you noted, are just low-dropout varieties of linear regulators. And yes, switching regulators like these are a little pricier, although I believe with the RECOM R-78xx series you're just paying for the convenience of swapping out a 7805 space heater without touching the rest of your circuit. :-P
LDOs aren't that expensive, and certainly wouldn't dissipate that much power.
500W dissipated in the volume of a soda can is quite a lot. My heat-shrink gun that claims to go up to 500C is rated at 1200W, and that's the air that's leaving it, not the coils inside. Its barrel is about the size of two soda cans.
I see three options: Blue waterbear with spaghetti, Grey Tron waterbear, and Frumpy waterbear
And yet, this commercial had zero driving at all!
Ok, the statement I made in my third sentence above is imprecise to the point of being inaccurate. The exact property, as described by Wikipedia:
So the part about not needing "properly arbitrated memory access" is mostly true—a read that collides with a write to the same location can return garbage. Writes still must update memory properly, and presumably must be sequentially consistent.
My first encounter with Leslie's work was Lamport's Bakery. It's a serialization primitive with some surprising properties. For example, it doesn't require properly arbitrated access to memory as the initial value read from memory on entrance to the "bakery" actually doesn't matter!
Dr. Lamport was actually kind enough to reply to an email of mine regarding said primitive. I was optimizing a version of it for a multiprocessor device we were making where I work, and I had come upon what I thought was a clever optimization. (I actually vectorized a portion of the algorithm by way of the "unroll and jam" transformation, so I could test the state of multiple processors in parallel, rather than in serial order as described in the algorithm.) He actually took the time to respond to my email, and was quite gracious. His reply:
In the Bakery Algorithm, process i must wait until a certain condition holds for each other process. The order in which it checks for the different other processes does not matter. So, the algorithm can be parallelized in the manner you suggest.
The only time I was more thrilled on a topic like this was when Dr. Knuth replied to mail I sent him regarding a particular algorithm in Volume 4 of TAOCP. I actually received a hand written reply. Well, he hand wrote notes on a printed copy of the email I had sent to his TAOCP feedback address. Dr. Knuth also encourages me to let all my friends know how much I like TAOCP. So, consider yourself informed: I think Knuth's The Art of Computer Programming series is worth its weight in gold, and if you consider yourself a computer scientist or computer engineer, you should consider getting yourself a copy, and investing the time to at least skim it. (Let's face it, to truly understand everything in there would require as much time as Don put into writing it.)