I just looked it up and 3-SAT was proved to be NPC in 1974 by Cook and Levin.
The first problem proved NPC was general CNF SATISFIABILITY, by Cook in 1971. SATISFIABILITY can be shown to polynomially reduce to (CNF) 3-SAT (proving that 3-SAT is NPC), and 3-SAT reduces to lots of other problems, which themselves reduce to even more problems (this is the typical way to prove a new problem NPC).
I somewhat doubt that 3-SAT wasn't proven NPC until 1974, because SAT->3-SAT is a pretty easy reduction. (You just introduce a bunch of dummy variables, e.g. (a or b or c or d) becomes ((a or b or z) and (~z or c or d)) and so on; the reduction is linear time in the number of variables in the original expression.)
Between corporate and private puchases, I'd bet 16EB worth of digital storage has been manufactured and sold in the past 24 months.
I quite agree. That's why I said that 16 EB probably within a first approximation (i.e. a factor of 10) of the total number of bits. Then I said it was probably too small by a factor of maybe 3 or so.
I would not be surprised if the total amount of digital storage manufactured in the last 2 years was a third of the total manufactured ever. In fact, if you go by the estimate that the amount of bits manufactured per year increases at a rate of 60% a year, then the amount made in the last 2 years is well over half of the total manufactured ever!
Re:Two transition periods?
on
If I Had a Hammer
·
· Score: 5, Informative
64 bits should be enough for anyone.
No really, I mean it.
Clever, Ed. For those who don't get it, he's quite right: 64 bits *will* be enough for anyone.
For those still stuck in mid-90's video game wars, "bit-edness" in the real world refers (technically) to the size of your general purpose integer registers, which, for most intents and purposes, refers to how many memory addresses you can easily and quickly address. 32 bit addressing tops out at 4GB, a value which is often too small for e.g. large databases, which thus tend to live on 64-bit big iron machines. (MS has a hack to give x86 processes access to 36 bits of space, but it requires OS intervention.)
64 bits, on the other hand, works out to 16 billion GB. (That's 16 exobytes IIRC.) For reference, that's roughly 40 times as much memory capacity as there currently is DRAM produced (of all types, for all markets) worldwide in a year, at this January's rate.
I don't have the figures on hand for hard drive production, but I would guess as a first approximation that 16 billion GB is not quite equal to the total number of bits of digital storage of all kinds manufactured throughout computing history up until today. (I'd guess it's too small by a factor of 3 or so.)
In other words, it's quite a lot. Presumably computing will have run into some very different paradigm (wherein the bit-edness of the "CPU" is no longer an applicable term) before any computer has a use for >64 bit addressing.
(FWIW, today's 64-bit processors don't offer all 64 bits of data addressing yet, because no one has a need for more than 40-something, so that's what they offer.)
Note, first of all, that it is in fact quite true that Intel is (planning on) dropping all new RDRAM-based designs from their x86 chipset lineup soon. The last "new" RDRAM chipset Intel releases will be the upcoming 850E, which is just the 850 but speed bumped to work with the upcoming 533MHz FSB P4s (due in April), and to match them with PC1066 RDRAM. Of course, this is very old news, known from Intel roapmaps obtained by everyone and their mom back in October or November.
But that's not what the article says. It's talking only about chipsets for servers and workstations, where, indeed, the 860 is being replaced by the just introducted dual-channel DDR E-7500 (Plumas) and the upcoming dual-channel DDR Placer (as well as a just-introduced chipset from Broadcom), and where the 850 will be replaced by the dual-channel DDR Granite Bay chipset, due in Q3 or so.
Thing is, dual-channel DDR for the *desktop* won't arrive from Intel until sometime in 2003, with the Springdale chipset. (Dual-channel DDRII, in fact.) VIA and SiS are both trying to get their dual-channel DDR chipsets out in time for the 533 FSB P4s (doubtful, but they should be in full swing by Q3), but, again, if you want the very highest-performing P4 desktop, and you want an Intel chipset, you'll either need to ridiculously overpay for a Granite Bay (workstation oriented) motherboard, or you'll have to use the 850E with PC1066 RDRAM, or you'll have to wait until Springdale in 2003.
So, to reiterate:
1) Yes, RDRAM is gone from all future Intel chipset introductions save the 850E, which is just a speed bump, not a new chipset.
2) But that's not what this article is talking about; it's only talking about servers and workstations.
3) RDRAM won't be completely gone until there is a dual-channel DDR chipset to replace it on the desktop; soon from VIA and SiS, not until 2003 from Intel.
When you design a processor, you look at what kind of job it'll be doing and optimize it for the most frequently used instructions. Double precision float are very far from being a priority. What kind of data range from 10E-1024 to 10E1023 ? The forces applied on the different areas of a space shuttle entering the atmosphere ? A simulation of a nuclear explosion ? The 10E-128 to 10E127 of the single precision is more than enough for most of the situations. The G4 floating point units support double precision but not altivec.
Yes, most consumer applications use single-precision floats. However, most HPC code uses doubles. The original poster was positing that the G4 would be a good replacement to all the 64-bit CPUs that are getting pushed out by Itanium because of its fp number crunching abilities (i.e. for HPC workloads). This is wrong in almost every way possible: 1) the G4 doesn't have the SISD execution resources necessary, because the G4's fp units are underpowered 2) the G4 doesn't have the SIMD execution resources necessary (even if all HPC code would magically be vectorized), because Altivec doesn't do doubles 3) the G4 (as it appears in Macs) doesn't have the DRAM bandwidth necessary
Indeed, the current top of the line, a dual 1GHz PowerMac G4, would be about the worst possible choice for replacing big iron HPC machines, even if a strong FORTRAN compiler existed.
My points were all in reference to this proposed use for the G4. But yes, you're quite right that the G4 isn't nearly as inferior when it comes to desktop workloads.
Of course, your assertion that "when you design a processor, you look at what kind of job it'll be doing and optimize it for the most frequently used instructions" is both extremely true and extremely ironic, because the G4 is much more widely used as a signal processor in various embedded systems than as a general-performance desktop CPU. This is why it has comparatively meager OoO abilities and why it allowed itself to be saddled with an overpowered vector unit which gates clock rampability, while leaving its SISD execution units relatively underpowered. OTOH, I'm fairly sure the G4 has been equipped with DDR in its embedded incarnations; certainly that particular fault can't be blamed on the chip's intended design.
On any system, the memory is a bottle-neck but the problem is DRAM chips, not bus.
Huh?? Um...try sticking some faster DRAM in a PowerMac (e.g. PC2100, PC2700, RDRAM, etc) and tell me how that helps!
As long as the bus support what the dram chips can spit.
Which the current G4 bus cannot. The difference, so far as I can tell, is semantic. (It's also wrong; DRAM chips can be made to run at fantastic speeds for reasonable prices: witness current high-end 3d cards, with DRAM bandwidth of 7, 8, and in the case of the newest G4s, >10 GB/s! The problem is coming up with a bus which can handle all that throughput in the much noisier and more complex environment of a motherboard with socketed DRAM, as opposed to a small chip with soldered DRAM.)
3D games send lists of polygons to the memory card. Even if we imagine a real crappy game using absolutely no acceleration, 30fps*1024*768*3 = 67 MB/S even if we double it for memory reads and add a lot of disk and network use, we're still far from 1GB/S.
This is quite incorrect. First of all, what you're imagining is not a "crappy game", and not even a non-interactive rendering, but a non-interactive video playing in 24-bit color. If we want to turn the calculation into how much polygon data gets sent to the video card, we'll replace the number of pixels (i.e. 1024*768) by the number of polygons in the scene (say, 100,000), and the number of bytes per pixel (3) by the number of bytes per polygon (3*32-bit spatial coordinates for each of 3 vertices plus for the normal = 48 bytes; note that this doesn't include other information which needs to be sent along with the polygon, e.g. pointers to its textures, etc.) So instead we've got 30fps*100,000*48bytes = 144MB/s. Of course, 30fps looks pretty shitty, so for decent immersion we're actually looking at 60fps, or 288MB/s. And, again, this leaves out all the other polygonal data besides the vertices and normal.
And, of course, we're just talking about data that comes from the CPU and is sent over the AGP bus to the graphics card; in a real game situation the DRAM most certainly does not store the predetermined locations of every polygon in the game world (how could it know?)! Um, so this has very little in fact to do with what we're talking about.
I can't provide solid approximations of how much DRAM traffic a real 3d engine actually provides, except by pointing out that in non-graphics-card-limited situations, a high-end K7 can easily gain 20-30% in fps by replacing a good PC133 chipset with a good PC2100 chipset. For example, see the last benchmark on this page. The KT266A beats the KT133A by 27%, and the nforce increases that to 32%, albeit with an ill-utilized-but-still-there dual-channel configuration. This is direct real-world proof that moving from PC133 to PC2100 can gain you ~30% in very common consumer desktop applications. (In fairness, the Serious Sam engine is known to rely particularly heavily on DRAM throughput where, for example, the Q3 engine seems to care more about throughput and latency on the cache level. Still, 20-30% is a pretty fair estimate overall.)
A large register set allow you to work with your data without having to read and write to memory all the time which is the biggest time waster on any modern system.
Having 32 GPRs (PPC) instead of 8 (x86) means you need to use an astonishing 24x4=96 bytes of cache to make up the difference. Most modern processors have L1 caches of slightly more than 96 bytes in size. Having more GPRs is good, but it has nothing to do with saving memory traffic on any higher levels of the memory hierarchy than L1-to-register and back.
Like you said, most of the accesses on a intel will be handled by L1 but you still have to calculate the physical address from the logical address in the instruction which means page table lookups ( ie other memory accesses ) and the arithmetic operations associated.
Wrong. Try looking up how a cache works, how memory pages work, how virtual memory works, etc. (Short version: believe it or not, but they've solved that problem. The translation from logical->physical addresses only comes into play when you have a page fault. OTOH, there is still work associated with calculating the physical address, but this is why all modern CPUs have seperate execution units to allow for the computing of memory addresses without clogging up the ALUs.)
OTOH, the effect I pointed out--that RISC code is more bloated than CISC code--*does* have an impact on the level of DRAM-to-CPU throughput, albeit, like I said, usually only in particularly ugly integer code.
The alrogithms for most heavy fp number-crunching, in contrast, are usually pretty good, and the code pretty tight. But the datasets are often too large to fit in even a 2 MB cache, and the nature of the calculation is often such that it is gated by memory throughput. There's just no way to run these sorts of calculations on a computer with a 1 GB/s bus and have it be anything but slow. And this is without even considering the folly of having a 1 GB/s bus that's supposed to keep 2 CPUs fed with data from DRAM *and* carry all the messages passed from one CPU to the other. (e.g. anytime CPU 1 wants data that's in that nice big 2 MB L3 cache of CPU 2...)
Altivec is a vector unit that operate on integer and IEEE-754 floating-point. Vectors are 128 bits arrays of 16 * 8bit integer or 8*16 bit integer or 4 * 32 bits integer or 4 * 32 bits IEE-754 floating point.
Thanks; did not know that. But I was thinking of IEEE double-precision floats, because the original topic was whether the G4 could be a replacement for the big iron chips discussed in this article. Those are not included in Altivec, as I suspected. (They are in SSE2, though.) But I should have been more clear.
And for the bus part, DDR is only a small improvement on the pc133 bus. If you think it's twice or more the speed of pc133, you're completely wrong. And what kind of code do really saturate the memory bus ? G4 like intel or AMD are aimed at desktop computing uses.
For many pieces of useful desktop code--some 3d game engines, streaming media encoding/decoding, etc.--the performance increase could easily be 30% or more on a single processor machine. (As for a dual-G4, the performance increase depends too much on how scalable the program is, etc. But using two processors sharing a single 1 GB/s bus for floating point work is ridiculous, as I said.) For many fp-intensive HPC-style programs, the performance increase could be 80% or higher.
And yes, I know that no one would use a G4 to run that sort of stuff in the first place (although P4s would be a great choice if the application could be parallelized to a cluster of PCs); but the thread was discussing using the G4 as a replacement for big iron machines, so yell at previous posters for getting on the topic first, not me.
Most memory accesses are filled by the caches
Yes but most of the *time* spent waiting for memory accesses is spent waiting for memory access from DRAM.
RISC CPU ( g4,mips,sparc,... ) work mostly on register data reducing the memory access speed dependency.
A larger register set means less spilling to L1 cache; it has no effect on memory usage from L2 on out. If your application has a 10 MB dataset, then it has a 10 MB dataset, period.
Incidentally, RISC code is bloatier than CISC code because it uses more instructions of fixed instruction length to do the same amount of work than fewer, often shorter CISC instructions can do. This of course makes the CPU core simpler to design and therefore faster with a given set of design and implementation quality. However it does lead to significantly larger binaries, and thus a larger bandwidth burden on all levels of the memory hierarchy. This very often leads to RISC processors placing a higher strain on DRAM bandwidth. However, these issues generally only show up in integer code, not in fp code which was the original subject. Still, it is generally the case that a RISC processor needs more DRAM bandwidth than a CISC processor to achieve the same level of memory performance. (Although, as you point out, L1 data latency is more critical for an ISA with a small register set like x86 than a large one like all standard RISC ISAs.)
Just to be clear: I'm not arguing that RISC is not clearly superior to CISC as an abstract design philosophy, because it absolutely is. Just that RISC does have some negatives, including code density. And that even saddled with their inferior ISAs as they are, the P4 and K7 are among the fastest performing single CPUs in the world, and blow everything else away in terms of price/performance. And that the G4's fp performance sucks, which it does.
I know that. My post referred to the US because all the legal issues surrounding Napster and its shutdown played out in the American courts and in the American Congress. (Of course the same laws are now on their way to the rest of the world courtesy of the WTO.) My point was that the current American interpretation of American copyright law is clearly at odds with both the economic impact on the record labels and the clear will of the American public. The point doesn't apply so much to the rest of the world because their laws aren't as backwards (yet) and because their laws were not relevant to the case at hand.
And I used the # of teenagers in America as a brief shorthand proof that most Napster users were not teenagers. A more explicit version would have gone like this: "Americans made up very roughly half of all Napster users, meaning there were ~2.5x as many American Napster users as there are American teenagers. Furthermore, we shouldn't expect the ratio of teenager to adult Napster users in the US was much different than anywhere else. Etc."
So, no slight or offense meant to the rest of the world (in fact, I'm moving to the r.o.w. after I graduate this year, and looking very much forward to it). Just that talking about Americans focused a couple of my points a bit better.
Most prior Napster users were teenagers who simply didn't want to shell out a few bucks to get the latest Britney album. Most aren't adults, and most certainly DO NOT share your view of expanding one's musical horizon.
There were upwards of 60 million Napster users when it got shut down. "Most" of them were not anything. ("Most" of them were certainly not teenagers; there are only ~12 million teenagers in the US total!) And, as there were something like 40 million unique mp3s floating around Napster, I guess "most" of them actually were listening to more than just the latest Britney album. (Otherwise that's a pretty long album...)
If you want a "most", here it is: most Americans with Internet access in early 2001 used Napster. The overwhelming majority of those who could feasibly use it (i.e. those with broadband connections) used Napster. And while I can't speak for all or even "most" of them, I know I have never, not once, felt guilty for downloading music from the Internet, nor has anyone I've spoken to about the subject.
And, like the original poster, I most certainly increased my CD buying as a direct result of Napster. I can't say whether such behavior reflected the majority or minority of Napster users, but considering the almost precise correspondence between growing (then suddenly falling in spring 2001) online music trading and growing (then suddenly falling in spring 2001) record sales, the statistics strongly support the former.
Laws are supposed to arise from the consent of the governed. When most of the governed are engaging in an activity with a clear conscience, it probably shouldn't be illegal, unless it carries some hidden negative consequences unseen by the uneducated majority. In the case of Napster, though, there were two hugely positive consequences: free access to the largest cultural repository the world had ever seen, and increasing CD sales to boot.
The argument that we should suddenly rewrite and reinterpret the past 200 years of copyright law (in which noncommercial infringement was generally held to be inactionable) just to kowtow to what the misguided oligopoly trying to retain their control over mass expression and culture mistakenly feels is their own self-interest is utterly absurd. The fact that you feel "guilty" about it (and project that guilt onto 60 million others) is just pathetic.
Completely wrong. The G4 has some of the slowest IEEE floating point this side of a StrongARM.
Presumably you're confusing AltiVec with "floating point". AltiVec is a vector unit, not a floating point unit.
True, many AltiVec operations operate on vectors of floating point values (don't think they're IEEE, though), but that is most certainly not the same thing as normal floating point performance. Only a small subset of all floating point calculations can be effectively vectorized, and doing so requires extensive reprogramming (not just recompilation).
In any case most real-world floating-point applications have heavy bandwidth requirements and large datasets, which brings us to another glaring weakness of current Macs, namely their paltry PC-133 memory bus. Sticking two 1 GHz G4s on the same shared 64-bit PC-133 bus is almost comedy. (Tragicomedy if you want to run serious HPC workloads.) Sure the L3 helps if your dataset is 2MB and well-behaved, but if you think PC-133 is still ok for a modern desktop PC, perhaps you ought to think about *why* Apple and Moto had to start adding L3 cache and a backside bus to each processor when every other desktop CPU gets by with much cheaper on-chip L2 and a modern memory bus.
(Yes, I fully expect that DDR G4s will be available real soon now, but until then the situation on the high-end of Apple's desktop is just embarrassing. As for floating point performance, ditch the 64-bit PC-133 shared bus for a 128-bit PC2100 memory system and P2P interconnect and then that dual-GHz G4 might start looking credible...if only the G4 had decent fp number-crunching power.)
Can any media truly be 'copy protected'? If all else fails I can use a program like Ghost2002 or other forensic-certified disk duplication software to do a bit by bit copy. Basically make an exact duplicate of a disc.
How would this be unplayable?
CDS works by purposely introducing errors into the audio data on the disc. Audio CD players are supposed to interpolate across the errors such that there is supposed to be no difference in sound quality. But CD-ROMs--being designed to read data CDs where every bit has to be correct--don't do this interpolation, and thus they see the disc as having lots of errors and crap out. You can't make an exact copy of the disc if your CD-R can't read it.
At least that's what's supposed to happen. It has since come out that 1) many DVD-ROMs read the discs just fine; and 2) *certain* combinations of CD-Rs and ripping software can manage alright.
You should have said "Approximations of this is the sort of thing..."
Yeah, you're right. When I wrote that I was thinking of one of Carmack's.plans when he mentioned a paper proving that, in the limit, successive passes through shaders can approximate the output of true ray-tracing arbitrarily closely. So, in the limit, shaders can get you exactly the same output. But of course this is a very theoretical result with no direct impact on the real world, and it gets there (I think) only by successive approximation anyways.
My larger point was that as graphics get better and better, pixel shaders will be used more and more to perform functions which, first, require input from other parts of the image and, second, induce alpha channel effects--both of which fit poorly in the tile-based rendering paradigm--and this is true whether the functions are equivalent to ray-tracing or just approximations of it. But you are quite correct to differentiate between the two.
More shaders, More pixel pipelines, More memory bandwidth... whoopee...
When the hell are they going to ditch the antiquated scanline rendering method and go work on some tile based rendering methods?
Probably never, and for very good reason. Tile-based rendering is a very efficient architecture whose time has already come and gone.
For those who don't know, tile-based rendering divides an image up into a number of smaller squares ("tiles") and renders them independently, as opposed to the traditional method ("immediate-mode rendering") of rendering an image one polygon at a time. The major benefits claimed for tile-based renderers are that the process is more parallelizable (no risk of two chips rendering to the same area if they are working on different tiles) and that it is an easy modification to check each polygon's z-buffer (its distance from the camera) as you add it to the poly-list for its tile, and then to only texturize those polygons which are not occluded (i.e. actually visible). This is in contrast to the traditional immediate-mode rendering algorithm, where polygons are textured more or less in random order, leading to situations where a polygon will go through the entire process of being textured and rendered, only to later be completely covered up by a later poly--a situation which wastes a lot of (especially) memory bandwidth, fetching all those useless textures and such.
Cool! Sounds great! Let's hear it for tile-based rendering! Too bad ATI and NVIDIA have clearly never ever heard of this miracle technique! After all, it's not like they would ever make (gasp!) an informed choice not to use it!
Well...not so fast. Basically what we've seen is that tile-based rendering offers two potential benefits: it eliminates *some* of the complexity of enabling multi-GPU implmentations, and it uses quite a bit less memory bandwidth in the base case. The problem is that both of these supposed benefits really buy you very little when designing a consumer-level graphics card today.
First, the problem of "dividing up the work" isn't really what's preventing multi-chip graphics cards these days. Indeed, it's really a rather easy problem. Here's a clue: have alternate chips render alternate frames. Gee...that wasn't so tough, now was it? Well, no. But the other problems of implementing a multi-chip card for the consumer market sure are. For example, we have our choice of implementing an (expensive, performance gating) point-to-point bus to handle memory traffic (and have memory bandwidth/chip cut in half anyways), or of completely mirroring the memory, using twice as much for the same capacity (expensive). Then there's the cost of a second chip (expensive), the cost of packaging the second chip and connecting it to memory (expensive), and the cost of the extra power and cooling, the cost of trying to squeeze it all onto one card (results in a bigger, more expensive card; may gate clockability). And this is without mentioning the extra development and debugging time that goes into getting a multi-chip solution to work correctly. (In general this is one of the most difficult issues design engineers face.) Golly, it's almost enough to make you remember how when 3dfx tried to make a multi-chip product it was 6 months late, the single-chip card was far too slow, the double-chip (and cancelled quad-chip) card too expensive, and, due to the release delay, no longer competitive. (OTOH John C has hinted that a scalable multi-chip architecture might be on the way from one of the major players. Tie that in with the fact that Anand reports the GF4 will be the last to use the GF name, and that NVIDIA owns the remnants of 3dfx, and I start scratching my head...)
Second, the problem of memory bandwidth. Or rather, the former problem of memory bandwidth. Yes, the traditional rendering pipeline is very inefficient with memory bandwidth. Thing is, the prices on high-speed DDR have been coming down so fast that it hardly matters. You can find a Radeon 7500 with 64MB of 128-bit-wide DDR running at 2x230 MHz (i.e. 7.4GB/s bandwidth) for as low as $85 on pricewatch.com. (Actually there's one for $79 but it may be mislabeled.) The memory is probably less than $30 of the cost. Or maybe even less--the 64MB and 32MB GF2Pros (6.4GB/s bandwidth) only differ by $6. And the new GF4 MX460 hits the street with 64MB of 2x275 MHz DDR (8.8GB/s) for $179, list, on a brand new card.
As for the price premium of using relatively high-speed DDR instead of the same amount of SDRAM, it's pretty neglibible. Even for the highest speed DDR it's not such a big deal. Sure NVIDIA charges an extra $100 for another 25MHz on the GPU and an extra 1.6GB/s from the memroy (GF4 Ti4600 vs. Ti4400), but that doesn't mean it costs them anywhere near that much. (depending on GPU yields) It just means they like to bilk those in the $400-for-a-video-card crowd for the full $400. So how much does the stuff cost? Well...Hynix recently announced samples and volume production of 2x375 MHz x32 DDR selling at $10 for 128Mbit chips. That means $40 for 64MB of 128-bit-wide DDR with 12GB/s bandwidth. Not too shabby.
Ok, ok...so maybe the benefits of tile-based rendering don't really mean all that much in today's consumer GPU market. But better is better: why wouldn't ATI and NVIDIA use tile-based architectures for the benfits it does provide. After all, it's not like there might be some (gasp!) downsides to tile-based rendering!
Well, actually, there are. For one thing, it's more difficult to design a tile-based GPU and get it running at high speeds. For another both NVIDIA and ATI have years and years of research and experience with implementation techniques and algorithms for immediate-mode renderers, much of which wouldn't apply to tile-based designs.
For another, neither ATI nor NVIDIA really uses traditional immediate-mode rendering anymore. Instead they use modified immediate-mode rendering, with lots of algorithmic tricks and tweaks to lessen the memory bandwidth inefficiencies of traditional immediate-mode rendering. Things like lossless z-buffer compression and various early polygon-culling algorithms. No they aren't quite as effective in reducing overdraw as tile-based rendering, but they provide quite a significant benefit. Indeed, the GF4 Ti4600 has more or less caught up with the (tile-based) KyroII in Kyro's own villagemark benchmark, which is contrived entirely to test massive overdraw of the sort which is never encountered in a game. The KyroII is only 8 months old. Sure it's much much cheaper than a Ti4600, but if Kyro can barely keep the lead in the one benchmark specially designed to make the case for tile-based rendering then something is wrong here.
Meanwhile there are very serious issues with the ability of tile-based rendering to scale to meet future challenges. In particular, the tile-based rendering algorithm works very naturally so long as there are no polygons which find themselves spread into more than one tile, and so long as you don't use transparent or translucent textures. Of course it's not that tile-based chips can't handle these situations--the KyroII is here and works just fine, after all--but just that they require complicated workarounds which are more inefficient than for immediate-mode rendering, which handles these cases naturally.
The problem is that both cases are going to be more and more likely as graphics continue to improve. As tile-based rendering tries to scale with increasing scene polygon counts and resolutions, you get more tiles per scene and many more polygons crossing tile boundries. And as graphical effects get more realistic, the alpha channel (i.e. transparency) starts coming into play more and more. Indeed much of the recent research in non-real-time computer graphics has focused on adding translucent "subsurface" reflections to the ray-tracing algorithm. This (and approximations of it) is the sort of thing that future pixel shaders are going to be called on to do, and tile-based rendering is a bad match for it.
Indeed, most of the recent advances in graphics are pointing towards a world in which the assumptions which tile-based rendering is based on no longer hold. How, for example, does tile-based rendering handle cubic environment mapping across tile boundries, or cast dynamic shadows across tile boundries? What happens if a dot3 bump map extends a texture from one tile into another? I'm sure clever solutions can be found to these and all the other dozens and dozens of issues that will arise when you try to mix DX8-style effects and tile boundries, but the main point is that tile-based rendering was an algorithm developed under two assumptions which increasingly do not hold:
1) If one polygon occludes another, the other's texture will never be visible to the camera;
2) Objects in one section in the screen can be rendered without reference to any other parts of the screen.
Of course, we may never know the difficulties of trying to make a DX8-compliant tile-based renderer; after all, the KyroII hasn't even made it to DX7, since it is still missing integrated T&L. I have no idea whether this is because of any difficulties integrating T&L with a tile-based rendering pipeline (can't think of why it would be a problem, but it may be), or just because the Kyro doesn't have the money or manpower behind it to keep up with 3 year old technology, but this lack is already preventing the KyroII from competing effectively with the cheaper GF2MX on modern high-poly games. I am pretty sure that integrating a programmable pixel shader into a tile-based architecture would be pretty tough, if not pretty impossible.
Which brings me to the main point: you started out writing "More shaders, More pixel pipelines, More memory bandwidth... whoopee..." and in a sense, this is the right attitude. To which we should very quickly add "tile-based screen division...deferred rendering algorithm...whoopee..." All these technical details only mean something insofar as they give us the capability for more realistic graphics--this means high FPS, high color depth, higher resolutions, lack of aliasing problems, high-quality mip-mapping/anisotropic filtering, realistic--or even dynamic--lighting and shadows, realistic and/or impressive pixel effects, high polygon counts, useful and realistic vertex effects, etc.--for a reasonable price. It is pretty damn hard to argue that the last few years, under NVIDIA's leadership (and ATI's pursuit) have not resulted in huge improvements on these measures. Again, the new GF4 Ti4600 may be ridiculously expensive and may not change your experience with today's games very much (besides enabling 1600x1200x32 with 4xAA at playable framerates), but when the new Doom game comes out, a card with similar specs and selling for ~$100 will bring you decent performance on an engine which offers a totally new level of graphical realism. Same thing when Unreal Warfare, Unreal 2, Deus Ex 2, and all the other Unreal 2-engine games start coming out. Believe me, a GF4 caliber card will improve the experience of playing those and later games significantly over a GF3 and especially a non-DX8 compliant card like a GF2 (and, sadly, a GF4MX). And, believe me, those games are going to provide significantly more realistic graphical experiences than those of today.
Immediate-mode rendering is doing just fine, and the GF4 marks an evolutionary but very significant improvement to the state-of-the-art. A switch to tile-based would require significant retreading to reach the same level, and might form a poorer basis for future improvements. But, if I'm wrong, then ATI and NVIDIA will make the switch. Believe me, they know all about tile-based rendering, and NVIDIA even owns Gigapixel (via 3dfx) and their tile-based rendering engine. I think they'll stick to modifications of immediate-based rendering, but no matter what they do it will be whatever they think offers the best graphics performance at the lowest cost to them.
And now to correct some minor misconceptions in your post:
Hell, the reason why the Geforce line has to keep doubling its fill rates every generation is because its architechture is so god damn ineffecient. Look at the memory bandwidth requirements for the cards!
The reason the GeForce line increases its texel fill rates continually is because consumers want to run new games which have higher multi-texturing requirements (Carmack has said Doom3 will have something like ~8 textures/pixel), and to run existing games in higher resolutions and at higher FPS.
The memory bandwidth "requirements" for the cards don't matter, only the prices. If a recent card with 7.4GB/s only costs $85 (Radeon 7500) and a brand new card with 8.8GB/s lists for $179, then the costs of increasing memory bandwidth are obviously not so terrible. Today's $400 card is next year's $80 card. Similarly, immediate-mode rendering's inefficiencies need to be measured according to their dollar costs, not their bandwidth costs.
Instead of using the relatively limited bandwidth of AGP for streaming textures from main memory (where it should god damn be) to the texture cache, the card is busy wasting bandwidth on the damn Z-buffer (which would be eliminated if they implemented hidden surface removal like the PowerVR chipsets).
???
First off, textures most certainly should not "god damn be" in main memory! The AGP bus is there to stream vertex data from the CPU (pre- or post-transformation, it's the same amount of data). That's all it's there to do, and good thing, too, because today's high-poly games can already generate enough vertex data to make AGP 2x a bottleneck, and those of a couple years will do the same to AGP 4x. (Which is why AGP 8x is on the horizon.) Increasing the bandwidth of a bus from the northbridge across the motherboard through a slot to an add-on card is a whole lot harder than increasing the bandwidth from soldered DDR to a soldered GPU a few centimeters away. AGP should only carry the data which it absolutely is forced to--namely initial vertex data from the game's engine running on the CPU.
Z-buffer lookups only waste bandwidth between the GPU and the on-card memory. Technically, you don't eliminate z-buffer lookups with a tile-based architecture; you eliminate texture lookups (and texture application) on occluded polygons. However, by dealing with a small tile at a time, you can read all the z-buffer data for the tile in from memory all at once, and store it in an on-chip cache until you're done with that tile. (This is essentially why higher poly-count games mean smaller and smaller tiles.)
And last, they do implement hidden surface removal techniques, like I pointed out before, even though they are less effective than with a tile-based architecture.
According to the Motorola PowerPC roadmap, the G5 will be available in both 32 and 64 bit versions. How much it resembles Power4 isn't clear, but it's supposed to debut at up to 2 GHz. Are you still so confident it won't have world-class performance?
No. According to the Motorola PowerPC "roadmap" (I'm sure they have more informative roadmaps internally and released to their partners, but god is that thing vague!) the G5 will debut at 800MHz and up, and eventually scale over its lifetime to 2GHz or maybe even higher. Am I positive that it won't have world-class performance? No, but I would guess that it won't based on the fact that it's being designed at a 2nd-class design firm which has fallen significantly behind of late, and that it's primarily targeted at embedded systems, not desktop PCs. Of course, the K7 was designed at a previously-thought-to-be-2nd-class design firm which had fallen significantly behind on their previous core, and was arguably rushed out to replace a rapidly obsolete K6. And yet the K7 turned out to be extremely successful, and grabbed the x86 performance crown from just after its introduction til very recently.
But there are important reasons why we shouldn't expect a K7 out of Motorola. Among them, while AMD had amassed a semi-dream team to design the K7, Moto is apparently so hurting for talent that they are soliciting EEs on the basis of comp.arch posts. Plus their semi division has posted very large losses the past several quarters and is speculated to be a candidate for being shut down or spun off. (AMD had losses before the K7, but as CPUs were their main business, they weren't about to drop them.)
Certainly the G5 will be faster than the G4, and Jobs will surely be able to make it seem faster than a room full of P4s. Since Macs have never been about performance, I would bet the G5 will be enough to keep them happy. But world-beating, I doubt it. We'll see...
(BTW: you are right that the G5 apparently has a 64-bit version. There is no good reason for Apple to use it, however; 64 bits is worthless for all the markets Macs sell to. The only reason it's at all worthwhile for Hammer is that Hammer is trying to steal some of Xeon's market in e.g. databases. Of course, Apple isn't past using a useless feature for marketing purposes, so perhaps they'd use a 64-bit version anyways.)
The integrated I/O might or might not be worthwhile, but Apple's current pro machines use L3 cache.
*Some* integrated I/O and *some* L3 cache TLBs might be of use in a desktop chip. But the integrated I/O to network a 2-way system across a motherboard is nothing at all like the integrated I/O to network a 4-way system across an MCM. They'll use completely different protocols. Similarly, the TLBs for maybe 2 or 4 MB of L3 aren't going to share much in design or layout with the TLBs for 128MB L3. Indeed, the whole address space will have to be completely different. And so on.
The new dual-processor 1 GHz G4 is claimed to have 15+ GFlops of computing power, using Altivec I presume.
Snort. This figure is what you get when you multiply the peak execution rates of all the Altivec and floating point units on both chips together and multiply by 1 billion. This assumes all peak-rate operations (so, most likely, 100% fp adds/packed Altivec fp adds...although, come to think of it, they might be counting fp loads as "operations"), no loading of operands, no data hazards, etc. The precise technical term for this is "bullshit." Side note: how do you plan on getting the operands for 15 billion floating point operations every second across a 1 GB/s memory bus??
If the G4 is a supercomputer on a chip, how come there aren't any G4-based machines on the Top 500 list? More to the point, how come any old x86 chip will destroy a G4 on LINPACK or LAPACK? A supercomputer with PC133?? (If you think the derision is too harsh, it's because you don't realize the degree to which "supercomputer" workloads are dominated by memory bandwidth considerations.)
If that single Power4 CPU was really "optimized to work in an 8-way MCM", it truly did a stellar job as a uni-processor.
Again, 128MB L3 didn't hurt.:)
On the compiler front, I did find a seemingly decent FORTRAN compiler for MacOS X, so that issue is addressed at least.;-)
Good to hear. A SPEC license isn't all *that* expensive ($100 for a student IIRC), so hopefully someone will get cracking and produce some real independant benchmarks comparing the G4 to other processors. (Again, *not* holding my breath for Apple to do the same.) Of course a lot of effort needs to go into figuring out the optimal compiler flags, etc. And I somewhat doubt Absoft's compiler can vectorize Altivec out of code without any changes or hints stuck in. (SPEC doesn't allow any changes to the source, and very few compilers can do truly autonomous vectorization.)
But I'd be *very* interested to see those results!
Ok, you have several misconceptions about the relationship of IBM's PowerX line to IBM and Moto's Gx line. Simply put, they have extremely little in common besides the fact that they both use their own (incompatible) supersets of the PowerPC ISA.
That $100,000 cost is fairly meaningless, since there is an extreme markup on server hardware, and the chip isn't in mass production...I'd venture to say that it can be mass-produced cheaper than P4, as I'll bet it has a lower gate count.
Yes and no. Sure the HPC market where the Power4 currently plays has huge markups and very low production volumes...but that also means designs which could not possibly be cost effective in the desktop market. A single Power4 multi-chip module contains 4 2-way CMP dies, 256-bit interconnect between each pair of dies, and, oh yeah, a measely 128MB of eDRAM.
Each one of the 4 dies takes up 400mm^2 on a.18um process. (Compare to 217mm^2 for the P4 on.18um, 145mm^2 on.13um. "Lower gate count" my ass.) The process is copper and SOI, which are quite a bit more expensive and lower-yielding in the case of SOI than the P4's bulk aluminum process on.18um. The ceramic substrate the thing sits in probably costs IBM considerably more than the cost of a new iMac.
G5 will essentially be this architecture.
The G5 is an upcoming 32-bit embedded chip made by Motorola (like the G4 and G4+), and does not resemble the (64-bit) Power4's internal architecture in the slightest. Whether this chip will be the basis of the next generation of Macs is of course not yet known.
The 1 GHz G4+ that powers the current generation of Macs would probably score about the same as the R14k on SPEC, or a bit lower
Please cite some reference to support this (wild in my opinion) claim.
Because Apple does not have the integrity (nor, according to the oft-repeated excuse, the FORTRAN compiler) to submit SPEC runs for a G4-based computer, there are no official SPEC scores for the G4. However, we do have Motorola's *estimated* *SPEC95* scores for the 7450 (a.k.a. G4+) at 733MHz. (Here, second page, on the left.)
They are 32.1/23.9, SPEC95 int/fp. By comparison, a 400MHz R12k (best I could find for SPEC95; it is an old benchmark after all) scores 24.2/43.5 SPEC95 int/fp; 25% worse on int, and 82% better on fp.
That same 400MHz R12k scores 347/343 on SPEC2k int/fp. (Sorry, but no more links; the scores are all available at www.spec.org) Assuming equivalent SPEC95-to-SPEC2k ratios (a faulty assumption, but then again we're using estimated scores in the first place), we get our 733MHz G4+ scoring 460/188(!!) on SPEC2k int/fp.
For a scaling factor we'll use the Coppermine PIII, since it has SPEC2k scores available for both 733MHz and 1GHz configs. 1GHz is 22%/16% faster than 733MHz at SPEC2k int/fp. (If you repeat my calcs, be sure to use the 1 GHz PIII scores using the same compiler version as the 733MHz scores.) So applying that to our "estimated" SPEC2k scores for 733MHz G4+, we get even-more-estimated SPEC2k scores of 563/219 for a 1GHz G4+.
So, a decent spot (32%) better than the 500MHz R14k at int, and a significant bit (53%) worse at fp. Plus the CPU in the new SGI Graphics Fuel can be up to 600MHz and uses DDR and not SDRAM like the one I got the scores from.
So...hope that helped.
Re: the Power4 SPEC scores(Also this was a single-CPU system, so I don't think it was a multi-CPU module.)
SPEC2k is single-threaded. The score was obtained using a 4-way Power4 "Turbo" module with 3 of the cores "turned off". The rather sneaky thing is this gave the remaining core access to all 128MB L3, which means the SPEC score probably overstates single-threaded performance a bit.
What makes you think that Power4 technology won't make it's way into desktop chips? IBM manufactures desktop PowerPC chips as well, and certainly shows no sign of giving up on PowerPC in general. There have recently been rumors of Apple switching from Motorola to IBM for it's chips...we'll see what happens.
Power4 is simply not a desktop chip design. Even using one of the 4 dies in the MCM as the basis for a desktop CPU is a shakey proposition, since they're too big (again, 400mm^2 on.18um), and include a bunch of integrated I/O stuff and the L3 TLBs, all stuff which would be worthless in a desktop machine. The actual datapaths are quite simple, and indeed are optimized to work in an 8-way MCM, not as the sole CPU of a desktop machine.
Of course, it may be quite likely that Apple turns to IBM instead of Motorola for the next generation of Mac CPUs (especially as it looks somewhat likely that Moto will exit the semi business in the coming year). But it will not look anything like a Power4.
Looks like SGI should consider joining Apple in the PowerPC world...that Power4 looks pretty awesome!
That Power4 also costs like $100,000 for each (4-way CMP) processor module alone, so, gee, it'd better be pretty awesome. The 1 GHz G4+ that powers the current generation of Macs would probably score about the same as the R14k on SPEC, or a bit lower...but we don't know because Apple is too cowardly to submit themselves to legitimate benchmarks when they have a bunch of fools running around believing that a G4 is faster than a P4 or Athlon, and Motorola doesn't bother because they know the G4+ is actually designed for the embedded signal processing market, where SPEC scores are not too relevant. Just because the G4 and Power4 are both "in the PowerPC world" doesn't mean they have similar performance characteristics.
In any case, where the R1x000 really shines is in scalability to very high processor count NUMA configurations (not at issue in this case of course). It'd still be a world-class processor line if SGI hadn't given up 5 years ago by essentially stopping R1x000 development and committing to Itanium instead. They've finally realized their mistake and apparently have some extra tweaks on the way (R16k and R18k), but it's probably too little too late.
Were I SGI at the moment, I'd drop IRIX for Linux, port everything that made IRIX special, and run it all on proprietary P4 or Xeon boards with all the special SGI graphics goodies. Although that was the idea behind their NT line and that didn't do so well, did it...
SGI had some amazing tech back in the day, but having more or less rolled over and died the past few years it might be difficult for them to stay ahead of the commidity hardware crowd. (Re: 48-bit color, if johnc has his way--and he usually does--commidity graphics cards will have 48 or 64-bit internal color soon enough.) But they appear to be finally waking up and making a go at it, so best of luck to them.
You have to understand benchmarking people. When they say kernels they mean benchmarking kernels. Small contained programs that extract key loops or algrothmns from larger programs.
Exactly correct. 14 replies and this is the only one that even understands the terminology used. Someone mod parent up.
Some of these people have remarked upon, but others they haven't.
1) Whether they used export-grade or real encryption made absolutely no difference in this case in terms of preventing terrorism, saving lives, etc. All that prevented that plane from blowing up is that this guy had bad luck lighting his detonator cord and somebody noticed him. Even if there were no encryption of any sort in the world it would have made no difference in this case. It was all a matter of dumb luck, bad shoe-bomb design, and an attentive person. The only use the file has now is as evidence, and of course there are valid concerns as to its legitimacy.
Conclusion: perhaps we should be concentrating on keeping bombs off of planes (which we are finally starting to do, albeit in a half-assed ass-covering sort of way) instead of on crypto exports.
2) This file was kept on a communal Al-Qaeda PC. It happened to be encrypted using Windows EFS, but most of the other contents of the machine--many of them just as valuable as inteligence or evidence--were not.
3) Again, this file was encrypted on a desktop machine in Kabul. The only possible way Americans could get a look at it would be on the unlikely chance that we took over the entire country of Afghanistan. Otherwise the CIA/NSA/etc. never gets a look at this file, encrypted or no. Presumably the reason the file was encrypted was to prevent other members of Al-Qaeda who had access to the machine from looking at it, not to foil Americans. For these purposes 40-bit Windows EFS is probably just fine.
4) A correlary: presumably when Al-Qaeda wants to encrypt something that the CIA/NSA/etc. actually might have a chance to intercept, they use real encryption. i.e. they presumably use PGP for their email. (Although reports have them into steganography instead, presumably because with intercepted encrypted email at least you know who sent it, when, and to whom.)
In other words: there's nothing to see here. If this is the best the anti-cryptos can come up with then export-crypto would be quite safe in a reasonable world. (Of course no one said Washington after Sept. 11 was anywhere near reasonable.)
So, this is probably how Intel demo'ed their 3.5GHz P4 last year. Shows how pointless the whole thing is, to be honest.
No: the 3.5GHz P4 Intel demoed at IDF last fall was air-cooled. On the other hand, it was certainly hand-picked from a special run of chips on a boutique process tuned to produce a few very high clocking chips at the expense of overall yield. Which, yes, shows how pointless the whole thing is, to be honest.
On the other hand, the fact that they are showing it off is an indication of where they're going. Intel showed of an (air-cooled) 2 GHz P4 at IDF fall '00, and launched the same part, not coincidentally, exactly at IDF fall '01. They showed a 3.5 GHz P4 at IDF fall '01, which means...?
No, they probably won't get one out quite so early (3.0 is more like it), but it'll be here around the end of the year. Incidentally, the top speed of an air-cooled hand-picked chip on a special process is probably more relevant to future clock scaling than that of a Liquid Nitrogen cooled off-the-shelf part, for the simple reason that the process will be tweaked to be more aggressive as time goes on, but the temperature is never going to magically drop to -196 deg C. (And yes, the difference matters, as lower temperatures attack different limiting factors for clock rates than tweaked processes do.)
IANAL, but all of your methods involve accessing the unencrypted versions of the songs. Therefore, DMCA doesn't apply because you are not defeating the copy prevention on the encrypted MP3 files.
Right, but the whole CDS technology is intended to prevent one from acessing the unencrypted versions of the songs with a computer. The methods I posted circumvent this intent, and thus might fall under the DMCA.
Although I would argue that the methods are so general as to not violate the DMCA, because if they work then the access control was not legally "effective". (CSS was found to be "effective" because it took a new program and some reverse engineering to crack it, but if old programs, drives, or methods work, then I don't think that qualifies.) But IANAL either.
BTW, the encrypted MP3 files are presumably not copy protected at all, but rather can only be decrypted by the signed player included on the disc. (Note to self...but how do they prevent the MP3s *and* the player from being copied...hmm...just because they use Blowfish instead of rolling their own "encryption" ala CSS does not mean they necessarily know what they are doing...although presumably there's more to it than this...)
This seemed like a good idea to me, too, until I started to think about the idea that in the end, they just up the price of CD's, and we end up paying for it.
No, because at the moment this is only Universal Vivendi--only one of the big 5 record labels. Thus all the returns will only hurt Universal. This leaves three possibilities:
1) Universal does not raise prices to cover the cost of returns; Universal loses lots of money
2) Universal does raise prices to cover the cost of returns; now they are charging $2 more than the competition for people to buy defective "CDs"! Universal loses even more sales
3) Universal rasises prices to cover costs and the other labels raise prices to match; the other labels make make larger profits (assuming consumers don't stop buying) while Universal just breaks even; other labels steal away all of Universal's artists.
We still have a choice in this. Universal has specifically said that they will be looking at the return rates to decide whether they move all their music onto this new format. Yes, the music industry has been too dumb to realize that the reason music sales are down is because they shut off Napster. But they are not too dumb to realize that when people return their new format as defective that it isn't smart to move their entire line over to that format.
Just to clear up a bit of mis-information, SACDs are not backwards compatible with the CD standard by default. The physical media used for SACDs is high density like a DVD and the audio bitstream is not LPCM, but the specification allows for a hybrid disc with two layers where one of the layers is compliant with the traditional CD spec and made such that it will play in most CD players. Note that this is an optional portion of the specification.
Thanks for the correction; I'd assumed the hybrid disc was standard.
Is Philips still planning on not letting Universal us the standard audio CD logos on their CDs because of the Red Book compliance issues? To me that's a very strong statement.
Do we really need to wait for Philips to decide this issue for us?
The thing is, the circular platters they are selling are NOT CDs. They are a new format, designed to be partially backwards compatible with certain CD players and not compatible with certain other CD players.
Just because they store information on a thin 5.25" circular platter does not make them CDs. VideoCDs, SuperAudioCDs and DVDs also store information on 5.25" circular platters, but they are not CDs. Only Philips can sue Universal for trademark infringment on the term "CD", but we can all sue them for misleading labeling.
Or, more properly, we should pressure the retailers. After all, Universal is doing something by putting a warning label on these platters; it's the retailers who are inviting confusion by (presumably) marketing and displaying these platters in the same way that they do actual CDs.
We should be pressing the record stores to create new categories if they want to sell these platters, e.g. a "Not-A-CD" section for all Universal disks, just as they have seperate sections for DVDs and, if they sell them, SACDs or VCDs. (Or perhaps "IncompatibleCD"; "ICD" for short.) Hell, they have seperate sections for SACDs, and those *are* completely backwards-compatible with the CD standard!
If you invent a new and incompatible standard, you don't get to market it by inviting confusion with the dominant standard. That is illegal, even if the trademark holders of the dominant standard don't bother suing you for it.
The thing is, if you read the EULA carefully it's clear that it only applies to the software portion of this so-called "CD":
"When you use the compact disc in a CD ROM drive, the technology launches an audio player (the "Player"), and plays compressed audio files (the "Content")."
In other words, "the Content" means the encrypted MP3 files on the platter, not the fux0red uncompressed audio with the messed up error correction that plays when you stick it in a normal CD player.
Of course you are presumably bound from trying to mess with the latter due to the anti-circumvention clause of the DMCA. Although, for that to kick in, the access-protection mechanism needs to actually be "effective" in the eyes of the law; a valid case can be made that this mechanism is *not* effective, because according to various reports there are the following workarounds:
1) Certain if not all DVD-ROM drives (and perhaps consumer DVD players as well) can access tracks 2 and beyond *automatically*, with no extra user effort or loss in quality.
2) Widespread pre-existing utilities such as exact audio copy are reported to be able to rip the disc (as one single.wav file) just fine, with no extra user effort or loss in quality.
3) Extracting the audio from a consumer CD player with digital-out into a sound card with digital-in should result in a perfect copy, with no extra user effort or loss in quality.
Presumably nobody accessing the audio on the disc using the above three methods could be charged with using a "circumvention device", because they were just using commonly available tools and methods which were in place before this supposed access-control mechanism was even invented. Thus in my NAL opinion, the DMCA would not apply here.
Once the content is accessed, of course you are perfectly within your rights to rip to MP3 or make a backup copy for personal use, or, under the AHRA, to make copies for your friends (as long as they are distributed non-commercially). Whether you are allowed to distribute MP3s online (e.g. through a P2P network) is still an open legal question, but distributing these MP3s is certainly no more or less illegal than distributing any MP3 from a CD you don't have the copyright on.
Since when did consumers lose all of their rights as a result of buying a product?
Since the product was software. The EULA attached to their buggy player and the encrypted MP3s is unfortunate, but as we all know, not terribly unusual for the world of software--where it clearly resides. Luckily none of its provisions--especially those regarding indemnity or reverse engineering--are likely to stand up in court.
Common Lisp contains the most advanced iteration constructs I've ever seen in any lanugage, including C, Perl, Python, and others. It's called extended loop, and it doesn't need lots of parens. It's not used by by Graham or Norvig, since Graham despises loop and OO and Norvig uses applicative style since that fits most AI problems extremely well.
Interesting point. I must confess to being scared off of loop by Graham (and by my professors) as being incredibly intuitive to read but incredibly difficult to pin down exactly what it's going to do. (Graham claims that certain implementation details which determine the order functions are executed in a loop are left unspecified in the ANSI CL specs, and thus differ from implementation to implementaiton.)
I can certainly say that I appreciated the times Norvig used loop in his code, because it sure does cut through the clutter of a complicated do expression.
I just looked it up and 3-SAT was proved to be NPC in 1974 by Cook and Levin.
The first problem proved NPC was general CNF SATISFIABILITY, by Cook in 1971. SATISFIABILITY can be shown to polynomially reduce to (CNF) 3-SAT (proving that 3-SAT is NPC), and 3-SAT reduces to lots of other problems, which themselves reduce to even more problems (this is the typical way to prove a new problem NPC).
I somewhat doubt that 3-SAT wasn't proven NPC until 1974, because SAT->3-SAT is a pretty easy reduction. (You just introduce a bunch of dummy variables, e.g. (a or b or c or d) becomes ((a or b or z) and (~z or c or d)) and so on; the reduction is linear time in the number of variables in the original expression.)
Between corporate and private puchases, I'd bet 16EB worth of digital storage has been manufactured and sold in the past 24 months.
I quite agree. That's why I said that 16 EB probably within a first approximation (i.e. a factor of 10) of the total number of bits. Then I said it was probably too small by a factor of maybe 3 or so.
I would not be surprised if the total amount of digital storage manufactured in the last 2 years was a third of the total manufactured ever. In fact, if you go by the estimate that the amount of bits manufactured per year increases at a rate of 60% a year, then the amount made in the last 2 years is well over half of the total manufactured ever!
64 bits should be enough for anyone.
No really, I mean it.
Clever, Ed. For those who don't get it, he's quite right: 64 bits *will* be enough for anyone.
For those still stuck in mid-90's video game wars, "bit-edness" in the real world refers (technically) to the size of your general purpose integer registers, which, for most intents and purposes, refers to how many memory addresses you can easily and quickly address. 32 bit addressing tops out at 4GB, a value which is often too small for e.g. large databases, which thus tend to live on 64-bit big iron machines. (MS has a hack to give x86 processes access to 36 bits of space, but it requires OS intervention.)
64 bits, on the other hand, works out to 16 billion GB. (That's 16 exobytes IIRC.) For reference, that's roughly 40 times as much memory capacity as there currently is DRAM produced (of all types, for all markets) worldwide in a year, at this January's rate.
I don't have the figures on hand for hard drive production, but I would guess as a first approximation that 16 billion GB is not quite equal to the total number of bits of digital storage of all kinds manufactured throughout computing history up until today. (I'd guess it's too small by a factor of 3 or so.)
In other words, it's quite a lot. Presumably computing will have run into some very different paradigm (wherein the bit-edness of the "CPU" is no longer an applicable term) before any computer has a use for >64 bit addressing.
(FWIW, today's 64-bit processors don't offer all 64 bits of data addressing yet, because no one has a need for more than 40-something, so that's what they offer.)
Note, first of all, that it is in fact quite true that Intel is (planning on) dropping all new RDRAM-based designs from their x86 chipset lineup soon. The last "new" RDRAM chipset Intel releases will be the upcoming 850E, which is just the 850 but speed bumped to work with the upcoming 533MHz FSB P4s (due in April), and to match them with PC1066 RDRAM. Of course, this is very old news, known from Intel roapmaps obtained by everyone and their mom back in October or November.
But that's not what the article says. It's talking only about chipsets for servers and workstations, where, indeed, the 860 is being replaced by the just introducted dual-channel DDR E-7500 (Plumas) and the upcoming dual-channel DDR Placer (as well as a just-introduced chipset from Broadcom), and where the 850 will be replaced by the dual-channel DDR Granite Bay chipset, due in Q3 or so.
Thing is, dual-channel DDR for the *desktop* won't arrive from Intel until sometime in 2003, with the Springdale chipset. (Dual-channel DDRII, in fact.) VIA and SiS are both trying to get their dual-channel DDR chipsets out in time for the 533 FSB P4s (doubtful, but they should be in full swing by Q3), but, again, if you want the very highest-performing P4 desktop, and you want an Intel chipset, you'll either need to ridiculously overpay for a Granite Bay (workstation oriented) motherboard, or you'll have to use the 850E with PC1066 RDRAM, or you'll have to wait until Springdale in 2003.
So, to reiterate:
1) Yes, RDRAM is gone from all future Intel chipset introductions save the 850E, which is just a speed bump, not a new chipset.
2) But that's not what this article is talking about; it's only talking about servers and workstations.
3) RDRAM won't be completely gone until there is a dual-channel DDR chipset to replace it on the desktop; soon from VIA and SiS, not until 2003 from Intel.
When you design a processor, you look at what kind of job it'll be doing and optimize it for the most frequently used instructions. Double precision float are very far from being a priority. What kind of data range from 10E-1024 to 10E1023 ? The forces applied on the different areas of a space shuttle entering the atmosphere ? A simulation of a nuclear explosion ? The 10E-128 to 10E127 of the single precision is more than enough for most of the situations. The G4 floating point units support double precision but not altivec.
Yes, most consumer applications use single-precision floats. However, most HPC code uses doubles. The original poster was positing that the G4 would be a good replacement to all the 64-bit CPUs that are getting pushed out by Itanium because of its fp number crunching abilities (i.e. for HPC workloads). This is wrong in almost every way possible:
1) the G4 doesn't have the SISD execution resources necessary, because the G4's fp units are underpowered
2) the G4 doesn't have the SIMD execution resources necessary (even if all HPC code would magically be vectorized), because Altivec doesn't do doubles
3) the G4 (as it appears in Macs) doesn't have the DRAM bandwidth necessary
Indeed, the current top of the line, a dual 1GHz PowerMac G4, would be about the worst possible choice for replacing big iron HPC machines, even if a strong FORTRAN compiler existed.
My points were all in reference to this proposed use for the G4. But yes, you're quite right that the G4 isn't nearly as inferior when it comes to desktop workloads.
Of course, your assertion that "when you design a processor, you look at what kind of job it'll be doing and optimize it for the most frequently used instructions" is both extremely true and extremely ironic, because the G4 is much more widely used as a signal processor in various embedded systems than as a general-performance desktop CPU. This is why it has comparatively meager OoO abilities and why it allowed itself to be saddled with an overpowered vector unit which gates clock rampability, while leaving its SISD execution units relatively underpowered. OTOH, I'm fairly sure the G4 has been equipped with DDR in its embedded incarnations; certainly that particular fault can't be blamed on the chip's intended design.
On any system, the memory is a bottle-neck but the problem is DRAM chips, not bus.
Huh?? Um...try sticking some faster DRAM in a PowerMac (e.g. PC2100, PC2700, RDRAM, etc) and tell me how that helps!
As long as the bus support what the dram chips can spit.
Which the current G4 bus cannot. The difference, so far as I can tell, is semantic. (It's also wrong; DRAM chips can be made to run at fantastic speeds for reasonable prices: witness current high-end 3d cards, with DRAM bandwidth of 7, 8, and in the case of the newest G4s, >10 GB/s! The problem is coming up with a bus which can handle all that throughput in the much noisier and more complex environment of a motherboard with socketed DRAM, as opposed to a small chip with soldered DRAM.)
3D games send lists of polygons to the memory card. Even if we imagine a real crappy game using absolutely no acceleration, 30fps*1024*768*3 = 67 MB/S even if we double it for memory reads and add a lot of disk and network use, we're still far from 1GB/S.
This is quite incorrect. First of all, what you're imagining is not a "crappy game", and not even a non-interactive rendering, but a non-interactive video playing in 24-bit color. If we want to turn the calculation into how much polygon data gets sent to the video card, we'll replace the number of pixels (i.e. 1024*768) by the number of polygons in the scene (say, 100,000), and the number of bytes per pixel (3) by the number of bytes per polygon (3*32-bit spatial coordinates for each of 3 vertices plus for the normal = 48 bytes; note that this doesn't include other information which needs to be sent along with the polygon, e.g. pointers to its textures, etc.) So instead we've got 30fps*100,000*48bytes = 144MB/s. Of course, 30fps looks pretty shitty, so for decent immersion we're actually looking at 60fps, or 288MB/s. And, again, this leaves out all the other polygonal data besides the vertices and normal.
And, of course, we're just talking about data that comes from the CPU and is sent over the AGP bus to the graphics card; in a real game situation the DRAM most certainly does not store the predetermined locations of every polygon in the game world (how could it know?)! Um, so this has very little in fact to do with what we're talking about.
I can't provide solid approximations of how much DRAM traffic a real 3d engine actually provides, except by pointing out that in non-graphics-card-limited situations, a high-end K7 can easily gain 20-30% in fps by replacing a good PC133 chipset with a good PC2100 chipset. For example, see the last benchmark on this page. The KT266A beats the KT133A by 27%, and the nforce increases that to 32%, albeit with an ill-utilized-but-still-there dual-channel configuration. This is direct real-world proof that moving from PC133 to PC2100 can gain you ~30% in very common consumer desktop applications. (In fairness, the Serious Sam engine is known to rely particularly heavily on DRAM throughput where, for example, the Q3 engine seems to care more about throughput and latency on the cache level. Still, 20-30% is a pretty fair estimate overall.)
A large register set allow you to work with your data without having to read and write to memory all the time which is the biggest time waster on any modern system.
Having 32 GPRs (PPC) instead of 8 (x86) means you need to use an astonishing 24x4=96 bytes of cache to make up the difference. Most modern processors have L1 caches of slightly more than 96 bytes in size. Having more GPRs is good, but it has nothing to do with saving memory traffic on any higher levels of the memory hierarchy than L1-to-register and back.
Like you said, most of the accesses on a intel will be handled by L1 but you still have to calculate the physical address from the logical address in the instruction which means page table lookups ( ie other memory accesses ) and the arithmetic operations associated.
Wrong. Try looking up how a cache works, how memory pages work, how virtual memory works, etc. (Short version: believe it or not, but they've solved that problem. The translation from logical->physical addresses only comes into play when you have a page fault. OTOH, there is still work associated with calculating the physical address, but this is why all modern CPUs have seperate execution units to allow for the computing of memory addresses without clogging up the ALUs.)
OTOH, the effect I pointed out--that RISC code is more bloated than CISC code--*does* have an impact on the level of DRAM-to-CPU throughput, albeit, like I said, usually only in particularly ugly integer code.
The alrogithms for most heavy fp number-crunching, in contrast, are usually pretty good, and the code pretty tight. But the datasets are often too large to fit in even a 2 MB cache, and the nature of the calculation is often such that it is gated by memory throughput. There's just no way to run these sorts of calculations on a computer with a 1 GB/s bus and have it be anything but slow. And this is without even considering the folly of having a 1 GB/s bus that's supposed to keep 2 CPUs fed with data from DRAM *and* carry all the messages passed from one CPU to the other. (e.g. anytime CPU 1 wants data that's in that nice big 2 MB L3 cache of CPU 2...)
Altivec is a vector unit that operate on integer and IEEE-754 floating-point. Vectors are 128 bits arrays of 16 * 8bit integer or 8*16 bit integer or 4 * 32 bits integer or 4 * 32 bits IEE-754 floating point.
Thanks; did not know that. But I was thinking of IEEE double-precision floats, because the original topic was whether the G4 could be a replacement for the big iron chips discussed in this article. Those are not included in Altivec, as I suspected. (They are in SSE2, though.) But I should have been more clear.
And for the bus part, DDR is only a small improvement on the pc133 bus. If you think it's twice or more the speed of pc133, you're completely wrong. And what kind of code do really saturate the memory bus ? G4 like intel or AMD are aimed at desktop computing uses.
For many pieces of useful desktop code--some 3d game engines, streaming media encoding/decoding, etc.--the performance increase could easily be 30% or more on a single processor machine. (As for a dual-G4, the performance increase depends too much on how scalable the program is, etc. But using two processors sharing a single 1 GB/s bus for floating point work is ridiculous, as I said.) For many fp-intensive HPC-style programs, the performance increase could be 80% or higher.
And yes, I know that no one would use a G4 to run that sort of stuff in the first place (although P4s would be a great choice if the application could be parallelized to a cluster of PCs); but the thread was discussing using the G4 as a replacement for big iron machines, so yell at previous posters for getting on the topic first, not me.
Most memory accesses are filled by the caches
Yes but most of the *time* spent waiting for memory accesses is spent waiting for memory access from DRAM.
RISC CPU ( g4,mips,sparc,... ) work mostly on register data reducing the memory access speed dependency.
A larger register set means less spilling to L1 cache; it has no effect on memory usage from L2 on out. If your application has a 10 MB dataset, then it has a 10 MB dataset, period.
Incidentally, RISC code is bloatier than CISC code because it uses more instructions of fixed instruction length to do the same amount of work than fewer, often shorter CISC instructions can do. This of course makes the CPU core simpler to design and therefore faster with a given set of design and implementation quality. However it does lead to significantly larger binaries, and thus a larger bandwidth burden on all levels of the memory hierarchy. This very often leads to RISC processors placing a higher strain on DRAM bandwidth. However, these issues generally only show up in integer code, not in fp code which was the original subject. Still, it is generally the case that a RISC processor needs more DRAM bandwidth than a CISC processor to achieve the same level of memory performance. (Although, as you point out, L1 data latency is more critical for an ISA with a small register set like x86 than a large one like all standard RISC ISAs.)
Just to be clear: I'm not arguing that RISC is not clearly superior to CISC as an abstract design philosophy, because it absolutely is. Just that RISC does have some negatives, including code density. And that even saddled with their inferior ISAs as they are, the P4 and K7 are among the fastest performing single CPUs in the world, and blow everything else away in terms of price/performance. And that the G4's fp performance sucks, which it does.
For christ's sake, the inderned != the usa.
I know that. My post referred to the US because all the legal issues surrounding Napster and its shutdown played out in the American courts and in the American Congress. (Of course the same laws are now on their way to the rest of the world courtesy of the WTO.) My point was that the current American interpretation of American copyright law is clearly at odds with both the economic impact on the record labels and the clear will of the American public. The point doesn't apply so much to the rest of the world because their laws aren't as backwards (yet) and because their laws were not relevant to the case at hand.
And I used the # of teenagers in America as a brief shorthand proof that most Napster users were not teenagers. A more explicit version would have gone like this: "Americans made up very roughly half of all Napster users, meaning there were ~2.5x as many American Napster users as there are American teenagers. Furthermore, we shouldn't expect the ratio of teenager to adult Napster users in the US was much different than anywhere else. Etc."
So, no slight or offense meant to the rest of the world (in fact, I'm moving to the r.o.w. after I graduate this year, and looking very much forward to it). Just that talking about Americans focused a couple of my points a bit better.
Most prior Napster users were teenagers who simply didn't want to shell out a few bucks to get the latest Britney album. Most aren't adults, and most certainly DO NOT share your view of expanding one's musical horizon.
There were upwards of 60 million Napster users when it got shut down. "Most" of them were not anything. ("Most" of them were certainly not teenagers; there are only ~12 million teenagers in the US total!) And, as there were something like 40 million unique mp3s floating around Napster, I guess "most" of them actually were listening to more than just the latest Britney album. (Otherwise that's a pretty long album...)
If you want a "most", here it is: most Americans with Internet access in early 2001 used Napster. The overwhelming majority of those who could feasibly use it (i.e. those with broadband connections) used Napster. And while I can't speak for all or even "most" of them, I know I have never, not once, felt guilty for downloading music from the Internet, nor has anyone I've spoken to about the subject.
And, like the original poster, I most certainly increased my CD buying as a direct result of Napster. I can't say whether such behavior reflected the majority or minority of Napster users, but considering the almost precise correspondence between growing (then suddenly falling in spring 2001) online music trading and growing (then suddenly falling in spring 2001) record sales, the statistics strongly support the former.
Laws are supposed to arise from the consent of the governed. When most of the governed are engaging in an activity with a clear conscience, it probably shouldn't be illegal, unless it carries some hidden negative consequences unseen by the uneducated majority. In the case of Napster, though, there were two hugely positive consequences: free access to the largest cultural repository the world had ever seen, and increasing CD sales to boot.
The argument that we should suddenly rewrite and reinterpret the past 200 years of copyright law (in which noncommercial infringement was generally held to be inactionable) just to kowtow to what the misguided oligopoly trying to retain their control over mass expression and culture mistakenly feels is their own self-interest is utterly absurd. The fact that you feel "guilty" about it (and project that guilt onto 60 million others) is just pathetic.
Also featuring stinking fast floating point.
Completely wrong. The G4 has some of the slowest IEEE floating point this side of a StrongARM.
Presumably you're confusing AltiVec with "floating point". AltiVec is a vector unit, not a floating point unit.
True, many AltiVec operations operate on vectors of floating point values (don't think they're IEEE, though), but that is most certainly not the same thing as normal floating point performance. Only a small subset of all floating point calculations can be effectively vectorized, and doing so requires extensive reprogramming (not just recompilation).
In any case most real-world floating-point applications have heavy bandwidth requirements and large datasets, which brings us to another glaring weakness of current Macs, namely their paltry PC-133 memory bus. Sticking two 1 GHz G4s on the same shared 64-bit PC-133 bus is almost comedy. (Tragicomedy if you want to run serious HPC workloads.) Sure the L3 helps if your dataset is 2MB and well-behaved, but if you think PC-133 is still ok for a modern desktop PC, perhaps you ought to think about *why* Apple and Moto had to start adding L3 cache and a backside bus to each processor when every other desktop CPU gets by with much cheaper on-chip L2 and a modern memory bus.
(Yes, I fully expect that DDR G4s will be available real soon now, but until then the situation on the high-end of Apple's desktop is just embarrassing. As for floating point performance, ditch the 64-bit PC-133 shared bus for a 128-bit PC2100 memory system and P2P interconnect and then that dual-GHz G4 might start looking credible...if only the G4 had decent fp number-crunching power.)
Can any media truly be 'copy protected'? If all else fails I can use a program like Ghost2002 or other forensic-certified disk duplication software to do a bit by bit copy. Basically make an exact duplicate of a disc.
How would this be unplayable?
CDS works by purposely introducing errors into the audio data on the disc. Audio CD players are supposed to interpolate across the errors such that there is supposed to be no difference in sound quality. But CD-ROMs--being designed to read data CDs where every bit has to be correct--don't do this interpolation, and thus they see the disc as having lots of errors and crap out. You can't make an exact copy of the disc if your CD-R can't read it.
At least that's what's supposed to happen. It has since come out that 1) many DVD-ROMs read the discs just fine; and 2) *certain* combinations of CD-Rs and ripping software can manage alright.
You should have said "Approximations of this is the sort of thing..."
.plans when he mentioned a paper proving that, in the limit, successive passes through shaders can approximate the output of true ray-tracing arbitrarily closely. So, in the limit, shaders can get you exactly the same output. But of course this is a very theoretical result with no direct impact on the real world, and it gets there (I think) only by successive approximation anyways.
Yeah, you're right. When I wrote that I was thinking of one of Carmack's
My larger point was that as graphics get better and better, pixel shaders will be used more and more to perform functions which, first, require input from other parts of the image and, second, induce alpha channel effects--both of which fit poorly in the tile-based rendering paradigm--and this is true whether the functions are equivalent to ray-tracing or just approximations of it. But you are quite correct to differentiate between the two.
More shaders, More pixel pipelines, More memory bandwidth... whoopee...
When the hell are they going to ditch the antiquated scanline rendering method and go work on some tile based rendering methods?
Probably never, and for very good reason. Tile-based rendering is a very efficient architecture whose time has already come and gone.
For those who don't know, tile-based rendering divides an image up into a number of smaller squares ("tiles") and renders them independently, as opposed to the traditional method ("immediate-mode rendering") of rendering an image one polygon at a time. The major benefits claimed for tile-based renderers are that the process is more parallelizable (no risk of two chips rendering to the same area if they are working on different tiles) and that it is an easy modification to check each polygon's z-buffer (its distance from the camera) as you add it to the poly-list for its tile, and then to only texturize those polygons which are not occluded (i.e. actually visible). This is in contrast to the traditional immediate-mode rendering algorithm, where polygons are textured more or less in random order, leading to situations where a polygon will go through the entire process of being textured and rendered, only to later be completely covered up by a later poly--a situation which wastes a lot of (especially) memory bandwidth, fetching all those useless textures and such.
Cool! Sounds great! Let's hear it for tile-based rendering! Too bad ATI and NVIDIA have clearly never ever heard of this miracle technique! After all, it's not like they would ever make (gasp!) an informed choice not to use it!
Well...not so fast. Basically what we've seen is that tile-based rendering offers two potential benefits: it eliminates *some* of the complexity of enabling multi-GPU implmentations, and it uses quite a bit less memory bandwidth in the base case. The problem is that both of these supposed benefits really buy you very little when designing a consumer-level graphics card today.
First, the problem of "dividing up the work" isn't really what's preventing multi-chip graphics cards these days. Indeed, it's really a rather easy problem. Here's a clue: have alternate chips render alternate frames. Gee...that wasn't so tough, now was it? Well, no. But the other problems of implementing a multi-chip card for the consumer market sure are. For example, we have our choice of implementing an (expensive, performance gating) point-to-point bus to handle memory traffic (and have memory bandwidth/chip cut in half anyways), or of completely mirroring the memory, using twice as much for the same capacity (expensive). Then there's the cost of a second chip (expensive), the cost of packaging the second chip and connecting it to memory (expensive), and the cost of the extra power and cooling, the cost of trying to squeeze it all onto one card (results in a bigger, more expensive card; may gate clockability). And this is without mentioning the extra development and debugging time that goes into getting a multi-chip solution to work correctly. (In general this is one of the most difficult issues design engineers face.) Golly, it's almost enough to make you remember how when 3dfx tried to make a multi-chip product it was 6 months late, the single-chip card was far too slow, the double-chip (and cancelled quad-chip) card too expensive, and, due to the release delay, no longer competitive. (OTOH John C has hinted that a scalable multi-chip architecture might be on the way from one of the major players. Tie that in with the fact that Anand reports the GF4 will be the last to use the GF name, and that NVIDIA owns the remnants of 3dfx, and I start scratching my head...)
Second, the problem of memory bandwidth. Or rather, the former problem of memory bandwidth. Yes, the traditional rendering pipeline is very inefficient with memory bandwidth. Thing is, the prices on high-speed DDR have been coming down so fast that it hardly matters. You can find a Radeon 7500 with 64MB of 128-bit-wide DDR running at 2x230 MHz (i.e. 7.4GB/s bandwidth) for as low as $85 on pricewatch.com. (Actually there's one for $79 but it may be mislabeled.) The memory is probably less than $30 of the cost. Or maybe even less--the 64MB and 32MB GF2Pros (6.4GB/s bandwidth) only differ by $6. And the new GF4 MX460 hits the street with 64MB of 2x275 MHz DDR (8.8GB/s) for $179, list, on a brand new card.
As for the price premium of using relatively high-speed DDR instead of the same amount of SDRAM, it's pretty neglibible. Even for the highest speed DDR it's not such a big deal. Sure NVIDIA charges an extra $100 for another 25MHz on the GPU and an extra 1.6GB/s from the memroy (GF4 Ti4600 vs. Ti4400), but that doesn't mean it costs them anywhere near that much. (depending on GPU yields) It just means they like to bilk those in the $400-for-a-video-card crowd for the full $400. So how much does the stuff cost? Well...Hynix recently announced samples and volume production of 2x375 MHz x32 DDR selling at $10 for 128Mbit chips. That means $40 for 64MB of 128-bit-wide DDR with 12GB/s bandwidth. Not too shabby.
Ok, ok...so maybe the benefits of tile-based rendering don't really mean all that much in today's consumer GPU market. But better is better: why wouldn't ATI and NVIDIA use tile-based architectures for the benfits it does provide. After all, it's not like there might be some (gasp!) downsides to tile-based rendering!
Well, actually, there are. For one thing, it's more difficult to design a tile-based GPU and get it running at high speeds. For another both NVIDIA and ATI have years and years of research and experience with implementation techniques and algorithms for immediate-mode renderers, much of which wouldn't apply to tile-based designs.
For another, neither ATI nor NVIDIA really uses traditional immediate-mode rendering anymore. Instead they use modified immediate-mode rendering, with lots of algorithmic tricks and tweaks to lessen the memory bandwidth inefficiencies of traditional immediate-mode rendering. Things like lossless z-buffer compression and various early polygon-culling algorithms. No they aren't quite as effective in reducing overdraw as tile-based rendering, but they provide quite a significant benefit. Indeed, the GF4 Ti4600 has more or less caught up with the (tile-based) KyroII in Kyro's own villagemark benchmark, which is contrived entirely to test massive overdraw of the sort which is never encountered in a game. The KyroII is only 8 months old. Sure it's much much cheaper than a Ti4600, but if Kyro can barely keep the lead in the one benchmark specially designed to make the case for tile-based rendering then something is wrong here.
Meanwhile there are very serious issues with the ability of tile-based rendering to scale to meet future challenges. In particular, the tile-based rendering algorithm works very naturally so long as there are no polygons which find themselves spread into more than one tile, and so long as you don't use transparent or translucent textures. Of course it's not that tile-based chips can't handle these situations--the KyroII is here and works just fine, after all--but just that they require complicated workarounds which are more inefficient than for immediate-mode rendering, which handles these cases naturally.
The problem is that both cases are going to be more and more likely as graphics continue to improve. As tile-based rendering tries to scale with increasing scene polygon counts and resolutions, you get more tiles per scene and many more polygons crossing tile boundries. And as graphical effects get more realistic, the alpha channel (i.e. transparency) starts coming into play more and more. Indeed much of the recent research in non-real-time computer graphics has focused on adding translucent "subsurface" reflections to the ray-tracing algorithm. This (and approximations of it) is the sort of thing that future pixel shaders are going to be called on to do, and tile-based rendering is a bad match for it.
Indeed, most of the recent advances in graphics are pointing towards a world in which the assumptions which tile-based rendering is based on no longer hold. How, for example, does tile-based rendering handle cubic environment mapping across tile boundries, or cast dynamic shadows across tile boundries? What happens if a dot3 bump map extends a texture from one tile into another? I'm sure clever solutions can be found to these and all the other dozens and dozens of issues that will arise when you try to mix DX8-style effects and tile boundries, but the main point is that tile-based rendering was an algorithm developed under two assumptions which increasingly do not hold:
1) If one polygon occludes another, the other's texture will never be visible to the camera;
2) Objects in one section in the screen can be rendered without reference to any other parts of the screen.
Of course, we may never know the difficulties of trying to make a DX8-compliant tile-based renderer; after all, the KyroII hasn't even made it to DX7, since it is still missing integrated T&L. I have no idea whether this is because of any difficulties integrating T&L with a tile-based rendering pipeline (can't think of why it would be a problem, but it may be), or just because the Kyro doesn't have the money or manpower behind it to keep up with 3 year old technology, but this lack is already preventing the KyroII from competing effectively with the cheaper GF2MX on modern high-poly games. I am pretty sure that integrating a programmable pixel shader into a tile-based architecture would be pretty tough, if not pretty impossible.
Which brings me to the main point: you started out writing "More shaders, More pixel pipelines, More memory bandwidth... whoopee..." and in a sense, this is the right attitude. To which we should very quickly add "tile-based screen division...deferred rendering algorithm...whoopee..." All these technical details only mean something insofar as they give us the capability for more realistic graphics--this means high FPS, high color depth, higher resolutions, lack of aliasing problems, high-quality mip-mapping/anisotropic filtering, realistic--or even dynamic--lighting and shadows, realistic and/or impressive pixel effects, high polygon counts, useful and realistic vertex effects, etc.--for a reasonable price. It is pretty damn hard to argue that the last few years, under NVIDIA's leadership (and ATI's pursuit) have not resulted in huge improvements on these measures. Again, the new GF4 Ti4600 may be ridiculously expensive and may not change your experience with today's games very much (besides enabling 1600x1200x32 with 4xAA at playable framerates), but when the new Doom game comes out, a card with similar specs and selling for ~$100 will bring you decent performance on an engine which offers a totally new level of graphical realism. Same thing when Unreal Warfare, Unreal 2, Deus Ex 2, and all the other Unreal 2-engine games start coming out. Believe me, a GF4 caliber card will improve the experience of playing those and later games significantly over a GF3 and especially a non-DX8 compliant card like a GF2 (and, sadly, a GF4MX). And, believe me, those games are going to provide significantly more realistic graphical experiences than those of today.
Immediate-mode rendering is doing just fine, and the GF4 marks an evolutionary but very significant improvement to the state-of-the-art. A switch to tile-based would require significant retreading to reach the same level, and might form a poorer basis for future improvements. But, if I'm wrong, then ATI and NVIDIA will make the switch. Believe me, they know all about tile-based rendering, and NVIDIA even owns Gigapixel (via 3dfx) and their tile-based rendering engine. I think they'll stick to modifications of immediate-based rendering, but no matter what they do it will be whatever they think offers the best graphics performance at the lowest cost to them.
And now to correct some minor misconceptions in your post:
Hell, the reason why the Geforce line has to keep doubling its fill rates every generation is because its architechture is so god damn ineffecient. Look at the memory bandwidth requirements for the cards!
The reason the GeForce line increases its texel fill rates continually is because consumers want to run new games which have higher multi-texturing requirements (Carmack has said Doom3 will have something like ~8 textures/pixel), and to run existing games in higher resolutions and at higher FPS.
The memory bandwidth "requirements" for the cards don't matter, only the prices. If a recent card with 7.4GB/s only costs $85 (Radeon 7500) and a brand new card with 8.8GB/s lists for $179, then the costs of increasing memory bandwidth are obviously not so terrible. Today's $400 card is next year's $80 card. Similarly, immediate-mode rendering's inefficiencies need to be measured according to their dollar costs, not their bandwidth costs.
Instead of using the relatively limited bandwidth of AGP for streaming textures from main memory (where it should god damn be) to the texture cache, the card is busy wasting bandwidth on the damn Z-buffer (which would be eliminated if they implemented hidden surface removal like the PowerVR chipsets).
???
First off, textures most certainly should not "god damn be" in main memory! The AGP bus is there to stream vertex data from the CPU (pre- or post-transformation, it's the same amount of data). That's all it's there to do, and good thing, too, because today's high-poly games can already generate enough vertex data to make AGP 2x a bottleneck, and those of a couple years will do the same to AGP 4x. (Which is why AGP 8x is on the horizon.) Increasing the bandwidth of a bus from the northbridge across the motherboard through a slot to an add-on card is a whole lot harder than increasing the bandwidth from soldered DDR to a soldered GPU a few centimeters away. AGP should only carry the data which it absolutely is forced to--namely initial vertex data from the game's engine running on the CPU.
Z-buffer lookups only waste bandwidth between the GPU and the on-card memory. Technically, you don't eliminate z-buffer lookups with a tile-based architecture; you eliminate texture lookups (and texture application) on occluded polygons. However, by dealing with a small tile at a time, you can read all the z-buffer data for the tile in from memory all at once, and store it in an on-chip cache until you're done with that tile. (This is essentially why higher poly-count games mean smaller and smaller tiles.)
And last, they do implement hidden surface removal techniques, like I pointed out before, even though they are less effective than with a tile-based architecture.
The ISAs are the same and have been since at least the Power3.
So how 'bout those IBM Altivec chips? And those 64-bit G4s??
Both use their own incompatible supersets of PowerPC. In case it wasn't clear, it's the supersets that are incompatible, not the core PPC ISA.
According to the Motorola PowerPC roadmap, the G5 will be available in both 32 and 64 bit versions. How much it resembles Power4 isn't clear, but it's supposed to debut at up to 2 GHz. Are you still so confident it won't have world-class performance?
:)
;-)
No. According to the Motorola PowerPC "roadmap" (I'm sure they have more informative roadmaps internally and released to their partners, but god is that thing vague!) the G5 will debut at 800MHz and up, and eventually scale over its lifetime to 2GHz or maybe even higher. Am I positive that it won't have world-class performance? No, but I would guess that it won't based on the fact that it's being designed at a 2nd-class design firm which has fallen significantly behind of late, and that it's primarily targeted at embedded systems, not desktop PCs. Of course, the K7 was designed at a previously-thought-to-be-2nd-class design firm which had fallen significantly behind on their previous core, and was arguably rushed out to replace a rapidly obsolete K6. And yet the K7 turned out to be extremely successful, and grabbed the x86 performance crown from just after its introduction til very recently.
But there are important reasons why we shouldn't expect a K7 out of Motorola. Among them, while AMD had amassed a semi-dream team to design the K7, Moto is apparently so hurting for talent that they are soliciting EEs on the basis of comp.arch posts. Plus their semi division has posted very large losses the past several quarters and is speculated to be a candidate for being shut down or spun off. (AMD had losses before the K7, but as CPUs were their main business, they weren't about to drop them.)
Certainly the G5 will be faster than the G4, and Jobs will surely be able to make it seem faster than a room full of P4s. Since Macs have never been about performance, I would bet the G5 will be enough to keep them happy. But world-beating, I doubt it. We'll see...
(BTW: you are right that the G5 apparently has a 64-bit version. There is no good reason for Apple to use it, however; 64 bits is worthless for all the markets Macs sell to. The only reason it's at all worthwhile for Hammer is that Hammer is trying to steal some of Xeon's market in e.g. databases. Of course, Apple isn't past using a useless feature for marketing purposes, so perhaps they'd use a 64-bit version anyways.)
The integrated I/O might or might not be worthwhile, but Apple's current pro machines use L3 cache.
*Some* integrated I/O and *some* L3 cache TLBs might be of use in a desktop chip. But the integrated I/O to network a 2-way system across a motherboard is nothing at all like the integrated I/O to network a 4-way system across an MCM. They'll use completely different protocols. Similarly, the TLBs for maybe 2 or 4 MB of L3 aren't going to share much in design or layout with the TLBs for 128MB L3. Indeed, the whole address space will have to be completely different. And so on.
The new dual-processor 1 GHz G4 is claimed to have 15+ GFlops of computing power, using Altivec I presume.
Snort. This figure is what you get when you multiply the peak execution rates of all the Altivec and floating point units on both chips together and multiply by 1 billion. This assumes all peak-rate operations (so, most likely, 100% fp adds/packed Altivec fp adds...although, come to think of it, they might be counting fp loads as "operations"), no loading of operands, no data hazards, etc. The precise technical term for this is "bullshit." Side note: how do you plan on getting the operands for 15 billion floating point operations every second across a 1 GB/s memory bus??
If the G4 is a supercomputer on a chip, how come there aren't any G4-based machines on the Top 500 list? More to the point, how come any old x86 chip will destroy a G4 on LINPACK or LAPACK? A supercomputer with PC133?? (If you think the derision is too harsh, it's because you don't realize the degree to which "supercomputer" workloads are dominated by memory bandwidth considerations.)
If that single Power4 CPU was really "optimized to work in an 8-way MCM", it truly did a stellar job as a uni-processor.
Again, 128MB L3 didn't hurt.
On the compiler front, I did find a seemingly decent FORTRAN compiler for MacOS X, so that issue is addressed at least.
Good to hear. A SPEC license isn't all *that* expensive ($100 for a student IIRC), so hopefully someone will get cracking and produce some real independant benchmarks comparing the G4 to other processors. (Again, *not* holding my breath for Apple to do the same.) Of course a lot of effort needs to go into figuring out the optimal compiler flags, etc. And I somewhat doubt Absoft's compiler can vectorize Altivec out of code without any changes or hints stuck in. (SPEC doesn't allow any changes to the source, and very few compilers can do truly autonomous vectorization.)
But I'd be *very* interested to see those results!
That $100,000 cost is fairly meaningless, since there is an extreme markup on server hardware, and the chip isn't in mass production...I'd venture to say that it can be mass-produced cheaper than P4, as I'll bet it has a lower gate count.
Yes and no. Sure the HPC market where the Power4 currently plays has huge markups and very low production volumes...but that also means designs which could not possibly be cost effective in the desktop market. A single Power4 multi-chip module contains 4 2-way CMP dies, 256-bit interconnect between each pair of dies, and, oh yeah, a measely 128MB of eDRAM.
Each one of the 4 dies takes up 400mm^2 on a
G5 will essentially be this architecture.
The G5 is an upcoming 32-bit embedded chip made by Motorola (like the G4 and G4+), and does not resemble the (64-bit) Power4's internal architecture in the slightest. Whether this chip will be the basis of the next generation of Macs is of course not yet known.
Please cite some reference to support this (wild in my opinion) claim.
Because Apple does not have the integrity (nor, according to the oft-repeated excuse, the FORTRAN compiler) to submit SPEC runs for a G4-based computer, there are no official SPEC scores for the G4. However, we do have Motorola's *estimated* *SPEC95* scores for the 7450 (a.k.a. G4+) at 733MHz. (Here, second page, on the left.)
They are 32.1/23.9, SPEC95 int/fp. By comparison, a 400MHz R12k (best I could find for SPEC95; it is an old benchmark after all) scores 24.2/43.5 SPEC95 int/fp; 25% worse on int, and 82% better on fp.
That same 400MHz R12k scores 347/343 on SPEC2k int/fp. (Sorry, but no more links; the scores are all available at www.spec.org) Assuming equivalent SPEC95-to-SPEC2k ratios (a faulty assumption, but then again we're using estimated scores in the first place), we get our 733MHz G4+ scoring 460/188(!!) on SPEC2k int/fp.
For a scaling factor we'll use the Coppermine PIII, since it has SPEC2k scores available for both 733MHz and 1GHz configs. 1GHz is 22%/16% faster than 733MHz at SPEC2k int/fp. (If you repeat my calcs, be sure to use the 1 GHz PIII scores using the same compiler version as the 733MHz scores.) So applying that to our "estimated" SPEC2k scores for 733MHz G4+, we get even-more-estimated SPEC2k scores of 563/219 for a 1GHz G4+.
So, a decent spot (32%) better than the 500MHz R14k at int, and a significant bit (53%) worse at fp. Plus the CPU in the new SGI Graphics Fuel can be up to 600MHz and uses DDR and not SDRAM like the one I got the scores from.
So...hope that helped.
Re: the Power4 SPEC scores(Also this was a single-CPU system, so I don't think it was a multi-CPU module.)
SPEC2k is single-threaded. The score was obtained using a 4-way Power4 "Turbo" module with 3 of the cores "turned off". The rather sneaky thing is this gave the remaining core access to all 128MB L3, which means the SPEC score probably overstates single-threaded performance a bit.
What makes you think that Power4 technology won't make it's way into desktop chips? IBM manufactures desktop PowerPC chips as well, and certainly shows no sign of giving up on PowerPC in general. There have recently been rumors of Apple switching from Motorola to IBM for it's chips...we'll see what happens.
Power4 is simply not a desktop chip design. Even using one of the 4 dies in the MCM as the basis for a desktop CPU is a shakey proposition, since they're too big (again, 400mm^2 on
Of course, it may be quite likely that Apple turns to IBM instead of Motorola for the next generation of Mac CPUs (especially as it looks somewhat likely that Moto will exit the semi business in the coming year). But it will not look anything like a Power4.
Looks like SGI should consider joining Apple in the PowerPC world...that Power4 looks pretty awesome!
That Power4 also costs like $100,000 for each (4-way CMP) processor module alone, so, gee, it'd better be pretty awesome. The 1 GHz G4+ that powers the current generation of Macs would probably score about the same as the R14k on SPEC, or a bit lower...but we don't know because Apple is too cowardly to submit themselves to legitimate benchmarks when they have a bunch of fools running around believing that a G4 is faster than a P4 or Athlon, and Motorola doesn't bother because they know the G4+ is actually designed for the embedded signal processing market, where SPEC scores are not too relevant. Just because the G4 and Power4 are both "in the PowerPC world" doesn't mean they have similar performance characteristics.
In any case, where the R1x000 really shines is in scalability to very high processor count NUMA configurations (not at issue in this case of course). It'd still be a world-class processor line if SGI hadn't given up 5 years ago by essentially stopping R1x000 development and committing to Itanium instead. They've finally realized their mistake and apparently have some extra tweaks on the way (R16k and R18k), but it's probably too little too late.
Were I SGI at the moment, I'd drop IRIX for Linux, port everything that made IRIX special, and run it all on proprietary P4 or Xeon boards with all the special SGI graphics goodies. Although that was the idea behind their NT line and that didn't do so well, did it...
SGI had some amazing tech back in the day, but having more or less rolled over and died the past few years it might be difficult for them to stay ahead of the commidity hardware crowd. (Re: 48-bit color, if johnc has his way--and he usually does--commidity graphics cards will have 48 or 64-bit internal color soon enough.) But they appear to be finally waking up and making a go at it, so best of luck to them.
You have to understand benchmarking people. When they say kernels they mean benchmarking kernels. Small contained programs that extract key loops or algrothmns from larger programs.
Exactly correct. 14 replies and this is the only one that even understands the terminology used. Someone mod parent up.
Some of these people have remarked upon, but others they haven't.
1) Whether they used export-grade or real encryption made absolutely no difference in this case in terms of preventing terrorism, saving lives, etc. All that prevented that plane from blowing up is that this guy had bad luck lighting his detonator cord and somebody noticed him. Even if there were no encryption of any sort in the world it would have made no difference in this case. It was all a matter of dumb luck, bad shoe-bomb design, and an attentive person. The only use the file has now is as evidence, and of course there are valid concerns as to its legitimacy.
Conclusion: perhaps we should be concentrating on keeping bombs off of planes (which we are finally starting to do, albeit in a half-assed ass-covering sort of way) instead of on crypto exports.
2) This file was kept on a communal Al-Qaeda PC. It happened to be encrypted using Windows EFS, but most of the other contents of the machine--many of them just as valuable as inteligence or evidence--were not.
3) Again, this file was encrypted on a desktop machine in Kabul. The only possible way Americans could get a look at it would be on the unlikely chance that we took over the entire country of Afghanistan. Otherwise the CIA/NSA/etc. never gets a look at this file, encrypted or no. Presumably the reason the file was encrypted was to prevent other members of Al-Qaeda who had access to the machine from looking at it, not to foil Americans. For these purposes 40-bit Windows EFS is probably just fine.
4) A correlary: presumably when Al-Qaeda wants to encrypt something that the CIA/NSA/etc. actually might have a chance to intercept, they use real encryption. i.e. they presumably use PGP for their email. (Although reports have them into steganography instead, presumably because with intercepted encrypted email at least you know who sent it, when, and to whom.)
In other words: there's nothing to see here. If this is the best the anti-cryptos can come up with then export-crypto would be quite safe in a reasonable world. (Of course no one said Washington after Sept. 11 was anywhere near reasonable.)
So, this is probably how Intel demo'ed their 3.5GHz P4 last year. Shows how pointless the whole thing is, to be honest.
No: the 3.5GHz P4 Intel demoed at IDF last fall was air-cooled. On the other hand, it was certainly hand-picked from a special run of chips on a boutique process tuned to produce a few very high clocking chips at the expense of overall yield. Which, yes, shows how pointless the whole thing is, to be honest.
On the other hand, the fact that they are showing it off is an indication of where they're going. Intel showed of an (air-cooled) 2 GHz P4 at IDF fall '00, and launched the same part, not coincidentally, exactly at IDF fall '01. They showed a 3.5 GHz P4 at IDF fall '01, which means...?
No, they probably won't get one out quite so early (3.0 is more like it), but it'll be here around the end of the year. Incidentally, the top speed of an air-cooled hand-picked chip on a special process is probably more relevant to future clock scaling than that of a Liquid Nitrogen cooled off-the-shelf part, for the simple reason that the process will be tweaked to be more aggressive as time goes on, but the temperature is never going to magically drop to -196 deg C. (And yes, the difference matters, as lower temperatures attack different limiting factors for clock rates than tweaked processes do.)
IANAL, but all of your methods involve accessing the unencrypted versions of the songs. Therefore, DMCA doesn't apply because you are not defeating the copy prevention on the encrypted MP3 files.
Right, but the whole CDS technology is intended to prevent one from acessing the unencrypted versions of the songs with a computer. The methods I posted circumvent this intent, and thus might fall under the DMCA.
Although I would argue that the methods are so general as to not violate the DMCA, because if they work then the access control was not legally "effective". (CSS was found to be "effective" because it took a new program and some reverse engineering to crack it, but if old programs, drives, or methods work, then I don't think that qualifies.) But IANAL either.
BTW, the encrypted MP3 files are presumably not copy protected at all, but rather can only be decrypted by the signed player included on the disc. (Note to self...but how do they prevent the MP3s *and* the player from being copied...hmm...just because they use Blowfish instead of rolling their own "encryption" ala CSS does not mean they necessarily know what they are doing...although presumably there's more to it than this...)
This seemed like a good idea to me, too, until I started to think about the idea that in the end, they just up the price of CD's, and we end up paying for it.
No, because at the moment this is only Universal Vivendi--only one of the big 5 record labels. Thus all the returns will only hurt Universal. This leaves three possibilities:
1) Universal does not raise prices to cover the cost of returns; Universal loses lots of money
2) Universal does raise prices to cover the cost of returns; now they are charging $2 more than the competition for people to buy defective "CDs"! Universal loses even more sales
3) Universal rasises prices to cover costs and the other labels raise prices to match; the other labels make make larger profits (assuming consumers don't stop buying) while Universal just breaks even; other labels steal away all of Universal's artists.
We still have a choice in this. Universal has specifically said that they will be looking at the return rates to decide whether they move all their music onto this new format. Yes, the music industry has been too dumb to realize that the reason music sales are down is because they shut off Napster. But they are not too dumb to realize that when people return their new format as defective that it isn't smart to move their entire line over to that format.
Just to clear up a bit of mis-information, SACDs are not backwards compatible with the CD standard by default. The physical media used for SACDs is high density like a DVD and the audio bitstream is not LPCM, but the specification allows for a hybrid disc with two layers where one of the layers is compliant with the traditional CD spec and made such that it will play in most CD players. Note that this is an optional portion of the specification.
Thanks for the correction; I'd assumed the hybrid disc was standard.
Is Philips still planning on not letting Universal us the standard audio CD logos on their CDs because of the Red Book compliance issues? To me that's a very strong statement.
Do we really need to wait for Philips to decide this issue for us?
The thing is, the circular platters they are selling are NOT CDs. They are a new format, designed to be partially backwards compatible with certain CD players and not compatible with certain other CD players.
Just because they store information on a thin 5.25" circular platter does not make them CDs. VideoCDs, SuperAudioCDs and DVDs also store information on 5.25" circular platters, but they are not CDs. Only Philips can sue Universal for trademark infringment on the term "CD", but we can all sue them for misleading labeling.
Or, more properly, we should pressure the retailers. After all, Universal is doing something by putting a warning label on these platters; it's the retailers who are inviting confusion by (presumably) marketing and displaying these platters in the same way that they do actual CDs.
We should be pressing the record stores to create new categories if they want to sell these platters, e.g. a "Not-A-CD" section for all Universal disks, just as they have seperate sections for DVDs and, if they sell them, SACDs or VCDs. (Or perhaps "IncompatibleCD"; "ICD" for short.) Hell, they have seperate sections for SACDs, and those *are* completely backwards-compatible with the CD standard!
If you invent a new and incompatible standard, you don't get to market it by inviting confusion with the dominant standard. That is illegal, even if the trademark holders of the dominant standard don't bother suing you for it.
The thing is, if you read the EULA carefully it's clear that it only applies to the software portion of this so-called "CD":
.wav file) just fine, with no extra user effort or loss in quality.
"When you use the compact disc in a CD ROM drive, the technology launches an audio player (the "Player"), and plays compressed audio files (the "Content")."
In other words, "the Content" means the encrypted MP3 files on the platter, not the fux0red uncompressed audio with the messed up error correction that plays when you stick it in a normal CD player.
Of course you are presumably bound from trying to mess with the latter due to the anti-circumvention clause of the DMCA. Although, for that to kick in, the access-protection mechanism needs to actually be "effective" in the eyes of the law; a valid case can be made that this mechanism is *not* effective, because according to various reports there are the following workarounds:
1) Certain if not all DVD-ROM drives (and perhaps consumer DVD players as well) can access tracks 2 and beyond *automatically*, with no extra user effort or loss in quality.
2) Widespread pre-existing utilities such as exact audio copy are reported to be able to rip the disc (as one single
3) Extracting the audio from a consumer CD player with digital-out into a sound card with digital-in should result in a perfect copy, with no extra user effort or loss in quality.
Presumably nobody accessing the audio on the disc using the above three methods could be charged with using a "circumvention device", because they were just using commonly available tools and methods which were in place before this supposed access-control mechanism was even invented. Thus in my NAL opinion, the DMCA would not apply here.
Once the content is accessed, of course you are perfectly within your rights to rip to MP3 or make a backup copy for personal use, or, under the AHRA, to make copies for your friends (as long as they are distributed non-commercially). Whether you are allowed to distribute MP3s online (e.g. through a P2P network) is still an open legal question, but distributing these MP3s is certainly no more or less illegal than distributing any MP3 from a CD you don't have the copyright on.
Since when did consumers lose all of their rights as a result of buying a product?
Since the product was software. The EULA attached to their buggy player and the encrypted MP3s is unfortunate, but as we all know, not terribly unusual for the world of software--where it clearly resides. Luckily none of its provisions--especially those regarding indemnity or reverse engineering--are likely to stand up in court.
Common Lisp contains the most advanced iteration constructs I've ever seen in any lanugage, including C, Perl, Python, and others. It's called extended loop, and it doesn't need lots of parens. It's not used by by Graham or Norvig, since Graham despises loop and OO and Norvig uses applicative style since that fits most AI problems extremely well.
Interesting point. I must confess to being scared off of loop by Graham (and by my professors) as being incredibly intuitive to read but incredibly difficult to pin down exactly what it's going to do. (Graham claims that certain implementation details which determine the order functions are executed in a loop are left unspecified in the ANSI CL specs, and thus differ from implementation to implementaiton.)
I can certainly say that I appreciated the times Norvig used loop in his code, because it sure does cut through the clutter of a complicated do expression.