I haven't been able to find any info online about a multithreaded version of gzip and so I'd like to take this up as personal project.
I have a feeling that the gzip algorithm is rather thoroughly serial and so would work very badly if multi-threaded at a fine-grain level; I suspect you won't get much advantage over just
tar cf file.tar [files] Split file.tar into N pieces, where N is twice the number of CPUs you have gzip all N pieces in parallel tar up the gzipped bits
Of course, this isn't going to work very well on streams; you'll have to construct the full tar file first. If you want to work with streams, you could do something hideous like sending the Mth block to file N%M before doing the gzip-in-parallel - a sort of N-way version of tee - but this'll disrupt locality horribly.
Neither method will produce files compatible with normal gzip, which is another teeny little problem.
I don't think Ada is anything like as hideous as it's made out to be; Ada95 feels to me rather like a more friendly C++ (at last! Interface and Implementation files with dependencies handled by the compiler, rather than having to do explicit #include commands), though (at least in the version I've played with - the GNAT Win32 release at www.gnat.org) the libraries available don't seem as good as STL.
It's possible that the Jargon File is referring to Ada83, which was apparently a good deal more hideous.
I think the next big breakthrough will be a compressor that can take a file with not much repetition of data (therefore hard to compress using current algorithms) and create a file with much more repetition in it (and perhaps larger) and then compress that down.
Have a look at the paper describing the bzip algorithm; this is pretty much exactly what it does. The idea is that, with a bit of care and twiddling, you can partially sort the file so that similar bits go together, but in such a way that the sorting can be undone to get the original file back.
Basically, the world has changed slightly, and the x86 people now have the best process technology.
Intel and AMD have 0.18u; Alpha and HP and MIPS are still on 0.25u, and Sun will get to 0.25u with the Ultrasparc 3 this summer.
Alphas are still a lot better at scalar FP per clock (with two FP units and without that dumb register stack), but if you're doing single-precision work a P3 or K7 with SIMD instructions will be as capable as a $5000 21264 machine.
HP's PA-8600 chip is amazing; 1536k of L1 cache running at 550MHz (as fast as the L2 on the fastest Xeons around), with a brainiac design even more sophisticated than the 21264, delivers ridiculous speed for that clock rate. But at the price, I'd still rather have a decent second-hand car...
The field of compression has been thronged with patents for a long time - but patents at least reveal the algorithm.
What do you think of the expansion of trade-secret algorithms (MP3 quantisation tables, Sorensen, RealAudio and RealVideo, Microsoft Streaming Media) where the format of the data stream is not documented anywhere?
The Data Compression Book was an excellent reference when it came out, but there are some hot topics in compression that it doesn't cover - frequency-domain lossy audio techniques (MP3), video techniques (MPEG2 and especially MPEG4), wavelets (Sorenson video uses these, I believe, and JPEG2000 will), and the Burrows-Wheeler transform from bzip.
Do you have any plans for a new edition of the book, or good Web references for these techniques? BZip is covered well by a Digital research note, but documentation for MPEG2 seems only to exist as source code and I can't find anything concrete about using wavelets for compression. The data is all there on the comp.compression FAQ, but the excellent exposition of the book is sorely lacking.
The Athlon 800 uses a 2/5 ratio cache, the Athlon 1000 uses a 1/3 [so the/800 has 320MHz L2 cache and the/1000 has 333MHz]. That's on the Anandtech site.
Comparing P3 and Athlon benchmarks is a bit fiddly, because the best P3 results are obtained using Rambus memory, an i840 motherboard, and Intel's v4.5 compiler which can vectorise FP operations and do prefetch.
For Athlon systems, you're tied to PC133 memory on a KX133 motherboard, and it's not possible to tell ICL v4.5 to generate prefetch operations but not SSE, so you don't get the prefetch benefit.
So you get a SpecFP95 figure of 29.4 for the K7/1000 from AMD; if you go to specbench and search for the P3/800, you get
If you compile on the P3 with options such that the code also runs on the Athlon, you get a score of about 19.5 and the P3 appears incredibly slow - but this is what you expect when running code which is essentially unoptimised.
Running hyper-optimised cache-blocked code, like Prime95, an Athlon/500 is 25% faster than a P3/500E and 45% faster than a normal P3/500.
Of course, the K7 system is enormously cheaper than a P3-and-Rambus system here in the UK.
It is *MUCH* more important that everyone run code which is *EXACTLY* equivalent, and has been tested to be equivalent, than that people be a few cycles faster.
I don't believe that Olli is that stupid: FFT code is not that hard to get right, and there seems no point at all in wasting horrendous numbers of processor cycles running hideously sub-optimal code.
I certainly don't believe that the people at Seti@home are capable of doing such superlatively rigorous testing that we should believe that their software is By Definition Correct, nor that the people building faster FFT modules for Seti@home are so stupid that they'll get the software wrong. FFTs don't have data dependent flow and you've got umpteen blocks on which to test your SIMD routine against the bolt-standard version that the main client's using... it's not a hard thing to verify, and it seems ridiculous that Seti@home hasn't picked up Olli's client and said 'ooh, we've got a newer, better client - let's use this'.
If it's enough faster, the Right Thing To Do is to restart the whole project using the new client! If you've got twenty years of processing to do and have already done one using the old client, throw that away, start at square zero, and you'll have redone all the old data within a month (to check the new fast client is good) and finish the twenty years of computation in two.
I'd like to address some of the rather negative posts that people have made regarding PG because frankly their short-sighted:
PG's etext's should be made more web-friendly:
Most of the people who make this comment also suggest a lot of nonsense about adding HTML formatting to texts. This would be a huge mistake. The beauty of having the texts in as simple a format as possible is that it is always possible to add the formatting later.
The problem I've found is that the texts are at present in a format which LEAVES INFORMATION OUT: when I converted the Gutenberg Les Miserables and the Father Brown stories, I had to go through marking up the chapters as chapters (leaving Perl to mark up the paragraphs as paragraphs), then read through the entire text to find out paragraphs which actually contained tabular information and had got mis-markuped.
I've got the game running really quite nicely on a Riva TNT card here.
Of course, I had to port it to Win98 first; it's not a complicated procedure, though (1.5 hours, and that included setting up the IDE nicely; only about 10 lines of code had to be changed).
The i820 (board used in these benchmarks, codenamed Camino) can use either PC133 or RDRAM. Intel's intention was to use RDRAM all the way; despite something like $1 billion investment by Intel in RDRAM manufacturers, it's not yet available in sufficient quantities, so they're permitting PC133 as well.
K6/3 had 256k on-chip L2; the K7 looks a larger chip (because of the 128k L1), and AMD have been having enough problems fabricating the K6/3 that I'm not surprised they're rolling out the K7 with FSB L2.
Or is it something more sinister such as a "from each according to his ability, to each according to his needs" philosophy?
But that's exactly what Free Software is, because software is about the only field where unrestricted duplication means that everyone's needs can be met without depriving other people, and where the problem is small enough that single humans can solve chunks of it in their free time.
From RMS and Linus according to their (considerable) abilities; to the rest of us according to our desires. And you don't even have to be a party member to go up and put another brick on the edifice.
Of course, you get some of the problems of communism as well - feedback's not straightforward, because there are so few producers that, if you can't influence any of them, you're stuck. Individual projects tend to end up under dictator-class control, with arbiters of correctness ensuring the project's kept going along the lines they consider right.
But it doesn't matter, in the way that oppression in the Real World matters, because software is a small chunk of the world, and your non-existent right to get your patches into Linux can be abrogated without hurting you.
There's a camera watching you. But there's a camera watching the man watching the camera, there's a camera in the police station's interrogation room, there's a camera watching your neighbour.
You get to look at the access logs to your camera... so, whilst you may be being watched, you're not being watched anonymously, and if you get fed up you can watch right back.
There are 21264 benchmarks for systems between 500 and 575MHz floating around; a dual 21264/500 board costs about $9000 at the moment, with the targetted market being embedded controllers which need that sort of performance.
Digital are very good at doing extremely aggressive speed-binning; they have 616, 625, 633 and even the occasional 666MHz 21164 running in very high-end systems, and they can produce 600MHz chips in large enough quantities that SGI can build 1024-way multi-processing T3E/1200 boxes.
The 21264 is not a small nor a simple chip; 64k data, 64k instruction caches already take up an awful lot of transistors. It has 15.2 million transistors, and is huge (300mm^2) in the 0.35u process. It bears about the same relation to the 21164 as the PPro does to the Pentium - much cleverer scheduling, much cleverer renaming, out-of-order execution, and all the other techniques required to get great performance out of an architecture.
Since it has a reasonable number of registers, and a sane FP design, it's about 50% faster on integer work and 2x as fast on FP work as a P3 of the same clock speed.
And it's been wildly delayed; the announcement I have with details in it was produced on 25/10/1996, and said 'samples first quarter 1997, volume second half 1997'; Digital representatives were promising volume production of 21264 systems by October 1997. If it had come out then, Intel would be in considerable trouble.
I'm not sure hardware speed has been soaked up by bloated software. I think it's more a matter of hardware speed having gone beyond the capacity of software to bloat.
Sometimes you can get advantages out of hardware speed - but emacs can do real-time spellcheck and autocorrect on arbitrary small Linux boxes, and even Word can do them, with non-noticable delay, on a P133.
As far as we know, You can't use all of the performance of a P2/350 to wordprocess or to browse the Web - while I'm using this PC, and everything is running fast enough to keep up with my typing and update the screen smoothly, 85% of my cycles are going to Prime95.
If a P3/700 is three times faster than this box, all I'll notice is that 95% of my cycles are going to Prime95. It'll still keep up with my typing; it'll still update the screen smoothly.
!GIVE ME AN EXAMPLE OF AN APPLICATION (not a game, that's too easy - though even games aren't that bad now we've got ubiquitous 3d acceleration) THAT REQUIRES A P2/350!
Can anyone think of an application which is accelerated substantially by MMX or KNI, but which could not practically be accelerated by an add-on board?
The defects of add-on boards are that you can't put much RAM on them, and that the link to the CPU is slow. For graphics, the slow CPU link means you throw vertex data across, and get your textures by DMA, and use nifty things like DX6 texture compression to get around the lack of memory on the video card.
For sound, you don't need much RAM and you don't need tens of megabytes per second of bandwidth, so add-on boards are not a problem.
The obvious scientific applications really need more than 16 bits of precision - so you can't use MMX. Possibly KNI would be useful for data analysis - I think you could write an incredibly fast single-precision FFT using it.
For working with large numbers (RSA crypto, lots of mathematics), you really need double-precision FP - so KNI is useless - for the FFT approach to bignum arithmetic, and you want add-with-carry of packed 32-bit or 64-bit numbers - which KNI just might have - for the schoolboy approach. And, unless your numbers are incredibly large, they'll fit in the RAM of an add-on card; unless you're doing fairly unpleasantly complicated tasks, you can do all the work in the add-on card's RAM and not worry about transfer bandwidth at all.
OK. So what applications are MMX or KNI actually useful for? The only one I can think of, possibly, is clever video decompression where you work from tens of megabytes of stored history - but video decompression runs into the bandwidth-to-graphics-card bottleneck. So that's no good either.
This makes far more sense to me and in this particular case (though that's not to say this will be the only case), is no more intrusive than a cookie.
But, unfortunately, it's no more secure than a (properly-crypted, stored in a non-exportable database) cookie, either. Both are 'something you have'; fiddling the kernel to send someone else's CPUID is not technically harder than copying someone else's cookie.
When I'm FORCED to toggle it BACK ON just to dial into my ISP to "make sure no one is stealing my account" then I will blame INTEL.
No, when you're forced to turn it back on to dial into your ISP, you'll change ISP.
When I'm forced to turn it BACK ON just to download the latest Windows '98 patches (you can damn well bet that Microsoft LOVES this idea!), I will blame INTEL.
No, when you're forced to turn it back on to load Win98 patches, you'll complain in the same way and to the same channels as you did this time, and Microsoft will give way in the same way that Intel did.
Or are you suggesting that these people should destroy the thick end of a billion dollars worth of perfectly-working chips, and reveal for public inspection several billion dollars worth of intellectual property (because I doubt you'd trust Intel's word that the feature was disabled), just so that reinforced paranoids can be certain that, whilst they're being tracked by their IP address and statistically sampled by their browsing habits, they're not also being tracked by which computer they use?
Remember, if you're trying to work out markets, you don't need perfect data and you don't need user names. With the present HTTP protocol, you can't avoid leaving an audit trail of the pages you've visited; if a webmaster knows that 37% of your users visit page A, 29% proceed to page B,... she knows enough to optimise advert placement, work out where to put announcements...
But the 'other books bought' stuff is reasonably useful information for you. I don't consider the list of books I've purchased to be private information.
In fact, there's not much I consider to be private information. My current checking account balance is £712.88; how does that help you? Why is it a bit of information that you'd rather random people don't know?
I haven't been able to find any info online about a multithreaded version of gzip and so I'd like to take this up as personal project.
I have a feeling that the gzip algorithm is rather thoroughly serial and so would work very badly if multi-threaded at a fine-grain level; I suspect you won't get much advantage over just
tar cf file.tar [files]
Split file.tar into N pieces, where N is twice the number of CPUs you have
gzip all N pieces in parallel
tar up the gzipped bits
Of course, this isn't going to work very well on streams; you'll have to construct the full tar file first. If you want to work with streams, you could do something hideous like sending the Mth block to file N%M before doing the gzip-in-parallel - a sort of N-way version of tee - but this'll disrupt locality horribly.
Neither method will produce files compatible with normal gzip, which is another teeny little problem.
Tom
I don't think Ada is anything like as hideous as it's made out to be; Ada95 feels to me rather like a more friendly C++ (at last! Interface and Implementation files with dependencies handled by the compiler, rather than having to do explicit #include commands), though (at least in the version I've played with - the GNAT Win32 release at www.gnat.org) the libraries available don't seem as good as STL.
It's possible that the Jargon File is referring to Ada83, which was apparently a good deal more hideous.
I think the next big breakthrough will be a compressor that can take a file with not much repetition of data (therefore hard to compress using current algorithms) and create a file with much more repetition in it (and perhaps larger) and then compress that down.
Have a look at the paper describing the bzip algorithm; this is pretty much exactly what it does. The idea is that, with a bit of care and twiddling, you can partially sort the file so that similar bits go together, but in such a way that the sorting can be undone to get the original file back.
Basically, the world has changed slightly, and the x86 people now have the best process technology.
Intel and AMD have 0.18u; Alpha and HP and MIPS are still on 0.25u, and Sun will get to 0.25u with the Ultrasparc 3 this summer.
Alphas are still a lot better at scalar FP per clock (with two FP units and without that dumb register stack), but if you're doing single-precision work a P3 or K7 with SIMD instructions will be as capable as a $5000 21264 machine.
HP's PA-8600 chip is amazing; 1536k of L1 cache running at 550MHz (as fast as the L2 on the fastest Xeons around), with a brainiac design even more sophisticated than the 21264, delivers ridiculous speed for that clock rate. But at the price, I'd still rather have a decent second-hand car
The field of compression has been thronged with patents for a long time - but patents at least reveal the algorithm.
What do you think of the expansion of trade-secret algorithms (MP3 quantisation tables, Sorensen, RealAudio and RealVideo, Microsoft Streaming Media) where the format of the data stream is not documented anywhere?
Tom
The Data Compression Book was an excellent reference when it came out, but there are some hot topics in compression that it doesn't cover - frequency-domain lossy audio techniques (MP3), video techniques (MPEG2 and especially MPEG4), wavelets (Sorenson video uses these, I believe, and JPEG2000 will), and the Burrows-Wheeler transform from bzip.
Do you have any plans for a new edition of the book, or good Web references for these techniques? BZip is covered well by a Digital research note, but documentation for MPEG2 seems only to exist as source code and I can't find anything concrete about using wavelets for compression. The data is all there on the comp.compression FAQ, but the excellent exposition of the book is sorely lacking.
The Athlon 800 uses a 2/5 ratio cache, the Athlon 1000 uses a 1/3 [so the /800 has 320MHz L2 cache and the /1000 has 333MHz]. That's on the Anandtech site.
Comparing P3 and Athlon benchmarks is a bit fiddly, because the best P3 results are obtained using Rambus memory, an i840 motherboard, and Intel's v4.5 compiler which can vectorise FP operations and do prefetch.
For Athlon systems, you're tied to PC133 memory on a KX133 motherboard, and it's not possible to tell ICL v4.5 to generate prefetch operations but not SSE, so you don't get the prefetch benefit.
So you get a SpecFP95 figure of 29.4 for the K7/1000 from AMD; if you go to specbench and search for the P3/800, you get
24.5 - BX motherboard, PC100
28.9 - i820 motherboard, Rambus, Intel
32.4 - i840 motherboard, Rambus, Dell
If you compile on the P3 with options such that the code also runs on the Athlon, you get a score of about 19.5 and the P3 appears incredibly slow - but this is what you expect when running code which is essentially unoptimised.
Running hyper-optimised cache-blocked code, like Prime95, an Athlon/500 is 25% faster than a P3/500E and 45% faster than a normal P3/500.
Of course, the K7 system is enormously cheaper than a P3-and-Rambus system here in the UK.
I don't believe that Olli is that stupid: FFT code is not that hard to get right, and there seems no point at all in wasting horrendous numbers of processor cycles running hideously sub-optimal code.
I certainly don't believe that the people at Seti@home are capable of doing such superlatively rigorous testing that we should believe that their software is By Definition Correct, nor that the people building faster FFT modules for Seti@home are so stupid that they'll get the software wrong. FFTs don't have data dependent flow and you've got umpteen blocks on which to test your SIMD routine against the bolt-standard version that the main client's using
If it's enough faster, the Right Thing To Do is to restart the whole project using the new client! If you've got twenty years of processing to do and have already done one using the old client, throw that away, start at square zero, and you'll have redone all the old data within a month (to check the new fast client is good) and finish the twenty years of computation in two.
What you want is a Psion 3mx.
Hand-sized, nice contrasty 480x160 screen - I use mine quite a lot to read ebooks and the like.
£169 in the UK.
The problem I've found is that the texts are at present in a format which LEAVES INFORMATION OUT: when I converted the Gutenberg Les Miserables and the Father Brown stories, I had to go through marking up the chapters as chapters (leaving Perl to mark up the paragraphs as paragraphs), then read through the entire text to find out paragraphs which actually contained tabular information and had got mis-markuped.
No, #14 at man.ac.uk is the fastest academic one ...
I've got the game running really quite nicely on a Riva TNT card here.
Of course, I had to port it to Win98 first; it's not a complicated procedure, though (1.5 hours, and that included setting up the IDE nicely; only about 10 lines of code had to be changed).
email me for details
The i820 (board used in these benchmarks, codenamed Camino) can use either PC133 or RDRAM. Intel's intention was to use RDRAM all the way; despite something like $1 billion investment by Intel in RDRAM manufacturers, it's not yet available in sufficient quantities, so they're permitting PC133 as well.
K6/3 had 256k on-chip L2; the K7 looks a larger chip (because of the 128k L1), and AMD have been having enough problems fabricating the K6/3 that I'm not surprised they're rolling out the K7 with FSB L2.
Umm, and how are you supposed to make a feature rich desktop that works great on both a low-end and a high-end machine?
/proc/cpuinfo and the size of /proc/kmem at startup and decide which set of code to use and how large your caches should be?
...
Look at
Have a nice GUI-based program where you can pull out bars to set the resource allocations?
Just two ideas
Or is it something more sinister such as a "from each according to his ability, to each according to his needs" philosophy?
But that's exactly what Free Software is, because software is about the only field where unrestricted duplication means that everyone's needs can be met without depriving other people, and where the problem is small enough that single humans can solve chunks of it in their free time.
From RMS and Linus according to their (considerable) abilities; to the rest of us according to our desires. And you don't even have to be a party member to go up and put another brick on the edifice.
Of course, you get some of the problems of communism as well - feedback's not straightforward, because there are so few producers that, if you can't influence any of them, you're stuck. Individual projects tend to end up under dictator-class control, with arbiters of correctness ensuring the project's kept going along the lines they consider right.
But it doesn't matter, in the way that oppression in the Real World matters, because software is a small chunk of the world, and your non-existent right to get your patches into Linux can be abrogated without hurting you.
I think you've missed the point.
There's a camera watching you. But there's a camera watching the man watching the camera, there's a camera in the police station's interrogation room, there's a camera watching your neighbour.
You get to look at the access logs to your camera
Tom
There are 21264 benchmarks for systems between 500 and 575MHz floating around; a dual 21264/500 board costs about $9000 at the moment, with the targetted market being embedded controllers which need that sort of performance.
Digital are very good at doing extremely aggressive speed-binning; they have 616, 625, 633 and even the occasional 666MHz 21164 running in very high-end systems, and they can produce 600MHz chips in large enough quantities that SGI can build 1024-way multi-processing T3E/1200 boxes.
The 21264 is not a small nor a simple chip; 64k data, 64k instruction caches already take up an awful lot of transistors. It has 15.2 million transistors, and is huge (300mm^2) in the 0.35u process. It bears about the same relation to the 21164 as the PPro does to the Pentium - much cleverer scheduling, much cleverer renaming, out-of-order execution, and all the other techniques required to get great performance out of an architecture.
Since it has a reasonable number of registers, and a sane FP design, it's about 50% faster on integer work and 2x as fast on FP work as a P3 of the same clock speed.
And it's been wildly delayed; the announcement I have with details in it was produced on 25/10/1996, and said 'samples first quarter 1997, volume second half 1997'; Digital representatives were promising volume production of 21264 systems by October 1997. If it had come out then, Intel would be in considerable trouble.
I'm not sure hardware speed has been soaked up by bloated software. I think it's more a matter of hardware speed having gone beyond the capacity of software to bloat.
Sometimes you can get advantages out of hardware speed - but emacs can do real-time spellcheck and autocorrect on arbitrary small Linux boxes, and even Word can do them, with non-noticable delay, on a P133.
As far as we know, You can't use all of the performance of a P2/350 to wordprocess or to browse the Web - while I'm using this PC, and everything is running fast enough to keep up with my typing and update the screen smoothly, 85% of my cycles are going to Prime95.
If a P3/700 is three times faster than this box, all I'll notice is that 95% of my cycles are going to Prime95. It'll still keep up with my typing; it'll still update the screen smoothly.
!GIVE ME AN EXAMPLE OF AN APPLICATION (not a game, that's too easy - though even games aren't that bad now we've got ubiquitous 3d acceleration) THAT REQUIRES A P2/350!
Can anyone think of an application which is accelerated substantially by MMX or KNI, but which could not practically be accelerated by an add-on board?
The defects of add-on boards are that you can't put much RAM on them, and that the link to the CPU is slow. For graphics, the slow CPU link means you throw vertex data across, and get your textures by DMA, and use nifty things like DX6 texture compression to get around the lack of memory on the video card.
For sound, you don't need much RAM and you don't need tens of megabytes per second of bandwidth, so add-on boards are not a problem.
The obvious scientific applications really need more than 16 bits of precision - so you can't use MMX. Possibly KNI would be useful for data analysis - I think you could write an incredibly fast single-precision FFT using it.
For working with large numbers (RSA crypto, lots of mathematics), you really need double-precision FP - so KNI is useless - for the FFT approach to bignum arithmetic, and you want add-with-carry of packed 32-bit or 64-bit numbers - which KNI just might have - for the schoolboy approach. And, unless your numbers are incredibly large, they'll fit in the RAM of an add-on card; unless you're doing fairly unpleasantly complicated tasks, you can do all the work in the add-on card's RAM and not worry about transfer bandwidth at all.
OK. So what applications are MMX or KNI actually useful for? The only one I can think of, possibly, is clever video decompression where you work from tens of megabytes of stored history - but video decompression runs into the bandwidth-to-graphics-card bottleneck. So that's no good either.
This makes far more sense to me and in this particular case (though that's not to say this will be the only case), is no more intrusive than a cookie.
But, unfortunately, it's no more secure than a (properly-crypted, stored in a non-exportable database) cookie, either. Both are 'something you have'; fiddling the kernel to send someone else's CPUID is not technically harder than copying someone else's cookie.
When I'm FORCED to toggle it BACK ON just to dial into my ISP to "make sure no one is stealing my account" then I will blame INTEL.
No, when you're forced to turn it back on to dial into your ISP, you'll change ISP.
When I'm forced to turn it BACK ON just to download the latest Windows '98 patches (you can damn well bet that Microsoft LOVES this idea!), I will blame INTEL.
No, when you're forced to turn it back on to load Win98 patches, you'll complain in the same way and to the same channels as you did this time, and Microsoft will give way in the same way that Intel did.
I think the magic word is 'internet', at least as much as 'boycott'.
Boycotts are no use unless you advertise loudly, widely and frequently what you're boycotting and why.
Intel is too fucking cheap
... she knows enough to optimise advert placement, work out where to put announcements ...
Um, Intel is notoriously expensive.
Or are you suggesting that these people should destroy the thick end of a billion dollars worth of perfectly-working chips, and reveal for public inspection several billion dollars worth of intellectual property (because I doubt you'd trust Intel's word that the feature was disabled), just so that reinforced paranoids can be certain that, whilst they're being tracked by their IP address and statistically sampled by their browsing habits, they're not also being tracked by which computer they use?
Remember, if you're trying to work out markets, you don't need perfect data and you don't need user names. With the present HTTP protocol, you can't avoid leaving an audit trail of the pages you've visited; if a webmaster knows that 37% of your users visit page A, 29% proceed to page B,
But the 'other books bought' stuff is reasonably useful information for you. I don't consider the list of books I've purchased to be private information.
In fact, there's not much I consider to be private information. My current checking account balance is £712.88; how does that help you? Why is it a bit of information that you'd rather random people don't know?