Slashdot Mirror


User: acidblood

acidblood's activity in the archive.

Stories
0
Comments
162
First seen
Last seen
Profile
(view on slashdot.org)

Comments · 162

  1. Re:Since this is a dupe on Inside Intel's Next Generation Microarchitecture · · Score: 1
    Also when the travaling of information across a die takes more than 10 cycles you need to have smaller structures, it will increase latencies of instructions.

    Not sure what you mean here, but if you're talking about my estimate of the costs of exchanging information between cores, remember that this is due to the lack of bypass structures between cores, the need for explicit synchronization code, and the rather inefficient method of sharing data through the cache. Once hardware is dedicated to it, even in large die processors, this latency is dramatically reduced.

    Also, the argument that cores will increase in size along with distances between execution units in the core and so on, is flawed. For a processor to make sense economically, it can't go beyond a certain die size. Cost grows with area, but not only that, yields naturally decrease for larger dies at a fast rate, increasing the price even further. Also, with the ever increasing costs of plants, materials, etc. the balance is being tilted towards ever smaller cores. I will concede that, with increasing clock speeds, it's not enough for distances to stand still, they actually have to be reduced. But distances are hardly the bottleneck for clock speed, and even if a couple of critical paths are hampered by distance, just do what the P4 did which is to include extra pipeline stages for data propagation.

    Here's one example, the bypass path needs to connect load port and every integer unit to every integer unit So there is (n*n) Connections between units, and the number of stages it needs to go in selecting input hampers the clockspeed eventally.

    That's assuming a crossbar configuration. Hardly any kind of interconnect (switches, etc.) produced today uses a true crossbar scheme. One could try other, more cost efficient topologies, or perhaps something like a multi-ported queue with n input connections and n output connections. If it has enough capacity for 99% of real-world code, it's certainly good enough.

    Or you could have pairs/triples/n-tuples of execution units with interconnects only between themselves, and try to dispatch code with dependencies to interconnected execution units. Again, there might be a contrived piece of code which would require all execution units to be interconnected, but if most code doesn't, it's good enough.

    The thing is that not enough research has been done on SMT and SMT-friendly structures. If there's a large benefit to be had with SMT over multicore, SMT-friendly structures will inevitably begin to appear. If you were to ask a RISC advocate 15 years ago whether something like the P6 core (used in the Pentium Pro/II/III) could be done for a CISC processor (and x86 is CISC, of course), you'd probably be laughed at. Yet there's money to be made in x86, so engineers and researchers eventually overcame the barriers. Given enough interest and money, the same will be true of SMT.
  2. Re:Since this is a dupe on Inside Intel's Next Generation Microarchitecture · · Score: 5, Informative
    Be careful when you speak of parallelism.

    Some software simply doesn't parallelize well. Processors like Cell and Niagara will take a very ugly ugly beating from Core architecture based processors in that case.

    Then there's coarse-grained parallelism, tasks operating independently with modest requirements to communicate between themselves. For these workloads, cache sharing probably guarantees scalability. Going even further, there's embarassingly parallel tasks which need almost no communication between different processes -- such is the case of many server workloads, where each incoming user spawns a new process, which is assigned to a different core each time, keeping all the cores full. This type of parallelism ensures that multicore (even when taken to the extreme, as in Sun's Niagara) will succeed in the server space. The desktop equivalent is multitasking, which can't justify the move to multicore alone.

    Now for fine-grained parallelism. Say the evaluation of an expression a = b + c + d + e. You could evaluate b + c and d + e in parallel, then add those together. The architecture best suited for this type of parallelism is the superscalar processor (with out-of-order execution to help extract extra parallelism). Multicore is powerless to exploit this sort of parallelism because of the overhead. Let's see:
    • There needs to be some sort of synchronization (a way for a core to signal the other that the computation is done);
    • The fastest way cores can communicate is through cache sharing -- L1 cache is fairly fast, say a couple of cycles to read and write, but I believe no shipping design implements shared L1 cache, only shared L2 cache;
    • An instruction has to go through the entire pipeline, from decode to write-back, before the result shows up in cache, whereas in a superscalar processor there exist bypass mechanisms which make available the result of a computation in the next cycle, regardless of pipeline length.

    Essentially, putting synchronization aside for the moment (which is really the most expensive part of this), it takes a few dozens of cycles to compute a result in one core and forward it to another. Also, if this were done in a large scale, the communication channel between cores would become clogged with synchronization data. Hence it is completely impractical to exploit any sort of fine-grained paralellism in a multicore setting. Confront this with superscalar processors, which have execution units and data buses especially tailored to exploit this sort of fine-grained parallelism.

    Unfortunately, this sort of fine-grained parallelism is the easiest to exploit in software, and mature compiler technology exists to take advantage of it. To fully exploit the power of multicore processors, the cooperation of programmers will be required, and for the most part they don't seem interested (can you picture a VB codemonkey writing correct multithreaded code?) I hope this changes as new generations of programmers are brought up on multicore processors and multithreaded programming environment, but the transition is going to be turbulent.

    Straying a bit off-topic... Personally, I don't think multicore is the way to go. It creates an artificial separation of resources: i.e. I can have 2 arithmetic units per core, so 4 arithmetic units on a die, but if the thread running on core 1 could issue 4 parallel arithmetic instructions while the thread running on core 2 could issue none, both of core 1's arithmetic units would be busy on that cycle, leaving 2 instructions for the next cycle, while core 2's units would sit idle, despite the availability of instructions from core 1 just a few milimeters away. The same reasoning is valid for caches and we see most multicore designs moving to shared caches, because it's the most efficient solution, even if it takes more work. It is only natural to extend this idea to the sharing of all resources on the chip. This is accomplished by putting them all in one big core and adding multicore functional

  3. Full-res video on Another Sony Format Bites the Dust · · Score: 2, Insightful

    Hope they enable full-res H.264 playback from memory stick now. I guess they were holding it back in a futile attempt to make UMD videos more attractive.

  4. No burning to DVD? on Movie Downloads to Coincide with DVD release · · Score: 1
    From the article:
    To keep from competing directly with large retailers like Wal-Mart, both sites for now are only allowing the movies people buy through downloads to be stored on PCs or on devices like the game player Xbox outfitted with certain Microsoft (MSFT) software. Movies can't be "burned" or copied onto disks that can be played on other devices, such DVD players. The movies, however, can be copied to play on as many as two other PCs, says Ramo.

    So they mean unless I have an Xbox, I'll have to watch it in the tiny 19" monitor in my bedroom instead of the 42" plasma TV and the badass sound system in my living room? Yeah, I predict they'll succeed big time.</sarcasm>
  5. Re:It slipped out on Google Slips Talk of Online Storage Service · · Score: 1
    What's fishy about SHA-1?

    Maybe the fact that collisions can be found with less effort than 2^(160/2) -- the attacks of Wang et al? I won't bother to find a reference, but it won't be hard if you try.

    And AFAIK no one's found a way to craft a file with a certain MD5 unless they have control over both files that they want to collide.

    Doesn't prevent someone from crafting two colliding files, uploading both to Google, and if one is lost claim that it was extremely important data and sue Google over it.
  6. Re:Wait and see on Intel's Conroe Previewed and Benchmarked · · Score: 0, Troll
    Can you follow up with more evidence on your "carbon copy" claim?

    Sure, just grab the optimization guides for both processors and compare the microarchitectural diagrams. AMD tried to disguise this a bit by changing some terminology, but after adjusting the terminologies to match, the similarities between both processors are clear.

    Although I have to admit I've been fooled by this as well. I recall I came to this realization by reading H&P's classic `Computer Architecture: A Quantitative Approach' and not the optimization guides.
  7. Re:Wait and see on Intel's Conroe Previewed and Benchmarked · · Score: 2, Informative
    And Intel's new chips are based on the Pentium-M, which is still heavily based on the Pentium-PRO that dates from the early 90s...

    Never mind that the AMD K7 was a carbon copys of the P6 microarchitecture, with incremental tweaks most probably applied to account for P6 shortcomings found in the field. That's an euphemism for `AMD stole Intel's field experience.' The K8 core is only an incremental tweak of K7, the major feature being the on-die memory controller.

    So really, AMD can't blame Intel for using P6-derived cores since they're doing the same (not to mention the ethics of stealing a competitor's design). Also, their incremental tweaks aren't really that significant -- process technology changes account for the larger share of performance increase.

    Intel tried to raise the bar with the P4 designs, applying some risky design features like hyperpipelined design, and unfortunately the strategy didn't work out all that well, in no small part due to power issues. Moreover they had to endure fanboy cries of `designed by marketing!', but that's the price one pays for exploring new ground in computer architecture. Meanwhile AMD will be content to follow on Intel's successful footsteps as they've always done.

    I'm sorry if that's not a fashionable opinion in Slashdot groupthink, but there you go.
  8. Feynman's account on Challenger Tragedy - In Depth, and Deeply Felt · · Score: 3, Informative

    An excellent account (and really, one should expect no less from Richard Feynman) of the Challenger disaster was given in the book `What do you care what other people think?' It highlights the political and managerial problems at NASA. If you enjoy this book, I highly recommend grabbing the rest of Feynman's books as well, such as `Surely you're joking, Mr. Feynman' and of course the Feynman Lectures on Physics.

    Feynman was by far one of the greatest minds of our time. Too bad he died fairly young (70 years), he still had a good 10 or 20 years of time to contribute to human knowledge.

  9. Re:Security on Buy Vista or Else · · Score: 2, Informative

    The .WMF vulnerability is, as I understand, the result of poor design, not an implementation problem like a buffer overflow. Given the same API, the Wine project wrote an independent implementation which was also vulnerable. So if Vista has the same vulnerability, that says nothing about whether they used the same code from XP.

  10. Re:Ethanol seems best on Is Ethanol the Answer to the Energy Dilemma? · · Score: 5, Informative
    What we really need for Ethanol to take off is a proper hybrid vehicle capable of burning both gasoline, ethanol, and various blends.

    These are all over the place here in Brazil. Last I heard, something like 80% or 90% of small cars were sold with hybrid ethanol-gasoline engines (nicknamed Flex around here). Many shops (even small ones) already have the technology to convert an ordinary gasoline engine to a hybrid, and it isn't that expensive either.

    I should remark that Brazil was a pioneer in the usage of ethanol for car fuels, but in the last decade or so it was getting out of fashion. With the advent of hybrid engines we're seeing a revival of sorts, particularly given the lower price (which unfortunately has been rising though).

    For my part, I believe the future is biodiesel, not ethanol, though.
  11. Re:Cryptographically secure voting on Diebold's Election Data Off-limits · · Score: 1
    No, you're the one who doesn't understand it. The AC's response was perfectly correct.

    How do you make sure that your vote was recorded correctly WHILE noone else can know what you voted for? There is only one method for that:

    Checking that a vote was recorded doesn't imply revealing who you voted for. You check, while still in the polling place, that your vote is correct (i.e. I voted for Kerry, not Bush). Once you approve the vote, it can't be changed; there are cryptographic mechanisms to prevent any tampering with the contents of the vote.

    Now there's the possibility that your vote is not counted during tallying -- that's what the receipt is for, to check that your vote was counted. Note that it doesn't reveal who you voted for, and in fact there's no need to reveal that information, since you already established the validity of the vote back in the polling place, and once that is done there is no method to tamper with your vote.

    You may be wondering how that is possible. Look up e.g. Chaum's voting protocol for the gory details.
  12. Re:80% pay cut? on Working from Home on a Tropical Island Paradise? · · Score: 4, Informative

    Since I'm Brazilian I'm going to chime in. I live in a medium-sized town (500k inhabitants, give or take) in the north of Paraná state. Right now I'm using 600/300 kbps ADSL which costs about US$ 50/month all told (including the phone line). Grab a free-for-all VoIP plan like BroadVoice's and you get free calling to Brazil, the US and other countries for US$ 28/month. This is absolutely imperative if you plan on using the phone a lot, as Brazilian rates are outrageously expensive. Cell phones are pretty expensive as well -- you'll hardly find people with 100+ minutes monthly plans (that's roughly 3 minutes a day). Also, be prepared to pay ridiculous markups on your hardware: the US$ 500 Mac mini costs upwards of US$ 1000 here, and a VoIP ATA/router I was looking at which sells for US$ 90 in the US costs US$ 200+ here.

    In my town the best connection you can get from ADSL is 1 Mbit/512 kbps, paying I believe something from US$ 80 to 90, all told. No cable connection either. I don't think the situation in the big towns is much better -- I've never heard of anything faster than 1 Mbit/512 kbps. If you really need more than that, you're either going to have to get multiple phone lines with multiple ADSLs, or get a pipe directly from the local tier 1 providers like Embratel (which is going to cost a fortune even by US standards, probably not worth it).

    Something you have to pay attention to is the capped plans. The main ADSL provider in São Paulo has monthly caps in place of (I believe) 10 to 40 GB/month, depending on which plan you get. You'll have to look around for uncapped plans or stay away from São Paulo (which is, as far as I know, the only place where caps are implemented -- plus it's not a good town to live in anyway).

    Out of curiosity, where were you staying in Brazil?

  13. Re:Praise for Cell on IBM Full-System Simulator Team Speaks Out · · Score: 2, Informative
    Could you speak more to performance issues when dealing with code/data that exceeds the 256K SPU local store?

    I'll try, but take my opinion with a grain of salt as I didn't do anything beyond coding an RC5-72 core, which doesn't involve external memory accesses.

    It looks to me like fetches from RAM are a real bottleneck, so if you want performance you need to keep code/data within each SPU. If you can chain a series of algorithms and move data down the chain this is a win. But if you need to manipulate a huge data block you're SOL.

    Sure it'd be impossible to keep this thing completely fed, but I hear the RAM specs are pretty impressive, using some new-fangled XD-RAM technology from Rambus. Still, the computational power of the SPEs is huge and it's sure to be RAM-starved unless the programmers take a lot of care.

    Do realize though that this thing has a monster 100 GB/s interconnect. I would gather sending reasonable amounts of data back and forth between the SPUs is feasible, so perhaps operating on 8*256 KB = 2 MB datasets might be possible.

    Beyond this, I think programmers would look at the Cell like they do at a NUMA box or clusters -- assume fetching remote data is costly and program to that paradigm. Not as costly as it is for clusters, even those with fancy interconnects; more like NUMA boxes. Hence, lots of blocking algorithms and stuff like 4-step FFTs. IBM is suggesting techniques using double-buffering which seem to be working well.

    I can see the Cell being a huge win for say a series of Monte Carlo sims running in each SPU, but am it looks like a lose once you exceed local store.

    That depends on your workloads, in particular your access patterns. Sequential and blocking access patterns should do just fine.

    What makes me pretty hopeful about the potential performance of Cell is that we're currently getting by pretty well with our CPUs with fast L2 cache of similar size (256 KB was pretty common 3 or 4 years ago) and slow memory accesses. The situation is pretty similar with Cell, save that the local store is directly addressable as opposed to transparent like caches are, and I see that as a big win actually -- being able to manage the local store and only make explicit memory accesses should help spot and fix bottlenecks, without the need to worry whether the target CPU will have 512 KB or 1 MB or 2 MB of cache. Of course, having 8 high-clocked SPEs processing 128-bit vectors will impose a much higher burden on memory than your run-of-the-mill Pentium 4 currently does, but I'm hoping that XD-RAM will be up to the challenge.

    But you seem to be saying that idle fetch cycles aren't so bad.

    You may be mixing things up. What I said was that local store accesses had a fixed latency of 6 cycles.

    you seem to be one of the few with real world experience posting here

    I don't think a couple of afternoons writing code qualifies as real-world experience, but there you go.
  14. Re:Praise for Cell on IBM Full-System Simulator Team Speaks Out · · Score: 1
    Well, RC5 IS pointless

    Not as much as it seems at first glance. But that's a discussion for another day.

    (I still remember when distributed.net was running RC-56 or something for 8 months on 100k machines, and some people just made some asics that had 100+ parallel key-piplines and build a a machine that could exhaust the keyspace in 3 days or so...

    The chips were built by the EFF, and actually they cracked not RC5-56 but DES, which is also a 56-bit-key cipher but far more widespread. Also, by the time of the last DES contest, the processing power of DES Cracker and the distributed.net network was in the same ballpark, so that actually they teamed up and ended up breaking DES in less than 24 hours.

    So i wouldnt be too optimistic because of that little performance point....

    It is a pretty important point in my neck of the woods. I wouldn't be surprised if an AES implementation (running in counter mode, of course, it'd be pointless otherwise) clocked at terabits/s with this thing. And considering it has an interconnect to match (100 GB/s, which is 0.8 Tb/s), this is certainly something to be impressed about.

    Moreover, if you want more `useful' results, go have a look at the figures for matrix multiplication and FFT code written by IBM. They're as impressive within their domain as this RC5 figure is.
  15. Re:Praise for Cell on IBM Full-System Simulator Team Speaks Out · · Score: 1

    May I suggest that you attend a computer architecture class? Most of these things should be discussed even in undergraduate-level computer architecture classes.

  16. Praise for Cell on IBM Full-System Simulator Team Speaks Out · · Score: 5, Informative

    I've been running the simulator here, and managed to port the distributed.net client to it. The performance of current cores in the PPE is so-so (worse than the G4 in my Mac Mini), although I'm sure it would improve by proper optimization. The SPE is a completely different matter though. I wrote an RC5-72 core for it that should achieve ~190 Mkeys/s on 8 SPEs at 3.2 GHz, which is by itself almost ten times faster than the current fastest processor (G5 at 2.7 GHz, which clocks at 20 Mkeys/s, IIRC). For embarassingly parallel applications like key cracking, this thing is a dream.

    Some technical details: the SPE's instruction set could be though of as `Altivec plus'. It has most of the functionality of Altivec (so far I've only missed a byte addition instruction), but quite a few improvements, like immediate operands for many instructions, immediate loads with much better range than Altivec's splat instruction, the addition of double precision floating point operations, etc. I'm sure there are more improvements, but these are the ones I noticed from my limited experience with Altivec. Instruction scheduling for this processor is remarkably similar to that of the first Pentium: it's dual issue with static scheduling, there are some conditions on pairable instructions and their ordering to ensure dual issue, and so on. The high latencies for instructions (2 for most integer arithmetic, 4 for shifts and rotates) are problematic, but the huge register file of 128 entries is very helpful to implement techniques like software pipelining which help mask these latencies. The local store is a mixed bag -- dealing with arrays larger than the local store should be challenging, but if you don't have to worry about it, it's great to have a fixed latency of 6 cycles for loads and stores, no need to worry about cache effects and so on. Actually, the local store behaves a lot like a programmer-addressable cache, which has some benefits compared to traditional cache: specifically, less control overhead per memory cell (so more logic can be packed in the same space) and, as a consequence, the potential for higher speeds and/or smaller latencies.

    Overall, I'm very impressed with Cell, but for now I've only programmed toy examples and I'm sure to hit some limits of the architecture once I start looking at real-world code.

  17. So let me get this straight... on DVD Jon's Code In Sony Rootkit? · · Score: 3, Funny

    When some cheapskate downloads copyrighted MP3s from a P2P network, it's `copyright infringement', but when Sony uses GPL'd code it's `stealing', right?

  18. WTF is the General Number Field Sieve... on RSA-640 Factored · · Score: 5, Informative

    ...many are asking. It's hard to find introductory materials on the NFS, because the number of people who actually understand the algorithm is probably in the hundreds, if not less, and most are worried about research not teaching. For those interested in a high-level view, plus some low-level details, of the (special and general) NFS, you can have a look at the slides for a talk that I gave on exactly this topic at a crypto workshop a couple of months ago. I won't even try to summarize the NFS here, because anything other than a very high level, handwaving, bird's eye view of NFS would take the better part of a page to explain. However, in this thread I can answer specific questions that anyone might have about the talk above.

    Now for those with the mathematical maturity to delve into the algorithm, I suggest the book Prime Numbers: A Computational Perspective by R. Crandall and C. Pomerance (link to Amazon.com lifted from Google, no referrals), which is certainly one of the best introductions to the algorithm that I have read.

    By the way, if anyone wants to help perform huge factorizations in a distributed computing network, check out the NFSNET, although they mostly apply SNFS on values from the Cunningham tables, no cryptographic targets.

  19. Re:Factor? on RSA-640 Factored · · Score: 1

    Although many point to quantum crypto as the ultimate solution to security problems, it should be noted that, in addition to all logistic problems inherent to quantum crypto, no purely quantum-mechanical scheme exists -- all of them rely on an authenticated/trusted/whatever classical channel (i.e. one that transmits bits not qubits), and this classical channel has to be secured using conventional cryptography.

    I'm too lazy to dig up references for this, some karma whore can search eprint/arXiv and add the references to the thread.

  20. 1000th post! on FBI Agents Put New Focus on Deviant Porn · · Score: 1

    NT

  21. Re:Distributed.net on Brute Force · · Score: 1

    Mr. Beberg,

    In the interest of full disclosure, you might have pointed out that you left distributed.net in not so friendly terms with the rest of the team. Don't sweat, so did I (I joined distributed.net as a core coder after you left, so we never met.) But whenever I mention something that might be construed as negative about distributed.net in public, I try to disclaim my potential biases. Hence I'm doing this favor to you and the rest of Slashdot readers.

    It wouldn't hurt to mention that distributed.net is no longer about key cracking only -- even if you have a grudge against RC5-72 (I also think it's fairly pointless at this moment in time), OGR is a completely valid project.

  22. Re:Spoiled brats on Valve's Gabe Newell Speaks on Console Development · · Score: 1

    I'v searched Intel's manuals a bit and there's no mention whether store buffers or renamed registers are used to perform STLF, so I'll assume you're right. I never stopped to think at which point data leaves renamed registers to store buffers -- isn't it at the end of the pipeline? Ideally a processor would have STLF from both renamed registers and store buffers, particularly in the case of hyperpipelined processors like the P4.

    Also, there aren't many restrictions on STLF, mostly size (a load of 128 bits can't be forwarded from a store of 32 bits, although the converse is possible) and alignment. In common situations and if the compiler generates proper code, STLF restrictions shouldn't be very restricting at all.

  23. Re:Spoiled brats on Valve's Gabe Newell Speaks on Console Development · · Score: 2, Interesting

    It's correct that the programmer doesn't see the renamed registers and must still spill to memory. However, given enough load/store execution units (the P4 can execute one load and one store per clock, for instance), and store-to-load forwarding circuitry, you can mostly `code around' the lack of registers. A store you can fire and forget, of course; and a load, if you just recently stored the value you're loading (which is the expected situation if register pressure is high, you're swapping values all the time), then STLF will forward the result with latency perhaps as low as zero. Of course, the forwarded data must come from somewhere, and that's the renamed registers. Ultimately what matters is having a lot of registers in the CPU (even if not exposed to the programmer) and actually using them to exploit as much parallelism as possible.

    I agree it's not as good as actually having a higher number of registers -- if the register pressure is high, there may be a lack of load/store ports, and code size is increased -- but ultimately most parallelism can be exploited and that's what matters. The fact that recent x86 processors perform as well as their RISC counterparts is a testament to that.

  24. Spoiled brats on Valve's Gabe Newell Speaks on Console Development · · Score: 4, Insightful

    Maybe I'm just too impressed with Cell's architecture to see things clearly, but here's my opinion...

    Generation after generation, developers have been given ever more powerful processors with a corresponding extra cost in hardware. Some of this is really needed to overcome architectural limitations (register renaming to make up for the scarcity of registers in x86 comes to mind) -- indeed I think x86 is too crippled to perform well without lots of hardware assistance.

    But the fact is that we've hit a wall of performance. Power increases due to ever more complex chips, plus certain effects like leakage currents (that were disregarded in previous manufacturing processes) are becoming ever more problematic. So the free performance lunch is over, and CPU designers are having to trim the fat of their designs. The result is nice power-efficient architectures like the Pentium M, but there's only so much that power-conscious design can do if you still must have the complexity of out-of-order execution and other modern CPU features.

    So there's really no way around. If you need a power-efficient processor, you're going to have to resort to completely new architectural ideas, like extensive use of SIMD and multi-core as Cell does. Programmers are going to pay a price in terms of complexity and cost of software development, yes; but there's no other way, the growth of CPUs we're used to is flattening out, unfortunately, and can only grow again through adoption of these alternative programming models.

    Which is why I say these people are spoiled brats. If CPU designers are guilty of anything, it's feeding off this illusion that infinite growth without laying any burdens on programmers was possible. But complaining is no good now; either they're going to adapt or die. It's clear that no ordinary out-of-order design, using the same transistor budget, can reach the peak power of Cell if correctly programmed. So if these guys really want the extra power to make better games, they'll have to learn these new programming models and bear the burden of extra complexity.

  25. I code Lexmark replacements chips on Refilling Ink Cartridges Now a Crime? · · Score: 5, Informative

    and here's the lowdown.

    First, nowhere is it stated clearly, but I'm fairly sure they're not talking about inkjet cartridges but laser toners. These are the ones I code replacements for.

    The chip in question is the Dallas-Maxim DS2432. It's an EEPROM with a twist: it uses some cryptography to perform authentication.

    The idea is that the master (in this case the printer) and the memory can negotiate a shared key, which is done in the factory or during testing -- the chip doesn't use public key encryption, so it requires a key exchange `in the open' which must obviously be done before the chip reaches the customer. (Lexmark has done some ugly implementation mistakes in some printers but nothing THAT bad.)

    So this key allows authentication of both the printer and the memory. After an authenticated read, the memory must compute a hash of some data (including a nonce and the last page read) and send it to the printer. If the hash matches what the printer was expecting, the printer is sure that the memory knows the shared key. (Unless stupid implementation mistakes are made that open the way for replay attacks.)

    Conversely, when the printer asks the memory to commit a write, the memory requests a hash as well, to authenticate the printer. You may ask, `what's the point?' This memory holds data on how many copies were made, serial number and so on. If the memory just blindly wrote what it was told, remanufacturers could keep resetting the contents and reselling the cartridge.

    So how do you build a replacement chip? Easy, get the key somehow and implement the protocols used by this memory on a microcontroller. Using an off-the-shelf DS2432 is impossible because these things have serial numbers with a fixed byte (the `family code') which is different from the same byte in Lexmark's DS2432s -- they probably buy so many of them that they were in a position to ask Dallas-Maxim to make batches of chips with modified family codes. A little bit of security by obscurity, but that wasn't a barrier to us -- it took less than a week to reimplement (in assembly) the DS2432 protocols on my favorite microcontroller architecture, the Texas MSP430.

    Now, I don't like to get into the politics of this thing. Myself, I believe what I'm doing is perfectly fine and in fact the right of the consumer, EULAs and contracts and patents be damned. I wouldn't do it otherwise. Some people complain that Lexmark sold a discounted toner (called Prebate), on the basis that you would return it to them, and you didn't, and that's unfair. What they don't take into account is that your printer comes loaded with a Prebate cartridge, and with a small amount of toner to boot. Many if not most people just use this one cartridge that came with their printer, and keep remanufacturing it. The customer didn't have a choice in this -- if Lexmark offered a regular toner, or no toner at all, when the customer bought the printer, the situation might be less clearcut. As it stands, I see this as Lexmark forcing everyone to pay for a crippled toner, giving them no choice in the matter, and so they're perfectly justified to remanufacture it. (This might not be considered ethical by some, and is most probably illegal, but I don't care.)

    Moreover, the prices they charge are completely absurd. I know this is standard practice in the industry, but I consider this highly immoral. Very few companies possess the technology to make a printer, but many possess the technology to remanufacture toners and cartridges. By imposing legal and technical hurdles on remanufacturing, printer makers are effectively enforcing a monopoly, and the worst thing is, some courts are sanctioning this monopoly. The traditional analogy with auto parts holds very well, and many other frightening scenarios haven't been explored -- what if the printer makers agree on a policy of no longer manufacturing toners and cartridges for printers older than 1 year so as to force everyone to upgrade and m