I have always (i.e. since the late 1980'ies) assumed that everything I write online can be used against me.
This is of course obvious in places like UseNet or Slashdot, but I try to behave the same way in email or IM.
Since I've also managed to keep the same email address since 1994, and always signed Usenet posts with my full name & email, it is trivially easy to Google me. Prospective clients/employers who do so should hopefully get a good impression.
What counts here isn't absolute but relative size:
I.e. population per sq/km and population distribution (urban vs rural).
On both those measures Norway is a lot closer to Canada than the US.
Besides which we have the mountains and fjords, i.e. a very fractal landscape which makes all forms of infrastructure much more expensive than in Denmark (or even New Mexico).
Here in Norway I cannot get fiber to my home in Oslo, but when we bought a new cabin up in the central mountains, the local power company by default pulled fiber along with the 3x65 Amp 400 V power cable. (Actually, what they do is to pull fiber to the local distribution box, then they place a 1/2" PVC tube along with the underground power cable to the building site. After the cabin was finished, they came back and spent 10 minutes blowing a fiber through the PVC tube.
The cost is the same as for ADSL in downtown Oslo.
BTW, Norway has a very sparse population, and this goes double for the mountain areas.
I just turned 50 this summer, and I've never felt more appreciated as an engineer than the last couple of years.
As other people here have commented, the real secret is to simply be _very_ good at what you do: Keep up your old skills, and make sure you learn (i.e. teach yourself) something brand new every year or two.
Over the last 5+ years I've been the "IT Fire Brigade Chief" in the Fortune 500 company I work for, i.e. I get all the really interesting problems, all the cases that none of the others can figure out, and all the bleeding edge stuff that doesn't fit nicely into one of the existing departments.
I also get to spend discretionary time writing and optimizing system code, so I really don't see any reason to complain. (I've worked on one of AES contenders http://www.adastral.ucl.ac.uk/~helger/research/aes/, the windows port of NTP http://ntp.org/, HD-DVD decoding, Ogg Vorbis optimization as well as lots of other kinds of code. I am also the Scandinavian coordinator of the Confluence Project http://confluence.org/.)
My role model within the company retired a few years ago, 67 years old, and he's still enthusiastic about brand new technology.
OTOH, living in Norway I also know that it would be effectively impossible to fire me, unless I completely stopped coming into work, and started doing drugs instead.
If te maximum jail time for not divulging encryption keys is significantly less than the time for actually being convicted of terrorism, then it should be obvious that real terrorists would never divulge such encryption keys.
No, this law, and others like it in other jurisdictions, are simply there to give the police one more reason to force regular citizens to hand over their keys.
If you actually do have something to hide from the authorities, the best idea is probably to look into http://truecrypt.org/ and the capability of having hidden encrypted volumes.
When forced, either by legal threats or by rubber hose interrogation, you can then divulge the primary key. On the primary volume you should store potentially embarrassing, but not really critical information. This should be sufficient to show that you had reason to hide said info, but not enough to put you in jail for a long time.
If you happen to be located in a place like Myanmar/Burma, then you should also use TrueCrypt, for exactly the same kind of reason.
Terje "almost all programming can be viewed as an exercise in caching"
We have 500+ servers worldwide, many of them contains the same program install images which by definition should be identical:
One master, all the others are copies.
Starting maybe 15 years ago, when these directory structures were in the single-digit GB range, we started noticing strange errors, and after running full block-by-block compares between the master and several slave servers we determined that we had end-to-end error rates of about 1 in 10 GB.
Initially we solved this by doubling the network load, i.e. always doing a full verify after every copy, but later on we found that keeping the same hw, but using sw packet checksums, was sufficient to stop this particular error mechanism.
One of the errors we saw was a data block where a single byte was repeated, overwriting the real data byte that should have followed it. This is almost certainly caused by a timing glitch which over-/under-runs a hardware FIFO. Having 32-bit CRCs on all Ethernet packets as well as 16-bit TCP checksums doesn't help if the path across the PCI bus is unprotected and the TCP checksum has been verified on the network card itself.
Since then our largest volume sizes have increased into the 100 TB range, and I do expect that we now have other silent failure mechanisms: Basically, any time/location when data isn't explicitly covered by end-to-end verification is a silent failure waiting to happen. On disk volumes we try to protect against this by using file systems which can protect against lost writes as well as miss-placed writes (i.e. the disk reports writing block 1000, but in reality it wrote to block 1064 on the next cylinder).
NetApp's WAFL is good, but I expect Sun's ZFS to an equally good job a significantly lower cost.
Although your thumb drive can pack 5 TB, it doesn't have write rates much better than about 10 x today's 1-10 MB/s, let's call it 100 MB/sec, OK?
This means that each GB 10 seconds, and a TB require 10K seconds or nearly 3 hours, if everything is working perfectly.
For 5TB we're talking 13-15 hours, which means that you must have spent the entire day over at Jimmy's house.
All this is assuming both Jimmy's (solid state?) disk array and your thumb drive can actually sustain a Gbit/s for a file copy operation, including all the file system overhead.
Over the last 25 years both IO rates and storage densities have increased exponentially, but the storage exponent has been significantly larger than the IO exponent, which means that the time to totally fill/empty a new, state-of-the art drive has increased every year.
When decoding "full HD" h264, i.e. 40 Mbit/s BluRay or 30 MBit/s HD-DVD, with 1080p resolution, current cpus start to trash the L2 cache:
Each 1080p frame consist of approximately 2 M pixels, which means that the luminance info will need 2 MB, right?
Since the normal way to encode most of the frames is to have two source frames and one target, motion compensation (which can access any 4x4, 8x8 og 16x16 sub-block from either or both of the source frames), will need to have up to 2+2+2=6MB as the working set.
Yes, they did look at those persons who where a lot younger than their older siblings, and found that after 8 years, the difference in IQ wasn't detectable.
I.e. since my youngest brother is 7 years younger than his sister who is the next-youngest, the expected "cost" of having older siblings should be very close to zero.
We have a large (geographically replicated) Hitachi disk array (as well as many NetApp boxes), mostly it works very well indeed.
However 2-3 years ago we stumbled (very painfully!) across a firmware bug which took the primary Hitachi array down:
As we (i.e. the Hitachi service reps) were upgrading the mirrored cache, an error hit the active half, and it turned out that the firmware would always check the mirror (a very good idea, right?) before falling back on re-reading the disk(s). However, the firmware error handler which could have handled an error on the mirror copy as well (as long as the data wasn't dirty, of course), did not know how to handle a _missing_ copy, instead it blew away the entire array while crashing.
It took us three days to get everything back up, even though most of the critical systems were running off of the WAN backup copy after 2-3 hours.
Terje
PS. That particular firmware bug has of course been extinguished, but there's bound to be some more lurking around. Getting totally non-stop operation is a _hard_ problem!
23 years ago I wrote a custom DB to maintain the status of millions of "universal" gift cards, it ran 3-5 orders of magnitude faster (on a 6 MHz IMB AT) than a commercial database running on a big IBM mainframe.
I reduced the key operations (what is the value of this gift card, when was it sold, has it been redeemed previously? etc) to just one operation:
Check and clear a single bit in a bitmap.
My program used 1 second to update 10K semi-randomly-ordered (i.e. in the order we got them back from the shops that had accepted them) records in a database of approximately 10 M records.
20 years later I wrote a totally new version of the same application, but this time the gift cards are electronic debet cards. This time I used Linux-Apache-MySQL-Perl to make a browser-based version, and I stored everything in the DB. Today that is plenty fast enough, and it allows us to make any kind of queries against the DB, like "How many transactions of less than 100 kr was accepted in December, broken down by business area/chain/shop/etc"
This theoretical attack is based on using (previously covered on/.) clock skew to identify systems.
The correct defense is the same as the last time:
a) Make sure that there is no system clock skew, by running Network Time Protocol (NTP) on all servers.
b) Make sure that all externally visible timestamps are based on the system clock.
Part (b) is the only difficult step, since many current IP stacks use a private counter/clock instead of the system clock, presumably to reduce the overhead of providing timestamps. I know that Linus T have discussed using user-level library code to provide microsecond resolution (or better) timestamps, with very low overhead:
The library code can just query the cpu/system timer, multiply by the current scale factor (which depends on things like dynamically variable cpu clock frequency), and add the base time which was stored by the OS on the last HW clock interrupt: Total runtime, including call/return overhead can be below 100 clock cycles, which is fast enough to use it everywhere timestamps are needed:
BTW, I wrote asm code to do exactly this inside Novell's NetWare OS a little over 10 years ago. In NetWare these timestamps were used by the Packet Burst algorithms which optimized packet transmission rates.
I haven't written pure asm programs for the last 10+ years, but I'm willing to be the sequence in your.sig is:
B8 - MOV to AX
00 4C Immediate 16-bit constant, in LE order
I.e. MOV AX,4C00h
CD 21 is of course INT 21h which is the Dos OS interface.
Since 4Ch in AH is the 'Exit program' Dos call, and AL = 0 is the return value, the code above will stop the current program, with an errorlevel of zero, i.e. no error.
Not initially, but I believe the current version does so.
The boot sequence is to load (from a reserved area) the FDE sw which first tries to verify that it is running in plain unprotected DOS mode, then it takes over the keyboard hw so that it can read keystrokes without risking a trojan/keylogger attack.
After getting the password/passphrase it uses this to decrypt the user entry which contains the master disk key: If this doesn't succeed it goes into a sw timeout loop, taking progressively longer each time, before letting you retry.
When Windows loads, it must run in bios mode, until the protected mode crypto driver can be loaded.
Before taking a one-year sabbathical (91-92) which I spent in the US, writing networking code, I had a company that sold terminal emulation/file transfer software. I sold enough licenses to make it one of the top 5 bestselling norwegian programs. During the last year the norwegian IRS grabbed 83% of every Krone I invoiced my customers.
At that point I realized that I'd much rather work less and spend more time with my wife & kids, so I closed the company.
I still write/optimize code, but always because I enjoy it, not to make money. (Sometimes I do get paid as well (in addition to my regular salary), but that's not the important part.)
Re. "know this (crypto) technology": I want to know a lot more than just crypto, and the job I have, which is a sort of IT Fire Brigade Chief, means that I get to work on all sorts of interesting technology, including everything that's new, as well as everything that doesn't perform as well as it has to. The Full Disk Encryption requirements I mentioned in my first post were obvious to me at the time, but not to most of the vendors unfortunately.
I spend my leisure time on orienteering http://orienteering.org/, which is the perfect thinking person's sport.
I work for a multinational corporation with more than 10 K laptops, we decided to use full disk encryption more than 5 years ago.
At that time we found just 5 vendors who were qualified to deliver (after an initial pre-qualification round), and we invited them all to a specially setup testing lab: Of these 5 vendors, 3 were selling pure snake oil (encrypt the partition table and/or root directory only), it took less than 5 minutes to break into each of these.
Nr 4 seemed a lot better, but after 20 minutes work I found the crucial 'compare password, JE decrypt' sequence in the driver, and we were in.:-(
Only the final entry (from a german company) had understood how you design a product like this:
First you encrypt, using your preferred symmetric key algorithm (AES-256 these days?), all sectors on the disk. You use some form of hash of the logical sector number as a salt when encrypting, this makes each block unique, even those that contain the same 'FDFDFDFD' freshly formatted pattern. The key you use for this is the master disk key, it is a random number generated during installation.
Next you make a small table, with room for at least two entries: User and admin.
The user entry can be modified as often as you like (we default to slightly less than once/month), while the admin key/password is constant, but unique to this particular PC.
Each password (user/admin) is used as the key when encrypting the master key, which means that there is no way, even for the crypto architect, to recover the master key without knowing at least one of these passwords. (The passwords are never stored anywhere on the disk of course!)
The admin key/password is saved both as a printout and on disk on a secure system (without any form of network connection), so that you can use it each time a user manages to forget his/her user disk password.
There are lots of nice to have features as well, one of the more important is the ability to use a challenge/response setup to safely regenerate a user password remotely, without ever having to transmit the relevant admin key. This does require some kind of side channel to verify the identity of the user who owns the particular laptop: We use a combination of RSA's SecureID cards and the user's cell phone for this (each user has such a card to be able to use the corporate VPN connection which requires strong authentication).
In most of the rest of the world, the required 'patent step' is significantly higher than in the US, where it seems to have been reduced to 'anything that at least some first-year students might not have thought about immediately'.:-(
About 10 years ago I was asked to do patent reviews on a group of 10 patents which company A would like to use to sue company B:
Of those valid US patents, 4 were really, really obvious, i.e. more or less the only reasonable way to solve a particular problem. AFAIK this means that the patent is automatically invalid, right?
The next group of 4 all consisted of taking a standard textbook algorith, without _any_ additional tweaks, and implement it as a VLSI chip.
The final 2 patents actually covered somewhat neat ideas.
R0 = 1; R1 = M for i from 0 to n-1 do
if d[i] then
R1 = R0 * R1 mod N
R0 = R0 * R0 mod N
else
R0 = R0 * R1 mod N
R1 = R1 * R1 mod N return R0
The key-dependent if statement is the key here, if we can remove all such branches, then there's no Branch Target Buffer entry that depends on it, and no timing channel attack either:
R0 = 1; R1 = M; for (i = 0; i < n; i++) {
mask = 0 - d[i];// Either 0 or -1
nmask = mask ^ -1;// -1 or 0
T0 = R0 & mask;// Either 0 or R0
T0 += R1 & nmask;// At this point T0 will point to the value to be squared, R0 or R1!
T1 = R0 * R1 mod N;
T0 = T0 * T0 mod N;// Now we move the correct values back into R0 & R1
R1 = T1 & mask;
R0 = T0 & mask;
R0 += T1 & nmask;
R1 += T0 & nmask; } return R0;
There are at least three interesting issues here:
a) Most modern cpus have hw support for conditional operations, on x86 this is in the form of CMOVcc which is a (constant-time!) conditional move into a register, but as shown above, it really isn't needed here.
b) The perforance impact of the above branch removal can be negative! On a P4 a branch miss costs about 20 clock cycles, and since a key-dependent branch will miss 50% of the time, the average cost is 10 cycles. My replacement code above takes around 5 cycles or less on any current cpu.
c) A final possible timing-channel attack would be due to the memory alignment of the R0 and R1 values: By allocating them at the same address modulo the cpu page size, i.e. at 4 KB offset, the cache lines hit will be the same for both.
When I worked on the asm version of DFC, one of the AES also-rans, I removed a similar timing attack from a core 128-bit modular multiplication operation, using very similar techniques.
Svalbard, including Spitzbergen which is the largest island, is recognized by UN as a Norwegian territory.
It does exist in a sort of legal limbo though, in that any country which signs the Svalbard treaty can go in and look for natural resources. Russia and its Soviet precursor have had a fairly large city (Barentsburg) there for decades, supporting a coal mine which is now running out.
The chief authority on Svalbard is the office of 'Sysselmannen', which is located in the main Norwegian settlement, Longyearbyen.
A few hours south (by snowmobile) of Longyearbyen is the site of the Svea mine, which is sitting on a very rich coal seam, it is currently one of the most productive (per employee) mines in the world.
Svalbard also contains the big international research station at Ny Ålesund, which is operated by the Kings Bay Company. http://www.kingsbay.no/
Visiting Svalbard in March a couple of years ago was one of my most memorable trips ever:
What the original article didn't mention, and none of the replies seemed to go into, is the fact that with current CPUs, effectively all RAM is 'virtual':
Only on-chip memory, i.e. cache, is "real" these days, and all accesses to DRAM will be handled in paging units of 64/128 bytes or so. If this sounds familiar, it should! CPUs with 1 to 4 MB of real memory and lots of virtual memory is what the mainframes and minicomputers had about 20-30 years ago.
What this means is that now, just like then, all performance-critical code needs to be written to keep the working set within the amount of "real" memory you have available. When you passed this limit, you needed to make sure that you handled paging in suitably large blocks, to overcome the initial seek time overhead.
Today this corresponds to the difference between random access to DRAM and burst-mode (block transfer) which can be nearly an order of magnitude faster.
In the old days, when you passed the limits of your drum/disk swap device, you had to go to tape, which was a purely sequential device. Today, when you pass the limits of DRAM, you have to go to disk, which also needs to be treated as a bulk transfer/sequential device.
I.e. all the programming algorithms that was developed to handle resource limitations on old mainframes should now be ressurected!
"those who forget their history, are condemned to repeat it"
The problem isn't having a small or a large country, but how many (potential) customers you have per square km, right?
I.e. in a small country with a mall and distributed population, the average cost per custumer will be much higher than in the US.
Here in Norway it is friday afternoon and I'm about to drive up to our small mountain cabin for the weekend. At this cabin the local power company (Rauland Kraft) _by default_ pulls along an optic fibre (or at least a pvc tube where they can subsequently blow in the fiber) on every new installation.
The result is that I have IPTV over a 300 Mbit/s connection, but as of now I can only use up to 10/10 (up/down) Mbit for regular Internet traffic.:-(
If you want to check your maps or GoogleEarth, you'll notice that Rauland is located in the Vinje community on the central mountain plateau of southern Norway: This is one of the least densely populated areas in the entire country, but we still get fiber to every home & cabin.:-)
I have been using SuSE's encrypted partitions for more than 3 years now, they have always been completely integrated into the graphical installer.
Yes, they do require someone to enter the (very long!) passphrase during the OS startup process, but that's a small price for the measure of peace of mind that it provides.
Terje
10,000 hours, not 10 years
on
The Expert Mind
·
· Score: 2, Insightful
The version I have seen of this theory states that it takes about 10K hours of training/study to become a real expert. At this point you've become as good as you're ever going to be.
There are still differences between such people though, and that has to come down to 'innate ability'/genetics/IQ/whatever.
I.e. for every intelligent person who immersed herself in programming from an early age, there's still only going to be a very few real gurus.
An example:
A guy like Mike Abrash is pretty well recognized as one of the best PC graphics programmers ever, and he even managed to speedup John Carmack's original Quake C rendering code by a factor of 3 when writing the asm version.
According to Mike, John Carmack had the ability to grok many (5+ ?) different subjects at this level, at the same time!
Just after the PC introduction (at NCC fall 1981) I told my father-in-law that we should re-implement the software used for OCR processing in his downtown office. We should select something PC-compatible since this new open architecture was bound to generate compatibles, thereby ensuring a pretty long lifetime.
After looking around the market, we bought two Columbia PCs, one desktop (with an immense, never to be filled, 10 MB hard drive) and one luggable, for the same price as a single IBM PC.
The Columbia machine came with a BIOS/HW manual that documented all the various lowlevel interfaces, including the port adresses for things like the serial port and the interrupt controller, which allowed me to write a hw interrupt driver for the incoming 9600 baud OCR data stream.
Columbia was both earlier than Compaq and more compatible, but that didn't matter, they still went under a couple of years later. The PCs lived for many years however.:-)
I've probably written more assembly than most slashdot readers, and most of what you say is true:
It used to be the case that I could always increase the speed of some random C/Fortran/Pascal code by rewriting it in asm, parts of that speedup came from realizing better ways to map the current problem to the actual cpu hardware available.
However, I also discovered that much of the time it was possible to take the experience gained from the asm code, and use that to rewrite the original C code in such a way as to help the compiler generate near-optimal code. I.e. if I can get within 10-25% of 'speed_of_light' using portable C, I'll do so nearly every time.
There are some important situations where asm still wins, and that is when you have cpu hardware/opcodes available that the compiler cannot easily take advantage of. I.e. back in the days of the PentiumMMX 300 MHz cpu it became possible to do full MPEG2/DVD decoding in sw, but only by writing an awful lot of hand-optimized MMX code. Zoran SoftDVD was the first on the market, I was asked to help with some optimizations, but Mike Schmid (spelling?) had really done 99+% of the job.
Another important application for fast code is in crypto: If you want to transparently encrypt anything stored on your hard drive and/or going over a network wire, then you want the encryption/decryption process to be fast enough that you really doesn't notice any slowdown. This was one of the reasons for specifying a 200 MHz PentiumPro as the target machine for the Advanced Encryption Standard: If you could handle 100 Mbit Ethernet full duplex (i.e. 10 MB/s in both directions) on a 1996 model cpu, then you could easily do the same on any modern system.
When we (I and 3 other guys) rewrote one of the AES contenders (DFC, not the winner!) in pure asm, we managed to speed it up by a factor of 3, which moved it from being one of the 3-4 slowest to one of the fastest algorithms among the 15 alternatives.
Today, with fp SIMD instructions and a reasonably orthogonal/complete instruction set (i.e. SSE3 on x86), it is relatively easy to write code in such a way that an autovectorizer can do a good job, but for more complicated code things quickly become much harder.
I have always (i.e. since the late 1980'ies) assumed that everything I write online can be used against me.
This is of course obvious in places like UseNet or Slashdot, but I try to behave the same way in email or IM.
Since I've also managed to keep the same email address since 1994, and always signed Usenet posts with my full name & email, it is trivially easy to Google me. Prospective clients/employers who do so should hopefully get a good impression.
Terje Mathisen
What counts here isn't absolute but relative size:
I.e. population per sq/km and population distribution (urban vs rural).
On both those measures Norway is a lot closer to Canada than the US.
Besides which we have the mountains and fjords, i.e. a very fractal landscape which makes all forms of infrastructure much more expensive than in Denmark (or even New Mexico).
Terje
Here in Norway I cannot get fiber to my home in Oslo, but when we bought a new cabin up in the central mountains, the local power company by default pulled fiber along with the 3x65 Amp 400 V power cable. (Actually, what they do is to pull fiber to the local distribution box, then they place a 1/2" PVC tube along with the underground power cable to the building site. After the cabin was finished, they came back and spent 10 minutes blowing a fiber through the PVC tube.
The cost is the same as for ADSL in downtown Oslo.
BTW, Norway has a very sparse population, and this goes double for the mountain areas.
Terje
I just turned 50 this summer, and I've never felt more appreciated as an engineer than the last couple of years.
As other people here have commented, the real secret is to simply be _very_ good at what you do: Keep up your old skills, and make sure you learn (i.e. teach yourself) something brand new every year or two.
Over the last 5+ years I've been the "IT Fire Brigade Chief" in the Fortune 500 company I work for, i.e. I get all the really interesting problems, all the cases that none of the others can figure out, and all the bleeding edge stuff that doesn't fit nicely into one of the existing departments.
I also get to spend discretionary time writing and optimizing system code, so I really don't see any reason to complain. (I've worked on one of AES contenders http://www.adastral.ucl.ac.uk/~helger/research/aes/, the windows port of NTP http://ntp.org/, HD-DVD decoding, Ogg Vorbis optimization as well as lots of other kinds of code. I am also the Scandinavian coordinator of the Confluence Project http://confluence.org/.)
My role model within the company retired a few years ago, 67 years old, and he's still enthusiastic about brand new technology.
OTOH, living in Norway I also know that it would be effectively impossible to fire me, unless I completely stopped coming into work, and started doing drugs instead.
Terje
This is in fact very easy to prove:
If te maximum jail time for not divulging encryption keys is significantly less than the time for actually being convicted of terrorism, then it should be obvious that real terrorists would never divulge such encryption keys.
No, this law, and others like it in other jurisdictions, are simply there to give the police one more reason to force regular citizens to hand over their keys.
If you actually do have something to hide from the authorities, the best idea is probably to look into http://truecrypt.org/ and the capability of having hidden encrypted volumes.
When forced, either by legal threats or by rubber hose interrogation, you can then divulge the primary key. On the primary volume you should store potentially embarrassing, but not really critical information. This should be sufficient to show that you had reason to hide said info, but not enough to put you in jail for a long time.
If you happen to be located in a place like Myanmar/Burma, then you should also use TrueCrypt, for exactly the same kind of reason.
Terje
"almost all programming can be viewed as an exercise in caching"
We have 500+ servers worldwide, many of them contains the same program install images which by definition should be identical:
One master, all the others are copies.
Starting maybe 15 years ago, when these directory structures were in the single-digit GB range, we started noticing strange errors, and after running full block-by-block compares between the master and several slave servers we determined that we had end-to-end error rates of about 1 in 10 GB.
Initially we solved this by doubling the network load, i.e. always doing a full verify after every copy, but later on we found that keeping the same hw, but using sw packet checksums, was sufficient to stop this particular error mechanism.
One of the errors we saw was a data block where a single byte was repeated, overwriting the real data byte that should have followed it. This is almost certainly caused by a timing glitch which over-/under-runs a hardware FIFO. Having 32-bit CRCs on all Ethernet packets as well as 16-bit TCP checksums doesn't help if the path across the PCI bus is unprotected and the TCP checksum has been verified on the network card itself.
Since then our largest volume sizes have increased into the 100 TB range, and I do expect that we now have other silent failure mechanisms: Basically, any time/location when data isn't explicitly covered by end-to-end verification is a silent failure waiting to happen. On disk volumes we try to protect against this by using file systems which can protect against lost writes as well as miss-placed writes (i.e. the disk reports writing block 1000, but in reality it wrote to block 1064 on the next cylinder).
NetApp's WAFL is good, but I expect Sun's ZFS to an equally good job a significantly lower cost.
Terje
The trouble with your Jimmy scenario is this:
Although your thumb drive can pack 5 TB, it doesn't have write rates much better than about 10 x today's 1-10 MB/s, let's call it 100 MB/sec, OK?
This means that each GB 10 seconds, and a TB require 10K seconds or nearly 3 hours, if everything is working perfectly.
For 5TB we're talking 13-15 hours, which means that you must have spent the entire day over at Jimmy's house.
All this is assuming both Jimmy's (solid state?) disk array and your thumb drive can actually sustain a Gbit/s for a file copy operation, including all the file system overhead.
Over the last 25 years both IO rates and storage densities have increased exponentially, but the storage exponent has been significantly larger than the IO exponent, which means that the time to totally fill/empty a new, state-of-the art
drive has increased every year.
Terje
When decoding "full HD" h264, i.e. 40 Mbit/s BluRay or 30 MBit/s HD-DVD, with 1080p resolution, current cpus start to trash the L2 cache:
Each 1080p frame consist of approximately 2 M pixels, which means that the luminance info will need 2 MB, right?
Since the normal way to encode most of the frames is to have two source frames and one target, motion compensation (which can access any 4x4, 8x8 og 16x16 sub-block from either or both of the source frames), will need to have up to 2+2+2=6MB as the working set.
Terje
Yes, they did look at those persons who where a lot younger than their older siblings, and found that after 8 years, the difference in IQ wasn't detectable.
I.e. since my youngest brother is 7 years younger than his sister who is the next-youngest, the expected "cost" of having older siblings should be very close to zero.
Terje
We have a large (geographically replicated) Hitachi disk array (as well as many NetApp boxes), mostly it works very well indeed.
However 2-3 years ago we stumbled (very painfully!) across a firmware bug which took the primary Hitachi array down:
As we (i.e. the Hitachi service reps) were upgrading the mirrored cache, an error hit the active half, and it turned out that the firmware would always check the mirror (a very good idea, right?) before falling back on re-reading the disk(s). However, the firmware error handler which could have handled an error on the mirror copy as well (as long as the data wasn't dirty, of course), did not know how to handle a _missing_ copy, instead it blew away the entire array while crashing.
It took us three days to get everything back up, even though most of the critical systems were running off of the WAN backup copy after 2-3 hours.
Terje
PS. That particular firmware bug has of course been extinguished, but there's bound to be some more lurking around. Getting totally non-stop operation is a _hard_ problem!
23 years ago I wrote a custom DB to maintain the status of millions of "universal" gift cards, it ran 3-5 orders of magnitude faster (on a 6 MHz IMB AT) than a commercial database running on a big IBM mainframe.
I reduced the key operations (what is the value of this gift card, when was it sold, has it been redeemed previously? etc) to just one operation:
Check and clear a single bit in a bitmap.
My program used 1 second to update 10K semi-randomly-ordered (i.e. in the order we got them back from the shops that had accepted them) records in a database of approximately 10 M records.
20 years later I wrote a totally new version of the same application, but this time the gift cards are electronic debet cards. This time I used Linux-Apache-MySQL-Perl to make a browser-based version, and I stored everything in the DB. Today that is plenty fast enough, and it allows us to make any kind of queries against the DB, like "How many transactions of less than 100 kr was accepted in December, broken down by business area/chain/shop/etc"
Terje
This theoretical attack is based on using (previously covered on /.) clock skew to identify systems.
The correct defense is the same as the last time:
a) Make sure that there is no system clock skew, by running Network Time Protocol (NTP) on all servers.
b) Make sure that all externally visible timestamps are based on the system clock.
Part (b) is the only difficult step, since many current IP stacks use a private counter/clock instead of the system clock, presumably to reduce the overhead of providing timestamps. I know that Linus T have discussed using user-level library code to provide microsecond resolution (or better) timestamps, with very low overhead:
The library code can just query the cpu/system timer, multiply by the current scale factor (which depends on things like dynamically variable cpu clock frequency), and add the base time which was stored by the OS on the last HW clock interrupt: Total runtime, including call/return overhead can be below 100 clock cycles, which is fast enough to use it everywhere timestamps are needed:
BTW, I wrote asm code to do exactly this inside Novell's NetWare OS a little over 10 years ago. In NetWare these timestamps were used by the Packet Burst algorithms which optimized packet transmission rates.
Terje
I haven't written pure asm programs for the last 10+ years, but I'm willing to be the sequence in your .sig is:
:-)
B8 - MOV to AX
00 4C Immediate 16-bit constant, in LE order
I.e. MOV AX,4C00h
CD 21 is of course INT 21h which is the Dos OS interface.
Since 4Ch in AH is the 'Exit program' Dos call, and AL = 0 is the return value, the code above will stop the current program, with an errorlevel of zero, i.e. no error.
OK?
Terje
Re: Multiple user entries:
Not initially, but I believe the current version does so.
The boot sequence is to load (from a reserved area) the FDE sw which first tries to verify that it is running in plain unprotected DOS mode, then it takes over the keyboard hw so that it can read keystrokes without risking a trojan/keylogger attack.
After getting the password/passphrase it uses this to decrypt the user entry which contains the master disk key: If this doesn't succeed it goes into a sw timeout loop, taking progressively longer each time, before letting you retry.
When Windows loads, it must run in bios mode, until the protected mode crypto driver can be loaded.
Terje
Been there, Done that.
Before taking a one-year sabbathical (91-92) which I spent in the US, writing networking code, I had a company that sold terminal emulation/file transfer software. I sold enough licenses to make it one of the top 5 bestselling norwegian programs. During the last year the norwegian IRS grabbed 83% of every Krone I invoiced my customers.
At that point I realized that I'd much rather work less and spend more time with my wife & kids, so I closed the company.
I still write/optimize code, but always because I enjoy it, not to make money. (Sometimes I do get paid as well (in addition to my regular salary), but that's not the important part.)
Re. "know this (crypto) technology": I want to know a lot more than just crypto, and the job I have, which is a sort of IT Fire Brigade Chief, means that I get to work on all sorts of interesting technology, including everything that's new, as well as everything that doesn't perform as well as it has to. The Full Disk Encryption requirements I mentioned in my first post were obvious to me at the time, but not to most of the vendors unfortunately.
I spend my leisure time on orienteering http://orienteering.org/, which is the perfect thinking person's sport.
I'm also the Scandinavian coordinator of the Confluence project http://confluence.org/
Check google for my other interests!
Terje
I work for a multinational corporation with more than 10 K laptops, we decided to use full disk encryption more than 5 years ago.
:-(
At that time we found just 5 vendors who were qualified to deliver (after an initial pre-qualification round), and we invited them all to a specially setup testing lab: Of these 5 vendors, 3 were selling pure snake oil (encrypt the partition table and/or root directory only), it took less than 5 minutes to break into each of these.
Nr 4 seemed a lot better, but after 20 minutes work I found the crucial 'compare password, JE decrypt' sequence in the driver, and we were in.
Only the final entry (from a german company) had understood how you design a product like this:
First you encrypt, using your preferred symmetric key algorithm (AES-256 these days?), all sectors on the disk. You use some form of hash of the logical sector number as a salt when encrypting, this makes each block unique, even those that contain the same 'FDFDFDFD' freshly formatted pattern. The key you use for this is the master disk key, it is a random number generated during installation.
Next you make a small table, with room for at least two entries: User and admin.
The user entry can be modified as often as you like (we default to slightly less than once/month), while the admin key/password is constant, but unique to this particular PC.
Each password (user/admin) is used as the key when encrypting the master key, which means that there is no way, even for the crypto architect, to recover the master key without knowing at least one of these passwords. (The passwords are never stored anywhere on the disk of course!)
The admin key/password is saved both as a printout and on disk on a secure system (without any form of network connection), so that you can use it each time a user manages to forget his/her user disk password.
There are lots of nice to have features as well, one of the more important is the ability to use a challenge/response setup to safely regenerate a user password remotely, without ever having to transmit the relevant admin key. This does require some kind of side channel to verify the identity of the user who owns the particular laptop: We use a combination of RSA's SecureID cards and the user's cell phone for this (each user has such a card to be able to use the corporate VPN connection which requires strong authentication).
Terje
In most of the rest of the world, the required 'patent step' is significantly higher than in the US, where it seems to have been reduced to 'anything that at least some first-year students might not have thought about immediately'. :-(
About 10 years ago I was asked to do patent reviews on a group of 10 patents which company A would like to use to sue company B:
Of those valid US patents, 4 were really, really obvious, i.e. more or less the only reasonable way to solve a particular problem. AFAIK this means that the patent is automatically invalid, right?
The next group of 4 all consisted of taking a standard textbook algorith, without _any_ additional tweaks, and implement it as a VLSI chip.
The final 2 patents actually covered somewhat neat ideas.
Terje
From the linked article:
// Either 0 or -1 // -1 or 0 // Either 0 or R0 // At this point T0 will point to the value to be squared, R0 or R1!
// Now we move the correct values back into R0 & R1
R0 = 1; R1 = M
for i from 0 to n-1 do
if d[i] then
R1 = R0 * R1 mod N
R0 = R0 * R0 mod N
else
R0 = R0 * R1 mod N
R1 = R1 * R1 mod N
return R0
The key-dependent if statement is the key here, if we can remove all such branches, then there's no Branch Target Buffer entry that depends on it, and no timing channel attack either:
R0 = 1; R1 = M;
for (i = 0; i < n; i++) {
mask = 0 - d[i];
nmask = mask ^ -1;
T0 = R0 & mask;
T0 += R1 & nmask;
T1 = R0 * R1 mod N;
T0 = T0 * T0 mod N;
R1 = T1 & mask;
R0 = T0 & mask;
R0 += T1 & nmask;
R1 += T0 & nmask;
}
return R0;
There are at least three interesting issues here:
a) Most modern cpus have hw support for conditional operations, on x86 this is in the form of CMOVcc which is a (constant-time!) conditional move into a register, but as shown above, it really isn't needed here.
b) The perforance impact of the above branch removal can be negative!
On a P4 a branch miss costs about 20 clock cycles, and since a key-dependent branch will miss 50% of the time, the average cost is 10 cycles. My replacement code above takes around 5 cycles or less on any current cpu.
c) A final possible timing-channel attack would be due to the memory alignment of the R0 and R1 values:
By allocating them at the same address modulo the cpu page size, i.e. at 4 KB offset, the cache lines hit will be the same for both.
When I worked on the asm version of DFC, one of the AES also-rans, I removed a similar timing attack from a core 128-bit modular multiplication operation, using very similar techniques.
Terje
Svalbard, including Spitzbergen which is the largest island, is recognized by UN as a Norwegian territory.
It does exist in a sort of legal limbo though, in that any country which signs the Svalbard treaty can go in and look for natural resources. Russia and its Soviet precursor have had a fairly large city (Barentsburg) there for decades, supporting a coal mine which is now running out.
The chief authority on Svalbard is the office of 'Sysselmannen', which is located in the main Norwegian settlement, Longyearbyen.
A few hours south (by snowmobile) of Longyearbyen is the site of the Svea mine, which is sitting on a very rich coal seam, it is currently one of the most productive (per employee) mines in the world.
Svalbard also contains the big international research station at Ny Ålesund, which is operated by the Kings Bay Company.
http://www.kingsbay.no/
Visiting Svalbard in March a couple of years ago was one of my most memorable trips ever:
http://confluence.org/confluence.php?visitid=8138
Terje
What the original article didn't mention, and none of the replies seemed to go into, is the fact that with current CPUs, effectively all RAM is 'virtual':
Only on-chip memory, i.e. cache, is "real" these days, and all accesses to DRAM will be handled in paging units of 64/128 bytes or so. If this sounds familiar, it should! CPUs with 1 to 4 MB of real memory and lots of virtual memory is what the mainframes and minicomputers had about 20-30 years ago.
What this means is that now, just like then, all performance-critical code needs to be written to keep the working set within the amount of "real" memory you have available. When you passed this limit, you needed to make sure that you handled paging in suitably large blocks, to overcome the initial seek time overhead.
Today this corresponds to the difference between random access to DRAM and burst-mode (block transfer) which can be nearly an order of magnitude faster.
In the old days, when you passed the limits of your drum/disk swap device, you had to go to tape, which was a purely sequential device. Today, when you pass the limits of DRAM, you have to go to disk, which also needs to be treated as a bulk transfer/sequential device.
I.e. all the programming algorithms that was developed to handle resource limitations on old mainframes should now be ressurected!
"those who forget their history, are condemned to repeat it"
Terje
The problem isn't having a small or a large country, but how many (potential) customers you have per square km, right?
:-(
:-)
, 8.073578&spn=0.282005,0.553436&om=1
I.e. in a small country with a mall and distributed population, the average cost per custumer will be much higher than in the US.
Here in Norway it is friday afternoon and I'm about to drive up to our small mountain cabin for the weekend. At this cabin the local power company (Rauland Kraft) _by default_ pulls along an optic fibre (or at least a pvc tube where they can subsequently blow in the fiber) on every new installation.
The result is that I have IPTV over a 300 Mbit/s connection, but as of now I can only use up to 10/10 (up/down) Mbit for regular Internet traffic.
If you want to check your maps or GoogleEarth, you'll notice that Rauland is located in the Vinje community on the central mountain plateau of southern Norway: This is one of the least densely populated areas in the entire country, but we still get fiber to every home & cabin.
http://maps.google.com/?ie=UTF8&z=11&ll=59.698935
Terje
I have been using SuSE's encrypted partitions for more than 3 years now, they have always been completely integrated into the graphical installer.
Yes, they do require someone to enter the (very long!) passphrase during the OS startup process, but that's a small price for the measure of peace of mind that it provides.
Terje
The version I have seen of this theory states that it takes about 10K hours of training/study to become a real expert. At this point you've become as good as you're ever going to be.
There are still differences between such people though, and that has to come down to 'innate ability'/genetics/IQ/whatever.
I.e. for every intelligent person who immersed herself in programming from an early age, there's still only going to be a very few real gurus.
An example:
A guy like Mike Abrash is pretty well recognized as one of the best PC graphics programmers ever, and he even managed to speedup John Carmack's original Quake C rendering code by a factor of 3 when writing the asm version.
According to Mike, John Carmack had the ability to grok many (5+ ?) different subjects at this level, at the same time!
Terje
Just after the PC introduction (at NCC fall 1981) I told my father-in-law that we should re-implement the software used for OCR processing in his downtown office. We should select something PC-compatible since this new open architecture was bound to generate compatibles, thereby ensuring a pretty long lifetime.
:-)
After looking around the market, we bought two Columbia PCs, one desktop (with an immense, never to be filled, 10 MB hard drive) and one luggable, for the same price as a single IBM PC.
The Columbia machine came with a BIOS/HW manual that documented all the various lowlevel interfaces, including the port adresses for things like the serial port and the interrupt controller, which allowed me to write a hw interrupt driver for the incoming 9600 baud OCR data stream.
Columbia was both earlier than Compaq and more compatible, but that didn't matter, they still went under a couple of years later. The PCs lived for many years however.
Terje
I've probably written more assembly than most slashdot readers, and most of what you say is true:
It used to be the case that I could always increase the speed of some random C/Fortran/Pascal code by rewriting it in asm, parts of that speedup came from realizing better ways to map the current problem to the actual cpu hardware available.
However, I also discovered that much of the time it was possible to take the experience gained from the asm code, and use that to rewrite the original C code in such a way as to help the compiler generate near-optimal code. I.e. if I can get within 10-25% of 'speed_of_light' using portable C, I'll do so nearly every time.
There are some important situations where asm still wins, and that is when you have cpu hardware/opcodes available that the compiler cannot easily take advantage of. I.e. back in the days of the PentiumMMX 300 MHz cpu it became possible to do full MPEG2/DVD decoding in sw, but only by writing an awful lot of hand-optimized MMX code. Zoran SoftDVD was the first on the market, I was asked to help with some optimizations, but Mike Schmid (spelling?) had really done 99+% of the job.
Another important application for fast code is in crypto: If you want to transparently encrypt anything stored on your hard drive and/or going over a network wire, then you want the encryption/decryption process to be fast enough that you really doesn't notice any slowdown. This was one of the reasons for specifying a 200 MHz PentiumPro as the target machine for the Advanced Encryption Standard: If you could handle 100 Mbit Ethernet full duplex (i.e. 10 MB/s in both directions) on a 1996 model cpu, then you could easily do the same on any modern system.
When we (I and 3 other guys) rewrote one of the AES contenders (DFC, not the winner!) in pure asm, we managed to speed it up by a factor of 3, which moved it from being one of the 3-4 slowest to one of the fastest algorithms among the 15 alternatives.
Today, with fp SIMD instructions and a reasonably orthogonal/complete instruction set (i.e. SSE3 on x86), it is relatively easy to write code in such a way that an autovectorizer can do a good job, but for more complicated code things quickly become much harder.
Terje