It depends on whether the two formats represent audio information in a similar enough form to allow transcode without introducing glitches. You also still have the question of how to decode the MP3 sufficiently for transcode w/out violating Fraunhofer's intellectual property, not to mention the fuzzy area that comes from the fact that an MP3 encoder produced the audio data to begin with.
They never said anything about compression. Their technology is all about eliminating the throttling effect of TCP acknowledgements on a long haul high-bandwidth link. You can only grow TCP windows so large, and with TCP slow-start, only so fast.
I once saw an article on using TCP for interplanetary work, and they showed that RTT was the bandwidth limiter (bigtime!) due to how the protocol is constructed.
These "Fountain" guys are not about compression. They're about sending XOR blocks to fill in gaps, doing essentially blind-retransmits until the other end says "Ok, I got it all now!"
Ick. The XORing bit just apparently helps reduce the number of needed "proactive retransmits."
Read the article. They're just using XOR.
It's like using RAID 4 checksum blocks, except they're doing it on a file transfer instead of on disk blocks.
(I'm oversimplifying a bit, but really their approach doesn't sound all that special.)
Could make a difference when the server is a load-balanced cluster. Also, if an ACK gets dropped for one stream, the others can absorb
the available bandwidth while the retransmit
timer times out, which can be useful even there's only one computer at both ends...
As for MOV EAX, 0 vs. XOR EAX, EAX , while both accomplish the same goal at the same speed, the former instruction is encoded into 5 bytes, while the latter is encoded into a single byte. If you're optimizing tight loops, this might really make a difference
(Disclaimer: I'm not speaking specifically about x86 here, but in the general sense of instruction scheduling.)
If they both encoded into the same size, I'd go with the MOV, though. The latter looks like it depends on the previous value of EAX, even though in this special case it
doesn't. Unless the dynamic scheduler has a special test for this (and on x86, because
of the coding size issue, I'm pretty sure it does), it'll not allow the XOR to parallelize up with anything that modifies EAX. OTOH, the MOV can not only parallelize, but also get a different rename register and be reordered very aggressively with respect to other places where EAX is used. (That's what rename registers are for, after all.)
I do have some knowledge on this area, and I can assure you, 64-bit will only slow down RC5 implementations, when the word size parameters are specified at 32 bits. Which, BTW, is the value set by RSA on all their Secret Key Challenges, this being the stuff distributed.net is cracking.
This isn't true if you have sufficient registers
and you use a "bitslice approach." With a bitslicing approach, you store each of the 32 bits
of a number across 32 different registers, and you do this for several keys in parallel. For instance, you store the data for key 0 in bit 0 across the 32 registers, the data for key 1 in bit 1, and so on. Then you can process N keys in parallel, where N is the bit-width of your machine. If you have a 64-bit machine, this is twice as efficient as a 32-bit machine, for the same number of registers. XORs don't change, ROTLs by 3 just become moves (if anything), and ADDs are two XORs and an AND. The only hard part with RC5 is when you have to ROTL by a varying
amount -- that requires some fancy footwork, but it's not too bad (about 5 sets of tricky ANDs and ORs).:-)
The main point you missed, though, is that Linus is right at the macro level -- there is no overall design process for Linux as a system or an overall direction for Linux.
At the kernel subsystem level, there's plenty of design, and plenty of goals, and plenty of localized direction. In the filesystem space, there was a lot of buzz around journalling filesystems. In the MM department, we had something more akin to controlled chaos...:-) And yes, the SCSI layer could use
some actual careful design work.
There was no overarching goal "We must optimize for market X" that drove any of this.
Sure, some people want to run Linux on huge machines, and so they want journalling.
Other people want to shove Linux into wristwatches and PDAs, and so they instead want to focus on memory
footprint. And still others care about interrupt
latency over throughput. So, each little care-about niche as their own little projects that
pull Linux in lots of different directions at
the macro level. Each individual project is very directed,
and some have significant design work. But none of it is directed from On High as part of the Grand Plan for The System.
just look at biological species to see that a process of evolution rarely results in the optimal design, and is unable to take U-turns or back out of dead ends...
In effect, evolution can make a U-turn by
branching further back on the tree. Notice that
bats and birds both have wings, but they each evolved them by completely different paths, because each started from a different branch point on the
tree...
Where you're failing to see the point is that evolution works by being a massively parallel,
highly branching gigantic tree. If a branch of
the tree goes down a dead-end, no worries -- another branch will make it around the dead-end.
As for no optimal design? Whose to say that our
appendix won't mutate into something useful again down the road when climatic or other changes tilt the survivability rules once more? The "extra baggage" we carry around and can afford to support biologically is what allows us to have non-advantageous mutations that eventually morph into something that is advantageous.
and on the other hand, a horse cannot evolve wheels, because the intermediate steps between a legged horse and a wheeled horse would not be able to move. pity because a wheeled horse could be faster...
Cecil
Adams actually covered animals evolving wheels. You might find it interesting. He doesn't tackle the intermediate state problem, though.
At any rate, there's the purely biological problem of "How would you keep those wheels 'fed'?" Even if they were made of calcium, you'd need to deposit the calcium to begin with, and then replenish it as it wears. (Wheels will go through a lot more wear than teeth.)
BTW, speaking of.sigs, Slashdot's added a new feature that makes them less annoying: You can now
enable "sig dashes" in your user profile so that
all sigs get separated from the body text with a
--<BR> or the like. Makes many sigs
much less annoying. (Since you're an A.C.,
I don't suspect it benefits you much, but I
thought I'd mention it nonetheless.)
The 53 minute comment is B.S. However, there is a pretty stark relationship between response time and productivity. Check out Hennessey and Patterson sometime, and look up "Response time." As response time drops (gets faster), so-called "think time" (time between user inputs) drops of much, much faster, and productivity shoots WAY up, at least until you hit the user's saturation point. If the system response slows down beyond a magic threshold,
productivity falls way off.
So, it's reasonable to say a 10% increase in response time (10% slowdown) will cause a greater-than 10% increase in the time to perform certain tasks. Now whether that translates into a loss in overall productivity, or just less time kibitzing at the watercooler is hard to say.
I think you missed the point. Take the car example, for instance: When telling someone how
to drive, you could say "ok, move this stick here,
when you get to that corner turn right, etc." and
give them very specific directions on how to drive
their car between very specific locations at specific times of day. Or,
you could tell them the basics of operating the
car, teach them the rules of the road, and teach them how to read maps. In neither case do you need to open the hood of the car.
This corresponds to the two sets of computer users I see -- the "I know my exact routine" users that panic when you move their desktop
icons around (these are the 'sheep'), and the
ones that actually know how to use
the computer at some level above rote memorization.
My fiancee was telling me about the one time she
played Solitaire on the secretary's computer at her grandfather's business. The secretary panicked and got upset at her, because she had
minimized the Program Manager (this was Win 3.x),
and the secretary didn't know how to "get it back." The secretary literally knew the exact
sequence of clicks to perform her limited set of
tasks involving the PC, and was essentially serving as a biological "macro".
Too many "training programs" are really just "rote memorization of specific sequences" rather
than actual "learning the general principles of using the machine." The principles don't need to be very low level to be useful -- just ideas such as what various clicks do, where various things are found, etc. The secretary I mentioned above didn't even know what "maximizing" and "minimizing" windows was about!
Whether driving a car, cooking food, using a computer, building something, or whatever, it's far more effective to teach the general principles and build from there than have the student memorize a few specific details. "Give a fish" vs. "teach to fish."
Interesting points about how all the registers are used... I've never actually been brave enough to get into x86 assembly. I have a Motorola background, so I'm used to things like a flat 4G
address space, "data" registers and "address" registers, and memory-mapped IO. My brain just balked at the x86 world of "memory segments", "al/ah/ax/eax", etc.
It's really not that hard. AH and AL map to AX on x86 the same way the A and B accumulators on the 6800 map to the D accumulator. (Ok, you really meant you prefer 68k, not all things Motorola.) Ignoring the 8-bit sub-registers, you have 7 16-bit general purpose registers to work with -- AX, BX, CX, DX, SI, DI, BP. (Ok, someone will scream "BP isn't a general purpose register." Just turn on -fomit-frame-pointer when calling GCC and get on with life. Or go play in traffic.) These registers can hold data OR addresses -- no partitioning between the two uses. (I just heard someone squeal about addressing modes, and maybe something about string instructions... Most of the time, it's not an issue. Hey, aren't you the same guy who was squeaking about BP?)
So, anyway, how is that harder than having to deal with separate data and address registers? (Why partition registers by functionality?) And as long as all your data fits in 64K, you're all set--you never have to think about segments.;-) (Hey you, with the GBs of porn... put that thing back in your pants. And stop listening to those MBs of pirated MP3s.)
Now the 32-bit extensions are even easier, if you ignore the 16-bit ways of referring to registers (most of the time, it's better that way anyway), you just put an 'E' in front of the name and they're
all 32 bits. Not too hard to think about. And at
least under Linux (I claim no knowledge of Windows), you get a nice 32-bit address space.
So, see, it's not so bad. (Hey you, snickering in the back! Cut it out.) The x86-64 extensions just continue this tradition -- as long as you ignore the old crap, the new crap isn't all that horrible.
All that said, I agree that x86 is a pretty grotty CPU architecture.:-)
Exercise for the reader: Identify uses of sarcasm and ironic humor in the above post.
The changelog does not constitute security
testing, though. Writing and/or using a program
which
tests for a hole and merely says "You're
vulnerable, install the patch" (or, if it's
part of the patch routine, just installs
the patch) qualifies as security testing.
Describing the vulnerability, though, such
that anyone could potentially write a program
to circumvent the access control
is not security testing.
Besides, If I'm understanding correctly, this clause says specifically that
you can still run afoul of the clause I quoted.
And if you read the thread, you'll see that
Alan Cox's assertion is that UNIX-style permissions can be used for digital rights
managment purposes. That is, they can be
used as an access control to protect copyrighted
works that are covered under the DMCA. Therefore, disclosing
a security vulnerability which can subvert
UNIX-style permissions is equivalent to
describing how to circumvent an access-control
device as described under the DMCA.
I would guess that the specific DMCA clause that Alan's affected
by is this one:
(2) No person shall manufacture, import, offer to the public, provide, or otherwise traffic in any technology, product, service, device, component, or part thereof, that--
(A) is primarily designed or produced for the purpose of circumventing a technological measure that effectively controls access to a work protected under this title;
(B) has only limited commercially significant purpose or use other than to circumvent a technological measure that effectively controls access to a work protected under this title; or
(C) is marketed by that person or another acting in concert with that person with that person's knowledge for use in circumventing a technological measure that effectively controls access to a work protected under this title.
It would seem Alan's conjecture is that describing a specific vulnerability in the Linux kernel that allows subverting some aspect of Linux's permission structure (which can be used as an access control device to a protected work) constitutes "traffic[king] in any technology [...] or part
thereof" that would allow someone to circumvent
the access control. Under the current interpretation of the law (re: Skylarov), detailing a security weakness in a product seems
to (a) constitute such trafficking, and (b) seems to fit one of the three clauses 2(A), 2(B), or 2(C) above. (Notice they're connected by an 'or',
so it's is necessary to fit only one of the three
to be in violation of DMCA. I'm guessing the
kernel information would fit 2(A).)
I'm so proud to be an American,
where at least I know I'm free[*].:-P
Reminds me of an anecdote I heard about a telegraph operator that suspected the message he was sending was in 'code'. It read "Mr. So-and-so is dead." He sent it as "Mr. So-and-so is deceased." A confused reply came: "Is Mr. So-and-so dead or deceased?"
seek time is a factor in the size of the platter and the head mechanism.. not the RPM rating of the drive, as indicated in the article. RPM will affect the average time it takes for a given sector to
re-appear under the head... but that's not as important.
Rotational latency IS a very important aspect of
average access time for a given sector. Here's a quick rundown of how long one full platter rotation takes at various common drive RPMs:
3600 RPM: 16.6ms (Ancient MFM baby!)
4500 RPM: 13.3ms
5400 RPM: 11.1ms (You can still find these... shudder)
7200 RPM: 8.3ms
10000 RPM: 6.0ms
15000 RPM: 4.0ms
Seeking from one sector to another requires both moving the head and acquiring the sector once you arrive at the track. Hopefully, the drive is laid out so that most common operations (linear reads that hop track-to-track) don't have to pay the rotational latency. Also, if you do a large linear read request to the drive (something I seem to recall SCSI supports better than IDE), the drive can be smart and read the whole track starting wherever the head lands -- thus hiding the rotational latency in certain cases. But for random seeks reading single blocks, there's not much you can do.
From what I remember, RAID 5 is intended for
reliability, not performance. Mirroring a RAID
5 should improve the performance of reads due to seek-time reduction, but
it will make the already incredibly expensive writes (on RAID 5, at least) even more expensive.
In general, a mirroring RAID makes reads
faster and writes slower.
Why does a mirroring RAID make writes more expensive? Because all drives must seek to the
location of the write in order to perform the
write. Since the heads on all the mirrors may
be scattered around the disk (a good thing for
reads since it reduces average seek time on reads), the seek for any given write is as
expensive as the most expensive seek across all
the disks in the array. This makes the average
for the whole array suck worse than it would for
one disk.
As a separate issue, RAID 5 (which doesn't do mirroring in its basic form) still makes writes very expensive.
Why are RAID 5 writes expensive? Because you
have to update two blocks for every isolated 1 block update -- the block in question and its corresponding checksum block. Worst of all, if
any one of the other blocks that share the checksum block aren't in cache, the checksum update is a read-modify-write cycle. OUCH! If you have lots of randomly scattered single-block updates, performance starts
to suck pretty badly pretty quickly. On the
plus side, if more than one block in a given update share a
given checksum block, then the number of checksum
updates go down since the updates can be combined. This means that the impact
of RAID 5 is
much less on large linear writes. In the best possible case, all blocks that share a given checksum block are being updated together in parallel with the checksum, so you can blast out the N blocks with N+1 parallel writes to the N+1 disks without any read-modify-write cycles.
(Note that it would seem that you can theoretically alleviate the r-m-w suckage
I mention above by increasing your filesystem block size to at least N*physical_block_size, so that any
update coming from the OS forces all physical
blocks that share the same checksum block to be
updated together. The suckage I mention above
applies only if your filesystem's block size is
smaller than N*physical_block_size. As I recall,
physical_block_size is typically 512 bytes. For ext2, this means making your fs with a 4K
block size instead of the default 1K block size
if you're on a RAID 5 with more than 3 disks...)
Two ways: It's pre-populated (prior to mounting,
you copy the entire physical disk to RAM), and
it's dedicated to that specific disk. This
could be important in an environment where access
to a particular disk has very tight timing requirements. (Real time audio track streaming
on an audio-mixing box, for instance.)
In most OSes, the disk cache is shared amongst
all disks, and in more modern OSes, the disk
cache grows and shrinks with respect to virtual
memory pressure. This means that no particular
disk gets any special treatment, and the data
you're interested in may not be in cache when
you ask for it, even if there was (and is) room.
Since mesterha had also sent me an email, I previously replied in email, but it's prolly worthwhile to post here to clear the
air around my vague statement above. I've also
added a couple more things that weren't in the
email I sent...
I said:
The main performance benefit of a RAID is in reducing the impact of
seek time on overall throughput.
Ok, not quite true. It is one of the
main performance benefits, though. I later said:
In other words, you do seeks 1/N as often...
... to which mesterha rightly asked:
What are you comparing against, because I just don't see why this would be true.
I did oversimplify a little, and as a result, my statement isn't
fully accurate. I compressed ideas from both RAID 0 and RAID 1
into one statement. Turns out that my statement is true for RAID
0 and RAID 1, but in different cases.
I would figure the seek time would be about the same while the bandwidth would multiply by about N, the number of drives.
I was thinking primarily of a mirrored drive configuration (RAID 1)
or striped+mirrored configuration (RAID 0+1).
Therefore, the seek time would have a greater effect on performance.
For a striped RAID (RAID 0), the main case I was thinking of was
a large linear access. Suppose, for example, each track holds 128
sectors (for sake of argument), and you're reading 512 sectors.
On a single disk, this requires up to 4 track-to-track seeks. On
two disks, each logical track holds 256 sectors, so you require
2 track-to-track seeks. On four, you're down to 1 seek.
You're probably thinking to yourself, "Yeah, you do fewer logical
seeks, but you still have to send the physical seek commands to all the drives, resulting in the same number of physical seeks."
Yes, this is very true. But, you only have to pay the full seek
penalty for one drive. (At least, this is true on SCSI where a
drive can detach from the bus while it's busy. I don't know how
IDE would behave unless all the drives are on separate channels.) Here's what I'm thinking of in that scenario:
Send drive 0 a seek command.
Send drive 1 a seek command.
Send drive 2 a seek command.
Send drive 3 a seek command.
Read from drive 0. This also causes the PC to wait for the seek
for drive 0 to finish. The seeks for drives 1, 2 and 3 complete
roughly in parallel.
Read from drive 1. Should be no noticable seek penalty.
Read from drive 2. Should be no noticable seek penalty.
Read from drive 3. Should be no noticable seek penalty.
Now, this does assume the drives are fairly well matched, or at least that drive 0 is equal or slower than drives 1 through 3. You're correct, though -- if you are doing arbitrary seeking on a
striped RAID, the cost of a single seek remains about the same.
In the case where you have a mirrored RAID (RAID 1), you do have
the potential (for reads, at least) of cutting the average random
seek down by a factor of N. This is because only 1 drive needs to
actually perform the read, and so the elevator can choose the drive
whose head is nearest to the data. In the best case, for fully
random seeks on reads, the N drives would divide the seek range
into N zones amongst themselves, with each zone being 1/N the size
of a full-stroke seek. (This assumes a model where the cost of a
seek is proportional to the distance of the seek, which is not
entirely accurate, but is a reasonable model.) This is the same reason multiple elevators in
a large building speed up access to all floors.
So, RAID 1 potentially cuts the cost of a random seek (on a read)
to approx 1/N in the average case. In more realistic cases, you
might get one disk's head hovering over the metadata, and another
hovering over the data itself.
What's interesting is that a RAID 0 system improves your linear bandwidth with only a minor impact on seeking, and that a RAID 1 system improves your seeking with little impact on
your (read) bandwidth.
If you implement a RAID 0+1 system, you can potentially get both
types of seek-time reduction, as well as
bandwidth improvement, as long as your stripes and mirrors
are all on different disks. This means 2N disks for a potential
factor of 1/N seek-time benefit and factor of N bandwidth improvement...
I concede the need for RAS and CAS--I am aware of these and their corresponding cost. They are immaterial in this context, however. Given that one disk block potentially large enough to cover more than one row, you'll need to re-RAS no matter what. Also, the cost of RAS and CAS are on the order of cycles at 133MHz, whereas the cost of a disk seek is on the order of milliseconds.
If you were transferring a byte here, a byte there, then your argument makes sense. If you're transferring 512 byte or 2048 byte sectors (remember, this is a block device now!), then your argument doesn't pass muster. You're going to spend many times the CAS penalty in cycles just receiving the command.
To me, it appears the key (no pun intended) to thwarting this lies in that the logger is only active while the
modem is active, meaning you have to be online in order to be have your keys logged.
Other way around. The logger is disabled when the modem is active, so that they don't capture any "electronic communications". Given that, the solution is to arrange for a 24/7 Internet link on your PC.:-)
The main performance benefit of a RAID is in reducing the impact of seek time on overall throughput. You pay a little extra in transaction overhead to send commands to multiple drives (instead of a single drive) to gain the dual benefits of cutting the average cost of a seek down, and increasing your linear access bandwidth.
(In other words, you do seeks 1/N as often, and your bandwidth for a linear read within a track is N times what it would be for 1 drive, for a RAID with N drives. At least, this is true for striping.)
With a RAM disk, the cost of seeking is zero. Also, the bandwidth of the RAM already exceeds the available bandwidth of the drive cable. So, if you were to RAID your RAM drives, you'd still have the performance penalty of the additional overhead, but no gain due to hiding seeks or striping your bandwidth. The result would be a net loss in performance.
Now, what might be interesting is a mirrored RAID,
where one side of the mirror was a physical HD,
and the other was RAM. Modify the RAID software to send all reads to the RAM drive by default.
Ta-da! Instant hardware-backed RAM drive! Performance would be lower than a pure RAM drive, but you wouldn't need to do anything unusual to make the RAM's contents persistent. A power loss
looks like a drive failure -- just replicate the
other drive back to the RAM.
That's right, it clicks the speaker.
One of my favorite hacks was to query the
cassette input port ($C020, as I recall),
and toggle the speaker to match the current
input level on the cassette port.
You get essentially a distortion box effect,
changing the analog cassette input into a
squarewave output.
A later hack took that a step further, capturing
from the cassette port into RAM with one program,
and playing back from RAM to the speaker with
another. From what I recall, the sampling rate
was approximately 30kHz with 1 bit per sample.
That means you could store about 6 seconds of
audio in the three hires pages ($2000 - $7FFF).
(Yes, I realize only two of the hires pages were
actual hires pages. The third one was a software
convention only. $2000-$3FFF is the HGR page,
and $4000-$5FFF is the HGR2 page.) Ah yes, such a fun little machine.
Was great for hacking around, wasn't it?
As for the joystick -- that was another lovely
hack, wasn't it? The ol' RC time-delay w/ a
CPU timing loop. That wouldn't fly real far
these days, now would it?;-)
Nor would the floppy disk drive that requires a CPU-speed-dependent timing loop to read correctly.
--Joe
Wow. Small is beautiful. :-)
on
MenuetOS Debuts
·
· Score: 1
Ok, so maybe cp wasn't the best
example. 350 lines of assembler is pretty
impressively small for such a program. Of course, you ignore the 800 or so lines of assembly that it %includes for doing system calls and
the like.:-) (If this were a standard system header, I'd ignore it, but it's
part of the application's source code.)
Still, it's hard to have any bugs
in the original line cp file1 file2,
unless you plain and simply get the filenames
wrong. If you get those wrong, and the filenames
are fixed filenames (no variable substitution),
then you have other problems.:-)
My point is that you can say a lot more in 1
shell script statement than you can in 1 assembler statement.
--Joe
Re:Yes, you are not 100% correct.
on
MenuetOS Debuts
·
· Score: 1
I have to agree. Assembly failure modes can be
VERY much more subtle than high-level language
failure modes. I have to deal with this regularly, as I write debug assembly code on some
rather complex processors. (VLIW CPUs with exposed delay slots lead to some really hard to spot bugs occasionally that have absolutely no analogs in the C world. These bugs can be some of the most annoying Heisenbugs you've ever seen, too...) Basically, writing in assembly code gives you raw speed, but you pay the penalty in not having some sanity checks on your code.
On a related note: One of my professors was fond
of pointing out a study that found that programs tend to have about the same defect density per
line of code regardless of the language they're
written in. That is, if you tend to have one
bug per 50 lines of code (let's say), it doesn't
matter if the code is ASM, C, Java, SQL, or Bourne
Shell Script. Imagine how many lines of ASM it
would take to be equivalent to cp file1 file2?
Now only slightly off topic...
Fear: When you see B8 00 4C CD 21 and know what it means
Uhm, I'm a little fuzzy on this because it's been awhile, but doesn't that terminate a DOS program via
INT 21h with an errorlevel of 0?
Then for you Apple ][ folk out there... what does 2C 30 C0 do?:-)
It depends on whether the two formats represent audio information in a similar enough form to allow transcode without introducing glitches. You also still have the question of how to decode the MP3 sufficiently for transcode w/out violating Fraunhofer's intellectual property, not to mention the fuzzy area that comes from the fact that an MP3 encoder produced the audio data to begin with.
--JoeThey never said anything about compression. Their technology is all about eliminating the throttling effect of TCP acknowledgements on a long haul high-bandwidth link. You can only grow TCP windows so large, and with TCP slow-start, only so fast.
I once saw an article on using TCP for interplanetary work, and they showed that RTT was the bandwidth limiter (bigtime!) due to how the protocol is constructed.
These "Fountain" guys are not about compression. They're about sending XOR blocks to fill in gaps, doing essentially blind-retransmits until the other end says "Ok, I got it all now!" Ick. The XORing bit just apparently helps reduce the number of needed "proactive retransmits."
--JoeRead the article. They're just using XOR. It's like using RAID 4 checksum blocks, except they're doing it on a file transfer instead of on disk blocks.
(I'm oversimplifying a bit, but really their approach doesn't sound all that special.)
--JoeCould make a difference when the server is a load-balanced cluster. Also, if an ACK gets dropped for one stream, the others can absorb the available bandwidth while the retransmit timer times out, which can be useful even there's only one computer at both ends...
--Joe(Disclaimer: I'm not speaking specifically about x86 here, but in the general sense of instruction scheduling.)
If they both encoded into the same size, I'd go with the MOV, though. The latter looks like it depends on the previous value of EAX, even though in this special case it doesn't. Unless the dynamic scheduler has a special test for this (and on x86, because of the coding size issue, I'm pretty sure it does), it'll not allow the XOR to parallelize up with anything that modifies EAX. OTOH, the MOV can not only parallelize, but also get a different rename register and be reordered very aggressively with respect to other places where EAX is used. (That's what rename registers are for, after all.)
--JoeThis isn't true if you have sufficient registers and you use a "bitslice approach." With a bitslicing approach, you store each of the 32 bits of a number across 32 different registers, and you do this for several keys in parallel. For instance, you store the data for key 0 in bit 0 across the 32 registers, the data for key 1 in bit 1, and so on. Then you can process N keys in parallel, where N is the bit-width of your machine. If you have a 64-bit machine, this is twice as efficient as a 32-bit machine, for the same number of registers. XORs don't change, ROTLs by 3 just become moves (if anything), and ADDs are two XORs and an AND. The only hard part with RC5 is when you have to ROTL by a varying amount -- that requires some fancy footwork, but it's not too bad (about 5 sets of tricky ANDs and ORs). :-)
--JoeThe main point you missed, though, is that Linus is right at the macro level -- there is no overall design process for Linux as a system or an overall direction for Linux.
At the kernel subsystem level, there's plenty of design, and plenty of goals, and plenty of localized direction. In the filesystem space, there was a lot of buzz around journalling filesystems. In the MM department, we had something more akin to controlled chaos... :-) And yes, the SCSI layer could use
some actual careful design work.
There was no overarching goal "We must optimize for market X" that drove any of this. Sure, some people want to run Linux on huge machines, and so they want journalling. Other people want to shove Linux into wristwatches and PDAs, and so they instead want to focus on memory footprint. And still others care about interrupt latency over throughput. So, each little care-about niche as their own little projects that pull Linux in lots of different directions at the macro level. Each individual project is very directed, and some have significant design work. But none of it is directed from On High as part of the Grand Plan for The System.
--JoeIn effect, evolution can make a U-turn by branching further back on the tree. Notice that bats and birds both have wings, but they each evolved them by completely different paths, because each started from a different branch point on the tree...
Where you're failing to see the point is that evolution works by being a massively parallel, highly branching gigantic tree. If a branch of the tree goes down a dead-end, no worries -- another branch will make it around the dead-end.
As for no optimal design? Whose to say that our appendix won't mutate into something useful again down the road when climatic or other changes tilt the survivability rules once more? The "extra baggage" we carry around and can afford to support biologically is what allows us to have non-advantageous mutations that eventually morph into something that is advantageous.
Cecil Adams actually covered animals evolving wheels. You might find it interesting. He doesn't tackle the intermediate state problem, though. At any rate, there's the purely biological problem of "How would you keep those wheels 'fed'?" Even if they were made of calcium, you'd need to deposit the calcium to begin with, and then replenish it as it wears. (Wheels will go through a lot more wear than teeth.)
--JoeHere's the link: http://www.ecrix.com/
BTW, speaking of .sigs, Slashdot's added a new feature that makes them less annoying: You can now
enable "sig dashes" in your user profile so that
all sigs get separated from the body text with a
--<BR> or the like. Makes many sigs
much less annoying. (Since you're an A.C.,
I don't suspect it benefits you much, but I
thought I'd mention it nonetheless.)
--JoeThe 53 minute comment is B.S. However, there is a pretty stark relationship between response time and productivity. Check out Hennessey and Patterson sometime, and look up "Response time." As response time drops (gets faster), so-called "think time" (time between user inputs) drops of much, much faster, and productivity shoots WAY up, at least until you hit the user's saturation point. If the system response slows down beyond a magic threshold, productivity falls way off.
So, it's reasonable to say a 10% increase in response time (10% slowdown) will cause a greater-than 10% increase in the time to perform certain tasks. Now whether that translates into a loss in overall productivity, or just less time kibitzing at the watercooler is hard to say.
--JoeI think you missed the point. Take the car example, for instance: When telling someone how to drive, you could say "ok, move this stick here, when you get to that corner turn right, etc." and give them very specific directions on how to drive their car between very specific locations at specific times of day. Or, you could tell them the basics of operating the car, teach them the rules of the road, and teach them how to read maps. In neither case do you need to open the hood of the car.
This corresponds to the two sets of computer users I see -- the "I know my exact routine" users that panic when you move their desktop icons around (these are the 'sheep'), and the ones that actually know how to use the computer at some level above rote memorization.
My fiancee was telling me about the one time she played Solitaire on the secretary's computer at her grandfather's business. The secretary panicked and got upset at her, because she had minimized the Program Manager (this was Win 3.x), and the secretary didn't know how to "get it back." The secretary literally knew the exact sequence of clicks to perform her limited set of tasks involving the PC, and was essentially serving as a biological "macro".
Too many "training programs" are really just "rote memorization of specific sequences" rather than actual "learning the general principles of using the machine." The principles don't need to be very low level to be useful -- just ideas such as what various clicks do, where various things are found, etc. The secretary I mentioned above didn't even know what "maximizing" and "minimizing" windows was about!
Whether driving a car, cooking food, using a computer, building something, or whatever, it's far more effective to teach the general principles and build from there than have the student memorize a few specific details. "Give a fish" vs. "teach to fish."
--JoeIt's really not that hard. AH and AL map to AX on x86 the same way the A and B accumulators on the 6800 map to the D accumulator. (Ok, you really meant you prefer 68k, not all things Motorola.) Ignoring the 8-bit sub-registers, you have 7 16-bit general purpose registers to work with -- AX, BX, CX, DX, SI, DI, BP. (Ok, someone will scream "BP isn't a general purpose register." Just turn on -fomit-frame-pointer when calling GCC and get on with life. Or go play in traffic.) These registers can hold data OR addresses -- no partitioning between the two uses. (I just heard someone squeal about addressing modes, and maybe something about string instructions... Most of the time, it's not an issue. Hey, aren't you the same guy who was squeaking about BP?)
So, anyway, how is that harder than having to deal with separate data and address registers? (Why partition registers by functionality?) And as long as all your data fits in 64K, you're all set--you never have to think about segments. ;-) (Hey you, with the GBs of porn... put that thing back in your pants. And stop listening to those MBs of pirated MP3s.)
Now the 32-bit extensions are even easier, if you ignore the 16-bit ways of referring to registers (most of the time, it's better that way anyway), you just put an 'E' in front of the name and they're all 32 bits. Not too hard to think about. And at least under Linux (I claim no knowledge of Windows), you get a nice 32-bit address space.
So, see, it's not so bad. (Hey you, snickering in the back! Cut it out.) The x86-64 extensions just continue this tradition -- as long as you ignore the old crap, the new crap isn't all that horrible.
All that said, I agree that x86 is a pretty grotty CPU architecture. :-)
Exercise for the reader: Identify uses of sarcasm and ironic humor in the above post.
--JoeThe changelog does not constitute security testing, though. Writing and/or using a program which tests for a hole and merely says "You're vulnerable, install the patch" (or, if it's part of the patch routine, just installs the patch) qualifies as security testing. Describing the vulnerability, though, such that anyone could potentially write a program to circumvent the access control is not security testing.
Besides, If I'm understanding correctly, this clause says specifically that you can still run afoul of the clause I quoted.
--JoeAnd if you read the thread, you'll see that Alan Cox's assertion is that UNIX-style permissions can be used for digital rights managment purposes. That is, they can be used as an access control to protect copyrighted works that are covered under the DMCA. Therefore, disclosing a security vulnerability which can subvert UNIX-style permissions is equivalent to describing how to circumvent an access-control device as described under the DMCA.
I would guess that the specific DMCA clause that Alan's affected by is this one:
(2) No person shall manufacture, import, offer to the public, provide, or otherwise traffic in any technology, product, service, device, component, or part thereof, that--
(A) is primarily designed or produced for the purpose of circumventing a technological measure that effectively controls access to a work protected under this title;
(B) has only limited commercially significant purpose or use other than to circumvent a technological measure that effectively controls access to a work protected under this title; or
(C) is marketed by that person or another acting in concert with that person with that person's knowledge for use in circumventing a technological measure that effectively controls access to a work protected under this title.
It would seem Alan's conjecture is that describing a specific vulnerability in the Linux kernel that allows subverting some aspect of Linux's permission structure (which can be used as an access control device to a protected work) constitutes "traffic[king] in any technology [...] or part thereof" that would allow someone to circumvent the access control. Under the current interpretation of the law (re: Skylarov), detailing a security weakness in a product seems to (a) constitute such trafficking, and (b) seems to fit one of the three clauses 2(A), 2(B), or 2(C) above. (Notice they're connected by an 'or', so it's is necessary to fit only one of the three to be in violation of DMCA. I'm guessing the kernel information would fit 2(A).)
I'm so proud to be an American, where at least I know I'm free[*]. :-P
--Joe[*] For a suitably narrow definition of free.
Reminds me of an anecdote I heard about a telegraph operator that suspected the message he was sending was in 'code'. It read "Mr. So-and-so is dead." He sent it as "Mr. So-and-so is deceased." A confused reply came: "Is Mr. So-and-so dead or deceased?"
Suspicion confirmed.
--JoeRotational latency IS a very important aspect of average access time for a given sector. Here's a quick rundown of how long one full platter rotation takes at various common drive RPMs:
Seeking from one sector to another requires both moving the head and acquiring the sector once you arrive at the track. Hopefully, the drive is laid out so that most common operations (linear reads that hop track-to-track) don't have to pay the rotational latency. Also, if you do a large linear read request to the drive (something I seem to recall SCSI supports better than IDE), the drive can be smart and read the whole track starting wherever the head lands -- thus hiding the rotational latency in certain cases. But for random seeks reading single blocks, there's not much you can do.
--JoeFrom what I remember, RAID 5 is intended for reliability, not performance. Mirroring a RAID 5 should improve the performance of reads due to seek-time reduction, but it will make the already incredibly expensive writes (on RAID 5, at least) even more expensive.
In general, a mirroring RAID makes reads faster and writes slower. Why does a mirroring RAID make writes more expensive? Because all drives must seek to the location of the write in order to perform the write. Since the heads on all the mirrors may be scattered around the disk (a good thing for reads since it reduces average seek time on reads), the seek for any given write is as expensive as the most expensive seek across all the disks in the array. This makes the average for the whole array suck worse than it would for one disk.
As a separate issue, RAID 5 (which doesn't do mirroring in its basic form) still makes writes very expensive. Why are RAID 5 writes expensive? Because you have to update two blocks for every isolated 1 block update -- the block in question and its corresponding checksum block. Worst of all, if any one of the other blocks that share the checksum block aren't in cache, the checksum update is a read-modify-write cycle. OUCH! If you have lots of randomly scattered single-block updates, performance starts to suck pretty badly pretty quickly. On the plus side, if more than one block in a given update share a given checksum block, then the number of checksum updates go down since the updates can be combined. This means that the impact of RAID 5 is much less on large linear writes. In the best possible case, all blocks that share a given checksum block are being updated together in parallel with the checksum, so you can blast out the N blocks with N+1 parallel writes to the N+1 disks without any read-modify-write cycles.
(Note that it would seem that you can theoretically alleviate the r-m-w suckage I mention above by increasing your filesystem block size to at least N*physical_block_size, so that any update coming from the OS forces all physical blocks that share the same checksum block to be updated together. The suckage I mention above applies only if your filesystem's block size is smaller than N*physical_block_size. As I recall, physical_block_size is typically 512 bytes. For ext2, this means making your fs with a 4K block size instead of the default 1K block size if you're on a RAID 5 with more than 3 disks...)
--JoeTwo ways: It's pre-populated (prior to mounting, you copy the entire physical disk to RAM), and it's dedicated to that specific disk. This could be important in an environment where access to a particular disk has very tight timing requirements. (Real time audio track streaming on an audio-mixing box, for instance.)
In most OSes, the disk cache is shared amongst all disks, and in more modern OSes, the disk cache grows and shrinks with respect to virtual memory pressure. This means that no particular disk gets any special treatment, and the data you're interested in may not be in cache when you ask for it, even if there was (and is) room.
--JoeSince mesterha had also sent me an email, I previously replied in email, but it's prolly worthwhile to post here to clear the air around my vague statement above. I've also added a couple more things that weren't in the email I sent...
I said:
Ok, not quite true. It is one of the main performance benefits, though. I later said:
I did oversimplify a little, and as a result, my statement isn't fully accurate. I compressed ideas from both RAID 0 and RAID 1 into one statement. Turns out that my statement is true for RAID 0 and RAID 1, but in different cases.
I was thinking primarily of a mirrored drive configuration (RAID 1) or striped+mirrored configuration (RAID 0+1).
For a striped RAID (RAID 0), the main case I was thinking of was a large linear access. Suppose, for example, each track holds 128 sectors (for sake of argument), and you're reading 512 sectors. On a single disk, this requires up to 4 track-to-track seeks. On two disks, each logical track holds 256 sectors, so you require 2 track-to-track seeks. On four, you're down to 1 seek.
You're probably thinking to yourself, "Yeah, you do fewer logical seeks, but you still have to send the physical seek commands to all the drives, resulting in the same number of physical seeks."
Yes, this is very true. But, you only have to pay the full seek penalty for one drive. (At least, this is true on SCSI where a drive can detach from the bus while it's busy. I don't know how IDE would behave unless all the drives are on separate channels.) Here's what I'm thinking of in that scenario:
Now, this does assume the drives are fairly well matched, or at least that drive 0 is equal or slower than drives 1 through 3. You're correct, though -- if you are doing arbitrary seeking on a striped RAID, the cost of a single seek remains about the same.
In the case where you have a mirrored RAID (RAID 1), you do have the potential (for reads, at least) of cutting the average random seek down by a factor of N. This is because only 1 drive needs to actually perform the read, and so the elevator can choose the drive whose head is nearest to the data. In the best case, for fully random seeks on reads, the N drives would divide the seek range into N zones amongst themselves, with each zone being 1/N the size of a full-stroke seek. (This assumes a model where the cost of a seek is proportional to the distance of the seek, which is not entirely accurate, but is a reasonable model.) This is the same reason multiple elevators in a large building speed up access to all floors.
So, RAID 1 potentially cuts the cost of a random seek (on a read) to approx 1/N in the average case. In more realistic cases, you might get one disk's head hovering over the metadata, and another hovering over the data itself.
What's interesting is that a RAID 0 system improves your linear bandwidth with only a minor impact on seeking, and that a RAID 1 system improves your seeking with little impact on your (read) bandwidth.
If you implement a RAID 0+1 system, you can potentially get both types of seek-time reduction, as well as bandwidth improvement, as long as your stripes and mirrors are all on different disks. This means 2N disks for a potential factor of 1/N seek-time benefit and factor of N bandwidth improvement...
--JoeI concede the need for RAS and CAS--I am aware of these and their corresponding cost. They are immaterial in this context, however. Given that one disk block potentially large enough to cover more than one row, you'll need to re-RAS no matter what. Also, the cost of RAS and CAS are on the order of cycles at 133MHz, whereas the cost of a disk seek is on the order of milliseconds. If you were transferring a byte here, a byte there, then your argument makes sense. If you're transferring 512 byte or 2048 byte sectors (remember, this is a block device now!), then your argument doesn't pass muster. You're going to spend many times the CAS penalty in cycles just receiving the command.
--JoeOther way around. The logger is disabled when the modem is active, so that they don't capture any "electronic communications". Given that, the solution is to arrange for a 24/7 Internet link on your PC. :-)
--JoeThe main performance benefit of a RAID is in reducing the impact of seek time on overall throughput. You pay a little extra in transaction overhead to send commands to multiple drives (instead of a single drive) to gain the dual benefits of cutting the average cost of a seek down, and increasing your linear access bandwidth. (In other words, you do seeks 1/N as often, and your bandwidth for a linear read within a track is N times what it would be for 1 drive, for a RAID with N drives. At least, this is true for striping.)
With a RAM disk, the cost of seeking is zero. Also, the bandwidth of the RAM already exceeds the available bandwidth of the drive cable. So, if you were to RAID your RAM drives, you'd still have the performance penalty of the additional overhead, but no gain due to hiding seeks or striping your bandwidth. The result would be a net loss in performance.
Now, what might be interesting is a mirrored RAID, where one side of the mirror was a physical HD, and the other was RAM. Modify the RAID software to send all reads to the RAM drive by default. Ta-da! Instant hardware-backed RAM drive! Performance would be lower than a pure RAM drive, but you wouldn't need to do anything unusual to make the RAM's contents persistent. A power loss looks like a drive failure -- just replicate the other drive back to the RAM.
--Joe*ding* *ding* *ding* We have a winner!
That's right, it clicks the speaker. One of my favorite hacks was to query the cassette input port ($C020, as I recall), and toggle the speaker to match the current input level on the cassette port. You get essentially a distortion box effect, changing the analog cassette input into a squarewave output.
A later hack took that a step further, capturing from the cassette port into RAM with one program, and playing back from RAM to the speaker with another. From what I recall, the sampling rate was approximately 30kHz with 1 bit per sample. That means you could store about 6 seconds of audio in the three hires pages ($2000 - $7FFF). (Yes, I realize only two of the hires pages were actual hires pages. The third one was a software convention only. $2000-$3FFF is the HGR page, and $4000-$5FFF is the HGR2 page.) Ah yes, such a fun little machine. Was great for hacking around, wasn't it?
As for the joystick -- that was another lovely hack, wasn't it? The ol' RC time-delay w/ a CPU timing loop. That wouldn't fly real far these days, now would it? ;-)
Nor would the floppy disk drive that requires a CPU-speed-dependent timing loop to read correctly.
--JoeOk, so maybe cp wasn't the best example. 350 lines of assembler is pretty impressively small for such a program. Of course, you ignore the 800 or so lines of assembly that it %includes for doing system calls and the like. :-) (If this were a standard system header, I'd ignore it, but it's
part of the application's source code.)
Still, it's hard to have any bugs in the original line cp file1 file2, unless you plain and simply get the filenames wrong. If you get those wrong, and the filenames are fixed filenames (no variable substitution), then you have other problems. :-)
My point is that you can say a lot more in 1
shell script statement than you can in 1 assembler statement.
--JoeI have to agree. Assembly failure modes can be VERY much more subtle than high-level language failure modes. I have to deal with this regularly, as I write debug assembly code on some rather complex processors. (VLIW CPUs with exposed delay slots lead to some really hard to spot bugs occasionally that have absolutely no analogs in the C world. These bugs can be some of the most annoying Heisenbugs you've ever seen, too...) Basically, writing in assembly code gives you raw speed, but you pay the penalty in not having some sanity checks on your code.
On a related note: One of my professors was fond of pointing out a study that found that programs tend to have about the same defect density per line of code regardless of the language they're written in. That is, if you tend to have one bug per 50 lines of code (let's say), it doesn't matter if the code is ASM, C, Java, SQL, or Bourne Shell Script. Imagine how many lines of ASM it would take to be equivalent to cp file1 file2?
Now only slightly off topic...
Uhm, I'm a little fuzzy on this because it's been awhile, but doesn't that terminate a DOS program via INT 21h with an errorlevel of 0?
Then for you Apple ][ folk out there... what does 2C 30 C0 do? :-)
--Joe