Sorry, Peter, but I just can't bear
the thought of anyone coding in PICO!
PICO is good for it's original task -- a user-friendly editor to sit inside PINE.
(PICO stands for PIne COmposer.)
My main complaint about PICO is that it tries
to provide the friendliness a word processor, with none of the real power of a word processor.
In doing so, it loses the power of a text editor
as well. I feel like I've sat down in front of
Microsoft Word, except someone took away all the
tool bars and hid the mouse. That, and its
insistance on word-wrapping lines in contexts
where it's not appropriate (line-oriented
languages, anyone?) make it insuitable for hard-core work.
I like VI since I can do rather crazy edits
(like the other day when I was fixing someone
else's code and rewrote all expressions of the
form "restrict type_qualifier type variable_name[]" as "type_qualifier type *restrict variable_name") with a single VI
command. I've even written a maze-solver in VI.
Do that in PICO.
Leave PICO where it belongs -- for jotting short
messages to friends via email -- and use a real
editor for real editing tasks.
This isn't flamebait, and I feel it's on-topic,
as the Slashdot article above directly mentioned
PICO. I must say it: Friends don't let friends
use PICO!
No, you don't. You rely on at least two very highly trained and experienced human pilots, who control the plane and make the go/no-go decisions.
Sure, but they operate on a much higher level than
the rest of the plane. There's a layer or two of
software between the pilot's control inputs and
the actual physical mechanisms being actuated.
Besides, ever hear of autopilot?
Also, isn't landing largely computer controlled?
And how about the radar units in the control tower? Believe me, despite the fact that the
pilot and copilot are issuing commands to the
plane, your safe arrival at the airport relies
on a lot of software.
Long time ago I watched Frank McCourt on TV, cracking a joke about how he was confused as a child after the Pope dismissed the existance of limbo, and was perplexed as to where all the souls not baptised will go.
George Carlin did a bit on that as well, in
his "I used to be an Irish Catholic" routine.
Perhaps that's what you were thinking of?
Remember not to offer any solutions (engineer's instinct;-) to whatever problems they might have;
Definitely! I've run into the same problem
myself. I'm always trying to help, but sometimes
the right way to help is just to listen.
It's hard listening and not offering suggestions
or solutions,
though, as it's all too tempting to either
offer one, or to stop listening to avoid the
temptation. Neither one's what she's looking
for...
As I recall, there is a small transient current during
switching (basically, it's like charging a
capacitor), but practically no continuous current draw. That's why LCDs are low in power.
It was 'tits'. The next line in Carlin's routine
was "...and 'tits' doesn't even belong on the
list!" He then expounds on the many fine and
varied possible uses of the word. "Sounds like
a snack... New Nabisco Tits!"
Urgl... the link got bogarted. Again, that's
a deeply pipelined CPU. To give you an idea of how deeply pipelined it is, a "Load Word" instruction takes five cycles, meaning I issue the "Load Word", wait four cycles, and then
I get to use the result.:-)
Latency is important to the extent
that it limits bandwidth. As other posters point
out, modern CPUs have many mechanisms for dealing
with latency to a certain extent. (Some, such
as Alpha, go to Herculean extremes with a
gigantic reorder buffer and a cache which
allows four or five outstanding misses to pend
while still allowing hits in the cache.)
In the end, the raw amount of work performed is
measured in terms of bandwidth. To process N
items, you need to touch N items, and how quickly
you touch those N items is expressed as bandwidth.
As a person who programs a deeply pipelined CPU, I can
attest that latency can affect some algorithms (especially general purpose control algorithms)
more than others, since it limits how quickly you
can process a given non-bandwidth-limited task.
However, for raw calculation (eg. all those
graphics tasks and huge matrix crunching tasks
the numbers folks like to run), those tasks are
fairly latency tolerant and just need bandwidth.
This is why number-crunching jobs might work
well with, say, RAMBUS, but desktop jobs might
work better with DDR SDRAM, even if the two are
at the same theoretical bandwidth node.
Second, there is a lot of research out there on cache technologies, including such out-there thinking. One that's on perhaps
the "lip" of the box (not quite in the box,
but not really out of the box) is the stride-predicting cache, which tries
to prefetch data based on CPU access patterns.
One that I think is really out-of-the-box is value prediction, that is, guessing what a value read from memory might be based on previous executions of the
same instruction. (You'd be surprised how
effective "guess zero" is!)
The thing is, you're not going to see many of these new features on mainstream processors for
a number of years, because many of these ideas
take time to really reach maturity. Remember,
the Tomasulo algorithm (also known as register
renaming) was developed in the 60s, but didn't
show up on the desktop until x86's sixth generation (the PentiumPro).
Yeah, but that's outgoing traffic,
and I'm guessing it means "going outside the
University network", not just other machines
on campus.
Why would you need to serve RedHat to the world
at large from your dorm room?
Uhm, no. As the person you responded to stated,
if there are more people next year than there
are this year, and if everyone bought one computer, then there will be more computers
sold next year than this year. Thus, the market
has grown.
Sure it is! At least, it is on Windows... The little flying piece of paper icon takes more memory than a CGA framebuffer had. Updating
it would cost about the same as full-screen
animation did way back when...;-)
What, relying on miracles at the process level
wasn't more significant?:-)
As I recall, they were doing all sorts of out-there things, such as a metal layer on the
die that was "in the air" -- the rest of the surface was etched away around it to reduce sidewall capacitance or somesuch. Some aspects
of their design were far too researchy to be
viable for production, IMHO.
But, then, I'm just a partially informed spectator, so what do I know?
Yes, you can measure time in meters (which,
incidentally, makes velocity a unitless
quantity as it's merely a slope at that point),
but in reality, nobody seems to as
a practical matter. Part of the problem is that
there's a sqrt(-1) in there that's rather annoying.
And yes, I took a semester of quantum mechanics
myself.
Ack, you people don't get it! Here's the short,
simple explanation:
When the program is run the first time, you
see "Code Morphing Time + Program Running Time."
To the user, this manifests itself as "Total
Running Time."
The second time you run it you mostly only
see "Program Running Time" (and some
"Code Morphing Time", but not nearly as much),
and so "Total Running Time" looks somewhat
smaller. The reality is that "Program Running
Time" didn't really change much, if at all.
In a real world scenario (not a benchmark),
"Program Running Time" is the important figure,
as you typically end up using the program for
quite a long period of time and so the "Code
Morphing Time" ends up being in the noise,
rather than being one of the dominant terms
as it is
in some of these benchmarks.
DISCLAIMER: I am not a kernel hacker, so I
might have some factual errors in the text below.
Kernel hackers: Feel free to correct me.
It does bad things if the clock rate
varies, as this affects micro-delay loops that
are used when talking to certain (broken)
peripherals. The execution speed of the instructions
varies even on true Intel parts. The kernel
has two mechanisms to cope with this, and the
important one should work fine on Transmeta.
(Reference: arch/i386/lib/delay.c
in the kernel source.)
The older mechanism is the BogoMIPS busy loop.
This mechanism relies on a tight loop that
fits in cache and should run with fixed behavior
on a given device. This mechanism probably
doesn't work real well on a Transmeta part, though
I suspect Code Morphing would hit steady state
real soon and so the BogoMIPS loop wouldn't
be hurt too badly. Still, it's suboptimal. That leads me to the second mechanism.
The newer mechanism which is available on most
modern CPUs is the Time Stamp Counter, which
returns a cycle count rendered in terms of
CPU clock cycles. As long as you know the MHz
rate of the CPU, you can measure time very accurately. Presumably, despite the Code
Morphing layer, the Transmeta CPU will return
a meaningful, coherent clock count for this
instruction.
The problem with varying clock rates is that
the time-base for the BogoMIPS or TSC clock change
and the kernel isn't notified. In theory,
the Transmeta could actually just use a fixed-rate
counter for the TSC whose time-base didn't vary as the CPU's clock-rate varied, thus fixing the problem entirely. But then, that'd make too much
sense.;-)
As for HLT, I thought Linux did that
already? That's how come my CPU stays nice and
ice cold when I'm not running my Distributed Net
client. A quick look at arch/i386/kernel/process.c shows the uniprocessor idle loop calling __asm__("hlt"); as long as the CPU supports it.
The reason that a benchmark runs faster the
second time is that the Code Morphing software
doesn't need to retranslate, and with these
short-lived benchmarks, the translation time
is a significant amount of the timing.
Rebooting won't necessarily
result in a faster translation, as the Code Morphing software supposedly re-morphs sections of code more aggressively over time anyway if they get called often. Basically, if
you rebooted your kernel, you might reboot more
quickly, but the steady-state performance of
the system would be identical after a minute
or so.
This is the reason standard benchmarking is unreliable on a Transmeta part. Basically, the
benchmark runs end to end touching many features
of an application, but not really reusing many
of them, so you get charged
the startup and initial Code Morphing overhead
on a large body of code and you don't get to see the actual steady-state
performance of the device. In contrast, if a
user's sitting there using Word for an hour,
they'll spend 99% of there time at steady
state using just a few features the bulk of the
time.
So, no, you don't need to reboot to get a faster kernel on a Transmeta device, unless you just want to watch it boot 2 seconds faster the second time.
ROFL!!!!
Someone mod that (+1, Funny). It certainly is not offtopic if you consider what we're discussing. (eg. Watt vs. Watt-hour and parsec being a unit of distance, not time).
Don't count on it. Most of the power in a notebook is consumed by the backlight and RAM. The CPU accounts for a relatively small percentage. They couldn't even get that kind of battery life if
the CPU used *no* power.
True. Of course, the TM5400 also swallows part of the PCI bridge, and eliminates some of the other legacy devices that are normally out in the chipset by emulating them in software at the Code Morphing level. Both of these steps also help save power (fewer bus transactions, fewer off-chip peripherals) that wasn't being counted towards the CPU to begin with. (Incidentally, having part of the PCI bridge onchip may account for those impressive memcpy/memset scores.)
Still, this isn't quite the same as integrating the backlight onchip.;-)
I believe that Intel "owns" SMP in some intellectual property legal sense. Have they erected any obstacles to AMDs chips being more compatible with Intel SMP?
Sorta. As I recall, Intel owns patents which cover the APIC (Advanced Programmable Interrupt Controller), and these patents are related to
APIC programming in an SMP environment. The K5 and K6's used the OpenPIC standard to avoid this, but there were no OpenPIC boards and so effectively no SMP w/ K5 and K6.
The EV6-style
SMP that Athlon uses avoids both of these issues
by using an SMP model which has existing boards (the SMP Alpha boards) and which isn't covered by Intel's patents. (Of course, the Alpha boards
can't be used directly for some reason, but
at least they're closer than the non-existant
boards for the K5s and K6s). I imagine the EV6-style SMP requires different OS support, though. That is, Linux, BeOS and WinNT would all need different SMP drivers for the SMP Athlon boards to replace the APIC code. (Basically,
they avoid the Intel patent by not designing
to Intel's MP spec, but that would imply that
OSes need to have differen't MP drivers to
support it.)
Sorry, Peter, but I just can't bear the thought of anyone coding in PICO! PICO is good for it's original task -- a user-friendly editor to sit inside PINE. (PICO stands for PIne COmposer.)
My main complaint about PICO is that it tries to provide the friendliness a word processor, with none of the real power of a word processor. In doing so, it loses the power of a text editor as well. I feel like I've sat down in front of Microsoft Word, except someone took away all the tool bars and hid the mouse. That, and its insistance on word-wrapping lines in contexts where it's not appropriate (line-oriented languages, anyone?) make it insuitable for hard-core work.
I like VI since I can do rather crazy edits (like the other day when I was fixing someone else's code and rewrote all expressions of the form "restrict type_qualifier type variable_name[]" as " type_qualifier type *restrict variable_name ") with a single VI command. I've even written a maze-solver in VI. Do that in PICO.
Leave PICO where it belongs -- for jotting short messages to friends via email -- and use a real editor for real editing tasks.
This isn't flamebait, and I feel it's on-topic, as the Slashdot article above directly mentioned PICO. I must say it: Friends don't let friends use PICO!
--Joe--
Program Intellivision!
Sure, but they operate on a much higher level than the rest of the plane. There's a layer or two of software between the pilot's control inputs and the actual physical mechanisms being actuated. Besides, ever hear of autopilot?
Also, isn't landing largely computer controlled? And how about the radar units in the control tower? Believe me, despite the fact that the pilot and copilot are issuing commands to the plane, your safe arrival at the airport relies on a lot of software.
--Joe--
Program Intellivision!
George Carlin did a bit on that as well, in his "I used to be an Irish Catholic" routine. Perhaps that's what you were thinking of?
--Joe--
Wanna program the Intellivision? Get an Intellicart!
Not particularly. If a sig weren't interesting enough to occasionally garner a comment, then it's not worth having. :-)
--Joe--
Wanna program the Intellivision? Get an Intellicart!
Definitely! I've run into the same problem myself. I'm always trying to help, but sometimes the right way to help is just to listen. It's hard listening and not offering suggestions or solutions, though, as it's all too tempting to either offer one, or to stop listening to avoid the temptation. Neither one's what she's looking for...
And yes, I'm an engineer too...
--Joe--
Wanna program the Intellivision? Get an Intellicart!
Slightly OT, but still funny. I loved the followed quote from the linked page, about three paragraphs down (emphasis mine):
--Joe--
Wanna program the Intellivision? Get an Intellicart!
As I recall, there is a small transient current during switching (basically, it's like charging a capacitor), but practically no continuous current draw. That's why LCDs are low in power.
--Joe--
Wanna program the Intellivision? Get an Intellicart!
It was 'tits'. The next line in Carlin's routine was "...and 'tits' doesn't even belong on the list!" He then expounds on the many fine and varied possible uses of the word. "Sounds like a snack ... New Nabisco Tits!"
Anyway...
--Joe--
And Tyts doesn't even belong on the lyst!
--Joe, catching the Gyorge Carlyn ryferynce.--
What about Fair U*ugh*... oops, forgot, we don't have that anymore.
--Joe--
Urgl... the link got bogarted. Again, that's a deeply pipelined CPU. To give you an idea of how deeply pipelined it is, a "Load Word" instruction takes five cycles, meaning I issue the "Load Word", wait four cycles, and then I get to use the result. :-)
--Joe--
Latency is important to the extent that it limits bandwidth. As other posters point out, modern CPUs have many mechanisms for dealing with latency to a certain extent. (Some, such as Alpha, go to Herculean extremes with a gigantic reorder buffer and a cache which allows four or five outstanding misses to pend while still allowing hits in the cache.)
In the end, the raw amount of work performed is measured in terms of bandwidth. To process N items, you need to touch N items, and how quickly you touch those N items is expressed as bandwidth.
As a person who programs a deeply pipelined CPU, I can attest that latency can affect some algorithms (especially general purpose control algorithms) more than others, since it limits how quickly you can process a given non-bandwidth-limited task. However, for raw calculation (eg. all those graphics tasks and huge matrix crunching tasks the numbers folks like to run), those tasks are fairly latency tolerant and just need bandwidth.
This is why number-crunching jobs might work well with, say, RAMBUS, but desktop jobs might work better with DDR SDRAM, even if the two are at the same theoretical bandwidth node.
--Joe --Joe--
First: Nice troll account. (s/Shooboy/Shoeboy/, eh?).
Second, there is a lot of research out there on cache technologies, including such out-there thinking. One that's on perhaps the "lip" of the box (not quite in the box, but not really out of the box) is the stride-predicting cache, which tries to prefetch data based on CPU access patterns. One that I think is really out-of-the-box is value prediction, that is, guessing what a value read from memory might be based on previous executions of the same instruction. (You'd be surprised how effective "guess zero" is!)
The thing is, you're not going to see many of these new features on mainstream processors for a number of years, because many of these ideas take time to really reach maturity. Remember, the Tomasulo algorithm (also known as register renaming) was developed in the 60s, but didn't show up on the desktop until x86's sixth generation (the PentiumPro).
--Joe--
Yeah, but that's outgoing traffic, and I'm guessing it means "going outside the University network", not just other machines on campus. Why would you need to serve RedHat to the world at large from your dorm room?
--Joe--
Uhm, no. As the person you responded to stated, if there are more people next year than there are this year, and if everyone bought one computer, then there will be more computers sold next year than this year. Thus, the market has grown.
--Joe--
Sure it is! At least, it is on Windows... The little flying piece of paper icon takes more memory than a CGA framebuffer had. Updating it would cost about the same as full-screen animation did way back when... ;-)
--Joe--
What, relying on miracles at the process level wasn't more significant? :-)
As I recall, they were doing all sorts of out-there things, such as a metal layer on the die that was "in the air" -- the rest of the surface was etched away around it to reduce sidewall capacitance or somesuch. Some aspects of their design were far too researchy to be viable for production, IMHO.
But, then, I'm just a partially informed spectator, so what do I know?
--Joe--
Yes, you can measure time in meters (which, incidentally, makes velocity a unitless quantity as it's merely a slope at that point), but in reality, nobody seems to as a practical matter. Part of the problem is that there's a sqrt(-1) in there that's rather annoying.
And yes, I took a semester of quantum mechanics myself.
--Joe--
Ack, you people don't get it! Here's the short, simple explanation:
When the program is run the first time, you see "Code Morphing Time + Program Running Time." To the user, this manifests itself as "Total Running Time." The second time you run it you mostly only see "Program Running Time" (and some "Code Morphing Time", but not nearly as much), and so "Total Running Time" looks somewhat smaller. The reality is that "Program Running Time" didn't really change much, if at all.
In a real world scenario (not a benchmark), "Program Running Time" is the important figure, as you typically end up using the program for quite a long period of time and so the "Code Morphing Time" ends up being in the noise, rather than being one of the dominant terms as it is in some of these benchmarks.
--Joe--
DISCLAIMER: I am not a kernel hacker, so I might have some factual errors in the text below. Kernel hackers: Feel free to correct me.
It does bad things if the clock rate varies, as this affects micro-delay loops that are used when talking to certain (broken) peripherals. The execution speed of the instructions varies even on true Intel parts. The kernel has two mechanisms to cope with this, and the important one should work fine on Transmeta.
(Reference: arch/i386/lib/delay.c in the kernel source.)
The older mechanism is the BogoMIPS busy loop. This mechanism relies on a tight loop that fits in cache and should run with fixed behavior on a given device. This mechanism probably doesn't work real well on a Transmeta part, though I suspect Code Morphing would hit steady state real soon and so the BogoMIPS loop wouldn't be hurt too badly. Still, it's suboptimal. That leads me to the second mechanism.
The newer mechanism which is available on most modern CPUs is the Time Stamp Counter, which returns a cycle count rendered in terms of CPU clock cycles. As long as you know the MHz rate of the CPU, you can measure time very accurately. Presumably, despite the Code Morphing layer, the Transmeta CPU will return a meaningful, coherent clock count for this instruction.
The problem with varying clock rates is that the time-base for the BogoMIPS or TSC clock change and the kernel isn't notified. In theory, the Transmeta could actually just use a fixed-rate counter for the TSC whose time-base didn't vary as the CPU's clock-rate varied, thus fixing the problem entirely. But then, that'd make too much sense. ;-)
As for HLT, I thought Linux did that already? That's how come my CPU stays nice and ice cold when I'm not running my Distributed Net client. A quick look at arch/i386/kernel/process.c shows the uniprocessor idle loop calling __asm__("hlt"); as long as the CPU supports it.
--Joe--
The reason that a benchmark runs faster the second time is that the Code Morphing software doesn't need to retranslate, and with these short-lived benchmarks, the translation time is a significant amount of the timing. Rebooting won't necessarily result in a faster translation, as the Code Morphing software supposedly re-morphs sections of code more aggressively over time anyway if they get called often. Basically, if you rebooted your kernel, you might reboot more quickly, but the steady-state performance of the system would be identical after a minute or so.
This is the reason standard benchmarking is unreliable on a Transmeta part. Basically, the benchmark runs end to end touching many features of an application, but not really reusing many of them, so you get charged the startup and initial Code Morphing overhead on a large body of code and you don't get to see the actual steady-state performance of the device. In contrast, if a user's sitting there using Word for an hour, they'll spend 99% of there time at steady state using just a few features the bulk of the time.
So, no, you don't need to reboot to get a faster kernel on a Transmeta device, unless you just want to watch it boot 2 seconds faster the second time.
--Joe--
ROFL!!!! Someone mod that (+1, Funny). It certainly is not offtopic if you consider what we're discussing. (eg. Watt vs. Watt-hour and parsec being a unit of distance, not time).
--Joe--
True. Of course, the TM5400 also swallows part of the PCI bridge, and eliminates some of the other legacy devices that are normally out in the chipset by emulating them in software at the Code Morphing level. Both of these steps also help save power (fewer bus transactions, fewer off-chip peripherals) that wasn't being counted towards the CPU to begin with. (Incidentally, having part of the PCI bridge onchip may account for those impressive memcpy/memset scores.)
Still, this isn't quite the same as integrating the backlight onchip. ;-)
--Joe--
Sorta. As I recall, Intel owns patents which cover the APIC (Advanced Programmable Interrupt Controller), and these patents are related to APIC programming in an SMP environment. The K5 and K6's used the OpenPIC standard to avoid this, but there were no OpenPIC boards and so effectively no SMP w/ K5 and K6.
The EV6-style SMP that Athlon uses avoids both of these issues by using an SMP model which has existing boards (the SMP Alpha boards) and which isn't covered by Intel's patents. (Of course, the Alpha boards can't be used directly for some reason, but at least they're closer than the non-existant boards for the K5s and K6s). I imagine the EV6-style SMP requires different OS support, though. That is, Linux, BeOS and WinNT would all need different SMP drivers for the SMP Athlon boards to replace the APIC code. (Basically, they avoid the Intel patent by not designing to Intel's MP spec, but that would imply that OSes need to have differen't MP drivers to support it.)
--Joe--
You're in the same boat I am. I've pretty much personally resigned to waiting until about mid-March next year, just based on gut feel.
--Joe--