David+Greene · Slashdot Mirror

Re:Then "Moore's Law" has fallen off a truck on ASUS P4 Motherboard Bests Intel, Says Sharky · 2000-12-13 23:41 · Score: 1

*stare*

Yes! Are you telling me that systems today aren't any faster than systems five years ago?

No, it's not an exponential increase in speed, but it is very significant. My 6-month-old laptop is the fastest machine in our research lab, and that includes PII and SPARC systems.

--

Re:But by then you're pretty much running Linux. on Ask Kevin Lawton About Plex86 · 2000-12-12 08:54 · Score: 1

I don't follow you. Yes, in some way you'd be running a stripped-down Linux kernel, but in exactly the same way you'd be running a stripped-down Windows kernel.

The goal isn't to run a stripped-down Linux in a VMM. The goal would be to run the VMM on the bare metal. It becomes the OS, and thus needs the device support. Traditional OS's are just applications in this model, like the Guest OS in VMware/plex86.

--

Re:Issues with Debian on Ask Kevin Lawton About Plex86 · 2000-12-12 05:54 · Score: 2

It's not an issue of "Free at all costs." It's an issue of Free. plex86 is under the GPL, so obviously the community thinks it should be Free, probably even DFSG-Free. But it's not DFSG-free. Perhaps the community doesn't mean for plex86 to be DFSG-Free, but I find that hard to believe. Hence the question.

If Kevin does not want a DFSG-Free plex86, then that is his choice and I can accept that. I'd even cheer that decision because no developer should be forced to license their code in any particular way.

However, beyond the Free Software rhetoric, there is a practicality issue here. I could always package plex86 up outside of Debian-proper and provide the apt sources to allow people to fetch it. However, this means plex86 won't be distributed on some Debian CD images and it won't be mirrored on some Debian mirrors. This is true even if it is filed under contrib in the official archive. This, I think, would be very unfortunate, as Debian is known for the wide-range of packages available and its open development model, both of which drew me to Debian in addition to the packaging system.

--

Re:Suspended state on Plex on Ask Kevin Lawton About Plex86 · 2000-12-12 03:32 · Score: 1

Remember that virtualization is not emulation. Programs running on a virtualized system execute natively. There is no performance penalty unless priviledged operations are requested. With plex86 there is a little more overhead due to some checking required to trap non-virtualizable code, but the cost should be amortized over time.

A suspend option is very useful for a virtualizer, though. I use it in VMware all the time.

Interestingly enough, "instant restore" (and shutdown) for desktops/servers is available for FreeBSD now via the Rio project and probably could easily be ported to Linux.

--

Re:"Host-less" VM software? on Ask Kevin Lawton About Plex86 · 2000-12-12 03:08 · Score: 1

That's an interesting question. I don't think anything about the PC would prevent this. After all, IBM's VMM is just an OS, a class of programs which the x86 supports quite well.

That said, there are definite advantages to hosting the VMM on another operating system. The largest is probably device availability. By piggybacking off another OS, plex86 can take advantage of all the device drivers written for that OS. Device models can easily be implemented using native drivers.

To support the S/390-style virtualization, plex86 would need to include from-scratch drivers or models for a reasonable amount of the hardware a guest OS might use. Of course, they could leverage the Linux source for some of this.

--

Re:Dynamic Code Generation for more speed on Ask Kevin Lawton About Plex86 · 2000-12-12 02:58 · Score: 1

My apologies. plex86.org just came back on-line and I got a chance to look at their docs.

From a very cursory reading of Kevin's virtualization ideas, it sounds like they're using breakpoints to mark potentially unsafe code. So it's true that the first time around, the code is scanned and unsafe instructions are marked. But after that, the code should be able to run natively and the breakpoints will act as the missing hardware traps.

There are issues with modifying code (by inserting breakpoints) which may be read/modified by the application, but the docs describe how some of that can be handled as well. In any event, it doesn't sound like plex86 has the sort of "main loop" described above. It's more like FX!32 where unprocessed code traps into an emulator/scanner.

Perhaps someone on the plex86 team can clarify and correct the mistakes I've almost certainly made.

--

Re:running my favorite app on Ask Kevin Lawton About Plex86 · 2000-12-12 02:17 · Score: 1

It depends on how you look at it. For me, fundamentally the goal is to be able to run some Windows apps under Linux. Virtualization is not needed for this, but it is convenient.

Other users want to run multiple OS's for various reasons (testing, providing different environments, etc.). In this case, virtualization is necessary.

But when people talk about virtualization, they are talking about the ability of a small host program (called the Virtual Machine Monitor) to provide a model of the hardware to guest applications. In traditional virtualized environments, the VMM is really a micro-OS whose sole purpose is to model the hardware and manage guest applications. Those guest applications are "traditional" operating systems.

So strictly speaking, the point of virtualization is to run OS's as applications, but that is usually just a means to some other end (reliability, functionality, etc.).

As for whether we could have a "transparent other-os-app-loader," we already do to some degree under Linux (binfmt). It can be used to run Java and Windows (using Wine) apps tranparently. Doing this with plex86 is more complicated, but I can imagine an event communication path from the kernel to plex86 telling it when and what to load.

--

Re:Dynamic Code Generation for more speed on Ask Kevin Lawton About Plex86 · 2000-12-12 02:01 · Score: 1

That's not how virtualization works. The code is run natively by default. Any accesses to priviledged state are trapped in hardware and emulated by the VMM. "Emulated" means "one instruction is emulated." Now it may be that multiple sequential instructions trap to the VMM, but it's not a case of the VMM examining a chunk of code and deciding whether to emulate or execute it. The processor just executes code as usual until it hits an instruction that causes a trap. There is no "main loop."

Now, there are virtualization issues with the x86 architecture and it's not clear to me what plex86 does about those.

--

Re:Native parition support on Ask Kevin Lawton About Plex86 · 2000-12-12 01:50 · Score: 1

Great point! I've wanted to do this as well as mount floppy images, partition images, etc.

Kevin, is support for any of this planned?

--

Re:fundamental feature or hack? on Ask Kevin Lawton About Plex86 · 2000-12-12 01:28 · Score: 1

It is something we should fundamentally have, at least for multi-user systems. A true Virtual Machine Monitor a la IBM's work in the 60's has many advantages.

It can run mutiple environments at once so users can work where they are most comfortable.
It allows test environments for OS upgrades, etc. Once the upgrade has been fully tested, users can be switched to the new version transparently.
It allows sharing of devices between environments, reducing infrastructure costs.
It allows low-cost, -power, etc. solutions where missing hardware functionality is emulated in software.

These advantages also apply in single-user environments to various degrees.

--

Re:Native parition support on Ask Kevin Lawton About Plex86 · 2000-12-12 01:07 · Score: 1

Ok, I wasn't aware of this. Last time I checked the web site, no information on this existed and at the moment it's /.'ed.

but even so, if it isn't useful, IMHO it doesn't exist (yet).

--

Re:Are you endangering commercial software on Linu on Ask Kevin Lawton About Plex86 · 2000-12-12 01:05 · Score: 1

Here's how I would respond to questions like these:

I purchased a copy of VMware, and have been mostly happy with the results. However, it seriously lacks support for external devices. Only the cdrom, mouse, parport and a couple of other minor devices are supported in VMware.

I plan to do much work in MIDI in the near future and would like to have this work in VMware. Unfortunately, I have no option to add such support.

With plex86, I have this option. plex86 fills a need. The commercial VMware is not supporting its users in ways they desire. An itch exists and Kevin is scratching it.

--

Issues with Debian on Ask Kevin Lawton About Plex86 · 2000-12-12 00:58 · Score: 5

Recently there has been much discussion on debian-devel about where plex86 (or parts of it) can be placed in the archive. The main issue is the licensing of the VGA BIOS from Elpin , which states that the BIOS is free for use in plex86, but may not be modified or used for any other purpose. Because of the dependency of plex86 on this BIOS, it my be forced into contrib instead of main. The BIOS itself would have to go into non-free, which raises questions about whether this would be a legal distribution of the BIOS, since it would not be packaged with plex86. Moreover, with the rumblings of eliminating non-free, it may not be distributed through Debian at all, regardless of legal issues.

Are there any plans in the near future to replace the Elpin BIOS with a Free implementation?

--

Native parition support on Ask Kevin Lawton About Plex86 · 2000-12-12 00:46 · Score: 5

What are the plans for native (raw) partition support in plex86? Bochs has had this ability for some time and I find it to be essential in VMware.

The need for large partitions in a Windows environment coupled with the file size limits in Linux and the more restrictive Windows licensing means this sort of support is critical to anyone wanting to run Windows on Linux with plex86.

--

Re:not th on Intel Says 10GHz By 2005 · 2000-12-10 22:14 · Score: 1

More density implies smaller transisters (since the die can't be arbitrarily increased in size)

Yes, I'm on crack. The die has nothing to do with it. It's cold and snowing here and we're going to get buried under 10 inches of it. Give me a break. :)

--

Re:not th on Intel Says 10GHz By 2005 · 2000-12-10 22:08 · Score: 1

Well, really, Moore's Law considers the density of transitors, not the absolute number. It just happens that it's hard to increase the absolute number solely by increasing the die size.

The density factor also contributes to the myth of Moore's Law as a speed indicator. More density implies smaller transisters (since the die can't be arbitrarily increased in size), which in turn allows an increased clock rate, given the same microarchitecture.

--

Re:Two points on IBM Itanium Based Systems and Linux · 2000-12-06 05:21 · Score: 1

I don't think it really matters if we have to do little things like rework our compilers...

Getting a good IA64 compiler is a lot more than a "little thing."

Intel's last true architectural change was with the introduction of the 386SX processor.

Pardon? Pentium? PPro? P4? MMX? SSE? Intel has really been a leader in pushing processor performance. The fact that they got such a clunky ISA to run fast is absolutely amazing.

That said, if Intel can make a smooth transition to a new ISA while keeping IA32 compatibility, that will be a very good thing for them. It's debatable whether Itanium will provide enough incentive for users to switch, however. I'm waiting for McKinley.

--

Re:"New" Architecture on Intel's Itanium Processor Explained · 2000-12-04 02:37 · Score: 1

I don't know if that's what was suggested, but if so, it would be correct.

The compiler cannot do as well as the hardware because the hardware has runtime context to guide its decisions. Unless you're cheating and running the compiler at run-time. :)

--

Re:IA64 vs x86-64 on Intel's Itanium Processor Explained · 2000-12-04 02:32 · Score: 1

You have to think about what you want out of a 64 bit architecture. To me they are 64 bit addressing, and 64 bit data.

You'll get no argument from me, though I (and I suspect most people) would say that the addressing is by far the most crucial part.

The Intel one on the other hands supplies *heaps* of registers. While this would put the intel chip way in front, the downside is loading & storing registers when you change stack frames (calling a function). The rotating register stack helps, but eventually nesting of procedures can result in register spill. In some ways, the IA64 resembles a stack machine, but that's a dirty word these days - perhaps the terminology was avoided for those reasons.

Isn't every modern general-purpose machine a stack machine, then? :) Every machine (even windowed ones) at some point has to save its local register set to the runtime stack.

In my opinion, 16 general purpose registers is probably about as many that a good optimizing compiler would need for the typical C functions.

This really depends on the compiler. See my post above for some studies in this area. To summarize, hundreds of registers can be used effectively if you pull the stops out of the compiler. Any single typical C function will probably use around 64 or so.

It is rather curious to see the trend from highly CISC machines to progressively more RISC machines, with the burden being placed more heavily on good compiler design.

The machines are only more compiler-oriented in their ISA's. I think most people simplify the CISC/RISC argument a bit too much. In some ways, machines are becoming "simpler" for the compiler's sake (fewer instructions to choose from, more registers, pipeline interlocks, etc.). However, the underlying implementations are actually becoming much more complex to take the burden away from the compiler.

With RISC (and unfortunately ignoring the pioneering work IBM and CDC did decades before anyone else), we went from lacking pipeline interrupts and requiring branch holes (both making the compiler's job harder) to adding hardware interlocks and branch prediction to full-fledged register renaming and out-of-order execution. The underlying hardware is not "reduced" in any sense of the word! The compiler's job actually ot easier in the sense that (for example) scheduling is not as critical on an out-of-order machine as it is on an in-order machine. Of course it is still important, but the hardware takes some of the burden away.

With IA64, we're going to see this trend again. The first release is going be compiler-critical (like the early RISC machines) but later generations (McKinley, etc.) are going to add in prediction, renaming, out-of-order execution and all the baggage that comes with it.

When you get down to it, the interface to the machine (the ISA) and the "bare metal" are completely decoupled. This is taken to the extreme in the Crusoe.

--

Re:Pipeline flush question. on Intel's Itanium Processor Explained · 2000-12-04 02:09 · Score: 1

As far as I can tell, all that would actually happen is the speculated instructions being invalidated in-flight, with other instructions proceeding as normal.

True, but on most machines, this is rather late in the pipeline, so it is effectively a flush.

You still get a delay - it's the equivalent of a stall of as many cycles as it took to figure out which way the branch really went - but certainly not a full flush.

It may actually be worse than a flush if the cost of restoring the non-speculative state and redirecting the fetch is very high, not to mention the cache pollution (or prefetching depending on your luck) caused by wrong-path execution.

--

Re:Rotating Registers... on Intel's Itanium Processor Explained · 2000-12-04 02:04 · Score: 1

Thanks a lot, I never realised that adding instruction between stalled cycles could sped up the process (I always thought that there were slots 'for free', but not that it could accelerate the result). Make sense in reality, because by using the b1/b2/b3 we are giving more temporary memory for the execution...

It is a bit weird at first, isn't it? The way I always think about stuff like this is by going back to the fundamentals. In combinational logic, a SOP or POS form has only two levels of gates and is often faster than a more minimal implementation which may have more levels of logic. Likewise, a fast algorithm is usually longer than the shortest possible to get the job done, because you usually have to sort some container or other to get the speedup.

Proving once again that bloat is not necessarily bad. :)

--

Re:Some highlights... on Intel's Itanium Processor Explained · 2000-12-04 01:56 · Score: 1

- Predication. You read this part right? This means no more pipeline flushes for missed branch prediction. None. This is a big saver. Although transmetas CPU's do this (to a limited extent) with their VLIW and OS, it is still wrong on occasion (i.e., not perfect branch prediction, which itanium will effectively provide)

Um...no. :)

Itanium most definitely does NOT provide perfect branch prediction. Predication and prediction are related, but very different, beasts.

Prediction tries to get around the added penalty of a branch mispredict over and above the "obvious" penalty of executing the wrong instructions. After a branch is predicted it takes some time for it to trickle down the pipeline and compute the correct answer. If at that time (or often a bit later) the machine compares the answer and finds its prediction to be incorrect, it has already fetched, decoded and executed many instructions from the wrong path of execution. But in addition there is a penalty associated with restoring proper machine state, re-directing the fetch engine and generally getting the pipeline filled back up. This is the penalty predication eliminates.

In fact, I would say that predication performs "perfectly imperfect" branch prediction, in that the machine never executes only from the right path. Prediction trades off the wasted time executing useless instructions to remove the restore/redirect/fill penalty of a misprediction and allow additional scheduling freedom. The scheduling freedom is important for a VLIW-style machine to keep the function units busy and reduce code bloat. However, if used unwisely, a predicated chunk of code can actually execute more useless instructions than a dynamically-predicting machine would, therefore offsetting the advantages of predication. This is why predication is usually reserved for hard-to-predict branches that cover short control sequences.

- Rotating registers. Why are these great? Usually you only have a few registers with CISC architectures. RISC has quite a bit more, but they are much smaller and you end up using them as much as the less populous CISC registers.

This just doesn't make any sense to me. What do you mean by "they are much smaller?"

Having 256 registers with the ability to cycle them means you will be hitting the L1 cache even less. While the L1 is fast, it is still at least twice as slow as hitting a register directly. This is another big bonus

As you corrected below, the number of registers is orthogonal to rotating them. The big advantage of rotating registers is their use in software pipelining, as explained in the wonderful discussion above. Note that software pipelining is especially critical on a VLIW machine for the same reason prediction is -- scheduling. Is anyone noticing a trend here? :)

As far as the number of registers go, yes, it is very nice to have lots of them, but it's important to be able to use them as well. Most compilers today cannot make much use of more than about 40 general-purpose registers unless they start doing "unsafe" things like putting global values into registers or using "non-traditional" architectures like register windows. Now I'm ignoring floating-point and scientific benchmarks where software pieplining can chew up registers like nobody's business. The point is that (for example) your kernel compile will not benefit from more than about 40 registers, at least with today's technology.

Some register usage studies we've done are available here and here. In particular, I suggest looking at our workshop paper on ILP, large register file tech. report and especially at our MICRO-33 paper (to be presented next week). These papers highlight how current compilers and/or architectures are artificially crippled to shoehorn programs into 32 registers. Many more can be used if some more tricks are pulled.

It sounds like Intel wont have a top notch compiler for another few years at best, and who knows when the GNU compiler will support even a fraction of the features.

What I can't figure out is why HP isn't developing (or announcing) a compiler. They have some top-notch people there who invented most of this stuff!

One very important thing to remember about IA64 is that all these nifty features are intimately tied together. It's a bit like a house of cards in that if one fails, the others will have a hard time making up the slack. VLIW implies that good scheduling is needed. Predication allows more scheduling freedom. Software pipelining allows more scheduling freedom at the cost of more temporary registers and copying. Rotating registers gets rid of much of the copying. The ALAT allows better more scheduling freedom and possibly more loop optimizations. See how everything works together to keep the machine busy?

--

Re:Another way to do emulation on IBM's OSS Code Morphing Code/or OSS vs. Transmeta · 2000-11-29 01:52 · Score: 1

There are several problems with this:

Distinguishing code and data. How do you know some bits in the .data section aren't instructions that are going to be executed at run-time? We won't even get into the trick of overlapping data values and instructions in the same memory locations.
Addressing. It is difficult to know where basic blocks begin and end. Because the instruction sets won't match one-to-one, you have to go patch branch addresses. This becomes very difficult when you deal with computed gotos and function pointers.
Register Allocation. Even if you could translate all the instructions correctly, you'd still want to do a good job. Unfortunately, with an architecture like x86, many variables are not enregistered, even if it is valid to do so. The translator can't know statically that it is safe to put a memory location into a register. Transmeta gets around this with some hardware to detect invalid register allocations at run-time.

FX!32 does something like what you're talking about, expect it uses the initial, emulated run of the program to find out what parts are actual code. On the next run, if untranslated code is touched, an exception handler emulates it and marks it for translation after program execution.

--

Re:Lawyers on Florida Election Votes Certified · 2000-11-27 04:03 · Score: 1

Well, clearly Harris was in the wrong on recounts if she opposed them. But did she oppose recounts on principle, or the way in which they were conducted?

The fact that counties couldn't get their counts in calls for a change in law. But not for this election. The Florida supreme court already extended the deadline. That seems generous to me.

So we enter the contesting phase, where Gore certainly has the right to challenge the results. I don't think it's wise politically, but he has the right.

Gore's hypocritical "every vote must count" line is disgusting. If every vote must count, then count the discarded military absentee ballots! If we cannot count them because of Florida law, then I don't think we can count dimpled, undervote or overvote ballots either.

--

Re:Don't forget the electoral college! on Florida Election Votes Certified · 2000-11-27 03:32 · Score: 1

These electors aren't all high-ranking rank-and-file politicians.

Actually, that's exactly what they are! Being selected as an elector by one's party is an honor reserved for those members who have shown consistent party loyalty over long periods of time, have worked actively in campaigns and have pledged themselves to vote for the party's candidate.

The only time electors have changed their votes is when the outcome was a foregone conclusion, and then they changed to the more "extreme" side of the party line (a Republican voting for Buchanan or a Democrat for Nader, for example). Electors have never crossed party ideologies.

--

Slashdot Mirror

User: David+Greene

Comments · 1,049