Christopher+Thomas · Slashdot Mirror

For raw speed, ditch gcc. on Linux on an Intel PIII vs. G4? · 2001-03-24 01:51 · Score: 4

Your main problem if you're looking for a speed boost for applications won't be the processor - it'll be the algorithms you use and the compiler.

For the algorithm:

One word. Cache.

Main memory is up to an order of magnitude slower than the cache. Make your algorithms cache-friendly. This means optimizing row vs. column accesses and doing checkerboarding for things like matrices, and other optimizations for vectors. For things like linked lists and trees, try to keep nodes contiguous with other nodes in memory where possible (or even just the key and linkage pointers, since that's all you'll be accessing most of the time when doing a search).

It takes a while to fully zen into this, but it will pay off in spades.

For the compiler:

The following applies to the gcc C/C++ compiler. I'm assuming that you'll get similar performance results for the g77 Fortran compiler. You're on your own for hand-optimizing Fortran (I don't know the language).

Gcc is a nice tool; it's free, and it works well. Unfortunately, even with -O3 -funroll-loops, it can't optimize for beans. I had to study this in detail as a project for one of my grad courses, and I was appalled when I found out just how many potential optimizations it wouldn't catch.

If you're at the point where you're ready to optimize core algorithm code without worrying about it staying simple, then either replace it with inline assembly or (for better portability) write "pseudo-assembly" C code, with temp variables with the "register" keyword instead of registers, and statements only performing operations that can be easily mapped to machine code. Hand-unrolling and hand-software-pipelining worked wonders. Gcc will do the unrolling for you, but not the pipelining (I think) and it won't move even obvious candidate variables to registers.

Using a chip with a large register set (like the PPC) makes this a bit more scalable, but it still works well on x86 chips (to a point). I tested on x86 and Sparc architectures.

Lastly, bear in mind that you might, if you're lucky, get a factor of 10 out of all of this. Make sure that your algorithm is of a well-behaved order, and consider using a cluster of PCs for anything really power-hungry (though that involves optimizing communications, too).

Sailcraft. on White Dwarfs Could be Dark Matter · 2001-03-24 01:04 · Score: 3

And it's not nearly as easy as you might think; your sailship would take a century or more, and would require very powerful lasers if they were to have noticeable effects at even interplanetary distances.

Actually, the main engineering challenge with the laser isn't its strength - it's its diameter. In order to have most of the beam hitting the sail even when the probe is a few light-years away (as is needed if you want to use a "forward"-style system to slow down), the sail and the effective laser aperture have to both be about 1000 km in diameter. This means you have a space-based array of lasers which use a lot of optics trickiness to stay in phase with each other.

My father and I were kicking around numbers for this thing a few weeks ago (we're both into physics). It turns out it's buildable (though quite expensive).

A century is a reasonable length of time for the trip to take. How short a trip you can manage depends on how bright you can make the laser and how light you can make the sail. Laser brightness scales linearly with cost (just make it wider). My sumbers suggest that in theory you can make the sail thin enough to get the ship moving at close to C for most of the trip, but more realistic estimates give about order-10% C as the peak velocity for an order-10-LY trip.

Still silly, if you can see the entire image. on Zooming in on the GeForce 3 · 2001-03-23 13:03 · Score: 2

128 is not silly at all. The problem is only partially accuracy - it's also dynamic range. 32-bit integers *still* don't carry enough dynamic range to properly differentiate between a moonlit and a sunlit scene. [...] I forget the dynamic range of the human eye, but it's vast due to the ability of the iris to allow more or less light onto the retina [the dynamic range of the retina is much smaller].

However, with all current display technologies, you can see all of the screen at one time. Thus, varying the iris size just brightens or darkens the scene by a constant factor. Within the scene as a whole, the dynamic range you can percieve is just the dynamic range of the retina - which is quite low, as you point out.

Thus, being able to accurately represent sunlit and moonlit scenes on the same monitor would be useless; your iris would respond to the average brighness of the screen, which would cause the sunlit scene to look washed-out and the moonlit scene to look black.

If you're looking at the images one at a time, you might as well just normalize both to the same 256-level brightness range.

Re:Colour depth. on Zooming in on the GeForce 3 · 2001-03-23 10:32 · Score: 2

8 bits per component suffices for display on CRTs in typical office viewing environments, but change the display technology and/or the viewing conditions and you'll need more.

Doubtful, IMO. Human colour vision isn't infinitely acute. Black-and-white vision is a bit better than colour, requiring somewhere in the range of 10-12 bits before we can't see colour bands, but that's about it.

I'm skeptical of conditions changing this very much. How would you modify the environment to give us the ability to see finer variations in colour?

Food for thought. on White Dwarfs Could be Dark Matter · 2001-03-23 10:28 · Score: 3

What strikes me as interesting is that there are so many of these white dwarf stars relatively nearby. If there are any within a dozen or so light-years of earth, they would make interesting targets for sail-based probes.

White dwarf stars are mostly made up of degenerate matter (they have a shallow "skin" of normal matter on top). The theoretical properties of degenerate matter are well-known, but having a probe over a white dwarf would give us a vast amount of new information about it (mainly via seismic data; yes, there are "starquakes"). There would also be a lot of (relatively) cool, dense matter on the surface and in the atmosphere. This would give us a lot of information about how materials behave under extreme conditions. We'd also learn about how matter interacts with the white dwarf's strong magnetic field (white dwarf stars, like neutron stars, keep much of the parent star's magnetic field, compacting it into a much smaller space).

This environment is very different from anything found within our solar system, so it would be quite interesting to study.

(Yes, we can build probes that can reach there within a reasonable amount of time. A ship that carries its own fuel would take generations or centuries, but a sailcraft launched with stationary lasers could get there much more quickly. If you make the craft and the laser system more complicated and expensive, you can even slow it down at its destination.)

Re:Dark matter on White Dwarfs Could be Dark Matter · 2001-03-23 10:16 · Score: 3

Obviously, no one has seen dark matter, but only attributed to it unexplainable effects. Being as dense as it supposedly is with such great gravitational attraction, how does it keep from folding in on itself like a black hole?

"Dense" is a relative term. If there's 10 times as much dark matter in the galaxy as luminous matter, it's still pretty tenuous. If most of the dark matter is in intergalactic space, then it's even more tenuous. (In practice, you have dark matter both inside and between galaxies; how much is present in each location is open to debate).

So, dark matter may avoid forming black holes the same way luminous matter does - by there not being enough in one place to collapse.

Secondly, many candidates for dark matter are particles that travel at or near the speed of light. This is much less readily confined than slower matter, which would explain both why it hasn't formed black holes and why there's so much of it (relative to luminous matter) in intergalactic space.

In the case of white dwarf cinders, they don't collapse because they're in more-or-less stable orbits within the galaxy, just like most of the other stars.

Colour depth. on Zooming in on the GeForce 3 · 2001-03-23 08:20 · Score: 2

128 bit color? 24 bit color most people can no longer distiguish between individual changes in color, 32 bit color is quite enough.

While this is true, most cards add a few extra bits per colour component internally to keep roundoff errors in blending from causing visible artifacts. 30 bits was standard for that, if I recall correctly (more if you count the alpha channel).

128 would just be silly, of course...

I still question your conclusions. on AMD Challenges P4 With 1.33Ghz · 2001-03-22 05:32 · Score: 4

I wasn't talking about chip arch. I was speaking of motherboard arch. AMD is releasing a high speed main bus which is API compatible with PCI but blows Intel's 1-bit serial bus (Infiniband or something like that) out of the market.

You are still ignoring several very important considerations.

Infiniband motherboards do not presently exist. Claiming that AMD or DEC/Compaq's motherboard architectures blow them out of the water is very premature.
You are blithely ignoring the serious problems involved in widening busses.

It turns out that this is *extremely* difficult, especially at high speeds, and especially for synchronus busses. Your path lengths for all of the traces have to be the same, or very nearly the same. This is next to impossible to achieve for extremely wide busses. Thus, your claims of it being cheaply extensible should be taken with a large grain of salt. (You can make the bus wider by using more expensive motherboard construction, but this is - guess what - considerably more expensive).
You provide no support for your claim that Infiniband's performance will be poor.

It's an asynchronus serial bus. That removes two major design constraints (controlling the lengths of multiple lines, and keeping handshaking synchronus). I can believe that you could run something like this fast enough to make it competitive with existing busses (though I'd still want multiple channels in parallel in a real system). The fact that Intel is planning to use this at all suggests that bandwidth will be comparable to or better than what they're currently using.
You're assuming a single channel. Any sane design that I can think of would use multiple channels in parallel to boost bandwidth (receiver logic has to be more complicated, to combine packets from two asynchronus data streams, but this isn't that difficult).

Can you provide more support for your claims, so that I can see where your arguments are coming from?

Re:AMD is releasing a highly adaptable bus on AMD Challenges P4 With 1.33Ghz · 2001-03-22 04:36 · Score: 3

Guess what? AMD is beating Intel there too. Intel is trying to bait users with more performance but added vendorlock. AMD convinced API, one of the leading Alpha system producers to use their bus. Why? Intel uses a 1-bit high frequency bus, AMD uses a slightly lower frequency variable width bus which gets you 8-bit,16-bit,32-bit, and I believe 64bit and 128-bit are possible with some tweaking.

Um, you seem to be missing a few points:

RamBus is 16-bit, not 1-bit.
Intel chips are perfectly capable of using SDRAM. It's the motherboard chipset that decides which is used, not the CPU.
The limiting factor for total CPU-to-memory and -to-system bandwidth for both Intel and AMD chips is the front-side bus - its bandwidth, and protocol.

The main thing that affects a system's I/O and memory performance is the motherboard architecture and memory architecture. Chip architecture is secondary.

The main impact of chip architecture is, as mentioned, the communication point between the chip and the motherboard chipset. This has no relation to the RAM type.

In summary, about two thirds of your post was based on incorrect information.

Deja vu all over again. on Microcoolers Could Change Processor Design · 2001-03-21 03:43 · Score: 1

This was posted to the Science section last Friday:

link .

Re:GBA will be DOOM'd on Game Boy Advance Arrives · 2001-03-20 12:32 · Score: 2

Activision issued this press release today, detailing plans for a GBA DOOM game!

Question - Given that this is for a Nintendo platform, will this be butchered as badly as I'm told Castle Wolfenstein was?

I'm told that Nintendo has some pretty draconian content restrictions.

Script advice. on Tombstones That Last? · 2001-03-17 23:24 · Score: 2

I have a Perl-related query that only you can solve (you'll understand once I explain, it's about a script you authored). In order to save this thread from something off-topic, I request your e-mail address so that I may further detail the extent of my plans.

Disposable account du jour: cthomas one two three four five at hotmail dot com

Materials and corrosion. on Tombstones That Last? · 2001-03-17 13:42 · Score: 2

The problem isn't that common materials don't last; it's that you're looking at marble.

Marble is metamorphosed limestone, and so will corrode like crazy in an acidic environment. Any reasonably sturdy oxide (quartz, granite, glass, ceramic, etc.) should last long enough for your purposes.

Steps needed for fans to save this. on Dreamland Chronicles - Can Someone Save This Game? · 2001-03-16 06:05 · Score: 3

If you want the fans themselves to save this game, you're going to have to do something along the lines of the following:

Find about 100,000 people willing to buy the game (as in, committing to buying it).
Get them all to pay $30 *now* for it to be delivered a year from now.
Take this big whack of money, and walk to the gaming company's door with it. Have a lawyer-produced contract in your other hand.
Contract them to finish the game, within fixed time and budget limits (de-scoping if necessary to meet the limits, as they *must* deliver something).

The first two steps are the hard part. It might be do-able, but won't be easy. Good luck.

Don't forget to start a shell company and jump through all of the legal hoops for this as you proceed. This will make the contract-signing part of the deal much easier.

Make sure that, no matter what, your contracts (both with the company and with people paying you) say that *you* aren't on the hook if the game fails to materialize. You can't afford to pay back all of the gamers if you've already spent their money paying the development house for a year.

The way these revivals work in practice is that a venture capitalist, bank, or parent company will give the gaming house money if they think that there are enough customers to repay the investment. Their judgement seems to be "no" here.

Quick and dirty solutions. on Ordering the Chaos of Bookmarks? · 2001-03-16 01:36 · Score: 2

A couple of quick and dirty solutions:

For browsers that use the same bookmark scheme, upload your bookmarks to a CVS server whenever you update them, and update them from the CVS server whenever firing up the browser on a given machine.
For all browsers - instead of using bookmarks, keep a bookmark page in your web directory. Update this whenever you want to remember a site. You can use a trivial CGI script do to this from anywhere, or do it by hand and use the cvs method to update it (with the machine with the master copy of the page doing a cvs update at regular intervals via cron).

A friend of mine uses a simpler version of the bookmark page method. It works reasonably well.

Re:Speed on Two Telescopes Linked To Find Planets · 2001-03-16 01:29 · Score: 3

This site has some relevant information. One question that is answered is "How much propellant mass would it take to get an object the mass of a space shuttle/bus past alpha centuri within 900 years"

Sounds easy? Think again!

1.chemical propellents - 10^137kg
2.fission - 10^17kg
3.fusion (inc orion craft) 10^11kg
4.ion/antimatter rocket 10^5kg

First of all, as you point out, a solar sail and a stationary laser would work quite well for sending probes out to other stars (though slowing down at the end would be quite a trick).

Secondly, I question some of these numbers. They're looking at only one range of the problem space - setting a time, and deriving a fuel/cargo ratio from there. The problem with doing this is that the fuel/cargo ratio will start to blow up once the amount of fuel becomes greater than your amount of cargo (it starts taking exponentially more fuel to reach a higher speed, because you're mostly hauling fuel).

A better question is, "given a certain fuel/mass ratio and a certain delta-V, how long would it take to reach Alpha Centauri?". My answers are as follows:

[Velocities and travel times are for a flyby; use half the velocity and double the time if you want to stop at the destination.]

Good Antimatter Drive
Most of the energy of a matter/antimatter annihilation comes out in the photons (even if you're doing a proton/antiproton annihilation, the mesons get only a small fraction of the energy). Assume a 10% efficient conversion of mass to useful thrust, and thust/energy efficiency of a photon drive (horrible - 1/C N/J).

This could be built as a big block of lead/concrete/rock with the ship and fuel tanks on one side, and antimatter explosions happening just over the other side.

Velocity at 50% fuel: 0.05 C (100 years)
Velocity at 10% fuel: 0.01 C (500 years)
Ideal fusion drive

This assumes 1% conversion of mass to kinetic energy within the plasma (still inefficient at this energy density, but much better than photon drive efficiency).

Building a drive like this would be very difficult. You'd have to use one of the fusion reactions that doesn't produce gamma rays or neutrons, you'd need a great magnetic bottle, and you'd have to have your exhaust leave the rocket before it could radiate its heat as light. Good luck.

Velocity at 50% fuel: 0.07 C (70 years)
Velocity at 10% fuel: 0.014 C (350 years)
Good fusion drive

This assumes 0.3% conversion of mass to useful energy, and photon drive efficiency.

This could be built as a big block of lead/concrete/rock with the ship and fuel tanks on one side, and fusion bombs being set off in space on the other side (counting on the plasma radiating most of its energy as light before dispersing).

Velocity at 50% fuel: 0.15% C (3,300 years)
Velocity at 10% fuel: 0.03% C (16,700 years)
Good Fission

As with Good Fusion, but with 0.03% mass to useful energy conversion. Ship design is similar to Good Fusion.

Velocity at 50% fuel: 0.015% C (33,000 years)
Velocity at 10% fuel: 0.003% C (167,000 years)
Good Chemical

This assumes 15 MJ of exhaust kinetic energy per kg of fuel. This is attainable with a good chemical rocket.

Velocity at 50% fuel: 0.0009% C (550,000 years)

A really good antimatter drive could bring humans to the next star within their lifetimes, if it was mostly fuel. An excellent fusion ship could do it within a few generations, though fusion ships would more likely take centuries (even if mostly fuel).

Re:"Bollocks" ? on Emergence of SMT · 2001-03-15 01:03 · Score: 2

My arguments are coming from the "memory wall" perspective of system performance. CPU cycles are no longer the problem: the problem is getting enough data to the CPU core.

And this depends entirely on your workload. Many tasks are memory-bound - and many are not. Generally, anything that can fit in the on-die caches will be CPU bound (for most cases). This still covers a wide range of useful problems.

The gap between memory speed and CPU speed is caused by DRAM latency and system bus speed, neither of which are issues for on-die caches. If clock speed increases and die sizes stay the same, propagation latency will become an issue, but SMT is great for alleviating _that_, too; as long as throughput scales with clock speed, you can tolerate higher latency by interleaving requests from different threads.

In summary, I think that memory bottleneck problems aren't as severe as you make them out to be. Yes, they're very relevant for programs that work with large data sets, but that by no means covers all tasks we want computers to perform.

Check with the local government. on How Would You Start A Business? · 2001-03-15 00:43 · Score: 3

Check the government section of the phone book for small business support services. There will usually be a department or institution that will give you an information package on the legal and administrative hoops involved.

Re:"Bollocks" ? on Emergence of SMT · 2001-03-14 13:26 · Score: 5

It's not true that doubling L1$ and adding a selection bit costs you nothing. In fact, the size of L1$ is rather limited, and cutting size in half substantially increases the miss rate. It is also fairly expensive to add selection bits.

Um, no.

Most of your die is taken up by the _L2_ cache. You have plenty of space to add more L1 cache. The reason you usually don't is that a larger L1 cache served by the same set of address lines has longer latency. Two independent duplicates of an L1 cache will behave identically to the original L1 cache.

Performing the selection adds latency, but this can be masked because you know the value of the selection bit long before you know the value of the address to fetch.

In fact, you'd almost certainly _reduce_ the cache load compared to a single-threaded processor capable of issuing the same number of loads per clock, because they'd be hitting different caches, and you wouldn't have to multiport.

SMT also doesn't save you from cache miss latency. Out-of-order instruction issue saves you from that.

SMT, in any sane design, is used on an OOO core. An OOO core won't save you if your next set of instructions has a true dependence on the value being fetched from memory. SMT gives you a second thread with no data dependence on the stalled load, and hence plenty of instructions in the window that you can execute while waiting.

I'm having trouble seeing where your arguments are coming from. As far as most of the core's concerned, there's still only one (interleaved) instruction stream, just with less data dependence in it. This is scheduled and dispatched as usual.

SMT != ILP != multiple pipes. on Emergence of SMT · 2001-03-14 13:16 · Score: 5

I thought that the major processor companies had been working with multiple execution pipelines for years now. Doesn't that fall under the category of ILP?

You might want to doublecheck the terms you're using:

"ILP" is "instruction-level parallelism". It's not a physical part of the chip - it's a quality of the instruction stream. ILP is the number of instructions (usually average) that could theoretically be executed at one time, without violating data relationships within the program. Modern processors _can_ execute multiple instructions per clock because the ILP of most programs is greater than one (i.e. there are usually multiple instructions that can be executed without violating data or control dependencies).
"Multiple pipes" is part of the hardware that allows processors to issue multiple instructions per clock. As the name implies, this represents multiple hardware units that are capable of performing operations independently of each other.
"SMT" is "Symmetrical Multithreading". Remember how back under ILP, I said that the number of instructions that can be issued per clock depends on the parallelism of the program being run? SMT boosts the parallelism by running two threads at the same time and interleaving their instructions (more or less). As the instructions from different threads usually don't care what the other threads are doing, this gives you many more instructions that can be executed at the same time (assuming you have enough hardware to execute them).

Multiple pipes are a relatively old idea. Ditto instruction-level parallelism, which is one of the analytical quantities used to judge how well multiple pipes will work in a given situation. SMT is a relatively new idea that lets you easily boost the instruction-level parallelism, which in turn makes scheduling and issuing instructions *much* easier.

"Bollocks" ? on Emergence of SMT · 2001-03-14 13:08 · Score: 2

IMHO, SMT is a load. Modern microprocessors are mostly cache-starved. SMT puts two processors on the wrong side of the L1$, aggrevating the cache bandwidth problem. Worse, the two processors in SMT degrade referential locality, further degrading the performance of the cache.

You overlook a couple of very important factors.

First of all, it would cost you almost no extra silicon or latency to have duplicate L1 caches, and to add a selection bit to the addresses sent out on memory operations.

Secondly, technologies like SMT help _save_ you when you have a cache miss, because you still have an instruction stream that can execute while one thread's waiting for data.

Fuel cells and ultracapacitors. on Electric Car Bests Ferrari F550 In 0-60mph · 2001-03-12 12:49 · Score: 2

A fuel cell is a great solution, and the progress is going quickly. Of course, you still have to deal with carrying fuel, which either means storing hydrogen (another whole problem) or carrying a converter to strip hydrogen off of methanol/gasoline/whatever.

Actually, if I understand correctly, several varieties of fuel cell can reprocess simple hydrocarbons like methanol internally (just need the right catalytic electrode material and the right operating environment).

I'd also argue that even carrying a converter would be less hassle than trying to use hydrogen :). Hydrogen has miserable energy density per unit _volume_ compared to gasoline or methanol at practical storage densities, and is a bugger to work with (you need a containement vessel that can take hundreds of atmospheres, and hydrogen gas will do fun things like diffusing through the walls of your pipes and fuel cell if they're made of the wrong materials).

Methanol is a well-behaved liquid (a bit corrosive over the long term, but less so than water).

If you want a purely-electric solution, keep your eye on ultracapacitors. They're still pretty expensive, but they're already starting to beat the energy density of batteries.

Unfortunately, this isn't saying much. The energy density of batteries is orders of magnitude lower than the energy density of most fuel-burning schemes.

Working principles. on Magnetic Propulsion Pellet Gun Achieves 20km/s · 2001-03-09 01:03 · Score: 3

(I've been forwarded a copy of the research paper, so here's how this thing really works.)

Ok. The Journal of Applied Physics article (Feb. 2001) does not describe this as a "gun" at all.

The device uses the Z-machine current source to send a large amount of current through two concentric pipes that are connected at one end. Current goes up the outer pipe and down the inner one.

This sets up a very strong magnetic field between the pipes, which pushes the two pipes apart (crushing the inner one and pushing outwards on the outer one). This is the same kind of effect that you get in a loop of wire that carries current (motor principle).

Samples on plates are stuck to the sides of the outer pipe. Magnetic forces accelerate these plates outwards rapidly, and the samples deform. This deformation is measured, giving a lot of useful materials information (the purpose of the experiment).

It's unclear from the article whether the plates, the samples, or neither go flying. The velocity quote is probably just the maximum velocity achieved while the pipe is expanding outwards under pressure. Letting the plates fly would give a somewhat better experiment, but would cause practical problems (they'd destroy whatever part of the machine they finally smacked into).

Physics-wise, this works on exactly the same principles as a railgun (motor principle with DC current). It's optimized for pressure experiments, not for firing projectiles; you could probably build a railgun that was more efficient at the second task.

Inductive effects do occur (this is a short current pulse), but are considered a source of error in the experiment, and so presumably aren't the dominant effect.

Re:Working principles? on Magnetic Propulsion Pellet Gun Achieves 20km/s · 2001-03-08 23:50 · Score: 2

I can email you the PDF if you like. You can find my email address in my user info.

Sent a day ago, and still waiting. Was it eaten by the 'net daemons?

If necessary, you can find my email address in the Linux Media Labs web-board (http://www.linuxmedialabs.com and hunt around).

Re:Off the moon? To where? on Magnetic Propulsion Pellet Gun Achieves 20km/s · 2001-03-07 06:01 · Score: 1

...this is one of the best ways known to get material off the moon. Required energy (and hence barrel length) is much lower, and there's no atmosphere to cause problems.

I'm assuming you mean the Z-pinch machine, not magnetic launchers in general.

I mean magnetic launchers in general (actually, gun-like launchers in general; magnetic just happens to be one of the more convenient forms).

Slashdot Mirror

User: Christopher+Thomas

Comments · 2,147