Is there anything better than Fractint now? I too played with it for ages on a clunky old IBM PC with clicky keyboard and Windows 2 (although Fractint ran in DOS though, I think, and necessitated misc tweaking with graphics drivers to make it work, you kids don't know how lucky you are...)
Seriously, couldn't the marketing droids come up with a better name?
Sadly, this technology was called "Intel Dynamic Acceleration" (IDA) in Core-2 CPU's, but nobody noticed it. So, Intel tried with "Dual Dynamic Acceleration" (DDA), but again, nobody noticed. At last, renamed it to "Turbo Boost" and now everybody thinks it's something new.
So, after three attempts, it seems that the current name is the best.
-- but they failed to point out that Solaris also has a/tmp, and that, by default/tmp is actually partially backed by RAM, which is extremely convenient and useful from time to time, when you want a little piece of lightning-fast filesystem space, or want to eliminate disk as a variable in some sort of timing test.
In any new Linux distribution,/dev/shm is also backed by ram, so you can do:
$ dd if=/dev/zero of=/var/tmp/foo bs=1024k count=512 512+0 records in 512+0 records out 536870912 bytes (537 MB) copied, 1.12253 s, 478 MB/s
$ dd if=/dev/zero of=/dev/shm/foo bs=1024k count=512 512+0 records in 512+0 records out 536870912 bytes (537 MB) copied, 0.754747 s, 711 MB/s
Obviously, I had to copy four times the data to reach the slowness of Solaris:-)
Well, here we say "Estadounidences", its like "Unitedstatesians" in spanish. An many people simply say "gringos". We use "Americanos" to mean people from all the Americas, North-, Central- and South-America.
And seems that you also miss that America isn't a coutry! (we, at South America are very far from USA, and even in North America there are more than one country)
Some of those can be done in GDB: 1) The ability to set conditional breakpoints.
See 'help condition' 2) The ability to see not only a variable's current value, but a stack of all of its previous values.
This is very difficult with compiled code, but can be emulated in GDB using "watch" breakpoints that run a user defined function. 3) The ability to select a variable's previous value and jump to the line of code that set it to that.
See (2) 4) The ability to change the value of a variable at any point.
Trivial in GDB. 5) The ability to add/change code on the fly.
Also difficult with compiled code, but you can assemble in GDB if you want. 6) The ability to jump into the debugger at any point in the program, even when I hadn't planned to before running it.
Well, joust press "Control+Z" and then attach GDB to the PID. 7) Auto-logging of method calls and (optionally) variable values, to be started and stopped as I see fit either while stepping through code or running it.
Also possible, but difficult.
MPEG-4 uses discrete-cosine-transform and motion-estimation technologies. By contrast, Nancy uses only the four fundamental processes of arithmetic (addition, subtraction, multiplication and division), along with comparison and bit-shift operation. This keeps its operation light, said Koichi Kato, chief technology officer at Office Noa.
This is nosense... DCT is also only addition and multiplications (no divisions, so it have to be faster...)
Also:
The codec will run "even if CPU power is not high," said Kato. "A 50-Mips CPU can compress and decompress video at 30 frames per second with QCIF [176 x 144-pixel] resolution [using Nancy]. There is no other video codec in a software form that can encode and decode." The program for real-time video compression and decompression takes 30 to 40 kbytes of memory, "and consumes about one-tenth of the power compared with MPEG-4 operation," he added
He shoud take a look at ffmpeg's libavcodec. In 240kbytes you have coder and decoder for: Video MPEG1/2/4, MSMPEG4, MJPEG, H263, RealVideo, AC3, Audio MPEG-Layer3... And with assembler routines for x86 and arm cpu's. Getting 30fps of QCIF at 50mips isn't as difficult...
Seems that you didn't read ther article, because You are plain WRONG!
He is comparing two CROSS-COMPILERS to MIPS code, so all that you say don't apply. The cross-compiling is the same task in each case.
Please, go and read the article.
No, gcc isn't a good x86 compiler. GCC is a multi-target compiler, designed for RISC processors, like the PowerPC. In x86 hardware, gcc produces about 10-15% slower code than Intel's own compiler in integer code, and about 30% slower code than Intel. If he had used gcc-3.0 targeted to pentiumpro (which he didn't, he used 386 target) the x86 wold be better stil.
The only thing that is fast in PowerPC is "altivec" instructions (faster than MMX and SSE), but nobody really uses it (except in hand-optimized code).
Nowadays GCC has very good general optimizations and a lot of x86 specific ones (386, 486, pentium, pentiumpro, amdk6). And in GCC 2.96 and upwards the x86 backend was rewriten for better optimizations, support for amd athlon, etc.
To enable these, you need something like:
gcc -O2 -mcpu=pentiumpro -march=pentiumpro....
(this example is for pentimpro's and family, ie, PPRO, PII and PIII).
I have compared the code generated with that of Borland C and MSVC and it's generaly better (at least with C input).
But stil, it's not perfect. Look at http://gcc.gnu.org/proj-optimize.html , wich describes some probles as February 2000 (some of these are better now...).
One thing to consider is that Tom is not only benchmarking an overclocked Chipset, he is also benchmarking an overclocked Graphics card, Hard Drive, etc. Remember that the old Intel BX Chipset have syncronous clocks!!!.
Many of the test are dependant on the Video Card, so they are realy pushed by the 33% overclocked Intel BX. And you need a realy new Video Card to support this kind of overclocking.
So I think that the Intel BX have to be taken only as a reference...
I have writen programs to test compression using FWT, and I have some disagreements with other coments:
I think that should be noted that computing the Fast Wavelet Transform (FWT) is realy a lot slower than the Discrete Cosine Transform (DCT) in current hardware. The reason for this is that current compression standards (as JPEG, MJPEG (not realy a standard), H261/H263, MPEG1/2) compute the DCT in blocks of 8x8 pixels needing very few multiplications and aditions (on the order of 150 per block, 2.5 per pixel), and also exploiting the locality of data (64 pixels fit in the L1 cache of almost all current CPU's). On the other side, the computation of the FWT to be realy effective in image compression have to use block sizes larger (I think that 32x32 should be the minimum), having less locality of data, and needs about 4 multiplications per pixel using a 5 coefficient filter. (the number of coefficients in the filter affects the "blockiness" of the transform, with 2 coefficients you get a very blocky image, in my experience you need at least 6 to compress effectively "smoth" images). The only advantage in the computation of the FWT is that the code is very regular, so it is easy to write it using MMX, 3DNow or SSE. But stil it isn't as fast as an 8x8 DCT.
For stil image comression this isn't a problem, but if you want realtime compression and decompression, you are at trouble. Also, in Video compression you have to exploit the motion redundance, so it is fundamental that a "motion compensation" stage be applied before Transform coding. This stage is ussualy the slowest in video compression.
In other topic, I did test FWT in audio compression, but it was before I learned about physicoacoustical models... so it sounded very bad at same rate as audio mpeg layer 2. The audio mpeg compressors use sub-band filtering for band separation and FFT (fast fourier transform) only to determine based in the physicoacoustical model the best quantization factors on each sub-band. In mpeg layer 3 a DCT is used to separate bands (giving better resolution to the quantization). In theory FWT can be used instead of the sub-band filters, as they are very similar, but I think that it won't be any advantage.
Is there anything better than Fractint now? I too played with it for ages on a clunky old IBM PC with clicky keyboard and Windows 2 (although Fractint ran in DOS though, I think, and necessitated misc tweaking with graphics drivers to make it work, you kids don't know how lucky you are...)
You have the open-source "xaos" http://xaos.sf.net/ for a fast interactive fractal exploring and "Fraqtive", http://fraqtive.mimec.org/ for a beautiful view generator. Also, there are new versions of fractint, but the UI is really outdated. Wikipedia has a list with a few more, at http://en.wikipedia.org/wiki/Fractal-generating_software
Seriously, couldn't the marketing droids come up with a better name?
Sadly, this technology was called "Intel Dynamic Acceleration" (IDA) in Core-2 CPU's, but nobody noticed it. So, Intel tried with "Dual Dynamic Acceleration" (DDA), but again, nobody noticed. At last, renamed it to "Turbo Boost" and now everybody thinks it's something new.
So, after three attempts, it seems that the current name is the best.
-- but they failed to point out that Solaris also has a /tmp, and that, by default /tmp is actually partially backed by RAM, which is extremely convenient and useful from time to time, when you want a little piece of lightning-fast filesystem space, or want to eliminate disk as a variable in some sort of timing test.
In any new Linux distribution, /dev/shm is also backed by ram, so you can do:
Obviously, I had to copy four times the data to reach the slowness of Solaris :-)
Yes, a lot cheaper.
Don't fool yourself, those MIT robot's were suspended by a cable, so it isn't autonomous run.
Autonomous running is _very_ difficult.
Well, here we say "Estadounidences", its like "Unitedstatesians" in spanish. An many people simply say "gringos". We use "Americanos" to mean people from all the Americas, North-, Central- and South-America.
And seems that you also miss that America isn't a coutry! (we, at South America are very far from USA, and even in North America there are more than one country)
At least here in Chile, GSM has much more market share than CDMA. Also, the bigest operator, who uses TDMA, is switching to GSM.
And the GSM network is much more reliable!
I also think that in Brasil and Argentina, there are also operators switching to GSM.
Go developers! Boycot M$... Just made your programs allocate large ammounts of memory!
On a second tought, perhaps that's what M$ programs are already doing...
Some of those can be done in GDB:
1) The ability to set conditional breakpoints. See 'help condition'
2) The ability to see not only a variable's current value, but a stack of all of its previous values.
This is very difficult with compiled code, but can be emulated in GDB using "watch" breakpoints that run a user defined function.
3) The ability to select a variable's previous value and jump to the line of code that set it to that.
See (2)
4) The ability to change the value of a variable at any point.
Trivial in GDB.
5) The ability to add/change code on the fly.
Also difficult with compiled code, but you can assemble in GDB if you want.
6) The ability to jump into the debugger at any point in the program, even when I hadn't planned to before running it.
Well, joust press "Control+Z" and then attach GDB to the PID.
7) Auto-logging of method calls and (optionally) variable values, to be started and stopped as I see fit either while stepping through code or running it.
Also possible, but difficult.
Seems that you didn't read ther article, because You are plain WRONG!
He is comparing two CROSS-COMPILERS to MIPS code, so all that you say don't apply. The cross-compiling is the same task in each case.
Please, go and read the article.
No, gcc isn't a good x86 compiler. GCC is a multi-target compiler, designed for RISC processors, like the PowerPC. In x86 hardware, gcc produces about 10-15% slower code than Intel's own compiler in integer code, and about 30% slower code than Intel. If he had used gcc-3.0 targeted to pentiumpro (which he didn't, he used 386 target) the x86 wold be better stil.
The only thing that is fast in PowerPC is "altivec" instructions (faster than MMX and SSE), but nobody really uses it (except in hand-optimized code).
Nowadays GCC has very good general optimizations and a lot of x86 specific ones (386, 486, pentium, pentiumpro, amdk6). And in GCC 2.96 and upwards the x86 backend was rewriten for better optimizations, support for amd athlon, etc. ....
To enable these, you need something like:
gcc -O2 -mcpu=pentiumpro -march=pentiumpro
(this example is for pentimpro's and family, ie, PPRO, PII and PIII).
I have compared the code generated with that of Borland C and MSVC and it's generaly better (at least with C input).
But stil, it's not perfect. Look at http://gcc.gnu.org/proj-optimize.html , wich describes some probles as February 2000 (some of these are better now...).
One thing to consider is that Tom is not only benchmarking an overclocked Chipset, he is also benchmarking an overclocked Graphics card, Hard Drive, etc. Remember that the old Intel BX Chipset have syncronous clocks!!!.
Many of the test are dependant on the Video Card, so they are realy pushed by the 33% overclocked Intel BX. And you need a realy new Video Card to support this kind of overclocking.
So I think that the Intel BX have to be taken only as a reference...
I have writen programs to test compression using FWT, and I have some disagreements with other coments:
I think that should be noted that computing the Fast Wavelet Transform (FWT) is realy a lot slower than the Discrete Cosine Transform (DCT) in current hardware.
The reason for this is that current compression standards (as JPEG, MJPEG (not realy a standard), H261/H263, MPEG1/2) compute the DCT in blocks of 8x8 pixels needing very few multiplications and aditions (on the order of 150 per block, 2.5 per pixel), and also exploiting the locality of data (64 pixels fit in the L1 cache of almost all current CPU's).
On the other side, the computation of the FWT to be realy effective in image compression have to use block sizes larger (I think that 32x32 should be the minimum), having less locality of data, and needs about 4 multiplications per pixel using a 5 coefficient filter. (the number of coefficients in the filter affects the "blockiness" of the transform, with 2 coefficients you get a very blocky image, in my experience you need at least 6 to compress effectively "smoth" images).
The only advantage in the computation of the FWT is that the code is very regular, so it is easy to write it using MMX, 3DNow or SSE. But stil it isn't as fast as an 8x8 DCT.
For stil image comression this isn't a problem, but if you want realtime compression and decompression, you are at trouble.
Also, in Video compression you have to exploit the motion redundance, so it is fundamental that a "motion compensation" stage be applied before Transform coding. This stage is ussualy the slowest in video compression.
In other topic, I did test FWT in audio compression, but it was before I learned about physicoacoustical models... so it sounded very bad at same rate as audio mpeg layer 2.
The audio mpeg compressors use sub-band filtering for band separation and FFT (fast fourier transform) only to determine based in the physicoacoustical model the best quantization factors on each sub-band. In mpeg layer 3 a DCT is used to separate bands (giving better resolution to the quantization).
In theory FWT can be used instead of the sub-band filters, as they are very similar, but I think that it won't be any advantage.
Perhaps you should try mesa 3.1 beta, it has
many acelerations for 3dNow!