Domain: fftw.org
Stories and comments across the archive that link to fftw.org.
Comments · 41
-
Re:shipping java scientific software for 15 years
Not just can be, it usually is faster. At least, once it's been JITed. We just ran some XML serialization/deserialization tests, and the java implementation was much faster than the C++ one...eventually.
What about pure numerical stuff? I'm wondering how say, a Java implementation of FFT would fair against the FFTW library (coded in C).
Of course you could make a native call to the FFTW library (MATLAB does this) through Java, but thats missing the point.
-
Re:Environment, conditions and parameters
> I would be quite surprised if any C++ can execute a FFT as fast as my Leahy FORTRAN95.
Have you looked at the Fastest Fourier Transform in the West (FFTW)? It's written in C - but the funny thing is, it's written in C by an OCaml program.
I think this is the way forward for truly performance intensive code. Not doing it in C++, but writing dedicated compilers for specific subroutines, churning out C, assembly, or compiler specific intermediate language code. Functional languages should excel at this, they have been ruling the program transformation/analysis space for a long time.
-
Re:Cool processor - No, they can't
Actually that's not exactly correct, each iteration is also parallelizable. Most of the work in an iteration is a FFT, which is parallelizable.
http://www.fftw.org/parallel/parallel-fftw.html
It's less efficient to do this than using each core for one independent number, so it's only used if quick checking of a number is desired (for example, when double-checking a previously found prime number).
-
Re:"functional programming languages can beat C"
C really isn't that great.
See here for some info on C perf from a googler, and FFTW for the fastest code around for doing FFTs.
In both cases, Objective CAML (a high-level and functional language) smokes C. Both are in areas (in particular FFTs) where people have been optimizing implementations for decades. -
Re:To Elaborate on the SubmissionI understand your frustration at there not being a "standard" package to solve EM (or scalar wave) problems -- I have ranted about this quietly on my own for a while. One would think that with the equations of Maxwell nearly 150 years old there should be some pretty standard solver techniques out there that would have been packaged up by now covering practically everything. The problem is - while it's easy to write down the equations and (naieve) methods of solving them the nitty-gritty of it all is both important and far more tricky than meets the eye! Each problem domain has its own issues and idiosyncrasy's. Likewise if you are interested in some quantaties more than others (e.g. far field / near field) that can drastically change your approach. Ultimately to have any chance of success you must approximate and the art of the approximation you choose is what matters. As the saying goes "If you want to go there, I wouldn't start from here".
If you are trying to carry out some sort of electrically large scattering problem through inhomogeneous anisotropic materials - you are in for a tough ride. Unless you can approximate things away furiously you will soon find the problem computationally intractable.
It sounds to me as though you really need to get a feel for the basics before embarking on anything too heavy. Time spent in reconnaissance is rarely wasted. Once you have an intuitive idea of how things work you will probably better understand the problem - hence be able to pick an appropriate solver.
A good general starting point in my opinion (particularly in the scalar case) is the use of pseudospectral methods. These will allow you to describe the field propagating through materials in a reasonably tractable manner - they are not too much effort to understand, reasonably quick thanks to the magic of FFTW and surprisingly robust.
I suspect your problem breaks down into three distinct domains:
- Getting the excitation field to the interaction region
- Modelling the (potentially complicated) interaction of the field with the surface
- Getting the field back from the interaction region to the detector.
Since the excitation is presumably beam-like, a pseudospectral technique (particularly one with coordinate scaling) will probably help with 1) and 3). With finite difference techniques you must model the field step-by-step through space. With FFT methods you can jump from one plane to the other - this can be orders of magnitude faster than finite difference.
How you manage 2 is the tricky part! The detail of this will depend strongly on what the material interaction is (e.g. will a scalar approximation suffice). I highly recommend you read Weng Cho Chew, Waves and Fields in Inhomogeneous Media for some pointers. Other things to look into:
- Green's function techniques (see, e.g. Martin et. al. for an accessible start point).
- Transfer matrix methods (see, e.g. Barns and Pendry)
- Discrete dipole scattering (see, e.g. Bruce Draine's DDSCAT)
- Multiple multipole methods (see, e.g. C. Hafner
- Finite Difference Time Domain (e.g. see the excellent MEEP from MIT) (see my warning below)
- Basis expansions and stratified media (similar to transfer matrix) see. Chew for details)
A
-
I see the Intel fanboys have shit on your post
Go here:
http://www.fftw.org/
Download and compile it (and get your free compiler here, because Sun Studio 12 is 5% or more faster than GCC even on Linux...). Do so on both AMD and Intel chips.
Now, benchmark both architectures.
FWIW, my 2.4GHz Opteron runs FFTW about 30 to 40% faster than my 3.2GHz Xeons. Even when the Opteron is running the 32-bit binaries compiled for the Xeon.
PS - when you compile FFTW, make damn sure you disable ALL inlining... -
Your error analysis is totally wrongThe Cooley-Tukey FFT algorithm and its variants, which is what they are using, has much better error characteristics than you think.
In floating-point arithmetic, the algorithm was proved in 1966 to have an upper bound for the error that grows only as O(log N), and the mean (rms) error grows only as O(log N). (See this page for more info.) (Errors in fixed-point arithmetic are worse, going as N.)
Even in single precision, the errors for their FFT sizes are probably quite reasonable, assuming they haven't done something silly like use an unstable trigonometric recurrence.
-
Not even numerical analysis.Except for certain computer-based classes like Numerical Analysis, undergrad-level math hasn't changed in the past 100 years
Even that has not changed much in 30 years. Fortran 77 is still used and the techniques are the same as they were when Newton and then Coates thought them up. The thing that has changed is the maturation of GNU tools and the availability of great numerical packages like the Fastest Fourier Transforms in the West. A text on the subject should contain a chapter of practical free computing, but this has little to do with the principles involved.
-
Re:More emphasis on functional languages.
Firstly, Haskell isn't strict, it's lazy. Secondly, this isn't funny.
Code written in languages with strong algebraic properties like referential transparency is ideal for doing automatic high-level transformations of code in order to increase parallelism.
As architectures get more complicated with multiple processors and multiple pipelines, we will more and more want to rely on automatic tools to search for good ways to structure code, from breaking up major processes right down to instruction level scheduling. Using high-level information about the task at hand and the components which make it up, which isn't present in lower level languages will be important in this task.
As a precursor to this, have a look at FFTW, the Fastest Fourier Transform in the West. On the surface, it appears to be written in C, but that C code is not entirely written by hand, it is machine generated by an O'Caml program with some very specific high-level knowledge of the problem (applying some mathematics to do a directed search for an implementation of an FFT of any size which is fast on the given platform). Basically, it's using high-level properties of the problem in order to obtain very fast code implementing a solution. The more information available to a compiler, the better.
Haskell itself already provides higher level information about the overall structure of a computation, leaving more of the details to the compiler than say, your average C++ or Java program. The implementations aren't totally killer yet, but there's a *lot* of untapped potential there.
(Even now, GHC is placing first and second on the computer language shootout with default settings.)
Haskell itself isn't quite ideal for high-level machine transformation of code, but I'd contend that it's certainly a practical starting point, and it's certainly my favourite programming language to actually get things done in.
- Cale -
Performance is irrelevant here
I doubt that Apple's move to Intel had a great deal to do with performance, and I dislike this fact being used as a key selling point for the iMac. If you refer to the "definitive" G5 vs. everyone else benchmarks at http://www.anandtech.com/mac/showdoc.aspx?i=2436 it is apparent that the G5 is largely comparable to offerings from AMD and Intel (admitedly the new Intel Core Duo is not benchmarked) and although the G5 is, in many cases, not the fastest chip, it is similar. The increases of 2-3x in performance between the G5 and MacIntel iMac are a consequence of having a dual core chip (and being a generation ahead of the G5) besides, Apple could have feasibly used the dual-core G5 chips that they've had at their disposal for a while now. Any Mac zealot will argue that their PowerPC Mac is "just" as fast as an intel based system, but performance is NOT the issue. This is why the iMac was updated first, it is a consumer product, supporting Apple's fledgling attempts to enter the living room (consider front row ) - it desperately needs Intel's brand name associated with its hardware.
The significance of this new product is long term and cannot be underestimated.
Apple finanlly has penetrated the consumer electronics market with the iPod, and their brand recognition and image could not be better. Apple has shoehorned its way into the psyche of the common man. It now has to bring its key product, the mac, to the masses. Consumers will be attracted from a design perspective and because it shares the same logo as their iPod, the OS is a little different to windows, but now at least you have the reasurrance of dual booting into windows (I'd like proof of this concept, but I'm sure it will come) and the processor gives the security of a well recognised brand name (consider brand strengh of Intel vs. AMD).
In the future, I doubt that IBM's die shrunk Power chips will share the low power consumption that I expect Intel will bring, and many concepts for great products will never be realised. I'll be interested to see if the new Intel chips can match up to the PowerPC altivec-ised vDSP FFT's , but in a way I don't care. It is an exciting time to be a Mac user, as more people join the fantastic experience that we have had for so long, and new software and hardware comes our way. Either way, they're finally here and it will be interesting to see what the future holds. -
Performance is irrelevant here
I doubt that Apple's move to Intel had a great deal to do with performance, and I dislike this fact being used as a key selling point for the iMac. If you refer to the "definitive" G5 vs. everyone else benchmarks at http://www.anandtech.com/mac/showdoc.aspx?i=2436 it is apparent that the G5 is largely comparable to offerings from AMD and Intel (admitedly the new Intel Core Duo is not benchmarked) and although the G5 is, in many cases, not the fastest chip, it is similar. The increases of 2-3x in performance between the G5 and MacIntel iMac are a consequence of having a dual core chip (and being a generation ahead of the G5) besides, Apple could have feasibly used the dual-core G5 chips that they've had at their disposal for a while now. Any Mac zealot will argue that their PowerPC Mac is "just" as fast as an intel based system, but performance is NOT the issue. This is why the iMac was updated first, it is a consumer product, supporting Apple's fledgling attempts to enter the living room (consider front row ) - it desperately needs Intel's brand name associated with its hardware.
The significance of this new product is long term and cannot be underestimated.
Apple finanlly has penetrated the consumer electronics market with the iPod, and their brand recognition and image could not be better. Apple has shoehorned its way into the psyche of the common man. It now has to bring its key product, the mac, to the masses. Consumers will be attracted from a design perspective and because it shares the same logo as their iPod, the OS is a little different to windows, but now at least you have the reasurrance of dual booting into windows (I'd like proof of this concept, but I'm sure it will come) and the processor gives the security of a well recognised brand name (consider brand strengh of Intel vs. AMD). In the future, I doubt that IBM's die shrunk Power chips will share the low power consumption that I expect Intel will bring, and many concepts for great products will never be realised. I'll be interested to see if the new Intel chips can match up to the PowerPC altivec-ised vDSP FFT's , but in a way I don't care. It is an exciting time to be a Mac user, as more people join the fantastic experience that we have had for so long, and new software and hardware comes our way. Either way, they're finally here and it will be interesting to see what the future holds. -
cost and performance are irrelevant here
I doubt that Apple's move to Intel had a great deal to do with performance, and I dislike this fact being used as a key selling point for the iMac. If you refer to the "definitive" G5 vs. everyone else benchmarks at http://www.anandtech.com/mac/showdoc.aspx?i=2436i
t is apparent that the G5 is largely comparable to offerings from AMD and Intel (admitedly the new Intel Core Duo is not benchmarked) and although the G5 is, in many cases, not the fastest chip, it is similar. The increases of 2-3x in performance between the G5 and MacIntel iMac are a consequence of having a dual core chip (and being a generation ahead of the G5) besides, Apple could have feasibly used the dual-core G5 chips that they've had at their disposal for a while now. Any Mac zealot will argue that their PowerPC Mac is "just" as fast as an intel based system, but performance is NOT the issue. This is why the iMac was updated first, it is a consumer product, supporting Apple's fledgling attempts to enter the living room (consider front row ) - it desperately needs Intel's brand name associated with its hardware.
The significance of this new product is long term and cannot be underestimated.
Apple finanlly has penetrated the consumer electronics market with the iPod, and their brand recognition and image could not be better. Apple has shoehorned its way into the psyche of the common man. It now has to bring its key product, the mac, to the masses. Consumers will be attracted from a design perspective and because it shares the same logo as their iPod, the OS is a little different to windows, but now at least you have the reasurrance of dual booting into windows (I'd like proof of this concept, but I'm sure it will come) and the processor gives the security of a well recognised brand name (consider brand strengh of Intel vs. AMD).
In the future, I doubt that IBM's die shrunk Power chips will share the low power consumption that I expect Intel will bring, and many concepts for great products will never be realised. I'll be interested to see if the new Intel chips can match up to the PowerPC altivec-ised vDSP FFT's , but in a way I don't care. It is an exciting time to be a Mac user, as more people join the fantastic experience that we have had for so long, and new software and hardware comes our way. Either way, they're finally here and it will be interesting to see what the future holds. -
iMacs performance is irrelevant
I doubt that Apple's move to Intel had a great deal to do with performance, and I dislike this fact being used as a key selling point. If you refer to the "definitive" G5 vs. everyone else benchmarks at http://www.anandtech.com/mac/showdoc.aspx?i=2436 it is apparent that the G5 is largely comparable to offerings from AMD and Intel (admitedly the new Intel Core Duo is not benchmarked) and although the G5 is, in many cases, not the fastest chip, it is similar. The increases of 2-3x in performance between the G5 and MacIntel iMac are a consequence of having a dual core chip (and being a generation ahead of the G5) besides, Apple could have feasibly used the dual-core G5 chips that they've had at their disposal for a while now. Any Mac zealot will argue that their PowerPC Mac is "just" as fast as an intel based system, but performance is NOT the issue. This is why the iMac was updated first, it is a consumer product, supporting Apple's fledgling attempts to enter the living room (consider front row) - it desperately needs Intel's brand name associated with its hardware.
The significance of this new product is long term and cannot be underestimated.
Apple finanlly has penetrated the consumer electronics market with the iPod, and their brand recognition and image could not be better. Apple has shoehorned its way into the psyche of the common man. It now has to bring its key product, the mac, to the masses. Consumers will be attracted from a design perspective and because it shares the same logo as their iPod, the OS is a little different to windows, but now at least you have the reasurrance of dual booting into windows (I'd like proof of this concept, but I'm sure it will come) and the processor gives the security of a well recognised brand name (consider brand strengh of Intel vs. AMD).
In the future, I doubt that IBM's die shrunk Power chips will share the low power consumption that I expect Intel will bring, and many concepts for great products will never be realised. I'll be interested to see if the new Intel chips can match up to the PowerPC altivec-ised vDSP FFT's, but in a way I don't care. It is an exciting time to be a Mac user, as more people join the fantastic experience that we have had for so long, and new software and hardware comes our way. Either way, they're finally here and it will be interesting to see what the future holds. -
Scientific Programming is HARD HARD HARD.
MacOS is not going to magically turn into Windows or Linux just because there's Intel Inside. Mac development will be unchanged, with some marginal exceptions.I dunno, maybe this falls into the "marginal" category, but "scientific" [or "mathematical"] programming is really REALLY REALLY difficult.
For instance, take a gander at the list of FFTs catalogued at BenchFFT:
http://www.fftw.org/benchfft/ffts.html
Then look at their relative performances for speed and accuracy:http://www.fftw.org/speed/
In many instances, the software and hardware engineers at the companies that build the chips [Intel and AMD] can't write FFTs that are as efficient as third-party vendors [or hobbyists], which should be a pretty good indication that something as ostensibly straightforward as writing an FFT routine is just fantastically complicated in practice.So if you're a company with a lot of low-level proprietary software that's tuned for the x86/x86-64 instructions sets, or for classical PCI bus communications [apparently PCIe is very backwards compatible], or for the nVidia/ATI/Oxygen instruction sets, then your porting job just got a heckuva lot easier if you don't have to deal with PPC RISC, Altivec, etc etc etc.
-
Scientific Programming is HARD HARD HARD.
MacOS is not going to magically turn into Windows or Linux just because there's Intel Inside. Mac development will be unchanged, with some marginal exceptions.I dunno, maybe this falls into the "marginal" category, but "scientific" [or "mathematical"] programming is really REALLY REALLY difficult.
For instance, take a gander at the list of FFTs catalogued at BenchFFT:
http://www.fftw.org/benchfft/ffts.html
Then look at their relative performances for speed and accuracy:http://www.fftw.org/speed/
In many instances, the software and hardware engineers at the companies that build the chips [Intel and AMD] can't write FFTs that are as efficient as third-party vendors [or hobbyists], which should be a pretty good indication that something as ostensibly straightforward as writing an FFT routine is just fantastically complicated in practice.So if you're a company with a lot of low-level proprietary software that's tuned for the x86/x86-64 instructions sets, or for classical PCI bus communications [apparently PCIe is very backwards compatible], or for the nVidia/ATI/Oxygen instruction sets, then your porting job just got a heckuva lot easier if you don't have to deal with PPC RISC, Altivec, etc etc etc.
-
Scientific Programming is HARD HARD HARD.
MacOS is not going to magically turn into Windows or Linux just because there's Intel Inside. Mac development will be unchanged, with some marginal exceptions.I dunno, maybe this falls into the "marginal" category, but "scientific" [or "mathematical"] programming is really REALLY REALLY difficult.
For instance, take a gander at the list of FFTs catalogued at BenchFFT:
http://www.fftw.org/benchfft/ffts.html
Then look at their relative performances for speed and accuracy:http://www.fftw.org/speed/
In many instances, the software and hardware engineers at the companies that build the chips [Intel and AMD] can't write FFTs that are as efficient as third-party vendors [or hobbyists], which should be a pretty good indication that something as ostensibly straightforward as writing an FFT routine is just fantastically complicated in practice.So if you're a company with a lot of low-level proprietary software that's tuned for the x86/x86-64 instructions sets, or for classical PCI bus communications [apparently PCIe is very backwards compatible], or for the nVidia/ATI/Oxygen instruction sets, then your porting job just got a heckuva lot easier if you don't have to deal with PPC RISC, Altivec, etc etc etc.
-
Re:Software...
"PowerPC has certain advantages for certain computational processes, which is THE REASON WHY THEY USED MAC IN THE FIRST PLACE."
True,
Compare the 2GHZ G5 mac to the 2.8GHZ Xeon,
In FFT tests, very few processors can beat Apple's veclib libaries - these algorithms form the backbone of a great deal of scientific analysis (the diverse spectrum of fields converning signal analysis) and clearly demonstrate the great advantage altivec can offer, this combined with lower power consumption and good FP performance makes the G5 and excellent choice for clusters.
(lets ignore the G5's memory latency...) -
Re:Software...
"PowerPC has certain advantages for certain computational processes, which is THE REASON WHY THEY USED MAC IN THE FIRST PLACE."
True,
Compare the 2GHZ G5 mac to the 2.8GHZ Xeon,
In FFT tests, very few processors can beat Apple's veclib libaries - these algorithms form the backbone of a great deal of scientific analysis (the diverse spectrum of fields converning signal analysis) and clearly demonstrate the great advantage altivec can offer, this combined with lower power consumption and good FP performance makes the G5 and excellent choice for clusters.
(lets ignore the G5's memory latency...) -
Why? Altivec-optimized libraries supplied by Apple
You really don't need macstl unless you have a strong desire to use valarray in C++...for example, the ATLAS project http://math-atlas.sourceforge.net/ already uses Altivec (and SSE/SSE2, etc) wherever it results in a speedup. So, if your code does linear algebra, use ATLAS and you'll see an automatic speedup in many cases. Other projects such as fftw http://fftw.org/ include Altivec/SSE/SSE2 optimizations as well. ATLAS includes lots of other optimizations such as cache-blocking, loop-unrolling, etc. I don't know of macstl includes such optimizations, but I do know that ATLAS performance approaches the theoretical peak performance on G4/G5 for things like matrix-matrix multiplication.
Not only that, but Apple's vecLib http://developer.apple.com/ReleaseNotes/MacOSX/vec Lib.html includes ATLAS so you don't even have to download or install anything - it comes with OS X. -
Re:great language, but not quite general purpose
Have you taken a look at fftw? http://www.fftw.org/
FFTW runtime is in C, but the C code is generated by a program written in Ocaml.
The point being not that one should write all code in ocaml, but in some cases its useful to know some of these techniques such as meta-programming, symbolic computation etc.
I learnt ocaml for one main reason, it helps to learn a new style/paradigm for programming. That has helped me create better code (for certain types of problems) even in imperative languages like java. A toolkit approach to languages is certainly valuable, especially for learning about constructs like closures (ocaml), continuations (scheme), backtracking (mercury), lazy evaluation (haskell, concurrent clean), massive concurrency (erlang), coroutines (ruby, sather etc). -
Further clarificationFFTW is written in C, so I was confused what it had to do with Ocaml. According to the faq:
FFTW is written in ANSI C. Most of the code, however, was automatically generated by a program called genfft, written in the Objective Caml dialect of ML. You do not need to know ML or to have an Objective Caml compiler in order to use FFTW. genfft is provided with the FFTW sources, which means that you can play with the code generator if you want. In this case, you need a working Objective Caml system. Objective Caml is available from the Caml web page.
-
Re:FFTW
For those who haven't heard of it, FFTW stands for Fastest Fourier Transform in the West, and it's a library for computing some discrete Fourier transforms really quickly.
-
FFTW?
I misread the post, I thought the complaints were about the speed of its FFTW, the Fastest Fourier Transform in the West! I thought the author wanted to criticize the quality of its fourier transform functions.
-
other offenders
I heard this closed source program uses the FFTW (Fastest Fourier Transform in the West). I feel it is my duty as an agent of the communist state to inform you.
-
Not FUD, but not correct, either.mcc wrote:
Code Generation is for people who don't understand or are too lazy for abstraction
Baloney. ...Code generation is a practical, efficient tool for solving many problems where OO-style abstraction need not enter the picture. One such class of problems is building interfaces and glue code from external specifications.
A few years ago, I wrote a simple code generator that reads the SQL DDL for a large database and generates an object-based interface to the database. Client coders could then use the object-based interface to access the database. The advantages of this approach proved to be numerous:
- Single, authoritative reference specification. The object interface was always in sync with the reference, which for this project was the database schema.
- Richer compile-time error detection. The projection of the schema into the object interface was fully available to the type system so that many kinds of client errors could be caught at compile time, not run time.
- Reduced opportunity for errors between subsystem boundaries. Because the object-based interface was generated by machine from the actual database -- and not derived from some programmer's understanding of the database -- there were fewer opportunities for impedance mismatch across the boundaries of the application code and the database. (Studies of errors in complex projects have shown that errors are more common between subsystem boundaries, and so this benefit is important.)
But I cannot think of any case in an object-oriented language where it would be both less work and more maintainable to write a code generator than to just abstract away the parts that would be auto-generated.
If you can't think of any such cases, it's because you're thinking too small. Look at the bigger picture. For starters:- When the number of variables affecting the desired code characteristics is large enough to make hand-coding (at any level of abstraction) impractical. E.g., FFTW: "FFTW uses a code generator to produce highly-optimized routines for computing small transforms."
- When your code must conform to an external reference specification that changes rapidly enough to make hand coding (at any level of abstraction) impractical. (See my example above.)
- When the requirement for correctness is so stringent as to make hand-coding methods impractical, mandating code generation from a formal specification.
- When you must target an output language whose native abstraction capabilities are too crude to capture directly the degree of abstraction that is merited. Believe it or not, most popular OO languages fall into this category for many commonly occuring problems. Hence the popularity of design patterns. (Compare, e.g., with the abstraction capabilities of modern functional programming languages like Haskell and O'Caml.)
-
Re:Turning the FFT into an integer monster.
You can definitely do the FFT in integer arithmetic. I'm not sure of the details. Fixed point arithmetic is something I don't think about very much. It would be more work than floating point and it doesn't sound very fun to me. But then I'm very lazy, as my presence goofing off here proves. Still, I wish you good luck. A good integer FFT is a noble goal.
Concerning your attempt to get rid of multiplications, the Winograd FFT reduces the number of multiplications to O(n) but the number of additions increases (to O(n^2) if I remember correctly.) For large n this is obviously not useful unless multiplication is much slower than addition (which is not the case on todays computers.) It is however sometimes used for short length modules in mixed radix FFT algorithms.
If you are interested in efficient FFT, a web search turned up the following, which seems to have a ton of references (including the Winograd paper): burrus-notes There is also a good general reference book by Charles Van Loan published by the Society for Industrial and Applied Mathematics: SIAM It's amazing how much work is done in this area. The FFT is very important.
-
FFTW 3.0 beta has been released
Well, FFTW 3.0 beta has been released, and it includes Altivec support. I would be very surprised if this new FFTW wasn't extremely competitive with all the benchmarks presented on this page, if not better.
Anyone want to add it to the benchmark? -
Old story?
I wonder when this benchmark was published, because the FFTW homepage already offers a 3.0beta for download, including SIMD support (SSE/SSE2/3DNow!/Altivec) support.
For applications where raw speed is not very important I recommend everyone to use the fftw library, it is already installed on a lot of systems and easy in use. Very fast are Intel's SDK version and DJ Bernstein's (only 256 points). -
Look at FFTW, for instanceFFTW, "the fastest Fast Fourier Transform in the West", is an implementation of the DFT (Discrete Fourier Transform) that was developed as part of a research project at MIT. FFTW is released under the GPL, and Section 1.4 in their FAQ one can read that they are using the system you describe:
We could instead have released FFTW under the LGPL, or even disallowed non-Free usage. Suffice it to say, however, that MIT owns the copyright to FFTW and they only let us GPL it because we convinced them that it would neither affect their licensing revenue nor irritate existing licensees.
-
Look at FFTW, for instanceFFTW, "the fastest Fast Fourier Transform in the West", is an implementation of the DFT (Discrete Fourier Transform) that was developed as part of a research project at MIT. FFTW is released under the GPL, and Section 1.4 in their FAQ one can read that they are using the system you describe:
We could instead have released FFTW under the LGPL, or even disallowed non-Free usage. Suffice it to say, however, that MIT owns the copyright to FFTW and they only let us GPL it because we convinced them that it would neither affect their licensing revenue nor irritate existing licensees.
-
R (aka GNU S)For my thesis in astrophysics, I have almost exclusively used the R-system. I find it brilliant. It was developed for statistic, but IMHO, it can be used for any numerical computational task, though in some areas, it may need more development (for example, it lacks 2D FFT, but that should be easy to fix.
R comes with Woody (Next Debian release).
-
Re:What's wrong with Haskell?
I happen to think that Haskell [haskell.org] is one of the semantically cleanest languages out there
I was going to say in my earlier post that Haskell had the cleanest syntax and semantics, but I figured that regarding the latter some smart-aleck (which I suppose is now going to be me) would respond with a comment like this: Oh, really? Care to provide the clean semantics for Haskell's run-time space-consumption characteristics? At present the semantics are often best described by, "run it, and see what happens," although after a while one does develop a feel for it (which is incorrect more than one would like).For example, consider the expression,
foldl1 (*) [1
which literally means .. 1000](...((((1 * 2) * 3) * 4) * 5)
and, by the way, is equivalent to your (fact 1000). Now, how much space is going to be consumed by its evaluation? ... 1000)Humans can easily see that the expression can be evaluated left to right, treating the operator (*) as strict, and, with the list defined by [1..1000] being built lazily, it is possible to evaluate the expression in constant space:
= foldl1 (*) [1
But is this what Haskell guarantees? Nope. An implementation might also do it like this: .. 1000]
= foldl (*) 1 [2 .. 1000]
= foldl (*) 2 [3 .. 1000]
= foldl (*) 6 [4 .. 1000]
= foldl (*) 24 [4 .. 1000]
= ...= foldl1 (*) [1
In this case the accumulator in foldl builds up a linear chain of thunks, which is evaluated only after the entire list is consumed. .. 1000]
= foldl (*) 1 [2 .. 1000]
= foldl (*) (1*2) [3 .. 1000]
= foldl (*) ((1*2)*3) [4 .. 1000]
= foldl (*) (((1*2)*3)*4) [5 .. 1000]
= ...You and I can see that the second method is wasteful, but the language definition provides no guidance as to whether a Haskell implementation will choose the second or the more-efficient first. (In fact, GHC would until recently choose the second for this expression.)
Don't get me wrong. Haskell is, without a doubt, my favorite language. It's what I used for my entry in this year's ICFP Programming Contest, and I consider it the best hope for a truly great mainstream functional programming language. I love Haskell. But its lack of intuitive space-consumption semantics is a serious weakness.
And, regarding your fact example, it doesn't really show off Haskell's semantics as much as its syntax. Why not show the classic Fibonacci Series implementation, which highlights Haskell's non-strict evaluation semantics?
fibs@(_:fibs') = 1 : 1 : zipWith (+) fibs fibs'
Now, if that isn't a beautiful line of code, I've never seen one.The fastest FFT library out there is in C, but the C code itself was generated by Haskell code
Actually, the FFTW code was generated by code written in O'Caml not Haskell.Cheers,
Tom -
Re:Piece of cakeAs far as an FFT engine goes, I've found that the FFTW library (it's GNU!) is very easy to use and not particularly processor intensive. There's no reason to rewrite code when people have made the "Fastest Fourier Transform in the West!"
:)I used it on a real-time guitar sound effects processing program (never released) that I wrote a year ago and never had any problems using it.
-
The Future of Programming Languages for me . . .. . . is lisp.
Ok, well I don't really know, that's just what I wanted to try out next. I thought I'd share my reasons and see what other people had to say.
Basically, I've observed the following characteristics about the best code I've worked with:
- Code generates code.
- A good example of this is the
- Fastest Fourier
Transform in the West. You can re-run the code generator to generate code
optimized for the size of transforms you will be using. The other common use of
this is in interfaces. Most programming shops have a set of code in house that
generates reader code for ascii parameter files ( like
.rc files or .ini files). The one I worked with used lex and yacc; it would parse a C struct definition and generate an example file and the reader code which would create the struct. The macros that any complecated cross-platform program seems to end up needing are another example. - Small chunks of "interpreted" instructions are generated at runtime. I think the "plan" that fftw makes you generate for a given transform and size, and then pass in with the data, is like this but I'm not sure. Some of the stuff I wrote was "generators" for certain classes, that essentially stored the arguments to a constructor, to be called (perhaps repeatedly) at another time. I think a lot of CORBA based interfaces basically pass along the code to interpret the data along with the data.
My experience with lisp is limited; scheme was my language of choice for any projects I did as an undergrad, and I haven't really dabbled in it much since. The reasons why I think lisp (well, common lisp in particular) might address these issues well is the relative ease of generating code on the fly, and the good reputation of the macro system.
From a little bit of reading newsgroups, and from talking to older lisp hacker types, it seems that one of the problems that lisp suffers from is that the commercial lisp implementations are too expensive and not compatable with each other, so people get locked into using one lisp because the effort to port to another is too expensive, and once these companies had you by the balls they will essentially calculate your profit margin and charge that. So I plan to only use one of the GPL'd versions, probably CMU. Is the CMU version so bad that I'm dooming myself from the start by doing this ?
-
Sure, that'll work
If you release the algorithm under the GPL, commercial software interests won't touch it with a ten-foot pole. It makes a rather effective, and quite free-software-friendly way of protecting your IP, while keeping your options open to alternate licensing schemes (that are more agreeable to said software interests).
Look at FFTW for a case study of this approach. It's a library for performing Fourier transforms, distinguished for being faster than anything out there. They have it available as a free, GPL'ed download, and sell commercial licenses for a pretty penny. -
A counterexampleBTW, an experienced assembly coder will always beat or at least equal the optimizing compiler, because if nothing else works he can always look at what the compiler produces and see if he can improve on that. Besides, optimizing compilers are good, but not that good, someone has to write them, and when was the last time that you wrote a program that can solve complex creative problems better than you can?
There's a wonderful counterexample to your arguments, which is FFTW (the fastest fourier transform in the west). This is a code that dynamically generates a highly optimized algorithm for doing FFTs. It is written in 100% C, and outperforms (not all the time, but often) hand coded highly optimized FFT routines.
It does this by actually running timing tests, and using the results to optimize the algorithm for your computer, for the size of array you want transformed. It is quite amazing. Here are the benchmarking results from a RS6000 of some sort, comparing (among others) FFTW with ESSL, the IBM hand optimized FFT library.
This is sort of a strange example, in between JIT compilers and normal compilers, and is a case where the compiler actually knows a heck of a lot more than the programmer does. Anyhow, figured I'd point out a cool code that is able to optimize itself.
-
Re:Profoundly counterintuitive?
Besides, optimizing compilers are good, but not that good, someone has to write them, and when was the last time that you wrote a program that can solve complex creative problems better than you can?
I haven't done that myself, but some other people had. The FFTW people have created a DFT library written in ANSI C that outperforms nearly all other DFT libraries, many of which are provided by the hardware vendors who hire the best programmers and use assembly code extensively. Most of the code in the library are generated by a program written in OCaml.
And judging from the number of patents and papers in IEEE journals, I believe that creating a fast DFT library is indeed a complex and creative problem. -
Faster FTs = better "standard" compression?
I'm no expert, but I've been working with the FFTW - "the Fastest Fourier Transform in the West" - and it's really kick-ass fast. I've been wondering what kind of FT algorithm these image-compression guys tend to use. I figure, a faster FT means less time required to compress and decompress data, which in turn means that more detailed data that once was unfeasibly large and slow to process will not cease to be large, but may cease to be slow. May this yield a feasible alternative? FTs are "the workhorse of DSP" - a very well-known and familiar technique. So why not? (I'm sure there's a damn good reason why not...)
-
Re:What's the big deal?
"Design goals? Closest they mention is that it's "fast" and written in assembler. So is an FFT, but it's got a designed purpose and scope."
The Fastest Fourier Transform in the West is completely written in C. -
Re:I've seen the code. Lots _could_ be done:proof
I agree. The fastest FFT algorithm that I know of is called the FFTW (Fastest Fourier Transform in the West) and it can be downloaded for free at http://www.fftw.org. I don't know if this is what the SETI team is using or not.
-
I know a few ...
Sinectonalysis has a few libraries for linux. I have no experience with them though
...For FFT there is fftw and djbfft