Finance, Scientific Users Get ActivePython Updates
jcasman sends along this clip from PCWorld: "ActiveState has added three open source mathematics libraries to its ActivePython Python distribution that might interest financial and scientific computing markets, the company announced Thursday. The packages are being added, in part, to anticipate the demand that may arise from new proposed rules for the US financial community brought about by the US Securities and Exchange Commission. ... In April, the government agency posted a set of proposed rules for handling asset-backed securities that called for financial firms to disclose, along with their prospectus filings, the source code of the programs that generated the filings, as rendered in Python. The government agency will be accepting input about the proposed rule until August 2. The three libraries that are being added to the ActivePython package are NumPy, SciPy, and matplotlib."
IOW fudging the numbers.. only faster and easier
For justice, we must go to Don Corleone
These are great and free tools for making publication-quality plots as well as the analysis of the data.
Okay so I'm the lead Microsoft certified developer at a Fortune 500 company. I have to get this crappy software integrated with our project that is running as a Win32 application by the end of the month and so I downloaded the packages and dragged and dropped them into Microsoft Visual Studio. Then I created a VB file that basically calls SciPy.getSECReport(somedata) and nothing happens. I get some "Error Method or data member not found" even though the stuff I downloaded was unzipped and dragged right into the rest of my libraries in my WIN32 directory.
... as well as dragging and dropping the excel files onto the packages (I'm not an idiot, I've tried everything). I even went to SourceForge to find documentation on this crap and there is nothing. Nothing! Can someone help me figure this crap out? It's impossible to use, the open source nuts have made it so it isn't streamlined and integrated with Windows. I mean, I'm a pretty talented hacker (MSCD and everything) and this just goes to show how crappy open source can be.
Oh and I also tried double clicking the packages and nothing happens
Will someone please rewrite this in Visual Basic so I can do my job?
That link is unrelated to this post. The referenced scientific/numeric libraries for python are implemented in C as native modules. They are not just as fast as the equivalent code written in C, they *ARE* the equivalent code written in C, merely interfaced to with python. You might lose a meaningless tiny fraction of time preparing your vectors for processing in python, but you'll save several orders of magnitude more time not worrying so much about malloc corruption.
I read the script, and I think it would help my character's motivation if he was on fire. -Bender
Which might be great if you need to do a one-off calculation. If it takes you 5 minutes instead of 10 mins and runs 10 seconds instead of 1 second then you're saving 4 minutes 51 seconds :)
For about an average of half the lines of code they might use in C, scientists ...
Scientists?! You're probably joking but I've been over to accounting and they're using Excel and *shudder* Access for all their heavy data lifting. Sometimes they need help and if they're kind enough I don't lie about how much I know about those ancient products. Most importantly they're not scientists. They're accountants and business people ... they don't care if they have to wait five minutes for Excel to open a worksheet containing the entire set of order histories of our company.
...
I don't think these packages are intended for NASA space mission flight certified calculations. Just something to really help you out if you want to comply with the SEC. Side bonus, I'll bet that when you submit this code, you're going to achieve compliance a lot faster when the SEC only has to check half the lines of code and not analyze your memory management
Software development is about trade-offs. Why aren't you complaining that it's not in some targeted processor specific machine language?
My work here is dung.
It seems fairly likely that the implementation of the regulation would require the models to be useful.
But maybe not.
Nerd rage is the funniest rage.
Oops - that's embarassing. I just realized that the Python program must be those used to produce the actual filings. So the programs' fitness to purpose must have been already established. Presumably they don't loop forever, or at least only do so after producing the filings.
I know bonafide scientists who use Excel for analysis. Scary.
I also know quite a few who use numpy, scipy, matplotlib, and python for real science. In fact, I think the astro community specifically was an early supporter of some of these packages. I myself have been using them in science for 5 or 6 years.
Actually, NASA JPL does most of their mission design work using Python (combining compiled C modules of course, just like NumPy and SciPy).
Those packages are fantastic and really 90% of what I use in python are in those packages. I have been using enthought edition python rather than active-state (many reasons), and this tips the scales a bit more toward recommending active-state to others.
FYI: Matplotlib makes 2D and 3D presentation quality plots of data (even an absurd quantity of data). Numpy and scipy provide scientific and matrix functions that pretty much cut matlab off at the knees unless you are a simulink user. Matlab is many thousands of dollars, python is free, and they are both remarkable similar, except matlab chokes on large data sets where python doesn't.
Sheldon
Probably because Python is a more widespread skill than R. Python code is also extremely easy to read and understand to most average coders, even if they have little or no experience actually coding Python.
My blog
It is only irrelevant if all the computations you are performing are done by the libraries. I tried using python for data processing tasks, and it was unbearably slow despite the use of scipy. I think it was due in large part to poor I/O and bit-twiddling performance while reading/writing data files but I'm not sure. Anyway with the amount of time I spent optimizing code, I could have just written the damn thing in C to begin with. I just don't understand python's poor performance. All it's high level language features were implemented in LISP and ML decades ago with good performance, but python just can't seem to get it right
Slash should get paid for this. These packages have always been available. Where is the news?
Now people will be getting paid to obfuscate Python code.
Python is actually more dynamic than most Lisp object systems...
Did you use Python 3.0? The IO performance there was a big drop from 2.x and was largely fixed in 3.1.
Nerd rage is the funniest rage.
I don't think you understand. These things are used as any other python code. The magic is that C is the backend that actually crunches the numbers. You don't even have to know or care that C is used. All you know is you call the foo method in the bar module.
For large sets, this will be our guide even unto death, for the LORD will work for each type of data it is applied to...
Speaking as someone who uses both R and Python all the time, I'd say that while R is very very good at a lot of what it does, it's just not as good as a general-purpose language as Python. I find myself doing as much preprocessing in Python as possible, then saving the results in DB tables and having R finish the analysis. And yes, I know about RPy, but the programming overhead of representing data structures in both languages, and making sure that they're talking to each other correctly, can be considerable; so is the runtime overhead of passing really big data sets back and forth. (Note that it's been a while since I used RPy for anything big, and a lot has been improved in that time, so it may be time for me to give it another shot.) Python code is just cleaner and easier to write for most tasks. I like both languages for their strengths, but overall, if you can do a particular analysis in Python then that's usually the easier choice.
The correlation between ignorance of statistics and using "correlation is not causation" as an argument is close to 1.
TFA and TFS fail to mention that SciPy, Numpy and Matplotlib have been added only to the Business, Enterprise, and OEM Editions of ActivePython. The Community Edition (the only one that's free) doesn't contain these libraries.
http://www.activestate.com/activepython
Or the loop was part of the design.
I leave my machine on overnight and lots of things are looping "endlessly" - and that's not a problem.
For large sets, this will be our guide even unto death, for the LORD will work for each type of data it is applied to...
Does this mean one less reason (scientific field) to use commercial Matlab and prefer free Sage/ActionPython/NumPy/SciPy/matplotlib?
Just asking, but since Sage can offer so much functionality, I wonder if now the community gets one more extra boost.
"Sum Ergo Cogito"
Because in the time that I save writing the program in Python rather than Java, I can run my program a million times. And even if I run it a million more times, computer time is nearly free, so it does not matter. And when it really does matter, in less time than it takes to write in Java, I can rewrite the single time consuming routine in C, and exceed Java's performance.
What do you mean by "list"? You mean echo to the screen? Print it out? Something else? Are you trying to say that you can't see the indentation? Really? If you can't follow 4-space indents (the recommended amount) visually then you are unlikely to be able to read the text accurately either.
It sounds like you prefer "line noise" languages that suffer from readability problems when abused in such fashion. More power to you. But you only sound confused and backwards when you can't even express your thoughts clearly. I recommend you try Python. Its rigorous requirements may help you order your thoughts as well as your programs into precise clarity.
thoromyr
you can't unambiguously list it, because on paper you can't tell the difference between a space and a tab.
You do realize that you can use spaces instead of tabs? You just have to be consistent in the way you indent. Your comment is invalid.
For many applications, especially quick one-time tasks, coding takes a lot longer than execution. If you can cut the coding time in half from 60 minutes to 30 minutes, does it really matter if the execution time goes up from 5s to 100s?
But I don't have much experience with R. How does it perform when you aren't using (and combining) the builtin methods, but are writing some number crunching routines of your own?
It's not bad. If you try to write naive C-style code, you'll run into all the overhead of an interpreted language and it will run very slowly. But if you (a) try to do everything you possibly can as matrix operations and (b) make intelligent use of the functional programming features, it compares well to NumPy/SciPy. And as with Python, it's easy enough to write your own packages using C or Fortran and then call them from the main language (which is how NumPy/SciPy works, of course.)
The correlation between ignorance of statistics and using "correlation is not causation" as an argument is close to 1.
While we're on the topic, here's something that confused me about the SEC / Python idea.
Python is Turing complete, which means some Python programs may never terminate*. Has the SEC taken this into consideration in its plans to use them?
Or is the SEC planning to impose limitations such as, "These Python programs must complete within 1 hour when run on an Intel Pentiun IV 2.8 GHz with 4 GB RAM and Windows XP SP 3"?
(* Of course, real computers have finite memories, so it's actually theoretically possible to detect looping on such a computer. But at this point we get back to specifying a particular memory size I think, which kind of goes to my question about the SEC specifying the particular hardware on which the program must run.)
Gag, I hate Python; it has to be the most annoying language to code in since COBOL. White-space blocked languages needed to die with the Hollerith card.
If someone is passing you on the right, you are an asshole for driving in the wrong lane.
Mod parent up!
I also use both, quite heavily. Rpy2 is a real improvement over RPy, but I share your preference for allocating tasks to the two languages according to their strengths, and the rpy/rpy2 data structure representations are much trickier than I like.
I find myself gravitating to R when the builtins are useful (robust estimation, smoothing kernels), where I need to split and analyze data subsets (using by() and its cousins) or where I want to plot things. Python is the ticket for interfacing with C, talking to the outside world, parsing, and expression of simple mathematical models.
Use Macports. Manual package management is something I try to avoid - 'sudo port install matplotlib py26-matplotlib' and all the dependencies and compiling are taken care of, not to mention the ability to cleanly uninstall if you wish. Macports recently upped to v1.9.1, which now tracks which ports you requested, so it's easy to prune away orphan libraries you no longer need.
And matplotlib is a gem. It's got a ridiculuous number of plot-styles so it's remarkably flexible - if you are into GIS, look at matplotlib-basemap, which adds many map projections and the ability to plot geo data on those.
I do wonder why people would pay for the 'special' (i.e. non-free, non-community) version of ActivePython. AFAIK, ActivePython neither develops the base libraries (matplotlib, SciPy and Numpy), nor python itself. What do they add other than as a bundling service?
If you're script or program is not time-sensitive, then it's definitely better to optimize development and maintenance time. However, if you're in finance and are working with numbers that translate to real cash, then execution time is critically paramount.
I think you meant to say, "Can someone at Microsoft please re-write VBasic to work with these modules?" Yeah, it can never be Microsoft's fault.
Damping absorbs vibrations. Dampening is caused by moisture.
You know, it is still possible to write bad code in Python... I use these packages on a daily basis, and while I wouldn't use them for production-level numerical weather prediction system, I do use it for data processing and such. There have been times when I wrote the code poorly and the code ran slowly, but as I learned Python and embraced the language (weaning myself away from C), I found myself writing faster programs. Which high-level language features do you think are lacking? I particularly like list comprehensions and the function argument syntax (positional and keyword-based).
but while I was programming an important app, I accidently hit the space bar just before tabbing. Since this error wasn't visible on printouts or screen views, [...]
What, the glaring "IndentationError" exception that gets thrown as soon as you import the file didn't tip you off?
The situation you describe can never happen silently. I call bull$!7. Theoretically, it is possible to construct a situation where you would get a silent inconsistency, but this isn't it. I've programmed extensively in Python since '97 and never experienced problems due to indentation. In real life, this just isn't a problem.
But you can't blame them, I mean what sort of idiot language has whitespace signify blocks of code?
Well.... all of them do. How else would you find the blocks in a program? I know that many languages also use special tokens like { and } or "begin" and "end", but programmers still use the indentation to identify blocks even when it contradicts the tokens. The canonical example is something like:
if (some_test())
....i = foo();
....bar(i);
baz();
Perfectly valid code, so no compiler errors etc., but most programmers will read it as a an if-statement with a true-block containing two statements... (I had to use dots to get indentation since pre-tags do not seem to be understood...?!)
If you think Python's syntax has anything to do with COBOL or Hollerith cards, I doubt you've ever even looked at any Python code.
I hate to essentially troll, and I hate to burst your bubbles, but these math packages aren't really doing anything all that wonderanomous. The guy I learned numerical analysis from in college used to use Excel to do a lot of his numerical techniques - and used to do a lot of them on a TI-80. Numerical analysis is all about knowing a lot to write an efficient algorithm to get the answer.
I've done Q/R decomposition in VB6 (for a real honest to god client! for money!)
I'm glad these tools are around for people to use but don't think they're all that new or revolutionary. Easy is easy, and there have been many generations of Easy for N years.
Flavor of the month.
Just use Cython.
All the niceness of Python and the speed of C where you need it.
This is not some sort of high-frequency trading or other section of finance where speed matters. This is producing SEC filings...As long as execution time is not in the month+ range, you are going to be fine.
Bottles.
Except that NumPy will use LAPACK and BLAS for it's linear algebra making it far more efficient. Try a QR decomposition on a matrix of any significant size in VB, then do the same decomposition using LAPACK and you'll see a huge difference. As for numerical analysis being about writing efficient algorithms, sure, that's true, but why would you want to rewrite those algorithms when highly optimized versions come by default?
Disclaimer: Yes, I'm sure you could get VB to use LAPACK and BLAS but python will do it by default.
You're comparing apples and cadillacs. Excel doesn't use LAPACK either, but Prof used to dig it because of it's profound recursive capabilties. I didn't write the Vb6 code from scratch - it was originally written in Fortran using LAPACK there, and for some reason the guy wanted it in VB6 (I guess so he could enter the parameters from a database using a form.) It was a simulation of some complexity and ran well enough for his purposes. The same calculations were being done, on ordinary PCs.
It's a distinction without a difference. No cool points for using lapack just to use it.
here
Hmm.
ActiveState has left a bad taste in my mouth in the past. My quick research just now may have dug up some reasons to re-evaluate them.
Computer memory is just fancy paper, CPUs just fancy pens with fancy erasers; the 'net is just a fancy backyard fence.
You should check out sage (http://www.sagemath.org), its a python-based computational platform that includes R. Besides including rpy with a slightly improved interface, you can also run a sage worksheet in native R mode. Not many sage users are using this interface, so it would be great to have more feedback on it.
I do use Sage for some work, but I didn't know about the RPy interface or the ability to run worksheets in R. That sounds well worth checking out. Can you point me to the documentation?
The correlation between ignorance of statistics and using "correlation is not causation" as an argument is close to 1.
You don't even have to know or care that C is used.
Well, unless you're not using CPython.
have you read the Moderation Guidelines Addendum?
I accidently hit the space bar just before tabbing. Since this error wasn't visible on printouts or screen views,
Get a proper editor?
have you read the Moderation Guidelines Addendum?