Domain: ipython.org
Stories and comments across the archive that link to ipython.org.
Comments · 17
-
IPython Notebook
I don't have a clear picture of what your organization will be doing, but your comment about "managing that data (=measurements + reports)" made me wonder if you will want to use the IPython notebook.
http://ipython.org/notebook.html
When people work to analyze measurements (make plots, etc. and make decisions) and then write new code, if they do so step by step in an IPython Notebook, and then other scientists can peer-review the notebooks, this might be even more useful to you than version control. It would give you a history of how the analysis was done and why the reports were made the way they were.
In my job, I do some analysis and work in databases, and I seriously want to start using IPython Notebook as my SQL client, and save my notebooks for later review. It would document the queries I ran and the results I got, so later I could find the queries again to re-run them, and see how they worked out before re-running them.
-
What it is, and what it isn't
Yes, the WSJ article is hyped.
On the other hand, this comment by the poster is not accurate, either: "At best, Google is programming (not teaching) a computer to mimic the conversation of humans under highly constrained circumstances. And the methods used have nothing to do with true cognition. "
Google didn't program the AI. Rather, they took one meta-level step back and used a very simple training algorithm that did the "programming" for them, using training data (the program is encoded as an LSTM neural net that processes word-vector encodings of tokens) . Based on direct tests, it looks as though the model learned (or, use scare-quotes, if you must -- "learned") things like:
- The "rules" of discourse;
- How to leverage context;
- How to do some amount commonsense reasoning.
All these things are extremely hard to program into a computer using rules-based methods; but, as the authors show, a purely data-driven approach, instead, works fairly well.
And just to be clear, what they applied is not datamining; it is machine learning. Basically, machine learning is where you feed in a bunch of training data, and from that, an algrorithm builds a program -- see, for instance, this lecture by John Platt (former Microsoft machine learning scientist, now at Google) on the difference between AI, machine learning, and datamining:
Using machine learning, it is possible to get a computer to "learn" a subset of the Python programming language, for example, such that you can feed into the model a little program + input, and it will produce for you the corresponding output. See:
What the authors of the conversation-generation paper wondered was whether they could get the computer to "learn" a whole dialog system (or "chatbot") from just conversation logs; and based on experiments, it looks like they succeeded (it's better than Cleverbot on the conversations they tested with) . They note in the paper:
We find it encouraging that the model can remember facts, understand contexts, perform common sense reasoning without the complexity in traditional pipelines. What surprises us is that the model does so without any explicit knowledge representation component except for the parameters in the word vectors. Perhaps most practically significant is the fact that the model can generalize to new questions. In other words, it does not simply look up for an answer by matching the question with the existing database. In fact, most of the questions presented above, except for the first conversation, do not appear in the training set.
This is not simply doing phrase-substitution, or some simple statistical tricks; it is more complicated than that... but, yes, it's not "true AI". In addition to that article on "Learning to Execute", see this blog posting by Yoav Goldberg, and skip down to where it says "So why am I impressed with RNNs after all?":
The unreasonable effectiveness of Character-level Language Models
-
Thinkpad T440s
If she's spending her own money, it's hard to beat the value of a Thinkpad T440s. It's an "Ultrabook" so it's similar form factor to a MacBook Pro. Great screen, good battery life, good processor, and Linux works out of the box.
She will need to get a mini-DisplayPort to HDMI adapter, for giving presentations where there is an HDMI connection to use. The T440s has both mini-DisplayPort and VGA connectors built-in.
I have one running Linux Mint 17.1 64-bit MATE. I got the top-of-the-line one with the 1920x1080 display, which I recommend. I got mine from B&H Photo in New York; it was significantly cheaper than other web sites I checked.
I have mine set up on a docking station, which came with its own power supply. So its power supply stays in my laptop travel bag, ready to go. Just undock and you are good to go. This is one way in which this is actually better than a Mac.
The Mac will cost $700 extra, and come with a higher-resolution display, a quad-core processor, and more RAM. That may be a better deal for her if she plans to do a whole lot of work directly on the laptop, rather than using the laptop to access remote computers.
P.S. I recommend that she take a look at the IPython Notebook, if she hasn't already. Running SciPy under IPython will be great for her.
http://nbviewer.ipython.org/gist/rpmuller/5920182
My favorite: XKCD-style plots in SciPy
http://nbviewer.ipython.org/url/jakevdp.github.com/downloads/notebooks/XKCD_plots.ipynb
-
Thinkpad T440s
If she's spending her own money, it's hard to beat the value of a Thinkpad T440s. It's an "Ultrabook" so it's similar form factor to a MacBook Pro. Great screen, good battery life, good processor, and Linux works out of the box.
She will need to get a mini-DisplayPort to HDMI adapter, for giving presentations where there is an HDMI connection to use. The T440s has both mini-DisplayPort and VGA connectors built-in.
I have one running Linux Mint 17.1 64-bit MATE. I got the top-of-the-line one with the 1920x1080 display, which I recommend. I got mine from B&H Photo in New York; it was significantly cheaper than other web sites I checked.
I have mine set up on a docking station, which came with its own power supply. So its power supply stays in my laptop travel bag, ready to go. Just undock and you are good to go. This is one way in which this is actually better than a Mac.
The Mac will cost $700 extra, and come with a higher-resolution display, a quad-core processor, and more RAM. That may be a better deal for her if she plans to do a whole lot of work directly on the laptop, rather than using the laptop to access remote computers.
P.S. I recommend that she take a look at the IPython Notebook, if she hasn't already. Running SciPy under IPython will be great for her.
http://nbviewer.ipython.org/gist/rpmuller/5920182
My favorite: XKCD-style plots in SciPy
http://nbviewer.ipython.org/url/jakevdp.github.com/downloads/notebooks/XKCD_plots.ipynb
-
Or, go Python Pandas
You get the awesomeness of dataframes in an awesome language with an awesome community (that's 3 awesomes!). Use the Python Notebook tutorials and go do awesome things . . .
-
Re:R?
Mod parent up please. The Scientific Python stack (numpy, scipy, pandas, etc.) with it's iPython Notebook interface (in the style of Mathematica) is rapidly taking the world by storm, both in the sciences as well as Big Data Analytics and "Data Science".
If you like software toys, or ever use a calculator, go get yourself the free Anaconda scientific python distribution (Win/Mac/Linux) from Continuum and try out the iPython Notebook. Seriously this is an out-of-the-box computing tool that is AMAZING and can do practically anything. Anaconda is built on the Conda package manager which makes installing any and all bits and pieces you need for any of the popular Python packages completely effortless.
The existence of these tools also makes Python absolutely the best "programming" language to learn, even if you only use it for scripting/invoking all the existing libraries that exist. Also Python is available as a scripting language built into many software packages (Blender etc.) which makes it a tool/skill that just keeps on giving.
R is fine, and currently very popular, but it's also a one-trick-pony when compared to the thundering herd of functionality available on top of Python. You can even invoke R from within an iPython Notebook and pass DataFrames back and forth between R and pandas for example.
I used to love calculators (back when calculator was spelled H-P rather than T-I) but apart from the standardized testing requirement, and the fun of hacking on hand-held devices, it's just silly to use one any more.
G.
References:
https://store.continuum.io/csh... (free, open source)
http://nbviewer.ipython.org/ (great way to share Notebooks)
http://computableapp.com/ (the SciPy stack for iPads
http://omz-software.com/python... (Great iOS Python environment)
http://numfocus.org/projects/i... (Foundation supporting the core SciPy stack components)
http://pythontutor.com/ (This is just too cool) -
Re:R...
R is definitely still ahead for data modeling, but Python has some advantages too. With a bigger set of modules (libraries) to choose from and high popularity in the financial sector, there are big improvements all the time. For the purposes of this discussion, the most important Python modules are:
IPython: powerful interactive shell
numpy and scipy: numerical, matrix, and scientific functions (matlab-ish)
pandas: R-like data structures and data analysis tools (analysis mostly limited to regression)
statsmodels: statistical analysis, complements pandas
sk-learn: machine learningSo can Python do everything that R can? No. Or, at least, not as easily. But it is improving in that direction quite quickly, and if Python's data analysis capability meets your needs, then you can likely do everything in one language instead of calling R routines from another.
-
Reminds me of iPython notebook
This reminds me of iPython notebook. It allows to run/re-run python commands and display either text or graphics. You can also insert "formated comments", save a session, and share it. It's now reaching a good maturity, and is becoming a kind of "python" killer apps for scientists.
As a side note, in addition to Python, it accepts shell commands, when preceding them with a !, to it could even replace a normal shell.
-
Re:So what's the alternative?
The problem here (Piketty, as well as Reinhart and Rogoff) isn't simple, data-intensive apps (that would be a business app developer's problem, perhaps you are one). It's demonstrating an innovative, scientific analysis in an easy to review format. These economist papers aren't that data intensive... they usually have much less data than a typical business app.
https://gist.github.com/vincen...
(169K as uncompressed text)Its the analysis that is the value here. The rather short, and a computationally non-intensive analysis in Reinhart and Rogoff paper triggered financial effects to the tune of probably trillions of dollars across Europe, some would argue prematurely.
The solution for this problem is a statistical package with a notebook presentation. The ideal case would probably be R with knitr. It allows one to combine snippets of code, with data, output and documentation to discuss the analysis & results in easy to understand chunks.
IPython notebook is also an excellent alternative.
Here is a demonstration of how Reinhart-Rogoff paper should have submitted the data.
http://nbviewer.ipython.org/gi...
I am sure, someone will do a Piketty one soon as well. -
spoiler: r.e. presidents vs losers regexpspoiler alert: if you were to read TFA you'd find a link to the actual blog 'norvig.com' that is pretty interesting. In short, they handle the "ambiguity" of people that are both Winners+Losers ignoring any Winner's losses:
From Norvig's blog:To avoid a contradiction and achieve Randall's intent, eliminate all winners from the set of losers:
In [293]: losers = losers - winnersThe code on Norvig's blog is pretty interesting.
This one was worth my coffee break time today.I might be missing something here, but the list of winners and the list of losers in US presidential elections both contain Richard Nixon. How can a regexp match ALL the winners and NONE of the losers in that case?
-
IPython uses markdown
For those doing scientific programming, the IPython notebook is a joyful place for interactive exploration and can be appropriate for document creation. Notebook cells can have code, images, or text, and text can mix Markdown and LaTeX (rendered in the cell via MathJax). Notebooks can be converted to HTML or PDF (via LaTeX), using the nbconvert utility (which depends on pandoc). For serious document production, this is not even remotely a replacement for LaTeX, but it can be a great place for interactive work.
-
Re:FORTRAN
Seriously consider FORTRAN
On the other hand, the basic underpinnings of SciPy are FORTRAN. The BLAS and LAPACK libraries, and other fast and well-understood FORTRAN libraries, are "wrapped" by SciPy.
http://www.scipy.org/scipylib/faq.html#id12
Using the IPython notebook, you can work with data sets in an interactive way that FORTRAN won't do. But the number crunching is being done for you at FORTRAN speed because it is compiled FORTRAN code that is doing the work.
-
Python, numpy, Pyvot
Since you mention VBA, I suspect that your data is in Excel spreadsheets? If you want to try to speed this up with minimum effort, then consider using Python with Pyvot to access the data, and then numpy/scipy/pandas to do whatever processing you need. This should give you a significant perf boost without the need to significantly rearchitecture everything or change your workflow much.
In addition, using Python this way gives you the ability to use IPython to work with your data in interactive mode - it's kinda like a scientific Python REPL, with graphing etc.
If you want an IDE that can connect all these together, try Python Tools for Visual Studio. This will give you a good general IDE experience (editing with code completion, debugging, profiling etc), and also comes with an integrated IPython console. This way you can write your code in the full-fledged code editor, and then quickly send select pieces of it to the REPL for evaluation, to test it as you write it.
(Full disclosure: I am a developer on the PTVS team)
-
IPython Notebook + Python Data Analysis Library
Install these 2 and you'll be good to go
http://ipython.org/notebook.html
http://pandas.pydata.org/ -
Pandas + IPython Notebook
It's not exactly a spreadsheet, but Pandas is totally awesome and is useful for many tasks for which you might think of using a spreadsheet.
pandas is a Python package providing fast, flexible, and expressive data structures designed to make working with “relational” or “labeled” data both easy and intuitive. It aims to be the fundamental high-level building block for doing practical, real world data analysis in Python. Additionally, it has the broader goal of becoming the most powerful and flexible open source data analysis / manipulation tool available in any language. It is already well on its way toward this goal.
http://pandas.pydata.org/index.html
IPython Notebook is sort of like a combination of the normal ipython shell and an IDE. You interact via your browser but it connects to a normal python process on your local (or remote?) system.
http://ipython.org/ipython-doc/dev/interactive/htmlnotebook.html
I've used these tools together for many tasks for which I might otherwise have used a spreadsheet, particularly for "pivot tables" and time series analysis. Again, even combined they do not a spreadsheet make, but they are in many ways superior. They can handle very large data sets, and best of all you are doing it all in Python.
-
Re:R; apt-get install r-base
Matlab is exceptional, but of course has some cost.
.Everything you can do in Matlab, you can do better in python. I work in a mathematics department at this moment, so I do know the subject somewhat. If you need some package of a person which is only present in matlab, then porting to python is not that hard, as the syntax can be easily translated to numpy/scipy. For high school students: ipython notebook: http://ipython.org/ipython-doc/dev/interactive/htmlnotebook.html
For pure mathematics, preference goes to Sage in many circles, However, having been on a conference and seeing the new features in ipython notebook, and knowing they just received a 1.15 million grant by the Sloan foundation, it has a bright future.
Nevertheless, the relativity department here uses maple for their theoretical work. Most engineers also, but those always try to solve problems first by throwing money toward it
:-) From a 'get your work done fast' and from an engineering point of view, maple/mathematica/matlab are great off course. From a 'control your own work', 'know what you are doing', and 'build for the future', chosing a python solution (ipython, matplotlib, sage, ...) is a good bet. -
Re:Why do people ask questions like these?
If you're looking to learn something new and general purpose, Python has a combination of decent docs (you can start with http://www.python.org/doc/ , http://pleac.sourceforge.net/pleac_python/ , and http://www.lightbird.net/py-by-example/ ), good libraries (see http://pypi.python.org/pypi and https://github.com/languages/Python/most_watched ) and all-around flexibility (all the regular system stuff, lots of microframeworks for web, scientific computing tools, 2d+3d graphics).
You may want to take a look at IPython ( http://ipython.org/ ), Reinteract ( http://fishsoup.net/software/reinteract/ ), and DreamPie ( http://dreampie.sourceforge.net/ ) for some interactive shells/interpreters to play around with. I use vim for programming, but there are a number of IDEs. Of the ones I've tried, I thought IEP offered the most interesting tools: http://code.google.com/p/iep/
Probably the fastest/easiest way to learn (and learn if you like) Python is to go through Zed Shaw's book/exercises: http://learnpythonthehardway.org/
There's a lot of other stuff on the Python wiki: http://wiki.python.org/moin/BeginnersGuide/ProgrammersSlashdot definitely isn't what it used to be. For programming questions you may want to look at Stack Overflow or Quora. For general nerdly news, I find Hacker News, Techmeme, and The Verge tends to cover my bases better these days.