Slashdot Mirror


Why Jupyter is Data Scientists' Computational Notebook of Choice (nature.com)

Jeffrey M. Perkel, writing for Nature: Perched atop the Cerro Pachon ridge in the Chilean Andes is a building site that will eventually become the Large Synoptic Survey Telescope (LSST). When it comes online in 2022, the telescope will generate terabytes of data each night as it surveys the southern skies automatically. And to crunch those data, astronomers will use a familiar and increasingly popular tool: the Jupyter notebook. Jupyter is a free, open-source, interactive web tool known as a computational notebook, which researchers can use to combine software code, computational output, explanatory text and multimedia resources in a single document. Computational notebooks have been around for decades, but Jupyter in particular has exploded in popularity over the past couple of years. This rapid uptake has been aided by an enthusiastic community of user-developers and a redesigned architecture that allows the notebook to speak dozens of programming languages -- a fact reflected in its name, which was inspired, according to co-founder Fernando Perez, by the programming languages Julia (Ju), Python (Py) and R.

[...] For data scientists, Jupyter has emerged as a de facto standard, says Lorena Barba, a mechanical and aeronautical engineer at George Washington University in Washington DC. Mario Juric, an astronomer at the University of Washington in Seattle who coordinates the LSST's data-management team, says: "I've never seen any migration this fast. It's just amazing." Computational notebooks are essentially laboratory notebooks for scientific computing. Instead of pasting, say, DNA gels alongside lab protocols, researchers embed code, data and text to document their computational methods. The result, says Jupyter co-creator Brian Granger at California Polytechnic State University in San Luis Obispo, is a "computational narrative" -- a document that allows researchers to supplement their code and data with analysis, hypotheses and conjecture. For data scientists, that format can drive exploration.

7 of 58 comments (clear)

  1. Wasn't this recently discussed? by Anonymous Coward · · Score: 5, Interesting
  2. Huge Notebook fan. by 0100010001010011 · · Score: 4, Interesting

    Anecdotal, but I do 90% of my python 'development' in Jupyter Notebooks.

    For work I can make a nice notebook and have it generate a PDF for archiving. It'll output to LaTeX, html, .py and a number of other formats.

    Now you can include multiple languages in the same notebook including R and Matlab, both popular in their own niches of use.

    1. Re:Huge Notebook fan. by Lab+Rat+Jason · · Score: 5, Interesting

      For the past year, I've begun using Jupyter and although I like it, there are some features that really bother me, and worry me when it comes time to create reproducible science. 1) Jupyter doesn't integrate automatically with any kind of source control software, and in the circles I run in, it is largely ignored. Data scientists act like they've never heard of source control, and what makes it worse, my local university is pumping out student after student where they introduce them to data science with Jupyter, but never bring up the topic of coding standards and recoverability. 2) Jupyter allows you to execute cells out of order. While this definitely helps speed up development (when you make a mistake, and just want to fix the relevant line and continue, rather than re-loading your entire data set), it presents a unique risk when someone thinks they've discovered something amazing, only to be unable to reproduce it after a restart, or when sharing the notebook with someone else. This can happen when race conditions exist, or when code makes changes to the database, and your out of order execution causes spooky behavior. 3) Jupyter doesn't encourage enterprise deployment. Too often I see experimental data science done well, but due to the nature of rapid development, nothing is modular, nothing is object oriented, and so if the solution was a one off answer, everything is great, but if the solution is to be made into proper enterprise ready code, the entire notebook must be transcribed into truly disciplined code. (as an aside, this process is massively difficult because data scientists often don't understand the principles of object oriented programming, and the programmer doesn't understand the principles specific to the data science objective the code was written to solve.)

      I expect to use Jupyter a lot more frequently in the coming years, but I fear it will feel like a huge step back in terms of the things that computer scientists have solved, that data scientists are ignoring.

      --
      Which has more power: the hammer, or the anvil?
  3. Because Jupyter aligned with Mars? by jfdavis668 · · Score: 3, Funny

    Then we will analyze what guides the planets and understand what steers the stars.

  4. A rare sort of development in the software world.. by Junta · · Score: 3, Interesting

    Jupyter is something that is relatively unique, useful in its field, and *not* crammed down the throats of people for whom it isn't really relevant.

    I applaud the way that project is executed, adopted, and evangelized as being on point and solidly executed...

    --
    XML is like violence. If it doesn't solve the problem, use more.
  5. Literate programming by jma05 · · Score: 3, Interesting

    For decades, we talked about Knuth's literate programming. Jupyter is finally an open source tool that made it usable for everyone.
    There is no better way to explain the use of a library than making a Jupyter notebook available.

    Most of my Python use lately is for one-off analytics with heavy libraries. Jupyter suits this workflow very well.
    IPython already has decent hooks for IDEs (PyCharm, Spyder), but I hope this gets even better.

  6. It's Matlab in python clothing by goombah99 · · Score: 3, Interesting

    I love jupyternotebooks. But it's matlab. Well a broken inferior matlab. I do like python syntax better than matlab but that's just a sugar.

    The upcoming Jupiterlab is a slavishly copy of the matlab ide.

    It reminds me of how Linux desktop managers were always copying the last generation of windows.

    I'm not complaining! I use mint and it owes a lot to windows too.

    Mint however is actually superior to windows now.

    But look at something like staroffice libre office. Ow... the pain. It's like a bad ms office 5, except you can only use it if you have thumbtacks in your shoes. They copied everything that was bad just so it was the same.

    Jupiter is really nice and I use it in preference to matlab because it's so portable and I can use other python packages. But unless you used matlab you may not realize it's just a fast follower of ideas already tested out by matlab

    --
    Some drink at the fountain of knowledge. Others just gargle.