In [*]: F(mind)

Doctoral Researcher / Teacher Assistant @ uni.lu

Freelance / Data Scientist Mentor @ OpenClassrooms

I work on Android Security, Big Data and Machine Learning

Bridging the gap between Python and Jupyter

Jupyter is an amazing interactive environment that every developer should know about !

It should not be a surprise that Jupyter received an ACM Software System Award in 2018.

Or that Nature explained Why Jupyter is data scientists’ computational notebook of choice.

However, some users found that Jupyter does not encourage best coding practices:

How can we write good code with Jupyter and integrate it with the rest of Python ecosystem ?

In this article, I present a simple solution to generate and reload Python code from Jupyter notebooks.

I also explain the pros and cons of using this technique compared to other development environments.

Requirements

  • nbconvert: an utility that converts Jupyter notebook (.ipynb) to other formats (.py, .pdf, .html ...)
  • jupyterlab: an editing environment for Jupyter notebook with additional features (tabs, settings, ...)

You can install these requirements with the following command:

$ pip3 install --user nbconvert jupyterlab

I also provide a layer that integrates this configuration to an existing Python project.

$ cookiecutter https://git.fmind.me/fmind/cookiecutter-python-lab

Find more about organizing a Python projects with a layered approach in my previous article.

Configuration

The goal of this configuration is to generate a Python file from Jupyter whenever a notebook file is saved.

For that purpose, we create a post save hook that calls nbconvert on the notebook (if its extension is .py.ipynb).

The hook automatically replaces the extension .py.ipynb with .py and makes the Python file executable with chmod.

# In: jupyterlab.py

import os
import subprocess

from traitlets.config import get_config

c = get_config()


def post_save_hook(model, os_path, contents_manager):
    cwd, name = os.path.split(os_path)

    if model["type"] == "notebook" and ".py.ipynb" in os_path:
        output = name.replace(".ipynb", "").lower()

        subprocess.check_call(
            ["jupyter", "nbconvert", "--to", "python", "--output", output, name],
            cwd=cwd,
        )
        subprocess.check_call(["chmod", "u+x", output], cwd=cwd)


c.FileContentsManager.post_save_hook = post_save_hook

To load this configuration with Jupyterlab, use the following command:

$ jupyter lab --config=jupyterlab.py

Autoreload

Now that we can generate Python files from notebooks, we can import code from other notebooks.

However, code imported this way is not automatically reloaded when their content change.

To illustrate the problem, here is a small demonstration using IPython on a simple Python module:

With autoreload=0

To automatically reload Python module, you must create a configuration file that enables one of IPython extension:

# In: ~/.ipython/profile_default/ipython_config.py

c.InteractiveShellApp.extensions = ["autoreload"]
c.InteractiveShellApp.exec_lines = ["%autoreload 2"]

We can observe that the module is now reloaded as expected in IPython (and by extension in Jupyterlab):

With autoreload=2

Pros/Cons

Pros:

  1. You benefit from the main features of Jupyter environment: interactivity and simplicity.
  2. Python file can be tested, linted, formatted and packaged with regular Python tools.
  3. This style encourages literate programming to explain code with rich formatting.

Cons:

  1. Jupyterlab (still) lacks some modern features compared to other development environment.
  2. You cannot edit Python file directly, as they will be overwritten when you save a notebook.
  3. The use of notebook is not common and may reduce the engagement of your project.

Conclusion

If you love Jupyter development workflow and wish it would be more pythonic, then I hope this approach will help.

Interestingly, the Jupytext project chose an alternative approach where notebooks are edited as plain text file.

You can find an implemention of this technique in gampy, an exploratory project about pipeline composition.