Member-only story

Gold Standards for Jupyter Notebooks

Best practices for clean, concise notebooks

Rebecca Vickery
5 min readSep 22, 2024
Photo by Alexander Grey on Unsplash

Jupyter notebooks are a popular tool for data scientists. The ability to display results and visualisations in line with the code you are writing and add rich annotation to your work makes notebooks ideal for the exploratory and research phases of data science projects.

However, the informal nature of Jupyter Notebooks can lead to bad practices. This is fine if you are one data scientist working in isolation (or maybe not if you hope you revisit your code in 6 months!), but if you are working in a team of data scientists and hope to collaborate on projects or go on holiday sometime then it is a good idea to develop a set of best practices.

Best practices for anything should not add unnecessary work to a team but they should introduce standards that easily allow team members to collaborate on projects. In my opinion, having a clear set of guidelines to follow for work is vital however, I also believe that these should be kept to the minimal set of rules you can get away with. A team is much more likely to follow a short, easy-to-remember, set of standards rather than a lengthy book of rules.

A team is much more likely to follow a short, easy-to-remember, set of standards rather than a…

Rebecca Vickery
Rebecca Vickery

Written by Rebecca Vickery

Data Scientist | Writer | Speaker

Responses (5)

Write a response

Notebooks were designed for sharing as well. By removing and reimporting all of the important code and queries, you're making it harder to share and you're introducing more points where your process can fail. Why not just create a helper functions…

Thank you for good advice.
The only odd thing I found was your calling "magic strings" what I'd refer to as "hardcoded strings".
The word "magic" already has a usual meaning in the context of notebooks: it's used to describe those metacommands that are prefixed with a % sign, like for instance
%matplotlib inline

In the following article, I’ll cover the minimal set of standards I’ve developed over many years of writing code for data science in notebooks

I truly believe best practices with jupyter notebooks is using them for basic EDA or demonstrative purposes. Other that that they become a mess and a source for problems.