8

When is it useful to solve an optimization problem inside a Jupyter Notebook? When is it not?

What are the advantages/disadvantages?

Geoffrey De Smet
  • 4,851
  • 10
  • 34

3 Answers3

12

TLDR: Jupyter Notebooks are a nice tool for proof of concepts (early stages of a project) + playing around + for teaching purposes (tutorials). They should not be used for industrial production. Another way of saying this is that they are good for calling code (I like the term "interactive whitepaper" used by @Reinderien), not for coding.

Jupyter Notebooks allow having your inputs (data), your model, your optimization algorithm and your outputs in one same ecosystem. This makes the whole modeling + solving process relatively smooth (if all these elements are not too big). It is nice to be able to see your raw data before optimization (e.g. a network), followed by the optimized solution (e.g. optimized vehicle routes over the network), without having to change tools/environments. With all the existing libraries, it is easy to plot the structured data with your favorite (or the most relevant) chart, with very little code. This is useful for comparing what if scenarios, changing assumptions, playing around with your model's parameters and visualizing the impact. Jupyter Notebooks are thus a good tool to play around with, to tell a story or for a proof of concept. Last but not least, notebooks are a nice tool for tutorials.

Jupyter Notebooks are not suitable for production. Maintenance, testing etc. should be done in some other environment. If the code is too dense, the notebook becomes impractical, hard to read, and perhaps slow to run. Although programming in an object oriented fashion can be done in a notebook, it is not the environment to do so. The notebook is more of an interactive place where you call some other code from somewhere else (e.g. a library, or the core of your code).

Kuifje
  • 13,324
  • 1
  • 23
  • 56
  • Thanks, these are good insights. For a POC, do you always use a Jupyter Notebook? Or occasionally? When (and how often) does it make sense to start a project immediately in a non-notebook environment? – Geoffrey De Smet Jan 03 '24 at 12:15
  • 1
    I like to use notebooks if some kind of visualization is involved, so yes almost everytime. I also like the fact that in the end you have a standalone project which you can save on github or somewhere else. – Kuifje Jan 03 '24 at 12:28
  • 1
    Your second question is hard to answer. It might depend if you are working alone or for a company, the time frame, the budget, the milestones etc. – Kuifje Jan 03 '24 at 12:29
  • Thanks, so the big win is the fast-to-visualize gain. – Geoffrey De Smet Jan 03 '24 at 15:34
  • 1
    Yes, visualization is a good tool for high level consistency checks, which are important for POCs. – Kuifje Jan 03 '24 at 16:14
  • 1
    Notebooks can also be useful in a research environment, when you want to be able to "play around" but also want to keep written records of everything you've played with, including keeping written records of things that did not work. – Stef Jan 03 '24 at 23:36
  • 1
    @Stef I agree ! The fact that you can write in markdown (and thus also LaTeX) is great. – Kuifje Jan 04 '24 at 07:58
  • 2
    Yes, and the combination of markdown, latex and python figures makes it suited for online teaching too: What tool to use for the online analogue of "writing lecture notes on a blackboard"? – Stef Jan 04 '24 at 08:03
7

The only use case where I find notebooks to be suitable is as an "interactive whitepaper" where explanatory formatted prose can be nicely interspersed with graphical content, equations, etc., and if you're lucky, easily rendered to HTML. If you intend on demonstrating a concept in academia, or as part of presenting a commercial design investigation, etc., then notebooks can be suitable.

In all other cases - even (dare I say especially) for prototyping - stay far, far away. Typical - though not strictly necessary - in notebooks is a vast swamp of global state with no functions, no modules, and no scope. Surprise side-effects, symbol conflicts, scope pollution and poor memory management abound. Languages like Python offer functions and classes for a reason. Flattening out a script makes design and debugging a nightmare, even at the very beginning of a project. If I had to paint with a stereotypical brush the worst academic code I've ever seen, it's in a notebook-style single script with no scope, no functions, no tests, and a list of global variables as long as my arm.

Reinderien
  • 450
  • 1
  • 7
  • Is the low quality code an effect of the tool (the notebook) or the way people use it? Does it often contain dirty code because it allows dirty code or because it stimulates dirty code? – Geoffrey De Smet Jan 04 '24 at 10:21
  • 1
    I can see the benefits of POCing in Jupyter Notebooks, like writing the rendering code in directly in Python, raw speed of development, etc. – Geoffrey De Smet Jan 04 '24 at 10:24
  • 1
    I can also see the downsides of POCing in Jupyter Notebooks, like lack of unit testing and lack of versioning. It works on my machine, today. What about your machine? What in a year from now? But do POCs care about that? – Geoffrey De Smet Jan 04 '24 at 10:27
  • @GeoffreyDeSmet writing the rendering code directly in Python doesn't need a notebook. Just write a function that makes the appropriate matplotlib calls. As for raw speed of development, speed needs to be considered in aggregate - what's faster - writing a bunch of bad code really fast and then needing to re-write it all, or writing maintainable code off the hop? – Reinderien Jan 04 '24 at 14:10
  • 1
    As for Is the low quality code an effect of the tool (the notebook) or the way people use it?, it's somewhat chicken-or-egg, but certainly both are true: notebooks encourage bad code; bad code is part of the "usage culture" of Jupyter. – Reinderien Jan 04 '24 at 14:12
  • I just discovered Marimo ( https://github.com/marimo-team/marimo ) which claims to solve some the issues you mentioned. I haven't played it with myself yet. – Geoffrey De Smet Jan 14 '24 at 19:06
4

I want to expand on the point by @kuifje, based on your comments on their answer. I know your library (OptaPlanner, now Timefold) from back when I was working with jsprit a lot.

Notebooks might be basically forced on individuals in environments like Databricks. In that case, I don't have a choice but to work with some kind of notebook, which is essentially jupyter. However, I hate them. As to when I would start a project in python "proper" (making it an installable package) -> immediately if possible.

They work for POCs to some extent but there is an underlying ipython kernel that stores previous values in RAM and it can lead to really confusing bugs if things get re-named or transformed between cells. The continuity of the code is terrible to follow during development, and just laborious.

When I wanted to draw on jsprit, I wrapped the whole thing in Spring Boot and made my own RESTful API with it self-hosted.

Such things are not necessarily possible in corporate setups that use things like Databricks, but I'd take the API any day if I could and just run the systems myself.

Geoffrey De Smet
  • 4,851
  • 10
  • 34
roganjosh
  • 151
  • 3
  • 1
    I do not know whether that is what you are talking about, but if you encounter a bug due to cells having been executed in the wrong order, then Jupyter Notebook has a "restart kernel" button which should solve the issue. – Stef Jan 03 '24 at 23:34
  • 1
    I understand that I can restart the kernel but when you're trying to write something for production it's not possible to get reliable behaviour that is testable or integrated so it really only goes as far as EDA – roganjosh Jan 04 '24 at 07:31
  • 1
    OptaPlanner is now Timefold, so I adjust that into the answer. I hope that's fine. – Geoffrey De Smet Jan 04 '24 at 10:15
  • I agree the "restart kernel" button (not just "rerun the whole notebook) has been useful often. – Geoffrey De Smet Jan 04 '24 at 10:19