When is it useful to solve an optimization problem inside a Jupyter Notebook? When is it not?
What are the advantages/disadvantages?
When is it useful to solve an optimization problem inside a Jupyter Notebook? When is it not?
What are the advantages/disadvantages?
TLDR: Jupyter Notebooks are a nice tool for proof of concepts (early stages of a project) + playing around + for teaching purposes (tutorials). They should not be used for industrial production. Another way of saying this is that they are good for calling code (I like the term "interactive whitepaper" used by @Reinderien), not for coding.
Jupyter Notebooks allow having your inputs (data), your model, your optimization algorithm and your outputs in one same ecosystem. This makes the whole modeling + solving process relatively smooth (if all these elements are not too big). It is nice to be able to see your raw data before optimization (e.g. a network), followed by the optimized solution (e.g. optimized vehicle routes over the network), without having to change tools/environments. With all the existing libraries, it is easy to plot the structured data with your favorite (or the most relevant) chart, with very little code. This is useful for comparing what if scenarios, changing assumptions, playing around with your model's parameters and visualizing the impact. Jupyter Notebooks are thus a good tool to play around with, to tell a story or for a proof of concept. Last but not least, notebooks are a nice tool for tutorials.
Jupyter Notebooks are not suitable for production. Maintenance, testing etc. should be done in some other environment. If the code is too dense, the notebook becomes impractical, hard to read, and perhaps slow to run. Although programming in an object oriented fashion can be done in a notebook, it is not the environment to do so. The notebook is more of an interactive place where you call some other code from somewhere else (e.g. a library, or the core of your code).
The only use case where I find notebooks to be suitable is as an "interactive whitepaper" where explanatory formatted prose can be nicely interspersed with graphical content, equations, etc., and if you're lucky, easily rendered to HTML. If you intend on demonstrating a concept in academia, or as part of presenting a commercial design investigation, etc., then notebooks can be suitable.
In all other cases - even (dare I say especially) for prototyping - stay far, far away. Typical - though not strictly necessary - in notebooks is a vast swamp of global state with no functions, no modules, and no scope. Surprise side-effects, symbol conflicts, scope pollution and poor memory management abound. Languages like Python offer functions and classes for a reason. Flattening out a script makes design and debugging a nightmare, even at the very beginning of a project. If I had to paint with a stereotypical brush the worst academic code I've ever seen, it's in a notebook-style single script with no scope, no functions, no tests, and a list of global variables as long as my arm.
matplotlib calls. As for raw speed of development, speed needs to be considered in aggregate - what's faster - writing a bunch of bad code really fast and then needing to re-write it all, or writing maintainable code off the hop?
– Reinderien
Jan 04 '24 at 14:10
I want to expand on the point by @kuifje, based on your comments on their answer. I know your library (OptaPlanner, now Timefold) from back when I was working with jsprit a lot.
Notebooks might be basically forced on individuals in environments like Databricks. In that case, I don't have a choice but to work with some kind of notebook, which is essentially jupyter. However, I hate them. As to when I would start a project in python "proper" (making it an installable package) -> immediately if possible.
They work for POCs to some extent but there is an underlying ipython kernel that stores previous values in RAM and it can lead to really confusing bugs if things get re-named or transformed between cells. The continuity of the code is terrible to follow during development, and just laborious.
When I wanted to draw on jsprit, I wrapped the whole thing in Spring Boot and made my own RESTful API with it self-hosted.
Such things are not necessarily possible in corporate setups that use things like Databricks, but I'd take the API any day if I could and just run the systems myself.