That's one interesting project. As someone who relies heavily on collaboration with people using Jupyter Notebook. The most annoying points about reproducing their work are the environment and the hidden state of Jupyter Notebooks.
This does to address directly the second problem. It does however by sacrificing flexibility. I might need to change a cell just to test a new thing (without affecting the other cells) but thats a trade off if you focus on reproducibility.
I know that requirements.txt is the standard solution to the other problem. But generating and using it is annoying. The command pio freeze will list all the packages in bloated way (there is better ways) but I always hoped to find a notebook system that will integrate this information natively and have a way to embed that into a notebook in a form that I can share with other people. Unfortunately I can't see support for something in any of the available solutions (at least up to my knowledge).
People always complain about pip and python packaging but it’s never been an issue for me. I create a requirements.base.txt that has the versions of things I want installed. I then:
There are several problems with this approach, notably you don't get information about specific platform stuff. You don't get information on how these package are installed (conda, mamba..etc).
And it does not account for dependincies version conflicts which life very hard.
I don’t understand the platform thing, is that something to do with running on Windows? Why wouldn’t you just pip install? Why bring conda etc into the mix?
If you have conflicts then you have to reconcile those at point of initial install - pip deals with that for you. I’ve never had a situation in 15 years of Python packages where there wasn’t a working combination of versions.
These are genuine questions btw. I see these common complaints and wonder how I’ve not ever had issues with it.
I will try to summarize the complaints (mine at least) in obvious simple points
1- pip freeze will miss packages not installed by pip (i.e. Conda).
2- It does include all packages, even not used in the project.
3- It just dumps all packages, their dependencies and sub-dependencies. Even without conflicts, if you happen to change a package, then it is very hard to keep track of dependencies and sub-dependencies that need to be removed. At some point, your file will be a hot mess.
4. If you install specific platform package version then this information will not be tracked
1/4- Ordinary `pip install` works for binary/platform-specific wheels (e.g., numpy) too and even non-Python utilities (e.g., shellcheck-py)
2/3- you need to track only the direct dependencies _manually_ but for reprodicible deployments you need fixed versions for all dependencies. The latter is easy to generate _automatically_ (`pip freeze`, pip-tools, pipenv/poetry/etc).
Ok. I think that’s all handled by my workflow, but it does involve taking responsibility for requirements files.
If I want to install something, I pip install and then add the explicit version to the base. I can then freeze the current state to requirements to lock in all the sub dependencies.
It’s a bit manual (though you only need a couple of cli commands) but it’s simple and robust.
This is my workflow too. And it works fine. I think the disconnect here is that I grew up fighting dependencies when compiling other programs from source on Linux. I know how painful it can be and I’ve accepted the pain and when I came to python/venv I thought “This isn’t so bad!”
But if someone is coming from data science and not dev-ops then no matter how much we say “all you have to do”. The response will be why do I have to do any of this?
I don't think that manual handling of requirement.txt in a collaborative environment is a robust process. It will be a waste of time and resources to handle it like that. And I don't know about your workflow but it is obviously not standard and it does not address the first and forth points.
Problems 1 and 2 can be solved by using a virtualev/venv per project.
3 is solved by the workflow of manually adding requirements and not including dependencies. It may not work for everyone. Something like pipreqs might work for many people.
I do not understand why 4 is such a problem. Can you explain further?
I follow a similar approach -- top-level dependencies in pyproject.toml and then a pip freeze to get a reproducible set for applications. I know there are edge cases but this has worked really well for me for a decade without much churn in my process (other than migrating from setup.py to setup.cfg to pyproject.toml).
After trying to migrate everything to pipenv and then getting burned, I went back to this and can't imagine I'll use another third-party packaging project (other than nix) for the foreseeable future.
The post you’re responding to said that there are many Python packaging options, not that they don’t work. Pip freeze works reasonably well for a lot of situations but that doesn’t necessarily mean it’s the best option for their notebook tool, especially if they want to attract users who are used to conda.
I regularly observe it stalling at dependency resolution stage upon changing version requirements for one of the packages (or python version requirements).
The link redirect does not specify which point in the list you are referring to but I guess it is "Install missing packages from...". If so, then I really wonder if you mean supporting something like '!pip install numpy' like Jupyter or something else?
I don't think this is really a solution, not to mention that this raise the question. Does it support running shell commands using '!' like Jupyter Notebook?
Thanks. That is precisely what I was talking about in my comment. It would solve the problem if we have some like that integrated natively. I understand that between pip, conda, mamba and all the others it would be hard problem to solve. But at least auto generating requirements.txt would be easier. But to be honest the hard part is identify packages and where they are from not what to do with information. Good luck with the development.
This does to address directly the second problem. It does however by sacrificing flexibility. I might need to change a cell just to test a new thing (without affecting the other cells) but thats a trade off if you focus on reproducibility.
I know that requirements.txt is the standard solution to the other problem. But generating and using it is annoying. The command pio freeze will list all the packages in bloated way (there is better ways) but I always hoped to find a notebook system that will integrate this information natively and have a way to embed that into a notebook in a form that I can share with other people. Unfortunately I can't see support for something in any of the available solutions (at least up to my knowledge).