That's one interesting project. As someone who relies heavily on collaboration w...

akshayka · on Jan 12, 2024

Yes, the second half of reproducibility is for sure packages. A solution for reproducible environments is on our roadmap (https://marimo-team.notion.site/The-marimo-roadmap-e5460b9f2...), but we haven't quite figured it out yet.

It's a bit challenging because Python has so many different solutions for package management. If you have any ideas we'd love to hear them.

aidos · on Jan 12, 2024

People always complain about pip and python packaging but it’s never been an issue for me. I create a requirements.base.txt that has the versions of things I want installed. I then:

    pip freeze -r requirements.base.txt > requirements.txt

Install is then simply:

    pip install -r requirements.txt

Updating / installing something new is a matter of adding to the base file and then refreezing.

bluish29 · on Jan 12, 2024

There are several problems with this approach, notably you don't get information about specific platform stuff. You don't get information on how these package are installed (conda, mamba..etc).

And it does not account for dependincies version conflicts which life very hard.

aidos · on Jan 12, 2024

I don’t understand the platform thing, is that something to do with running on Windows? Why wouldn’t you just pip install? Why bring conda etc into the mix?

If you have conflicts then you have to reconcile those at point of initial install - pip deals with that for you. I’ve never had a situation in 15 years of Python packages where there wasn’t a working combination of versions.

These are genuine questions btw. I see these common complaints and wonder how I’ve not ever had issues with it.

bluish29 · on Jan 12, 2024

I will try to summarize the complaints (mine at least) in obvious simple points

1- pip freeze will miss packages not installed by pip (i.e. Conda).

2- It does include all packages, even not used in the project.

3- It just dumps all packages, their dependencies and sub-dependencies. Even without conflicts, if you happen to change a package, then it is very hard to keep track of dependencies and sub-dependencies that need to be removed. At some point, your file will be a hot mess.

4. If you install specific platform package version then this information will not be tracked

d0mine · on Jan 13, 2024

1/4- Ordinary `pip install` works for binary/platform-specific wheels (e.g., numpy) too and even non-Python utilities (e.g., shellcheck-py)

2/3- you need to track only the direct dependencies _manually_ but for reprodicible deployments you need fixed versions for all dependencies. The latter is easy to generate _automatically_ (`pip freeze`, pip-tools, pipenv/poetry/etc).

aidos · on Jan 12, 2024

Ok. I think that’s all handled by my workflow, but it does involve taking responsibility for requirements files.

If I want to install something, I pip install and then add the explicit version to the base. I can then freeze the current state to requirements to lock in all the sub dependencies.

It’s a bit manual (though you only need a couple of cli commands) but it’s simple and robust.

pastorhudson · on Jan 13, 2024

This is my workflow too. And it works fine. I think the disconnect here is that I grew up fighting dependencies when compiling other programs from source on Linux. I know how painful it can be and I’ve accepted the pain and when I came to python/venv I thought “This isn’t so bad!”

But if someone is coming from data science and not dev-ops then no matter how much we say “all you have to do”. The response will be why do I have to do any of this?

bluish29 · on Jan 12, 2024

I don't think that manual handling of requirement.txt in a collaborative environment is a robust process. It will be a waste of time and resources to handle it like that. And I don't know about your workflow but it is obviously not standard and it does not address the first and forth points.

aidos · on Jan 12, 2024

Haha. Ok. I think that’s where we’re just going to have to agree to disagree.

graemep · on Jan 13, 2024

Problems 1 and 2 can be solved by using a virtualev/venv per project.

3 is solved by the workflow of manually adding requirements and not including dependencies. It may not work for everyone. Something like pipreqs might work for many people.

I do not understand why 4 is such a problem. Can you explain further?

paddy_m · on Jan 12, 2024

Can you name a package manager (any language) that handles #3 well?

How does it handle the problem?

ShamelessC · on Jan 13, 2024

Yes, there are more problems with Windows.

n8henrie · on Jan 13, 2024

I follow a similar approach -- top-level dependencies in pyproject.toml and then a pip freeze to get a reproducible set for applications. I know there are edge cases but this has worked really well for me for a decade without much churn in my process (other than migrating from setup.py to setup.cfg to pyproject.toml).

After trying to migrate everything to pipenv and then getting burned, I went back to this and can't imagine I'll use another third-party packaging project (other than nix) for the foreseeable future.

actuallyalys · on Jan 13, 2024

The post you’re responding to said that there are many Python packaging options, not that they don’t work. Pip freeze works reasonably well for a lot of situations but that doesn’t necessarily mean it’s the best option for their notebook tool, especially if they want to attract users who are used to conda.

bmitc · on Jan 13, 2024

Poetry handles all of this properly.

331c8c71 · on Jan 13, 2024

I regularly observe it stalling at dependency resolution stage upon changing version requirements for one of the packages (or python version requirements).

ShamelessC · on Jan 13, 2024

Just not PyTorch apparently.

bluish29 · on Jan 12, 2024

The link redirect does not specify which point in the list you are referring to but I guess it is "Install missing packages from...". If so, then I really wonder if you mean supporting something like '!pip install numpy' like Jupyter or something else?

I don't think this is really a solution, not to mention that this raise the question. Does it support running shell commands using '!' like Jupyter Notebook?

akshayka · on Jan 12, 2024

Oh, sorry for not being more clear. That's not the one. It's "Package management: make notebooks reproducible down to the packages they use": https://marimo-team.notion.site/840c475fd7ca4e3a8c6f20c86fce...

Does that align with what you're talking about?

That page has some scrawled brainstormed notes. But we haven't spent time designing a solution yet.

bluish29 · on Jan 12, 2024

Thanks. That is precisely what I was talking about in my comment. It would solve the problem if we have some like that integrated natively. I understand that between pip, conda, mamba and all the others it would be hard problem to solve. But at least auto generating requirements.txt would be easier. But to be honest the hard part is identify packages and where they are from not what to do with information. Good luck with the development.

gcarvalho · on Jan 13, 2024

The third half is data which only exists on your machine :P

And even if it’s on some shared storage, it may have been generated by another unreproducible notebook or worse, manually.

331c8c71 · on Jan 13, 2024

Nix is the only solution for reproducible environments that I would call rock-solid.

It comes with costs and the gpu-related stuff is especially tricky e.g. https://www.canva.dev/blog/engineering/supporting-gpu-accele...

BerislavLopac · on Jan 14, 2024

You should try pip-tools.