One thing is python's dependency management is just insane in itself. When you then additionally have to have it install basically drivers for the gpu, compile loads of native code etc it becomes a hot mess. And half the ML things you want to try are academic experiments not really made for distribution, so they were made to work and that's it. So if you have some different computer setup, minor version mismatch of a dependency etc it will just break. Datasets or models excist on some url that only works for a month.
It's a shame, I think the field is one where reproduction of results should be really welcome and feasible.
Yeah, a docker image would be nice. Even a Dockerfile, even though that in itself may not guarantee reproducibility if you try to build it later. (And may have issues with gpu drivers etc). But at least it documents all assumptions about the setup.
That's the cool thing about publishing the Dockerfile and an image, one is an example that may or may not break, and the other is a functional snapshot of a working config at that point in time.
It's a shame, I think the field is one where reproduction of results should be really welcome and feasible.
Yeah, a docker image would be nice. Even a Dockerfile, even though that in itself may not guarantee reproducibility if you try to build it later. (And may have issues with gpu drivers etc). But at least it documents all assumptions about the setup.