Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> Is Python so annoying to use that when a compelling non-Python solution appears, everyone will love it? Less hassle? Or did it take off for a different reason? Interested in hearing thoughts.

For me it's less about the language and more about the dependencies. When I want to run a Python ML program, I need to go through the hassle of figuring out what the minimum version is, which package manager/distribution I need to use, and what system libraries those dependencies need to function properly. If I want to build a package for my distribution, these problems are dialed up to 11 and make it difficult to integrate (especially when using Nix). On top of that, those dependencies typically hide the juicy details of the program I actually care about.

For something like C or C++? Usually the most complicated part is running `make` or `cmake` with `pkgconfig` somewhere in my path. Maybe install some missing system libraries if necessary.

I just don't want to install yet another hundred copies of dependencies in a virtualenv and just hope it's set up correctly.



> For me it's less about the language and more about the dependencies. When I want to run a Python ML program, I need to go through the hassle of figuring out what the minimum version is, which package manager/distribution I need to use, and what system libraries those dependencies need to function properly.

This is exactly why I hate Python. They even have a pseudo-package.json style dependencies file that you should supposedly be able to just run "install" with, but it NEVER works. Not once have I ever downloaded someone's Python project from github and tried to install the dependencies and run it has it ever gone smoothly and without issue.

The Python language itself may be great, I don't know, but I'm forever put off from learning or using it because clearly in all the years it's been around they have yet to figure out reproducibility of builds. And it's obviously possible! JavaScript manages to accomplish it just fine with npm and package.json. But for some reason Python and its community cannot figure it out.


Would the problem "how can I run this cool ML project from GitHub" be solved if developers would publish their container images on dockerhub? The only downside I see is enormous image sizes


"Here's a massive glob of mystery-meat state to run my massive glob of mystery-meat tensors".


"and by the way, inside the image I slightly modified one file of the tensors library (and this is undocumented) and this totally changes the output if the fix is not there"


They just need a dockerfile that builds correctly. No need to make an image available, the dockerfile should be able to build it consistently


I use pypoetry for dependency management in Python projects. It helps a lot, but doesn't resolve the issue of pip packages to fail installing because you're missing system libraries. At least it specified the Python version to use. With many open source ML repos I have to guess what Python version to use.

I'd really like to see more Docker images (images, not Dockerfiles that fail to build). Maybe flatpack or snap packages do the trick, too.


> Not once have I ever downloaded someone's Python project from github and tried to install the dependencies and run it has it ever gone smoothly and without issue.

Same here. If it can't resolve dependencies or whatever, then there will almost certainly be some kind of showstopping runtime error (probably because of API changes or something). I avoid Python programs at all cost nowadays.


This.

`pip` is the package manager that almost works.

Python is the language that almost supports package distribution.

I'll keep using `apt` on vanilla Debian.


Strong agree. I'll willingly install a handful dependencies from my distro package manager, where the dependencies are battle-hardened Unixy tools and I can clearly see what they do and how they do it.

I'm not going to install thousands of dodgy-looking packages from pip, the only documentation for which is a Discord channel full of children exchanging 'dank memes'.

I like Python, but I simply do not trust the pip ecosystem at this point (same for npm, etc.).


> I'm not going to install thousands of dodgy-looking packages from pip, the only documentation for which is a Discord channel full of children exchanging 'dank memes'.

This made me laugh. It’s true, isn’t it? That’s really what we deal with day to day (for me in the js world, the create react app dependencies make my head spin)


> For something like C or C++? Usually the most complicated part is running `make` or `cmake` with `pkgconfig` somewhere in my path. Maybe install some missing system libraries if necessary.

I don't mean to detract from your main point re Python dependencies, but I find this about C to be rarely true. `make` etc build flows usually result in dependency-related compile errors, or may require a certain operating system. I notice this frequently with OSS or academic software that doens't ship binaries.


Yep. I've given up on any C or C++ projects because I find they almost never work and waste hours of my time. Part of the issue might be the fact that I'm often using Windows or MacOS but I've had bad experiences on Linux also.


> `make` etc build flows usually result in dependency-related compile errors,

Which are displayed to me during the `./configure` step before the `make`, and usually require me to type "apt-get install [blah] [blah] [blah]", and to run configure again.


Not all configure scripts are created equal: a lot of them only tell you about missing dependencies one at a time.

I'm glad make, etc., works for yo. But for me, neither C, C++, nor Python are particularly enjoyable dependency wise.


So much this. As someone whose bread and butter is systems programming for things that run on end-user devices, every time I dig into a Python project I feel like I've been teleported into the darkest timeline, where everything is environment management hell.

Even the more complex and annoying scenarios in native-land for dependency management still feels positively idyllic in comparison to Python venvs.


When I initially started to learn Python (1.6), virtualenv was starting to be adopted, and since then thing have hardly changed.

It also helps that even minor versions introduce breaking changes.

I doubt anyone really knows Python that well, unless they are on the core team.


It was fine, back in the early days (I started with 1.4-ish). I just downloaded the tarball, unpacked, configured, make, installed into /usr/local on my workstation, then downloaded and stuck any packages into site-packages. Numeric was sometime tricky to compile right, but ye olde "configure && make && make install" worked fine.

Of course, that worked because 1) I was really only doing one project, not juggling multiple ones, 2) there weren't all that many dependencies (Numeric, plotting, etc.), and 3) I was already up to my eyeballs in the build system with SWIG and linking to the actual compute code, so I knew my way around the system.

But every now and then I just shake my fist at the clouds then mutter darkly about just installing the dang thing and maybe not taking on so many dependencies. :-)


I’m newish to python, I’ve only used it for machine learning projects and some web scraping. Could somebody elaborate on venv? I just started using it but now everyone in this thread is saying how much they hate it. Is there an alternative?


venv is fine. Remember that this is a self-selected sample of people. You’re going to bump into the flaws and gotchas but it’s a perfectly usable tool.


Uhm so I was professionally setting up ML distros and ML containers for cloud deployments. Venv is not fine, especially if you've seen how other langs do it.


Could you say more what you don’t like about venv compared to how other languages do it?


Just store your deps in project directory instead of using hidden fucking magic.


The hidden magic is adjusting env vars used by python, LD, etc. Adjusting paths only seems like magic until you understand what it’s doing.

I’ve done with with plenty of languages including C/C++


Yes I know how it works but try explaining that to someone learning flask at a boot camp.


I mean I wouldn’t expect someone at a boot camp to understand this nor was this a topic of conversation, so /shrug?


The good news is they're working towards encouraging a standard .venv in project directory.

https://peps.python.org/pep-0704/


IMHO venvs are fine as an implementation detail, a building block for a slicker tool.

The annoyance with venvs is you have to create and activate them. In contrast for cargo (or stack or dotnet or yarn or pipenv or poetry), you just run the build tool in the project directory.

Another limitation of venv is it doesn't solve the problem of pinning a versions of Python, so you need another tool.


Well I just spent an hour to diagnose the build failure of llama.cpp due to it picking up wrong nvcc path.

Dependency problem still happens even with C/C++.


I had the same issue... Turned out it was because I used the flat pack version of intellij idea and it had problems with paths. Running from a plain terminal worked fine.


The flatpak version of intellij isn't officially maintained by jetbrains. Jetbrains only maintains the snap for linux


it's fortunate that flatpak solves this problem of reproducible environments so effectively


This is also the reason I like when I see a project in C or C++. It's often a ./configure && make or something. Sometimes running a Python project even if dependencies install, there might be some mystery crash because package dependencies were not set correctly or something similar (I had a lot of trouble with AUTOMATIC1111 StableDiffusion UI when using some extensions that installed their own requirements that might be in conflict with the main project).

With a boring C project, if it compiles it probably works without hassle.

Feels validating that other people have these thoughts too and I'm not just some old fart.


I recently hit the "classic" case. Saw a CLI tool for an API I'd like to use, written in Python. Tried it and found out it didn't work on my machine. I later found out it was a bug in a dependency of that tool. 100 lines of shell script later, I had the functionality I needed, and a codebase which was actually free of unexpected surprises. I know, this is an extreme example, but as personal anecotes go, Python has lost a lot of trust from my side. I also wonder how people can write >10k codebases without static types, but that is just me ....


It's not that Python is bad. It's the people who want to just hack something quick together go to Python, so any time you pick up some software written in Python it's marred with all kinds of compatibility issues and bugs where you can't just run it

The answer is "yeah use this other software to make it work in an isolated way because the whole ecosystem is actually broken" and that's somehow acceptable


I think it is the opposite for me but I am also a fan of system independent package mangers, provided they support easy package configuration.

Otherwise you not only bind to system architecture and OS, you also bind yourself to a distribution.

I find that Automatic1111 plugins tend to not share dependencies and instead redownloads them for their own use. Can make your hdd cry because some of these are larger models. Advantages and disadvantages probably...

There are package managers for C and some are quite good. But for most projects you are quite dependent on the package manager of your distro to supply you a fitting foundation. Sometimes it is easy, but if there is a problem, I think handling C is far harder than python. And I write quite a bit of C while I can only perhaps read python code.

No code is completely platform independent, especially a stable diffusion project, but Python is still more flexible as C by a long shot here.

Of course Llama is great. Time to get those LLMs on our devices for our personal dystopian AIs running amok.


> which package manager/distribution I need to use, and what system libraries those dependencies need to function properly

I don't understand why things are so complicated in Python+ML world.

Normally, when I have a Python project, I just pick the latest Python version - unless documentation specifically tells me otherwise (like if it's still Python 2 or if 3.11 is not yet supported). If the project maintainer had some sense, it will have a requirements list with exact locked versions, so I run `pip install -r requirements.txt` (if there is a requirements.txt), `pipenv sync` (if there is a Pipfile), or `poetry install` (if there's pyproject.toml). That's three commands to remember, and that's not one just because pip (the one de-facto package manager) has its limitations but community hadn't really decided on the successor. Kinda like `make` vs automake vs `cmake` (vs `bazel` and other less common stuff; same with Python).

External libraries are typically not needed - because they'll be either provided in binary form with wheels (prebuilt for all most common system types), or automatically built during the installation process, assuming that `gcc`, `pkgconfig` and essential headers are available.

Although, I guess, maybe binary wheels aren't covering all those Nvidia driver/CUDA variations? I'm not a ML guy, so I'm sure how this is handled - I've heard there are binary wheels for CUDA libraries, but never used that.

Ideally, there's Nix (and poetry2nix) that could take care of everything, but only a few folks write Flakes for their projects.

> Usually the most complicated part is running `make` or `cmake` with `pkgconfig` somewhere in my path

Getting the correct version of all the dependencies is the trickiest part as there is no universal package managers - so it's all highly OS/distro specific. Some projects vendor their dependencies just to avoid this (and risk getting stuck with awfully out-of-date stuff).

> Maybe install some missing system libraries if necessary.

And hope their ABIs (if they're just dynamically loaded)/headers (if linked with) are still compatible with what the project expects. At least that is my primary frustration when I try to build something and it says it doesn't work anymore with whatever OS provides (mostly, Debian stable fault lol). It is not exactly fun to backport a Debian package (twice so if doing this properly and not handwaving it with checkinstall).


> Ideally, there's Nix (and poetry2nix) that could take care of everything, but only a few folks write Flakes for their projects.

Relevant to "AI, Python, setting up is hard ... nix", there's stuff like:

https://github.com/nixified-ai/flake


The right combo for Nvidia/CUDA/RandomPythonML library is a nightmare at times. This is especially true if you want to use older hardware like a Tesla M40 (dirt cheap, still capable). And your maker hopefully be with you if you if you tried to use your distro's native drivers first.

It's fair to say part of the blame is on Nvidia, but wow is it frustrating when you have to find eclectic mixes.


My personal recipe (on NixOS) is pip-ed virtual environment for quick tests, or conda inside a nix-shell, on top of a dedicated zfs pool/conda mounted in ~/.conda with dedup=on so nothing nixified and nothing that last a nixos-rebuild...

Many pythonic projects not only in ML world tend to be just developers experiments, so to be run as an experiment, not worth to be packaged as a stable, released program...

Oh, BTW projects like home-assistant fell in the same bucket...


I totally agree with you the irony being python and languages like were built in part to reduce the complexity not only of the language but also to build and run the code… I feel machine learning is a low enough level thing that it should not be tied to a high level language like python… so I can use node, ruby, php or whatever by adding a c binding etc that to me is why this is most interesting


The problem is that python is designed assuming people want to use system-wide packages. In hindsight, that has turned out to be a mistake. Conda / venv try to bridge that gap but they’re kludgy, complex hacks compared to something like cargo or even npm.

Worse, because Python is a dynamic language, you also have to deal with all of that complexity at deployment time. (Vs C/C++/Zig/Rust where you can just ship the compiled binary).


> The problem is that python is designed assuming people want to use system-wide packages.

This wasn't true for decades, `virtualenv` was de-facto standard isolation solution (now baked in as `python -m venv`, still de-facto standard), and `pip` is the package manager (we don't talk about setuptools/distutils, ssh!). If someone still used system-wide packages that was either because a) they were building a container or some single-purpose system; or b) they were sloppy or had no idea what they're doing (most likely, following some crappy tutorial). Or it was distro people creating packages to satisfy dependencies for Python programs - but that's a whole different story (and one's virtualenv shouldn't inherit system packages unless it is really really necessary and iif it makes sense to do so).

The problem started when one needed some external non-Python dependencies. Python had invented binary wheels and they're around for a while (completely solving issues with e.g. PostgreSQL drivers, no one needs to worry about libpq), but I suppose depending on specific versions of kernel drivers and CUDA libraries is a more complex and nuanced subject.

> Vs C/C++/Zig/Rust where you can just ship the compiled binary

Only assuming that you can either statically link, or if all libraries' ABIs are stable (or if you're targeting a very specific ABI, but I've had my share of "version `GLIBC_2.xx' not found"s and not fond of those).

In a similar spirit, any Python project can be distributed as one binary (Python interpreter and a ZIP archive, bundled together) plus a set of zero or more .so files.


> This wasn't true for decades, `virtualenv` was de-facto standard isolation solution (now baked in as `python -m venv`, still de-facto standard)

Right; but python itself doesn’t check your local virtual environment unless you “activate” it (ugh what). And it can’t handle transitive dependency conflicts, like node and cargo can. Both of those problems stem from python assuming that a simple, flat set of dependencies are passed in from its environment variables.


Virtual envs are actually quite simple -- they contain a bin/ directory with a linked python binary. When the python binary runs, it checks it sibling directories (it knows it was executed as e.g. /home/user/.venv/bin/python) for what to load. You don't need the activate shell scripts or anything, just running that binary within your venv is enough; the shell script is just for convenient of inserting the bin directory into the $PATH so just "python" or "pip" runs the right thing.


> the shell script is just for convenient of inserting the bin directory into the $PATH so just "python" or "pip" runs the right thing.

Or so any reference in the program you run that launches another binary or loads a DLL relying onnthe environment gets the right one, etc. There are some binaries you can run without activating a venv with no problem, and others will crash hard, and others will just subtly do the wrong thing if the conditions are “right” in your normal system environment.


Another implication of this is that its impossible for 2 mutually incompatible copies of the same package to exist in the same environment. If packageA needs numpy 1.20 and packageB needs numpy 1.21, you're stuck.


> Virtual envs are actually quite simple

You have never trashed your system from virtualenv?

Also, there is a problem when wheels assume they can have everything like tensorflow from years ago -- I don't know about now, since tf used to be tied to cuda versions you could get into trouble installing tf versions, even with venv, conda, etc.


> You have never trashed your system from virtualenv?

Unless one have done something they shouldn't have done (in particular, using sudo while working with virtualenv), this shouldn't be possible.

Due to limitations of most commonplace system-wide package managers (like, dpkg, rpm or ebuild, not modern stuff like nix) system packages exist to support other system packages. One installs some program, it needs libraries, dependencies get pulled. And then its distro package managers' job to ensure compatibility and deal with multiple version conflicts (not fun).

But if you start or check out some project, common knowledge was that you shouldn't be using on system packages, even if they're available and could work. With some obligatory exceptions like when you're working on a distribution packaging, or developing something meant to be tightly integrated with a particular distro (like a corporate standard stuff).

That is, unless we're talking about some system libraries/drivers needed for CUDA in particular (which is system stuff) rather than virtualenv itself.


> That is, unless we're talking about some system libraries/drivers needed for CUDA in particular (which is system stuff) rather than virtualenv itself.

Sir, this is an ML thread.

Venv interacts with that poorly, though to be fair it could be googles fault. Still it shouldn't be even possible.


I mean, virtualenv is not supposed to interact with that at all. System libraries are systems' package manager responsibility. Doubly so, as - as I get it - all this stuff is directly tied to the kernel driver.

What Python's package manager (pip in virtualenv) should do is build (or download prebuilt binaries) the relevant bindings and that's the extent of it. If others say it works this way with C (that comment about cmake and pkgconfig), then it must work this way with Python too.


> If someone still used system-wide packages that was either because a) [...] b) [...]

Or simply because they are packagers for some distro and they user want a simple way to pull-in some software by it's name, while the upstream devs imaging people cloning their public repo and run the software from the checkout in their own home, with regular pull, regularly rebuilding the needed surroundings...

Not to talking about modern systems/distro with not-really-posix vision like NixOS or Guix System...

> In a similar spirit, any Python project can be distributed as one binary

A single 10+Gb binary :-D


Ridiculous, 10gb binary only if machine learning models involved. I had distributed full stack binaries in 70mb or less.


> For something like C or C++? Usually the most complicated part is running `make` or `cmake` with `pkgconfig` somewhere in my path. Maybe install some missing system libraries if necessary.

If you don't use virtual environments in python, isn't it basically the same in python? Just run `pip install` and maybe install some missing system libraries if necessary. In practice, it's not that simple in either language, and "maybe install some missing dependencies" sweeps a lot of pain under the rug.


I wonder if having a shared cache would make this more easeful. fwiw nix does that.


You might like Cog. It solves these problems for ML projects in specific.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: