I dont care so much about the memory and CPU stuff, I mostly leave the heavy lifting to an SQL engine.
Although the Null handling seems very compelling, I guess it comes at a cost of incompatibility with existing libraries, otherwise Pandas would have implemented it as well?
If you mean whether I run it distributedly a la Spark then no. If you mean whether I test it on various machines with different RAM sizes then yes.
> I dont care so much about the memory and CPU stuff, I mostly leave the heavy lifting to an SQL engine.
Well, I care. Both pandas and polars are, to my view, single-machine dataframe library, so the memory and CPU constraints are rather stringent.
My comparison is based solely on my experience: reading csv files that are 20% to 50% the size of RAM, pandas takes (or errors out after) 2 to 10 minutes, while polars finishes in 20 seconds. Queries in pandas are almost always slower than polars.
But reading your comment, it seems you and I have different use cases for dataframe libraries, which is fine. I mostly use them for exploratory analysis, so the SQL api is not that much of a plus to me, but the performance is.
When using Pandas appropriately, that is with method chaining, lambda expressions (instead of intermediate assignments) and pyarrow datatypes, you also get much faster speed and null values handling.
I dont care so much about the memory and CPU stuff, I mostly leave the heavy lifting to an SQL engine.
Although the Null handling seems very compelling, I guess it comes at a cost of incompatibility with existing libraries, otherwise Pandas would have implemented it as well?
I am curious about the SQL api though.