3eb7988a1663 2 days ago

While I agree the speed benefit is marginal, the API is the differentiator.

Polars is a v2 of a dataframe API with a lot of thought put into offering a consistent experience. Variable names are seemingly regular across the board (eg no `sep` on this method, but `delimiter` here), no Numpy int nan baggage, and no silent data type conversions do a lot to improve the robustness of the code. That it is faster is nice, but a big shrug for my typical use cases.

The loss of the index is probably the right move - the implicit column has some subtle logic which I do not miss after switching to polars.

Source: over a decade of pandas experience. There are still a few idioms for which I do not have a good polars alternative, but nothing that is a deal breaker. The syntax is overall more verbose, but I am ok with it.

  • mjhay 2 days ago

    The API is indeed the big deal. I think a lot of Pandas users don’t realize how awful the API is, because they have never used a proper dataframe library like Polars or dplyr.

    • 3eb7988a1663 2 days ago

      I really do not see that much difference between polars and pandas. Under the hood, sure the machinery is different, but quite a bit of pandas code will run as-is with polars. If you want to maximize the performance + strictness, you do need to adopt the polars style, but the two are quite similar.

      Which is to say, I have no real problems with the pandas API. In fact, if I could just transplant the polars strictness into pandas, that would let me keep the slightly more terse syntax.

      • ritchie46 2 days ago

        > but quite a bit of pandas code will run as-is with polars

        I highly doubt this. Aside from dataframe generation and series assignment, almost everything in the API surface is different.

        Strictness is also not something you can transplant easily. It is checking data types at the IR query planning level before you run the query and being able to resolve schema's independent of the data. In pandas schemas do depend on data within operations and therefore it isn't uncommon that data types change if data gets missing values nor can it check if a correct type is passed to an operation without running the compute.

        • 3eb7988a1663 a day ago

          Depends on how you use pandas. Pre-polars I would do a lot of single column/series manipulation which works the same way (though heavily discouraged by polars because you lose out on optimization opportunities). There are plenty of surface level keyword API changes (merge vs join, sort_values vs sort), but you can operate polars in a very panda-esque manner which do not seem all that alien to each other.

          Strictness, I understand you cannot just slap it in, more just an idle thought.

  • mft_ 2 days ago

    I agree. I’m only a hobbyist user of such libraries, and have always found Pandas a little confusing and counter-intuitive. I recently used Polars instead and found it a lot more straightforward.