A dataframe protocol for the PyData ecosystem

As a high level comment, there seem to be a variety of new requirements that people are raising.

I’ve been raising some issues but I’m coming from the perspective of programming to an interface. I don’t think they can be classified as new as it’s a stated goal in bold at the top of this thread however it may not be relevant to a discussion on a data export interface.

The two concepts are somewhat linked as you have to define an interface to export data, but that might be quite different to defining a common programming interface.

This thread has people discussing both types of interfaces. To avoid conflating one with the other it might be worthwhile starting a new thread for each so that it’s unambiguous what goals and use-cases are associated with each.

It seems that I did not formulate my thoughts correctly: the only interface that I was interested in was to be able to create / pass around a dataframe, and not to manipulate it.

Yes, a universal dataframe API would be great. However, this is a bit what SQL pretends to be, and it fails: there are a lot of variants, and the new kids on the block do not even pretend to be compatible.

I see that many impressive goals have been raised in the discussion. However, who is going to implement them? Better done that perfect is an important aspect of the discussion: the current situation in the PyData world is a tower of babel. As a consequence, libraries stick to the most popular dataframe: the pandas’ one. The user is left to convert to this container, which quickly gets tedious. Hence, the user is more likely to give up on other dataframe libraries, because they lead to mental overhead.


A few weeks have passed. I’m not sure exactly how to proceed, but as an experiment I’d be interested in publishing the data access interface from my PR on PyPI and creating an implementation of it in pyarrow. If other projects want to use it, they are free to do so. Anyone can propose further changes my submitting a pull request.

If someone wants to move the repository someone else or fork it and go a different direction (or start over), please feel free to do so.

1 Like