-
Notifications
You must be signed in to change notification settings - Fork 867
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Supporting object store FFI #7075
Comments
Tagging @kylebarron. I think there are a couple of questions here:
I think there are many people interested in such functionality, however, I suspect there is a significant non-technical aspect to any such initiative. |
It sounds like the OP here is interested in a custom implementation of That's a bit different than what I've been focused on. The solution I've been working towards is to have reusable Python bindings that other Rust-Python developers can use in their own Python bindings. This doesn't require FFI (though by not using FFI it means you can't share I don't think I have the bandwidth to try and implement stable object store FFI. I'll tag @timsaucer who wrote the DataFusion FFI support. |
The use case here would be that unlike Datafusion that has a The idea I have is to have a Python API The main issue here is that there's no ObjectStore FFI at the moment AFAIK, which makes it tricky to implement. |
Polars has both a python and Rust API, and IMO trying to support both with one system using mechanisms like libloading is a bad UX for both. Managing shared libraries is a PIA, especially within the python ecoystem. Instead I'd suggest Rust polars code should be able to provide an ObjectStore directly. Similarly for python, polars could provide an object store shim that delegates to a user provided python impl (this may even already exist). This would allow people to then plugin python based object stores. If libloading is important for other reasons, e.g. some GPL dance, then these Rust/python impls can orchestrate that. I think this would satisfy your use-case, whilst not requiring a stable C FFI? |
@tustvold The Rust API to directly provide an object store makes sense, but I'll be interacting with polars via python so I was wondering what's the best way to register a new object store in python. I might be misunderstanding, but regarding your idea for python, does it involve creating some sort of generic ObjectStore python impl that wraps the user provided python impl and invokes its methods? Something like: I am worried that for this approach, doesn't this mean that the interactions with the object store will be -> rust -> python -> rust? Since the user provided python impl would just be a object store wrapped with python bindings. This would have some overheard and I'm not sure if there would be performance implications due to the GIL. It seems like libloading is already used in polars for plugins https://docs.pola.rs/api/python/stable/reference/plugins.html, which is where I got the idea from. |
I was somewhat presuming that if you're using the python API you want to author your extension in python. I agree if the implementation is in Rust, proxying via python seems unnecessary, even if the performance impact is likely irrelevant when compared to network overheads. I'm not familiar with how polars has hooked up its python bindings, but I wonder if you can setup a polars "context" in Rust and then invoke it from python? |
Are you referring to something like SessionContext that Datafusion has? It would be nice but AFAIK, I don't think such a concept exists in Polars unfortunately. I did float the idea of using libloading to the Polars devs but the main concern/blocker that was raised was the lack of a stable C FFI for ObjectStore. I agree from a UX perspective it might be non-ideal, but given that a similar concept already exists in Polars (registering a Rust plugin via python), this wouldn't be unprecedented. Is there any chance for an ObjectStore C FFI to be on the roadmap? |
I will be frank, ObjectStore has a fairly large, async API, and so defining and maintaining an FFI interface for it would be a fairly substantial undertaking. Given what I know of the interests of the various maintainers, I think such an initiative would likely stand the best chance of success incubating as a third-party project.
IMO I would suggest adding support for custom ObjectStore within polars from the Rust API first, i.e. introducing something similar to ObjectStoreRegistry. Once such an abstraction exists, then it will be possible to devise ways to potentially orchestrate that from python code. This wouldn't necessarily require a C FFI, for example, you could potentially build a shared library bundling both polars and your extension code, and then use that from python or something along those lines. |
Ah that's unfortunate :( There's only https://github.com/RelationalAI/object_store_ffi I think.
Do you mind elaborating on this approach? I was thinking that a Rust API for this might not really be helpful since after registering a custom object store into the registry (which maybe can be represented as some sort of global map), how will python polars be aware of this state? |
My understanding is that the polars python API really just acts as glue to orchestrate the underlying Rust execution engine. As such it should be possible to do the initial setup in Rust and then do further orchestration from Python. Ultimately I'd be very surprised if adding support for this from Rust wasn't a necessary precondition to python support. |
Just to make sure I understand correctly, is the idea here along the lines of:
Or am I getting it wrong? I’m not sure this would work, since wouldn’t the Rust binary for the Rust API be different from the underlying Rust binary for python polars which is separately compiled and published? So the state in Rust polars with the registered object store won’t be shared with python polars. |
You wouldn't use the standard polars distribution, just the custom one with your extension built into it. I think this discussion is probably best moved to polars. |
I think the confusion here might be that @tustvold is expecting Polars to act like DataFusion, where it's intended to be fully embedded into your own project. Whereas my understanding is that the Polars Python API isn't designed to be embedded. The intended extension point is via runtime-linked extensions. |
If this is indeed the case, then polars will indeed require a stable C FFI for all such extensions. My point was that this is a very limiting methodology, especially given Rust's lack of a stable ABI, and it seems surprising that polars would not support build-time extension in addition. |
A potential temporary solution could be to create a fork of Polars to integrate my new object store, though it creates some maintenance overhead which isn't ideal. There seems to be some demand for ObjectStore FFI, e.g. here, so it would be immensely helpful if the Arrow team could consider adding this capability to their roadmap! |
Is your feature request related to a problem or challenge? Please describe what you are trying to do.
The use case here would be to be able to dynamically load libraries with custom object store implementations outside of s3/azure/gcp etc in libraries like polars in which there’s no way currently to register a new object store.
Describe the solution you'd like
An object store FFI would be necessary due to Rust’s unstable ABI.
Describe alternatives you've considered
Additional context
The text was updated successfully, but these errors were encountered: