-
Notifications
You must be signed in to change notification settings - Fork 181
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor: port DaftContext to rust side #3767
refactor: port DaftContext to rust side #3767
Conversation
CodSpeed Performance ReportMerging #3767 will improve performances by 54.97%Comparing Summary
Benchmarks breakdown
|
static DAFT_CONTEXT: OnceCell<DaftContext> = OnceCell::new(); | ||
|
||
#[cfg(feature = "python")] | ||
pub fn get_context() -> DaftContext { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently we only have a singleton context as this is how it was done in py, but we may want to think about of ways that a user could initialize multiple contexts/sessions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The global singleton context is pretty useful, I wonder if there are other ways we can produce the required functionality? In my current mental model, all user interactions with the system go through a session/connection whether that is implicit or explicit. Whereas the Context I think of a global program state for the process itself. I'm curious what functionality you had in mind where multiple contexts would be necessary which cannot be achieved by pulling that necessary state into the session?
I know that you have said that currently "context" and "session" are synonymous, but perhaps this is the reason to no longer make them so?
- context – process state
- session – connection state
We are pushing up against the same concepts and patterns described in SQL standard Part 1 where our context represents state for the SQL-environment (singleton), whereas our session should represent state for the SQL-connection. This is not meant to be definitive, but hopefully this helps us tease out the role of context vs the role of session.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm yeah I was actually thinking a bit about this yesterday as well. I think that seems like a reasonable distinction to make.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we're going to make a hard distinction between the context and the session, I'll update the PR remove the catalog
from the context.
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #3767 +/- ##
==========================================
+ Coverage 75.88% 77.82% +1.93%
==========================================
Files 733 737 +4
Lines 96091 93361 -2730
==========================================
- Hits 72919 72656 -263
+ Misses 23172 20705 -2467
|
def __init__(self, ctx: PyDaftContext | None = None): | ||
if ctx is not None: | ||
self._ctx = ctx | ||
else: | ||
self._ctx = PyDaftContext() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason to stop using __new__
with the lock? I see how the other methods are protected by the mutex on the Rust side, but I'm not sure I follow how the singleton is thread safe.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we don't need the __new__
because having multiple python instances doesn't matter anymore. they are all backed by the single rust based context now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Understood, my question is about how the Rust singleton's construction is protected from multiple instantiations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh got it. It's protected by making the constructor and the fields private so there is no way to create a DaftContext
outside of using get_context
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, and get_context is protected by the OnceCell. So now it's possible to have multiple python context objects, but they are always backed by the same Rust context. I think that's fine? The original issue was in regards to having multiple runners, so this is probably ok.
static DAFT_CONTEXT: OnceCell<DaftContext> = OnceCell::new(); | ||
|
||
#[cfg(feature = "python")] | ||
pub fn get_context() -> DaftContext { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The global singleton context is pretty useful, I wonder if there are other ways we can produce the required functionality? In my current mental model, all user interactions with the system go through a session/connection whether that is implicit or explicit. Whereas the Context I think of a global program state for the process itself. I'm curious what functionality you had in mind where multiple contexts would be necessary which cannot be achieved by pulling that necessary state into the session?
I know that you have said that currently "context" and "session" are synonymous, but perhaps this is the reason to no longer make them so?
- context – process state
- session – connection state
We are pushing up against the same concepts and patterns described in SQL standard Part 1 where our context represents state for the SQL-environment (singleton), whereas our session should represent state for the SQL-connection. This is not meant to be definitive, but hopefully this helps us tease out the role of context vs the role of session.
Ideally i think the config objects should be pulled into the session though I'm not sure yet what makes the most sense for the `runner'. I'm totally open to adapting this to our needs for session as things evolve. This PR is solely to get a 1-1 mapping with our existing python |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR is solely to get a 1-1 mapping with our existing python DaftContext
Understood, and this is a huge help for when I lower catalog things into Rust. For this switchover, I believe the proof is in the tests, no concerns beyond the thread safe singleton which you've addressed.
@universalmind303 Looks like we're moving a lot of global state over to the Rust side of things? If so, I don't mind having this merge into I believe this should be a simple update. I will just need to see how to you've proposed global state in Rust and follow that pattern for daft broadcasting. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@universalmind303 What's the purpose of decorating functions with #[cfg(not(feature = "python"))]
and having their bodies remain as unimplemented!()
? I see this pattern has been used for all methods / functions defined in this file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pretty much all of the context depends on python as the runners are implemented in python, so this is just needed to not break things when running with cargo test|check|clippy --no-default-features
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
honestly, i'd like to just remove the python feature flag all together, but it makes rust testing less portable as it needs to link to python.
ports the `DaftContext` to the rust side so we can more easily reuse it for connect & catalog work. notes for reviewers: - after removing `daft-local-execution` from daft-connect, it made `cargo check` fail, so some of the changes to unrelated `Cargo.toml`'s was to properly feature flag them. We missed a couple that were causing things to break. - @rchowell I requested review from you specifically to take a look at [context.rs](https://github.com/Eventual-Inc/Daft/pull/3767/files#diff-5b536482a8303505c0c91a561a632218591bf8f02fff21f9f8a3535a3f0ff8a5). This contains the "context"/"session", and the global state that was previously only available in python. I added the `catalog` to this state as well.
ports the
DaftContext
to the rust side so we can more easily reuse it for connect & catalog work.notes for reviewers:
daft-local-execution
from daft-connect, it madecargo check
fail, so some of the changes to unrelatedCargo.toml
's was to properly feature flag them. We missed a couple that were causing things to break.catalog
to this state as well.