-
Notifications
You must be signed in to change notification settings - Fork 125
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feast integration #410
Feast integration #410
Conversation
The error in the
|
I just realized that we also have to ensure that the |
cookbook/case_studies/feature_engineering/feast_integration/feast_workflow.py
Outdated
Show resolved
Hide resolved
cookbook/case_studies/feature_engineering/feast_integration/feast_workflow.py
Outdated
Show resolved
Hide resolved
cookbook/case_studies/feature_engineering/feast_integration/custom_provider/provider.py
Outdated
Show resolved
Hide resolved
Fixed the issue. It's working now. |
project="flytesnacks", | ||
domain="development", | ||
name="feast_integration.feature_eng_tasks.mean_median_imputer", | ||
version="{{ registration.version }}", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kumare3 The workflow is getting registered only when registering it twice. Does registration.version
get populated only after the reference tasks get registered?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I couldn't get this to work, so reverted this change for the time being.
Also, do we really need this for this demo?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah. I think so.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did it not work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, it didn't. Now that the bulk of the work is done I can investigate this.
Samitha, can you expand on why you think having these tasks imported this way is important for this demo? I feel like it's already complicated enough and that we should focus on the interaction with the feast SDK.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather than importing directly, I thought reference tasks would make more sense. Also, I couldn't serialize the tasks with the relative import (I may have to use .
in front of the file name).
We need not worry about the relative imports with reference tasks, so I included it.
You can remove it if that's not really required.
# Notice the use of a custom provider. | ||
provider="custom_provider.provider.FlyteCustomProvider", | ||
offline_store=FileOfflineStoreConfig(), | ||
online_store=SqliteOnlineStoreConfig(path=self.config.online_store_path), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we have "redis" online store?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean in addition to the sqlite store? I don't see why not. We could expose that as a configuration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, we could have it in addition or replace the SQLite one. Both should work!
store_offline_node >> load_historical_features_node | ||
load_historical_features_node >> store_online_node | ||
store_online_node >> retrieve_online_node |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
😍
cookbook/case_studies/feature_engineering/feast_integration/feast_workflow.py
Outdated
Show resolved
Hide resolved
cookbook/case_studies/feature_engineering/feast_integration/feast_type_transformers.py
Outdated
Show resolved
Hide resolved
cookbook/case_studies/feature_engineering/feast_integration/feast_type_transformers.py
Outdated
Show resolved
Hide resolved
Signed-off-by: Eduardo Apolinario <eapolinario@users.noreply.github.com>
Signed-off-by: Eduardo Apolinario <eapolinario@users.noreply.github.com>
Signed-off-by: Eduardo Apolinario <eapolinario@users.noreply.github.com>
Signed-off-by: Eduardo Apolinario <eapolinario@users.noreply.github.com>
Signed-off-by: Eduardo Apolinario <eapolinario@users.noreply.github.com>
Signed-off-by: Eduardo Apolinario <eapolinario@users.noreply.github.com>
Signed-off-by: Eduardo Apolinario <eapolinario@users.noreply.github.com>
Signed-off-by: Eduardo Apolinario <eapolinario@users.noreply.github.com>
Signed-off-by: Eduardo Apolinario <eapolinario@users.noreply.github.com>
Signed-off-by: Eduardo Apolinario <eapolinario@users.noreply.github.com>
Signed-off-by: Eduardo Apolinario <eapolinario@users.noreply.github.com>
Signed-off-by: Eduardo Apolinario <eapolinario@users.noreply.github.com>
Signed-off-by: Eduardo Apolinario <eapolinario@users.noreply.github.com>
Signed-off-by: Eduardo Apolinario <eapolinario@users.noreply.github.com>
Signed-off-by: Eduardo Apolinario <eapolinario@users.noreply.github.com>
Signed-off-by: Eduardo Apolinario <eapolinario@users.noreply.github.com>
Signed-off-by: Eduardo Apolinario <eapolinario@users.noreply.github.com>
Signed-off-by: Eduardo Apolinario <eapolinario@users.noreply.github.com>
Signed-off-by: Eduardo Apolinario <eapolinario@users.noreply.github.com>
Signed-off-by: Eduardo Apolinario <eapolinario@users.noreply.github.com>
Signed-off-by: Eduardo Apolinario <eapolinario@users.noreply.github.com>
Signed-off-by: Eduardo Apolinario <eapolinario@users.noreply.github.com>
Signed-off-by: Eduardo Apolinario <eapolinario@users.noreply.github.com>
Signed-off-by: Eduardo Apolinario <eapolinario@users.noreply.github.com>
40e7a69
to
e4e0b98
Compare
Signed-off-by: Eduardo Apolinario <eapolinario@users.noreply.github.com>
Signed-off-by: Eduardo Apolinario <eapolinario@users.noreply.github.com>
cookbook/case_studies/feature_engineering/feast_integration/README.rst
Outdated
Show resolved
Hide resolved
cookbook/case_studies/feature_engineering/feast_integration/README.rst
Outdated
Show resolved
Hide resolved
cookbook/case_studies/feature_engineering/feast_integration/README.rst
Outdated
Show resolved
Hide resolved
from tqdm import tqdm | ||
|
||
|
||
class FlyteCustomProvider(LocalProvider): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@samhita-alla can we add some docs here?
cookbook/case_studies/feature_engineering/feast_integration/custom_provider/provider.py
Outdated
Show resolved
Hide resolved
full_feature_names, | ||
) | ||
|
||
def _localize_feature_view(self, feature_view: FeatureView): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is awesome!
|
||
@dataclass_json | ||
@dataclass | ||
class FeatureStore: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you think this is the best way. we will have to replicate all Feast FeatureStore methods in Flyte?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe i am missing some point, why not a transformer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the same lines, overall, the code looks neat! ✨ However, I presume there's boilerplate code—it looks like we are calling the Feast methods twice, although we don't actually do that. Is there a way we can make the user-facing code even simpler? Do we need to have a "Flyte task" per every Feast operation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
two very good questions, so let's separate those out. Just to be clear:
- Why not cut the middleman (in this case these dataclasses) and use a type transformer for the Feast SDK
FeatureStore
object directly?- This way we won't need to maintain these dataclasses at all. This is a great idea.
- A Flyte Task per Feast operation.
- The workflow we're using to showcase the integration,
feast_workflow
, goes through 4 distinct phases: (1) ingesting data into the offline store, (2) loading data from the offline store for training, (3) load data into the online store, and finally (4) retrieve data from the online store for model inference. - Having these in separate Flyte Tasks can be used to demonstrate Flyte's strenghts (reproducibility, caching, etc).
- @samhita-alla , how are you envisioning this simplification? From the user's perspective they only interact with Feast objects once, right? We're doing this translation between dataclasses and the actual Feast
FeatureStore
object, but this will be gone once we have the type transformer.
- The workflow we're using to showcase the integration,
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, after putting some thought into this, I don't think we can have a transformer for the FeatureStore
object because we have custom logic to update online store file. Notice how we upload or download the online store file in each of the methods of the current FeatureStore
dataclass: [1], [2], and [3].
That said, all this code could be replaced with a type transformer if we could have a remote datasource for SqliteOnlineStoreConfig
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Feast's registry already does this for us (Feast has S3RegistryStore), notice how we don't have to worry about uploading local changes to the registry to the remote location.
We could use a similar idea for configuring the sqlite online store.
That combined with the support for remote data sources coming in the next release will remove a lot of the custom used to plug in with Flyte (namely (1) the custom provider, (2) and the dataclass methods used to upload the online store file to s3).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@eapolinario How do you think a type transformer implementation could be different from dataclasses? Would that be beneficial in any way?
Concerning the user-facing code, I think having one task per Feast operation helps not think much about Feast-related code as such. The user needs to give the appropriate values, and the rest has to be taken care of by the plugin. This is a bit more complex to implement, but I think it's a deeper integration.
If we aren't inclined towards making the user-facing code much simpler, this isn't necessary. But if we want to hide some implementation details, this technique could help in doing so. Here's something that I've worked on before to write tasks for the Feast plugin: flyteorg/flytekit#577.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@eapolinario How do you think a type transformer implementation could be different from dataclasses? Would that be beneficial in any way?
@samhita-alla , the idea is to have a type transformer for Feast's FeatureStore
, this way we wouldn't have to worry about exposing the functionality of the main object in our dataclasses. In other words, we'd build the feature store object once and that would be the input of each task that had to interact with Feast.
As for making the user-facing code simpler, I see the appeal of hiding some of this behind a structure like what you proposed in flyteorg/flytekit#577, but unless we can bring more functionality to the plugin, I feel like in its current incarnation it doesn't add a lot of value. I feel like once we can have a type transfomer we can simply expose that object in each task and let the client code deal with the Feast sdk directly.
) | ||
|
||
# Ingest the data into feast | ||
feature_store.apply([horse_colic_entity, horse_colic_feature_view]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does this not update the feature store manifest? or are you saying that we are still mutating the same file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, it does mutate the registry (the Feast manifest) and we write the remote path of the FlyteSchema.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's the act of localizing specific Feast's FeatureView
objects prior to important operations (like loading historical features or materializing data into the online store) that makes this work.
This is awesome stuff. Thank you @eapolinario. |
…n custom provider Signed-off-by: Eduardo Apolinario <eapolinario@users.noreply.github.com>
Signed-off-by: Eduardo Apolinario <eapolinario@users.noreply.github.com>
Signed-off-by: Eduardo Apolinario <eapolinario@users.noreply.github.com>
Signed-off-by: Eduardo Apolinario <eapolinario@users.noreply.github.com>
This PR adds an integration with Feast, the Feature Store.
TLDR
We define a workflow that showcases the usual steps of a training and serving an ML model that uses a feature store.
Detailed description
The
feast_workflow
is a translation of the workflow proposed in #322, but without the imperative-style of defining workflow. In other words, we define a workflow that ingests data into an offline store, then it retrieves it and uses the retrieved data for training, finally, that workflow materializes the data into an online store and uses it to run an inference.In terms of the integration with the Feast SDK, we defined a custom provider named
FlyteCustomProvider
modeled after the Local provider.FlyteCustomProvider
's sole responsibility is to translate references to s3 files into local files.Note that this example relies on s3 being available. We might need to explore how to generalize this.
Assuming that the sandbox is running, in order to run this locally the env vars used by feast to access s3 need to be defined, in the root of
flytesnacks
run:TODO
TypeTransformer
for Feast's SDK objects, includingFeatureStore
,Feature
, andFeatureView
.Task
.s3://feast-integration
, exists before the workflow runsREADME.rst