-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
eventtime check before writing features, use pipelines, ttl #1961
Conversation
Hi @vas28r13. Thanks for your PR. I'm waiting for a feast-dev member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
# this works out for now since the minimum expire time for a set of features for entity will expire | ||
# the entire entity feature set which makes sense to keep whole data together | ||
if table.ttl: | ||
client.expire(name=redis_key_bin, time=table.ttl) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
probably can rethink this a bit to make the ttl on the entity level
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to work as intended? It expires the materialized feature view specifically, which is the contract we make. However, ttl today i think is more referring to the historical retrieval ttl, so might keep this section out of the initial PR we merge in. Could see we actually want a different offline vs online ttl
a "ttl on the entity level" is i think what we have in mind as a separate filter that's better suited for the get_online_features method directly.
) | ||
|
||
entity_rows.append(entity_row) | ||
return entity_rows |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is specific to our use case to support a model for ranking entities so we need to pull all relevant entities and their features from the Online store.
@vas28r13 thanks, looks great. Are you introducing any breaking changes with this PR?
This seems like user facing change. How would users do this? Would this be supported in all stores? |
Codecov Report
@@ Coverage Diff @@
## master #1961 +/- ##
==========================================
+ Coverage 82.08% 82.21% +0.13%
==========================================
Files 100 100
Lines 7992 8052 +60
==========================================
+ Hits 6560 6620 +60
Misses 1432 1432
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
after your other PR goes in, can probably write a quick test for this, initially remove the ttl logic until we figure out what the right UX would be, and then otherwise LG!
# this works out for now since the minimum expire time for a set of features for entity will expire | ||
# the entire entity feature set which makes sense to keep whole data together | ||
if table.ttl: | ||
client.expire(name=redis_key_bin, time=table.ttl) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to work as intended? It expires the materialized feature view specifically, which is the contract we make. However, ttl today i think is more referring to the historical retrieval ttl, so might keep this section out of the initial PR we merge in. Could see we actually want a different offline vs online ttl
a "ttl on the entity level" is i think what we have in mind as a separate filter that's better suited for the get_online_features method directly.
Hey @vas28r13, thanks for this PR. We just fixed the issue with the linter that's been blocking all PRs for the last two days. Would you mind rebasing your changes on |
fixes: #1969 |
@woop the "all entities" lookup I'll take out of this PR for now since it's specific to our use case for now. It also should probably be packaged with the idea that entities can expire in the Online store |
e39bf5b
to
e819d8b
Compare
keys = [] | ||
# redis pipelining optimization: send multiple commands to redis server without waiting for every reply | ||
with client.pipeline() as pipe: | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: remove extra line
return result | ||
|
||
def _get_features_for_entity(self, values, feature_view, requested_features): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
type annotations + return type?
ttl=timedelta(minutes=5), | ||
) | ||
# Register Feature View and Entity | ||
fs.apply([fv1, e]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
at some point, you probably also want to do a feast materialize here after writing from the online store and making sure it only overwrites values that are older
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
): | ||
event_time_seconds = int(utils.make_tzaware(timestamp).timestamp()) | ||
|
||
# ignore if event_timestamp is before the event features that are currently in the feature store |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe add a TODO to investigate whether check and set as a slower, but more correct version of this in case there are (seemingly rare) race conditions?
Signed-off-by: Vitaly Sergeyev <vsergeyev@better.com>
Signed-off-by: Vitaly Sergeyev <vsergeyev@better.com>
Signed-off-by: Vitaly Sergeyev <vsergeyev@better.com>
Signed-off-by: Vitaly Sergeyev <vsergeyev@better.com>
Signed-off-by: Vitaly Sergeyev <vsergeyev@better.com>
Signed-off-by: Vitaly Sergeyev <vsergeyev@better.com>
Signed-off-by: Vitaly Sergeyev <vsergeyev@better.com>
39fdfbe
to
f885a1a
Compare
Signed-off-by: Vitaly Sergeyev <vsergeyev@better.com>
# created_ts=time_3 + timedelta(hours=1), | ||
# write=(96864, "I HAVE A NEWER created_ts SO I WIN"), | ||
# expect_read=(96864, "I HAVE A NEWER created_ts SO I WIN"), | ||
# ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't believe created_ts
tie breaker was a feature in the recent feast versions
I think the only reason it passed was because it works in sequential write order not because of tie breaker logic
@@ -16,6 +16,7 @@ | |||
@pytest.mark.parametrize("infer_features", [True, False]) | |||
def test_e2e_consistency(environment, e2e_data_sources, infer_features): | |||
fs = environment.feature_store | |||
fs.config.project = fs.config.project + str(infer_features) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
different bins for different tests
otherwise the old data remains in the integration online store
Signed-off-by: Vitaly Sergeyev <vsergeyev@better.com>
Signed-off-by: Vitaly Sergeyev <vsergeyev@better.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: adchia, vas28r13 The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Signed-off-by: Vitaly Sergeyev vsergeyev@better.com
Several parts here mostly for discussion then can break into smaller PRs:
check the event timestamp before writing the features
the use case here comes into play when there are multiple ways to ingest into the OnlineStore (i.e. materialization, direct/streaming ingestion) so timing could be different in different scenarios so we are checking the even timestamp to make sure only the latest features are written
using redis pipelines to limit the amount of network calls
if there are a lot of entities to lookup in Redis then the number of network calls to Redis may be the bottleneck. Tested this with >1000 entities and it's slow without using pipelines.
(removed from this PR) support expiring records in the Redis -- probably should be a separate PR for this but putting out there for discussion
entities should be able to be expired otherwise they remain in the store and the feature store continuously grows.
In many of our use cases an entity has some natural timeline for being relevant so this could be part of the TTL in the OnlineStore.
(removed from this PR) ability to lookup all entities
use case here is to be able to support ranking models where we are constantly reranking a lot of entities. Works well if entities can be expired as well.