eventtime check before writing features, use pipelines, ttl #1961

vas28r13 · 2021-10-22T01:29:10Z

Signed-off-by: Vitaly Sergeyev vsergeyev@better.com

Several parts here mostly for discussion then can break into smaller PRs:

check the event timestamp before writing the features
the use case here comes into play when there are multiple ways to ingest into the OnlineStore (i.e. materialization, direct/streaming ingestion) so timing could be different in different scenarios so we are checking the even timestamp to make sure only the latest features are written
using redis pipelines to limit the amount of network calls
if there are a lot of entities to lookup in Redis then the number of network calls to Redis may be the bottleneck. Tested this with >1000 entities and it's slow without using pipelines.
(removed from this PR) support expiring records in the Redis -- probably should be a separate PR for this but putting out there for discussion
entities should be able to be expired otherwise they remain in the store and the feature store continuously grows.
In many of our use cases an entity has some natural timeline for being relevant so this could be part of the TTL in the OnlineStore.
(removed from this PR) ability to lookup all entities
use case here is to be able to support ranking models where we are constantly reranking a lot of entities. Works well if entities can be expired as well.

Using Redis pipelines to optimize read and writes to Redis.

feast-ci-bot · 2021-10-22T01:29:20Z

Hi @vas28r13. Thanks for your PR.

I'm waiting for a feast-dev member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

sdk/python/feast/infra/online_stores/redis.py

vas28r13 · 2021-10-22T01:30:39Z

sdk/python/feast/infra/online_stores/redis.py

+            # this works out for now since the minimum expire time for a set of features for entity will expire
+            # the entire entity feature set which makes sense to keep whole data together
+            if table.ttl:
+                client.expire(name=redis_key_bin, time=table.ttl)


probably can rethink this a bit to make the ttl on the entity level

This seems to work as intended? It expires the materialized feature view specifically, which is the contract we make. However, ttl today i think is more referring to the historical retrieval ttl, so might keep this section out of the initial PR we merge in. Could see we actually want a different offline vs online ttl

a "ttl on the entity level" is i think what we have in mind as a separate filter that's better suited for the get_online_features method directly.

sdk/python/feast/infra/online_stores/redis.py

vas28r13 · 2021-10-22T01:32:14Z

sdk/python/feast/infra/online_stores/redis.py

+            )
+
+            entity_rows.append(entity_row)
+        return entity_rows


this is specific to our use case to support a model for ranking entities so we need to pull all relevant entities and their features from the Online store.

woop · 2021-10-22T01:33:41Z

@vas28r13 thanks, looks great. Are you introducing any breaking changes with this PR?

ability to lookup all entities

This seems like user facing change. How would users do this? Would this be supported in all stores?

codecov-commenter · 2021-10-22T01:34:46Z

Codecov Report

Merging #1961 (4744c54) into master (ccf3f8d) will increase coverage by 0.13%.
The diff coverage is 99.01%.

@@            Coverage Diff             @@
##           master    #1961      +/-   ##
==========================================
+ Coverage   82.08%   82.21%   +0.13%     
==========================================
  Files         100      100              
  Lines        7992     8052      +60     
==========================================
+ Hits         6560     6620      +60     
  Misses       1432     1432

Flag	Coverage Δ
integrationtests	`74.66% <99.01%> (+0.19%)`	⬆️
unittests	`58.94% <8.82%> (-0.37%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
sdk/python/tests/utils/online_read_write_test.py	`100.00% <ø> (ø)`
sdk/python/feast/infra/online_stores/redis.py	`93.87% <98.27%> (+1.02%)`	⬆️
sdk/python/tests/conftest.py	`92.20% <100.00%> (+0.54%)`	⬆️
...python/tests/integration/e2e/test_universal_e2e.py	`88.00% <100.00%> (+0.24%)`	⬆️
.../integration/online_store/test_universal_online.py	`99.46% <100.00%> (+0.13%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ccf3f8d...4744c54. Read the comment docs.

adchia

after your other PR goes in, can probably write a quick test for this, initially remove the ttl logic until we figure out what the right UX would be, and then otherwise LG!

sdk/python/feast/infra/online_stores/redis.py

adchia · 2021-10-22T17:57:49Z

sdk/python/feast/infra/online_stores/redis.py

+            # this works out for now since the minimum expire time for a set of features for entity will expire
+            # the entire entity feature set which makes sense to keep whole data together
+            if table.ttl:
+                client.expire(name=redis_key_bin, time=table.ttl)


This seems to work as intended? It expires the materialized feature view specifically, which is the contract we make. However, ttl today i think is more referring to the historical retrieval ttl, so might keep this section out of the initial PR we merge in. Could see we actually want a different offline vs online ttl

a "ttl on the entity level" is i think what we have in mind as a separate filter that's better suited for the get_online_features method directly.

sdk/python/feast/infra/online_stores/redis.py

felixwang9817 · 2021-10-22T18:51:31Z

Hey @vas28r13, thanks for this PR. We just fixed the issue with the linter that's been blocking all PRs for the last two days. Would you mind rebasing your changes on master and then force pushing your changes up? Thanks!

loftiskg · 2021-10-24T20:43:14Z

fixes: #1969

vas28r13 · 2021-10-26T19:59:43Z

@vas28r13 thanks, looks great. Are you introducing any breaking changes with this PR?

ability to lookup all entities

This seems like user facing change. How would users do this? Would this be supported in all stores?

@woop the "all entities" lookup I'll take out of this PR for now since it's specific to our use case for now. It also should probably be packaged with the idea that entities can expire in the Online store

adchia · 2021-10-26T20:15:49Z

sdk/python/feast/infra/online_stores/redis.py

+        keys = []
+        # redis pipelining optimization: send multiple commands to redis server without waiting for every reply
+        with client.pipeline() as pipe:
+


nit: remove extra line

sdk/python/feast/infra/online_stores/redis.py

adchia · 2021-10-26T20:29:35Z

sdk/python/feast/infra/online_stores/redis.py

        return result
+
+    def _get_features_for_entity(self, values, feature_view, requested_features):


type annotations + return type?

adchia · 2021-10-29T00:04:56Z

sdk/python/tests/integration/online_store/test_universal_online.py

+            ttl=timedelta(minutes=5),
+        )
+        # Register Feature View and Entity
+        fs.apply([fv1, e])


at some point, you probably also want to do a feast materialize here after writing from the online store and making sure it only overwrites values that are older

adchia

/lgtm

adchia · 2021-10-29T17:36:48Z

sdk/python/feast/infra/online_stores/redis.py

+            ):
+                event_time_seconds = int(utils.make_tzaware(timestamp).timestamp())
+
+                # ignore if event_timestamp is before the event features that are currently in the feature store


maybe add a TODO to investigate whether check and set as a slower, but more correct version of this in case there are (seemingly rare) race conditions?

Signed-off-by: Vitaly Sergeyev <vsergeyev@better.com>

vas28r13 · 2021-11-01T15:41:29Z

sdk/python/tests/utils/online_read_write_test.py

+    #     created_ts=time_3 + timedelta(hours=1),
+    #     write=(96864, "I HAVE A NEWER created_ts SO I WIN"),
+    #     expect_read=(96864, "I HAVE A NEWER created_ts SO I WIN"),
+    # )


I don't believe created_ts tie breaker was a feature in the recent feast versions
I think the only reason it passed was because it works in sequential write order not because of tie breaker logic

vas28r13 · 2021-11-01T16:37:15Z

sdk/python/tests/integration/e2e/test_universal_e2e.py

@@ -16,6 +16,7 @@
 @pytest.mark.parametrize("infer_features", [True, False])
 def test_e2e_consistency(environment, e2e_data_sources, infer_features):
    fs = environment.feature_store
+    fs.config.project = fs.config.project + str(infer_features)


different bins for different tests
otherwise the old data remains in the integration online store

Signed-off-by: Vitaly Sergeyev <vsergeyev@better.com>

adchia

/lgtm

feast-ci-bot · 2021-11-01T19:06:14Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: adchia, vas28r13

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [adchia]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

vas28r13 requested review from achals, adchia, felixwang9817, tsotnet, woop and a team as code owners October 22, 2021 01:29

feast-ci-bot added do-not-merge/release-note-label-needed needs-kind needs-ok-to-test labels Oct 22, 2021

github-actions bot assigned adchia Oct 22, 2021

feast-ci-bot added the size/L label Oct 22, 2021

vas28r13 commented Oct 22, 2021

View reviewed changes

sdk/python/feast/infra/online_stores/redis.py Outdated Show resolved Hide resolved

vas28r13 commented Oct 22, 2021

View reviewed changes

sdk/python/feast/infra/online_stores/redis.py Show resolved Hide resolved

vas28r13 commented Oct 22, 2021

View reviewed changes

adchia reviewed Oct 22, 2021

View reviewed changes

felixwang9817 mentioned this pull request Oct 24, 2021

Materializing a large number of keys to Redis is very slow #1969

Closed

vas28r13 force-pushed the redis-store-updates branch from e39bf5b to e819d8b Compare October 26, 2021 20:10

adchia reviewed Oct 26, 2021

View reviewed changes

adchia reviewed Oct 29, 2021

View reviewed changes

adchia requested changes Oct 29, 2021

View reviewed changes

feast-ci-bot added lgtm approved labels Oct 29, 2021

adchia reviewed Oct 29, 2021

View reviewed changes

feast-ci-bot removed the lgtm label Oct 29, 2021

Vitaly Sergeyev added 7 commits October 29, 2021 17:40

eventtime check before writing features, use pipelines, ttl

7f9c517

Signed-off-by: Vitaly Sergeyev <vsergeyev@better.com>

redis write optimizations and event time check

691667c

Signed-off-by: Vitaly Sergeyev <vsergeyev@better.com>

small fixes and test

0589016

Signed-off-by: Vitaly Sergeyev <vsergeyev@better.com>

formatting

61e816c

Signed-off-by: Vitaly Sergeyev <vsergeyev@better.com>

typing fix

59554b0

Signed-off-by: Vitaly Sergeyev <vsergeyev@better.com>

formatting, comments, test

f62cc3c

Signed-off-by: Vitaly Sergeyev <vsergeyev@better.com>

formatting

f885a1a

Signed-off-by: Vitaly Sergeyev <vsergeyev@better.com>

vas28r13 force-pushed the redis-store-updates branch from 39fdfbe to f885a1a Compare October 29, 2021 21:41

test fixes for online store write order

1c731bc

Signed-off-by: Vitaly Sergeyev <vsergeyev@better.com>

vas28r13 commented Nov 1, 2021

View reviewed changes

Vitaly Sergeyev added 2 commits November 1, 2021 14:55

comment on test

6a71e88

Signed-off-by: Vitaly Sergeyev <vsergeyev@better.com>

remove commented out tests for now

4744c54

Signed-off-by: Vitaly Sergeyev <vsergeyev@better.com>

adchia approved these changes Nov 1, 2021

View reviewed changes

feast-ci-bot added the lgtm label Nov 1, 2021

adchia added the kind/feature New feature or request label Nov 1, 2021

feast-ci-bot removed the needs-kind label Nov 1, 2021

feast-ci-bot merged commit 600d38e into feast-dev:master Nov 1, 2021

feast-ci-bot added release-note and removed do-not-merge/release-note-label-needed labels Nov 1, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

eventtime check before writing features, use pipelines, ttl #1961

eventtime check before writing features, use pipelines, ttl #1961

vas28r13 commented Oct 22, 2021 •

edited by adchia

Loading

feast-ci-bot commented Oct 22, 2021

vas28r13 Oct 22, 2021

adchia Oct 22, 2021

vas28r13 Oct 22, 2021

woop commented Oct 22, 2021

codecov-commenter commented Oct 22, 2021 •

edited

Loading

adchia left a comment

adchia Oct 22, 2021

felixwang9817 commented Oct 22, 2021

loftiskg commented Oct 24, 2021

vas28r13 commented Oct 26, 2021

adchia Oct 26, 2021

adchia Oct 26, 2021

adchia Oct 29, 2021

adchia left a comment

adchia Oct 29, 2021

vas28r13 Nov 1, 2021

vas28r13 Nov 1, 2021

adchia left a comment

feast-ci-bot commented Nov 1, 2021

		return result

		def _get_features_for_entity(self, values, feature_view, requested_features):

eventtime check before writing features, use pipelines, ttl #1961

eventtime check before writing features, use pipelines, ttl #1961

Conversation

vas28r13 commented Oct 22, 2021 • edited by adchia Loading

feast-ci-bot commented Oct 22, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

woop commented Oct 22, 2021

codecov-commenter commented Oct 22, 2021 • edited Loading

Codecov Report

adchia left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

felixwang9817 commented Oct 22, 2021

loftiskg commented Oct 24, 2021

vas28r13 commented Oct 26, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

adchia left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

adchia left a comment

Choose a reason for hiding this comment

feast-ci-bot commented Nov 1, 2021

vas28r13 commented Oct 22, 2021 •

edited by adchia

Loading

codecov-commenter commented Oct 22, 2021 •

edited

Loading