Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add historical retrieval for BigQuery and Parquet #1389

Merged
merged 29 commits into from
Mar 16, 2021

Conversation

woop
Copy link
Member

@woop woop commented Mar 14, 2021

What this PR does / why we need it:

  • Adds support for file (Parquet) based historical retrieval
  • Adds support for BigQuery based historical retrieval

Limitations

  • The original SQL query to generate a training dataset was executed as a multi-job query that produced tables. This query does not produce tables, but it will probably run out of memory at some point. It needs to be optimized eventually.
  • We are not applying time range filters prior to doing joins.
  • Async (RetrievalJob) behaviour isn't consistent across stores (Parquet, BigQuery). Sometimes work only starts when a user runs (to_df()), other times it starts when get_historical_features() is run.
  • apply() has been added but doesnt yet provision infrastructure. Considering that functionality out of scope for this PR.

Does this PR introduce a user-facing change?:

Added historical retrieval for file and BigQuery sources

@woop woop requested a review from jklegar as a code owner March 14, 2021 05:25
@woop woop added kind/feature New feature or request and removed size/XXL labels Mar 14, 2021
@woop woop changed the title Add historical retrieval for BigQuery and Parquet [WIP] Add historical retrieval for BigQuery and Parquet Mar 14, 2021
@woop woop force-pushed the add-offline-historical-retrieval branch 4 times, most recently from c2c3735 to d1e9fa6 Compare March 15, 2021 03:43
@woop woop changed the title [WIP] Add historical retrieval for BigQuery and Parquet Add historical retrieval for BigQuery and Parquet Mar 15, 2021
@woop woop requested a review from oavdeev March 15, 2021 22:52
sdk/python/feast/offline_store.py Outdated Show resolved Hide resolved


class FileRetrievalJob(RetrievalJob):
def __init__(self, evaluation_function):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd add a type signature for evaluation_function here, and a comment explaining what it is for.

"""RetrievalJob is used to manage the execution of a historical feature retrieval"""

def __init__(self):
pass
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does it need a constructor?

sdk/python/feast/offline_store.py Outdated Show resolved Hide resolved
@feast-ci-bot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: oavdeev, woop

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@woop woop force-pushed the add-offline-historical-retrieval branch from 9c00d3b to 83f173c Compare March 16, 2021 02:29
woop added 9 commits March 15, 2021 19:29
Signed-off-by: Willem Pienaar <git@willem.co>
Signed-off-by: Willem Pienaar <git@willem.co>
Signed-off-by: Willem Pienaar <git@willem.co>
Signed-off-by: Willem Pienaar <git@willem.co>
Signed-off-by: Willem Pienaar <git@willem.co>
Signed-off-by: Willem Pienaar <git@willem.co>
Signed-off-by: Willem Pienaar <git@willem.co>
Signed-off-by: Willem Pienaar <git@willem.co>
Signed-off-by: Willem Pienaar <git@willem.co>
woop and others added 20 commits March 15, 2021 19:29
Signed-off-by: Willem Pienaar <git@willem.co>
Signed-off-by: Willem Pienaar <git@willem.co>
Signed-off-by: Willem Pienaar <git@willem.co>
Signed-off-by: Willem Pienaar <git@willem.co>
Signed-off-by: Willem Pienaar <git@willem.co>
Signed-off-by: Willem Pienaar <git@willem.co>
Signed-off-by: Willem Pienaar <git@willem.co>
Signed-off-by: Willem Pienaar <git@willem.co>
Signed-off-by: Willem Pienaar <git@willem.co>
Signed-off-by: Willem Pienaar <git@willem.co>
Signed-off-by: Willem Pienaar <git@willem.co>
Signed-off-by: Willem Pienaar <git@willem.co>
Signed-off-by: Willem Pienaar <git@willem.co>
Signed-off-by: Willem Pienaar <git@willem.co>
Signed-off-by: Willem Pienaar <git@willem.co>
Signed-off-by: Willem Pienaar <git@willem.co>
Signed-off-by: Willem Pienaar <git@willem.co>
Co-authored-by: Oleg Avdeev <oleg.v.avdeev@gmail.com>
Signed-off-by: Willem Pienaar <git@willem.co>
Signed-off-by: Willem Pienaar <git@willem.co>
Co-authored-by: Oleg Avdeev <oleg.v.avdeev@gmail.com>
Signed-off-by: Willem Pienaar <git@willem.co>
@oavdeev
Copy link
Collaborator

oavdeev commented Mar 16, 2021

/lgtm

@feast-ci-bot feast-ci-bot merged commit a31545b into feast-dev:master Mar 16, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants