Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix materialize for None #1481

Merged
merged 5 commits into from
May 10, 2021
Merged

Fix materialize for None #1481

merged 5 commits into from
May 10, 2021

Conversation

qooba
Copy link
Contributor

@qooba qooba commented Apr 20, 2021

Signed-off-by: qooba dev@qooba.net

What this PR does / why we need it:
This pull request fixes the issue during materialize view.
If in input data there is a None value feast can't parse its type in python_type_to_feast_value_type method

Which issue(s) this PR fixes:
Fixes #1480

Does this PR introduce a user-facing change?:

Added support for materializing sources with None types.

Signed-off-by: qooba <dev@qooba.net>
@feast-ci-bot
Copy link
Collaborator

Hi @qooba. Thanks for your PR.

I'm waiting for a feast-dev member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Signed-off-by: qooba <dev@qooba.net>
@woop
Copy link
Member

woop commented Apr 20, 2021

/ok-to-test

@woop
Copy link
Member

woop commented Apr 20, 2021

/kind feature

@feast-ci-bot feast-ci-bot added kind/feature New feature or request and removed needs-kind labels Apr 20, 2021
@woop
Copy link
Member

woop commented Apr 20, 2021

Hey @qooba. Thanks again for the PR. This change seems reasonable. FYI you can run make format and make lint to format your code and test whether linting is passing.

Signed-off-by: qooba <dev@qooba.net>
@woop
Copy link
Member

woop commented Apr 22, 2021

@qooba would you mind updating one of our tests so that we explicitly test for this condition?

Signed-off-by: qooba <dev@qooba.net>
@qooba
Copy link
Contributor Author

qooba commented Apr 23, 2021

Sure :) @woop I have added checks to existing tests.
Please notice that pandas during parquet read/write converts None to NaN which itself causes inconsistency.
Thus I have to add conversion in test:

df = df.where(pd.notnull(df), None)

I'm not sure if such conversion should be added to feast itself (on the other hand different offline stores can behave differently and it should be consistent).

@woop woop changed the title fix materialize for None Fix materialize for None Apr 23, 2021
@qooba
Copy link
Contributor Author

qooba commented Apr 23, 2021

/retest

Signed-off-by: qooba <dev@qooba.net>
@qooba qooba requested a review from a team as a code owner May 7, 2021 21:38
@feast-ci-bot feast-ci-bot added size/M and removed size/S labels May 7, 2021
Copy link
Collaborator

@jklegar jklegar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @qooba, this is great. Good catch on the None to NaN conversion that happens with pandas->parquet->pandas. Since we want results from the online and offline stores to be consistent, I think we should have the get_historical_features call just return Nones rather than converting NaNs to Nones within the test, i.e., we'll want to convert in sdk/python/feast/infra/offline_stores/file.py. I think for the BigQuery offline store we're ok since BigQuery nulls will turn into pandas Nones. With that change this should be good to go, thanks for the contribution!

@qooba
Copy link
Contributor Author

qooba commented May 8, 2021

Hi, @jklegar . I have checked native pandas methods df.where(pd.notnull(df), None) and df.where((pd.notnull(df)), None).replace({np.inf: None}) but both are changing the value dtype which introduces another inconsistency and breaks other tests. I think that using custom written methods rather than native pandas will affect historical fetch performance thus I think it is better to leave this and add appropriate comment in the documentation (anyway pandas users also live with this :) ). What do you think, is that solution is OK ?

Copy link
Collaborator

@jklegar jklegar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool it seems like that's the best thing to go with right now then. Thanks @qooba!

@feast-ci-bot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jklegar, qooba

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@jklegar
Copy link
Collaborator

jklegar commented May 10, 2021

/lgtm

@jklegar jklegar merged commit b482e21 into feast-dev:master May 10, 2021
jklegar pushed a commit to jklegar/feast that referenced this pull request May 12, 2021
* fix materialize for None

Signed-off-by: qooba <dev@qooba.net>

* fix materialize for None

Signed-off-by: qooba <dev@qooba.net>

* fix materialize for None

Signed-off-by: qooba <dev@qooba.net>

* fix materialize for None - add test

Signed-off-by: qooba <dev@qooba.net>

* Fix materialize - correct lint errors

Signed-off-by: qooba <dev@qooba.net>
jklegar pushed a commit that referenced this pull request May 12, 2021
* fix materialize for None

Signed-off-by: qooba <dev@qooba.net>

* fix materialize for None

Signed-off-by: qooba <dev@qooba.net>

* fix materialize for None

Signed-off-by: qooba <dev@qooba.net>

* fix materialize for None - add test

Signed-off-by: qooba <dev@qooba.net>

* Fix materialize - correct lint errors

Signed-off-by: qooba <dev@qooba.net>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Error during feast materialize-incremental: AttributeError: 'NoneType' object has no attribute 'dtype'
5 participants