Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Shadow entities mapped properly to real entity with entity aliasing #1762

Closed
mavysavydav opened this issue Aug 6, 2021 · 4 comments · Fixed by #1951
Closed

Shadow entities mapped properly to real entity with entity aliasing #1762

mavysavydav opened this issue Aug 6, 2021 · 4 comments · Fixed by #1951
Assignees

Comments

@mavysavydav
Copy link
Collaborator

mavysavydav commented Aug 6, 2021

Is your feature request related to a problem? Please describe.

There may be multiple different entity keys on the entity source data side that should map onto the same entity from the feature data table during the join. For instance, "spammer_id" and "reporter_id" may want the "years_on_platform" feature which lives in a table keyed by "user_id". It is inconvenient to combine the "spammer_ids" and "reporter_ids" into a new additional "user_ids" column and then clean it up after the join. In this case, we can call "spammer" and "reporter" the shadow entities of "user".

In another example, both "rider_id" and "driver_id" may want the "age" feature which lives in a table keyed by "user_id". In this case, we can call "rider" and "driver" the shadow entities of "user".

Describe the solution you'd like
A clear and concise description of what you want to happen.

Via aliasing on the FeatureServices level, the shadow entities can be mapped to the correct FeatureView entity.

abuse_feature_service = FeatureService(
    name="my_abuse_model_v1",
    features=[
        spammer_features,
        reporter_features,
        user_features.alias({'spammer_id', 'reporter_id'})
    ],
)
@mavysavydav mavysavydav changed the title Shadow entities mapped properly with entity aliases Shadow entities mapped properly to real entity with entity aliasing Aug 6, 2021
@woop
Copy link
Member

woop commented Aug 13, 2021

@mavysavydav happy to stab out a quick proposal for you here, but I am a bit confused by what is contained within each of these feature views and their source tables.

What is the union of join keys in this feature service? Why would aliasing spammer_id on user_features as reporter_id actually solve the join? Wouldn't spammer_features have a spammer_id?

If you could show me basic examples of what these tables look like using https://ozh.github.io/ascii-tables/ then it would really help a lot!

@mavysavydav
Copy link
Collaborator Author

So what I meant by user_features.alias({'spammer_id', 'reporter_id'}) is that user_features is the FV and it already has an entity key defined in the feature view definition (e.g user_id). And "spammer_id" and "reporter_id" would be aliased to whatever the entity key of the feature view is. In this case, since we're doing user_features.alias(..), spammer_id and reporter_id are both mapped to user_id. Let me know if ascii tables would still help or if these are clarified.

@adchia
Copy link
Collaborator

adchia commented Aug 13, 2021

the exact API we probably need to rev on, but having this kind of functionality makes sense to me.

Couple thoughts (think this is def worth writing an RFC for though):

  1. Logically, this breaks down really to me as wanting the final feature vector as being:

    entity_df + (spammer_features+spammer_user_features) + (reporter_features + reporter_user_features).

    Is there a way to make this API so that this above structure is more visible? Right now, the API doesn't declare these groupings and allows for more flexibility, which may or may not be a good thing

  2. Could imagine that e.g. the spammer_features + spammer_user_features is really the reusable block across models we want to isolate. wondering if there's a way to make that "spammer_features+spammer_user_features" combo more discoverable (e.g. as a feature view, but that could get hairy)?

  3. Another alternative might be to expose this aliasing at the FeatureView level. Then what users would go through is to make one "spammer_user_features" FV and one "reporter_user_features" FV both with the same batch source, but with an entity mapping that maps user_id -> spammer_id or user_id -> reporter_id.

    We'd then ideally do smarts so that we aren't actually repulling the same info from the batch source twice.

@mavysavydav
Copy link
Collaborator Author

appreciate the thoughts. We were considering the approach of just having a bunch of FV copies but were concerned this would pollute the registry space. And regarding bullet point 1, the final feature vector would look like that but spammer_features and spammer_user_features don't need to be grouped together in some particular way right? Each would have it's own column just like any other features coming from any feature views.

Here's the RFC that i'm in the process of putting together - https://docs.google.com/document/d/1TsCwKf3nVXTAfL0f8i26jnCgHA3bRd4dKQ8QdM87vIA/edit#

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants