Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Aggregated LOCOs of SmartTextVectorizer outputs #308
Aggregated LOCOs of SmartTextVectorizer outputs #308
Changes from 34 commits
ae0c05c
dae1c76
bfac6dd
f63be09
fb5c849
4ba11d8
f847850
1d1afc5
754891c
797222d
79467ba
250e1fd
7e13053
22eeb1b
9df934a
4829d39
aa0d662
8d28e1e
50653a0
33dfd4e
c37484a
6763b64
9033712
b427857
44075b2
92c160a
5302039
46ea9e0
c8c0cf9
21937db
f964818
ad1f32a
75fad54
5df59b5
2510762
2e1628a
57bcf6b
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this check here to make sure we don't accidentally pick up text features that were determined to be categorical by the smart text vectorizer? If so, we should probably make an easier way to tell what transformations were applied to the text.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not only smartVectorizer. Any feature with indicator/descriptor values and derived from a text transformation is likely to be easily interpreted once LOCO is done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the key value can be quite large is do concatenate all the parent stages. what's the rationale behind it? @mweilsalesforce
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is possible to apply different transformations on a same feature.
After thought, maybe we should aggregate all the derived features even if 2 different transformations were applied
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok fixed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not following - why prediction value becomes an index?!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We want to return the LOCO score of the predicted class.
Let's say for a row the LOCOs are
0 -> LOCO_0, 1 -> LOCO_1, 2 -> LOCO_2
, and the model predicts the class1
on this row. Then we want to returnLOCO_1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's add some docs to explain it, cause it looks weird to me.