Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Option to calculate LOCO for dates/texts by Leaving Out Entire Vector. #418
Option to calculate LOCO for dates/texts by Leaving Out Entire Vector. #418
Changes from 9 commits
62de0be
93afc12
b117b9a
b7bd84f
8286d7d
47f2ae1
97483a6
719fe73
2620076
aa45766
356e79f
9346c68
de23c9c
2fcb05e
90d759b
9e82af4
75934d1
ad1dca7
80556ca
3ebda0f
9776e79
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we really want/need to expose strategies for each feature type? I imagine that there are other types of features me may eventually want to control as either average or leave out vector and putting in individual settings for each seems excessive...maybe one parameter to control how ALL vector treated features are handled?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I agree. One parameter should be enough.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.view
here? either add it before filter of not add it at all.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The cost of computeDiff is O(n).
When
VectorAggregationStrategy=Avg
note that, we do computeDiff m times andWhen
VectorAggregationStrategy=LeaveOutVector
, we do computeDiff only 1 time.So the former computation is O(n*m) and the latter is O(n)
where n is the size of entire feature vector (i.e containing all the features) and m is the size of individual text/date feature vector which is to be aggregated.
@tovbinm ^^
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe
foreach
instead ofmap
?Also doesn't
copyFeatureSparse
need to be avar
?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, i will use
foreach
But I am not sure about whether i should make it
var
. Reassignment ofval
is not allowed, but object can have its internal state modified, So guessval
is fine.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How did you manage to remove the zero val indices logic?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From
featureSparse
vector we know the active indices count, but to calculate theaverage
LOCO for each date/text field, we needed the zero val indices logic. We used to calculate the total count of indices for a date/text feature in each transformation of a individual record/row intransformFn
function. There is no need to do this, we can just calculate the total count of indices per date/text feature only once usingOpVectorColumnHistory
at the global level (i.e outsidetransformFn
function ), this is what I did in line139