-
Notifications
You must be signed in to change notification settings - Fork 45
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cyclical Encoder + Column Ordering fix #208
Conversation
can you update existing standard transformer integration test to use this encoder? |
I add several comments, but the overall the PR already looking good @karzuo |
Currently, there's no validation for the encoder name. Can you add encoderInputSchema validation here: https://github.com/gojek/merlin/blob/main/ui/src/pages/version/components/forms/validation/schema.js#L123 |
@@ -218,26 +218,19 @@ func (t *Table) Sort(sortRules []*spec.SortColumnRule) error { | |||
|
|||
// UpdateColumnsRaw add or update existing column with values specified in columnValues map | |||
func (t *Table) UpdateColumnsRaw(columnValues map[string]interface{}) error { | |||
origColumns := t.Columns() | |||
df := t.DataFrame() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you explain what happens here and why do you change and use t.DataFrame()? Is there any performance gain and benchmark that support it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the behaviour we intended is already available in the function Mutate, which works on the dataframe directly.
It will be redundant and complex to re-implement the same logic ourselves, and from what I see our own logic may be less efficient.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you share benchmarking for this? Something like this: https://github.com/gojek/merlin/blob/main/api/pkg/transformer/symbol/time_bench_test.go#L68-L132
|
||
message ByEpochTime { | ||
PeriodType period = 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you also change this to PeriodType periodType
- name: "daily_cycle" | ||
cyclicalEncoderConfig: | ||
byEpochTime: | ||
period: "DAY" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please update it to periodType
-name: payday_trend | ||
cyclicalEncoderConfig: | ||
byEpochTime: | ||
period: MONTH |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please update it to periodType
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
* updated protobuf definition and added dummy files and test for cyclical encoder * change cyclical encoder range to float from int * cyclical encoding done * test encoder * encode test (half-done) * fixed bugs and completed test cases for encoder * clean up redundant codes * added interface for cyclical encoder (incomplete: range not ready) * completed ui for cyclical encoder * update the generated codes to use same version as previously * added test for encoder_op * added server test and function to compare json with tolerance given to float type * update column ordering to remain if modified inplace, append if new * fixed period type * improved hint on UI to explain cyclical encoding option * improved test to better illustrate feature usage * update wrong comment * fixes comments from PR * updated md doc * change constants to private * added UI validations for encoders * added e2e, updated doc with examples * refactor var name * added benchmark test for col update * refactor period to periodType
What this PR does / why we need it:
Which issue(s) this PR fixes:
Fixes #
Does this PR introduce a user-facing change?:
Checklist