Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cyclical Encoder + Column Ordering fix #208

Merged
merged 26 commits into from
Feb 9, 2022
Merged

Conversation

karzuo
Copy link
Contributor

@karzuo karzuo commented Jan 11, 2022

What this PR does / why we need it:

  • Cyclical encoder implementation
  • UI Support for cyclical encoder
  • Fix to column ordering after data modifications (such as encoding)

image

image

Which issue(s) this PR fixes:
Fixes #

Does this PR introduce a user-facing change?:

1. Support for cyclical encoder
2. Fix column ordering issue: Previously columns modified by transformation steps get moved to the front even if the column name remains unchanged. Now, columns ordering or columns modified inplace remain unchanged and new columns created will be appended to the back.

Checklist

  • Added unit test, integration, and/or e2e tests
  • Tested locally
  • Updated documentation
  • Update Swagger spec if the PR introduce API changes
  • Regenerated Golang and Python client if the PR introduce API changes

@karzuo karzuo self-assigned this Jan 11, 2022
api/pkg/transformer/types/encoder/cyclical_encoder.go Outdated Show resolved Hide resolved
api/pkg/transformer/types/encoder/cyclical_encoder.go Outdated Show resolved Hide resolved
api/pkg/transformer/types/encoder/cyclical_encoder.go Outdated Show resolved Hide resolved
api/pkg/transformer/types/encoder/cyclical_encoder.go Outdated Show resolved Hide resolved
api/pkg/transformer/types/encoder/cyclical_encoder.go Outdated Show resolved Hide resolved
api/pkg/transformer/types/encoder/cyclical_encoder.go Outdated Show resolved Hide resolved
api/pkg/transformer/types/encoder/cyclical_encoder.go Outdated Show resolved Hide resolved
api/pkg/transformer/types/encoder/cyclical_encoder.go Outdated Show resolved Hide resolved
ui/src/services/transformer/TransformerConfig.js Outdated Show resolved Hide resolved
@tiopramayudi
Copy link
Collaborator

can you update existing standard transformer integration test to use this encoder?

@tiopramayudi
Copy link
Collaborator

I add several comments, but the overall the PR already looking good @karzuo

@ariefrahmansyah
Copy link
Contributor

Currently, there's no validation for the encoder name. Can you add encoderInputSchema validation here: https://github.com/gojek/merlin/blob/main/ui/src/pages/version/components/forms/validation/schema.js#L123

@@ -218,26 +218,19 @@ func (t *Table) Sort(sortRules []*spec.SortColumnRule) error {

// UpdateColumnsRaw add or update existing column with values specified in columnValues map
func (t *Table) UpdateColumnsRaw(columnValues map[string]interface{}) error {
origColumns := t.Columns()
df := t.DataFrame()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain what happens here and why do you change and use t.DataFrame()? Is there any performance gain and benchmark that support it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the behaviour we intended is already available in the function Mutate, which works on the dataframe directly.
It will be redundant and complex to re-implement the same logic ourselves, and from what I see our own logic may be less efficient.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


message ByEpochTime {
PeriodType period = 1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you also change this to PeriodType periodType

- name: "daily_cycle"
cyclicalEncoderConfig:
byEpochTime:
period: "DAY"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please update it to periodType

-name: payday_trend
cyclicalEncoderConfig:
byEpochTime:
period: MONTH
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please update it to periodType

Copy link
Contributor

@ariefrahmansyah ariefrahmansyah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@karzuo karzuo merged commit 13d9007 into caraml-dev:main Feb 9, 2022
@karzuo karzuo mentioned this pull request Feb 9, 2022
5 tasks
pradithya pushed a commit that referenced this pull request Oct 3, 2022
* updated protobuf definition and added dummy files and test for cyclical encoder

* change cyclical encoder range to float from int

* cyclical encoding done

* test encoder

* encode test (half-done)

* fixed bugs and completed test cases for encoder

* clean up redundant codes

* added interface for cyclical encoder (incomplete: range not ready)

* completed ui for cyclical encoder

* update the generated codes to use same version as previously

* added test for encoder_op

* added server test and function to compare json with tolerance given to float type

* update column ordering to remain if modified inplace, append if new

* fixed period type

* improved hint on UI to explain cyclical encoding option

* improved test to better illustrate feature usage

* update wrong comment

* fixes comments from PR

* updated md doc

* change constants to private

* added UI validations for encoders

* added e2e, updated doc with examples

* refactor var name

* added benchmark test for col update

* refactor period to periodType
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants