Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[HUDI-3922] parse record key + partition path config consistently between keygens and HiveSync #5520

Closed
wants to merge 3 commits into from

Conversation

qjqqyy
Copy link
Contributor

@qjqqyy qjqqyy commented May 6, 2022

What is the purpose of the pull request

hive-sync and the key generators (such as ComplexKeyGenerator) handles the degenerate case (empty string) differently.
Previously this was merely a configuration annoyance, but now it causes issues (see #5101).

We parse configs consistently between the various key generators and HiveSyncConfig

Brief change log

  • Filter empty strings in TypedProperties#getStringList (this causes HoodieSyncConfig to treat hoodie.datasource.hive_sync.partition_fields= the same as not specifying it)
  • Parse PARTITIONPATH_FIELD_NAME similarly in the various key generators
    • in particular, it now accepts both not specifying the prop key and setting it as the empty string to denote a non-partitioned table
  • While here, also parse RECORDKEY_FIELD_NAME similarly, and allow leaving not specifying RECORDKEY_FIELD_NAME to mean the default.

Verify this pull request

This PR modifies the handling of degenerate configurations. The affected test cases are updated to match the new behaviour.

Committer checklist

  • Has a corresponding JIRA in PR title & commit

  • Commit message is descriptive of the change

  • CI is green

  • Necessary doc changes done or have another open PR

  • For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.

@codope codope self-assigned this May 7, 2022
while here, delete the "recordkey field not specified" test for
ComplexKeyGenerator
@hudi-bot
Copy link

hudi-bot commented May 7, 2022

CI report:

Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

@nsivabalan nsivabalan changed the title HUDI-3922 parse record key + partition path config consistently between keygens and HiveSync [HUDI-3922] parse record key + partition path config consistently between keygens and HiveSync May 11, 2022
@nsivabalan nsivabalan added the priority:critical production down; pipelines stalled; Need help asap. label May 11, 2022
Copy link
Member

@codope codope left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@qjqqyy Thanks for putting up a fix. I have some concern as mentioned below. I would suggest to not make any change in the keygen logic, and isolate the fix to hive-sync module only.

@qjqqyy qjqqyy closed this May 30, 2022
@qjqqyy qjqqyy deleted the HUDI-3922 branch May 30, 2022 15:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
priority:critical production down; pipelines stalled; Need help asap.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants