Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[HUDI-3336][HUDI-FLINK] Support custom hadoop config options for flink #4699

Closed
wants to merge 10 commits into from

Conversation

cuibo01
Copy link
Contributor

@cuibo01 cuibo01 commented Jan 27, 2022

…t take effect

Tips

What is the purpose of the pull request

(For example: This pull request adds quick-start document.)

Brief change log

(for example:)

  • Modify AnnotationLocation checkstyle rule in checkstyle.xml

Verify this pull request

(Please pick either of the following options)

This pull request is a trivial rework / code cleanup without any test coverage.

(or)

This pull request is already covered by existing tests, such as (please describe tests).

(or)

This change added tests and can be verified as follows:

(example:)

  • Added integration tests for end-to-end.
  • Added HoodieClientWriteTest to verify the change.
  • Manually verified the change by running a job locally.

Committer checklist

  • Has a corresponding JIRA in PR title & commit

  • Commit message is descriptive of the change

  • CI is green

  • Necessary doc changes done or have another open PR

  • For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.

@xiarixiaoyao
Copy link
Contributor

@hudi-bot run azure

@cuibo01 cuibo01 force-pushed the flinkConfigtoHdfsConfig branch 2 times, most recently from 8b1a75a to 8da29b1 Compare January 27, 2022 13:57
@cuibo01 cuibo01 changed the title [HUDI][HUDI-FLINK] Configurations transferred through Flink SQL canno… [HUDI-3336][HUDI-FLINK] Configurations transferred through Flink SQL canno… Jan 28, 2022
@cuibo01 cuibo01 force-pushed the flinkConfigtoHdfsConfig branch from 8da29b1 to 6e9036e Compare January 28, 2022 12:24
@xiarixiaoyao
Copy link
Contributor

@hudi-bot run azure

@cuibo01
Copy link
Contributor Author

cuibo01 commented Jan 29, 2022

@danny0405 pls review thx

@danny0405
Copy link
Contributor

Thanks, can you explain a little why we need theses hadoop. prefixed config options ?

@cuibo01
Copy link
Contributor Author

cuibo01 commented Jan 29, 2022

Thanks, can you explain a little why we need theses hadoop. prefixed config options ?

In the same application, the same core/hdfs-site is used
but some configurations may be different for different jobs, for example, the compact memory size, the FileSystem...

@cuibo01
Copy link
Contributor Author

cuibo01 commented Jan 29, 2022

@hudi-bot run azure

@cuibo01 cuibo01 force-pushed the flinkConfigtoHdfsConfig branch from 4cc376c to ec269f1 Compare January 29, 2022 11:51
@danny0405
Copy link
Contributor

You may need to change the commit title: such as: Support custom hadoop config options for flink

@cuibo01 cuibo01 changed the title [HUDI-3336][HUDI-FLINK] Configurations transferred through Flink SQL canno… [HUDI-3336][HUDI-FLINK] Support custom hadoop config options for flink Jan 29, 2022
@cuibo01
Copy link
Contributor Author

cuibo01 commented Jan 29, 2022

You may need to change the commit title: such as: Support custom hadoop config options for flink

ok Thanks for the review.

@cuibo01
Copy link
Contributor Author

cuibo01 commented Jan 29, 2022

@hudi-bot run azure

@cuibo01 cuibo01 force-pushed the flinkConfigtoHdfsConfig branch 2 times, most recently from 5085945 to 33f419e Compare January 29, 2022 12:22
@cuibo01
Copy link
Contributor Author

cuibo01 commented Jan 29, 2022

@hudi-bot run azure

@cuibo01
Copy link
Contributor Author

cuibo01 commented Jan 30, 2022

@danny0405 can u approval workflow and merge the pr, thx

@cuibo01 cuibo01 force-pushed the flinkConfigtoHdfsConfig branch from 33f419e to 099a340 Compare February 5, 2022 08:26
@cuibo01
Copy link
Contributor Author

cuibo01 commented Feb 6, 2022

@hudi-bot run azure

@cuibo01
Copy link
Contributor Author

cuibo01 commented Feb 10, 2022

@danny0405 can u merge the PR? thx

/**
* Collects the config options that start with specified prefix {@code prefix} into a 'key'='value' list.
*/
public static Map<String, String> getHoodiePropertiesWithPrefix(Map<String, String> options, String prefix) {
public static Map<String, String> getPropertiesWithPrefix(Map<String, String> options, String subprefix) {
final Map<String, String> hoodieProperties = new HashMap<>();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use the prefix directly ? There is no need to prefix the option with properties., prefix the option with hadoop. directly is okey.

And can we also add a too method named getHadoopOptions(Configuration conf) here ?

Copy link
Contributor Author

@cuibo01 cuibo01 Feb 10, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Parqurt also uses the getHoodiePropertiesWithPrefix, but the getHoodiePropertiesWithPrefix only use properties., not append parqurt.. To ensure consistency, hadoop and parquet add prefix(properties.) .
properties.parqurt.* and parqurt.*, which one you think is better

Copy link
Contributor Author

@cuibo01 cuibo01 Feb 11, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hi @danny0405 pls review :)
replace getPropertiesWithPrefix with DelegatingConfiguration#toMap of Flink
and Added validation to HoodieTableFactory

@cuibo01 cuibo01 force-pushed the flinkConfigtoHdfsConfig branch from ccfcae2 to 1cb3039 Compare February 11, 2022 02:17
@cuibo01 cuibo01 force-pushed the flinkConfigtoHdfsConfig branch from 1cb3039 to 0c496e6 Compare February 11, 2022 02:45
@danny0405
Copy link
Contributor

  1. The DelegatingConfiguration would prefix all the options with specified prefix which is not what we want
  2. Remove the validation of HoodieTableFactory because we want to support all the hoodie builtin options
  3. can we move all the prefix into FlinkOptions and in there, add a util to add options with specified prefix into the given hadoop configuration

@cuibo01
Copy link
Contributor Author

cuibo01 commented Feb 11, 2022

sorry ,i dont understand
DelegatingConfiguration#toMap returns all the specified prefixe options, the logic is the same as that of getHoodiePropertiesWithPrefix.
https://github.com/apache/flink/blob/0caed30f9aaa0ce9b1488000f843f9932c19ce5a/flink-core/src/main/java/org/apache/flink/configuration/DelegatingConfiguration.java#L314

@danny0405
Copy link
Contributor

sorry ,i dont understand DelegatingConfiguration#toMap returns all the specified prefixe options, the logic is the same as that of getHoodiePropertiesWithPrefix. https://github.com/apache/flink/blob/0caed30f9aaa0ce9b1488000f843f9932c19ce5a/flink-core/src/main/java/org/apache/flink/configuration/DelegatingConfiguration.java#L314

Yes, the toMap has similar functionality, but the getXXX and setXXX methods always prefix the option with prefix, which is not what we want, i still think there is no need to introduce this too class.

@cuibo01
Copy link
Contributor Author

cuibo01 commented Feb 11, 2022

Yes, the toMap has similar functionality, but the getXXX and setXXX methods always prefix the option with prefix, which is not what we want, i still think there is no need to introduce this too class.

got it, thx @danny0405 , i will update the PR :)

@cuibo01 cuibo01 force-pushed the flinkConfigtoHdfsConfig branch from d83f5bd to eb0b3c9 Compare February 12, 2022 07:07
@cuibo01
Copy link
Contributor Author

cuibo01 commented Feb 16, 2022

@danny0405 hi pls review :)

@cuibo01
Copy link
Contributor Author

cuibo01 commented Feb 19, 2022

@danny0405 can u merge the PR?

@danny0405
Copy link
Contributor

I think a little and we should hold for this patch, people usually do not pass hadoop config options through SQL options, can you describe again your use cases again, what kind of options do you want to pass around in per-job level ?

@cuibo01
Copy link
Contributor Author

cuibo01 commented Feb 21, 2022

I think a little and we should hold for this patch, people usually do not pass hadoop config options through SQL options, can you describe again your use cases again, what kind of options do you want to pass around in per-job level ?

1, Multiple SQL Job in the same process have different hadoop config in our production environment, we can configure different jobs by with
2, fix getParquetConf logic. :)

@XuQianJin-Stars
Copy link
Contributor

hi @cuibo01 can rebase this pr?

cuibo01 added 3 commits May 7, 2022 19:43
…gtoHdfsConfig2

Conflicts:
	hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/table/HoodieTableFactory.java
	hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/table/HoodieTableSource.java
@hudi-bot
Copy link

hudi-bot commented May 7, 2022

CI report:

Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

@cuibo01 cuibo01 closed this May 7, 2022
@cuibo01
Copy link
Contributor Author

cuibo01 commented May 7, 2022

#5528

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants