Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NIFI-11858 Configurable Column Name Normalization in PutDatabaseRecord and UpdateDatabaseTable #7544

Closed
wants to merge 56 commits into from

Conversation

ravinarayansingh
Copy link
Contributor

Summary

NiFi 11858

Tracking

Please complete the following tracking steps prior to pull request creation.

Issue Tracking

Pull Request Tracking

  • Pull Request title starts with Apache NiFi Jira issue number, such as NIFI-00000
  • Pull Request commit message starts with Apache NiFi Jira issue number, as such NIFI-00000

Pull Request Formatting

  • Pull Request based on current revision of the main branch
  • Pull Request refers to a feature branch with one commit containing changes

Verification

Please indicate the verification steps performed prior to pull request creation.

Build

  • Build completed using mvn clean install -P contrib-check
    • JDK 17

Licensing

  • New dependencies are compatible with the Apache License 2.0 according to the License Policy
  • New dependencies are documented in applicable LICENSE and NOTICE files

Documentation

  • Documentation formatting appears as expected in rendered files

Copy link
Contributor

@exceptionfactory exceptionfactory left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ravinarayansingh Thanks for the contribution. There are still some static code analysis issues. You can test locally by running the following command:

./mvnw clean install -Pcontrib-check -am -pl :nifi-standard-processors

@ravinarayansingh
Copy link
Contributor Author

Hi @exceptionfactory ,

following is the rat report:


Summary

Generated at: 2023-08-01T08:57:42-07:00

Notes: 24
Binaries: 5
Archives: 3
Standards: 27

Apache Licensed: 13
Generated Documents: 0

JavaDocs are generated, thus a license header is optional.
Generated files do not require license headers.

14 Unknown Licenses


Files with unapproved licenses:

nifi-h2/nifi-h2-database/nifi-h2-database.iml
nifi-h2/nifi-h2-database/target/.plxarc
nifi-h2/nifi-h2-database/target/maven-archiver/pom.properties
nifi-h2/nifi-h2-database-migrator/nifi-h2-database-migrator.iml
nifi-h2/nifi-h2-database-migrator/target/.plxarc
nifi-h2/nifi-h2-database-migrator/target/maven-archiver/pom.properties
nifi-h2/nifi-h2-database-migrator/target/maven-status/maven-compiler-plugin/compile/default-compile/createdFiles.lst
nifi-h2/nifi-h2-database-migrator/target/maven-status/maven-compiler-plugin/compile/default-compile/inputFiles.lst
nifi-h2/nifi-h2-database-migrator/target/maven-status/maven-compiler-plugin/testCompile/default-testCompile/createdFiles.lst
nifi-h2/nifi-h2-database-migrator/target/maven-status/maven-compiler-plugin/testCompile/default-testCompile/inputFiles.lst
nifi-h2/nifi-h2-database-migrator/target/maven-status/maven-compiler-plugin/testCompile/groovy-tests/createdFiles.lst
nifi-h2/nifi-h2-database-migrator/target/maven-status/maven-compiler-plugin/testCompile/groovy-tests/inputFiles.lst
nifi-h2/nifi-h2.iml
nifi-h2/target/.plxarc


Archives:


Files with Apache License headers will be marked AL
Binary files (which do not require any license headers) will be marked B
Compressed archives will be marked A
Notices, licenses etc. will be marked N
AL .asf.yaml
AL .github/PULL_REQUEST_TEMPLATE.md
AL .github/workflows/ci-workflow.yml
AL .github/workflows/stale.yml
AL .github/workflows/system-tests.yml
AL .mvn/wrapper/maven-wrapper.properties
AL checkstyle.xml
B jffi10666034361742842335.dll
N KEYS
N LICENSE
AL mvnw
AL mvnw.cmd
AL nifi-dependency-check-maven/suppressions.xml
!????? nifi-h2/nifi-h2-database/nifi-h2-database.iml
!????? nifi-h2/nifi-h2-database/target/.plxarc
N nifi-h2/nifi-h2-database/target/classes/META-INF/DEPENDENCIES
N nifi-h2/nifi-h2-database/target/classes/META-INF/LICENSE
N nifi-h2/nifi-h2-database/target/classes/META-INF/NOTICE
!????? nifi-h2/nifi-h2-database/target/maven-archiver/pom.properties
N nifi-h2/nifi-h2-database/target/maven-shared-archive-resources/META-INF/DEPENDENCIES
N nifi-h2/nifi-h2-database/target/maven-shared-archive-resources/META-INF/LICENSE
N nifi-h2/nifi-h2-database/target/maven-shared-archive-resources/META-INF/NOTICE
A nifi-h2/nifi-h2-database/target/nifi-h2-database-1.18.0-SNAPSHOT.jar
A nifi-h2/nifi-h2-database/target/original-nifi-h2-database-1.18.0-SNAPSHOT.jar
N nifi-h2/nifi-h2-database/target/test-classes/META-INF/DEPENDENCIES
N nifi-h2/nifi-h2-database/target/test-classes/META-INF/LICENSE
N nifi-h2/nifi-h2-database/target/test-classes/META-INF/NOTICE
!????? nifi-h2/nifi-h2-database-migrator/nifi-h2-database-migrator.iml
!????? nifi-h2/nifi-h2-database-migrator/target/.plxarc
N nifi-h2/nifi-h2-database-migrator/target/classes/META-INF/DEPENDENCIES
N nifi-h2/nifi-h2-database-migrator/target/classes/META-INF/LICENSE
N nifi-h2/nifi-h2-database-migrator/target/classes/META-INF/NOTICE
B nifi-h2/nifi-h2-database-migrator/target/classes/org/apache/nifi/h2/database/migration/H2DatabaseMigrator.class
B nifi-h2/nifi-h2-database-migrator/target/classes/org/apache/nifi/h2/database/migration/H2DatabaseUpdater.class
!????? nifi-h2/nifi-h2-database-migrator/target/maven-archiver/pom.properties
N nifi-h2/nifi-h2-database-migrator/target/maven-shared-archive-resources/META-INF/DEPENDENCIES
N nifi-h2/nifi-h2-database-migrator/target/maven-shared-archive-resources/META-INF/LICENSE
N nifi-h2/nifi-h2-database-migrator/target/maven-shared-archive-resources/META-INF/NOTICE
!????? nifi-h2/nifi-h2-database-migrator/target/maven-status/maven-compiler-plugin/compile/default-compile/createdFiles.lst
!????? nifi-h2/nifi-h2-database-migrator/target/maven-status/maven-compiler-plugin/compile/default-compile/inputFiles.lst
!????? nifi-h2/nifi-h2-database-migrator/target/maven-status/maven-compiler-plugin/testCompile/default-testCompile/createdFiles.lst
!????? nifi-h2/nifi-h2-database-migrator/target/maven-status/maven-compiler-plugin/testCompile/default-testCompile/inputFiles.lst
!????? nifi-h2/nifi-h2-database-migrator/target/maven-status/maven-compiler-plugin/testCompile/groovy-tests/createdFiles.lst
!????? nifi-h2/nifi-h2-database-migrator/target/maven-status/maven-compiler-plugin/testCompile/groovy-tests/inputFiles.lst
A nifi-h2/nifi-h2-database-migrator/target/nifi-h2-database-migrator-1.18.0-SNAPSHOT.jar
N nifi-h2/nifi-h2-database-migrator/target/test-classes/META-INF/DEPENDENCIES
N nifi-h2/nifi-h2-database-migrator/target/test-classes/META-INF/LICENSE
N nifi-h2/nifi-h2-database-migrator/target/test-classes/META-INF/NOTICE
B nifi-h2/nifi-h2-database-migrator/target/test-classes/nifi-flow-audit.mv.db
B nifi-h2/nifi-h2-database-migrator/target/test-classes/org/apache/nifi/h2/database/migration/TestH2DatabaseUpdater.class
!????? nifi-h2/nifi-h2.iml
!????? nifi-h2/target/.plxarc
N nifi-h2/target/maven-shared-archive-resources/META-INF/DEPENDENCIES
N nifi-h2/target/maven-shared-archive-resources/META-INF/LICENSE
N nifi-h2/target/maven-shared-archive-resources/META-INF/NOTICE
N NOTICE
AL pom.xml
AL README.md
AL SECURITY.md


let me know if i have to change anything
Thanks

@mattyb149 mattyb149 changed the title NiFi 11858 Improve column name normalization in PutDatabaseRecord and UpdateDatabaseTable processor NIFI-11858 Improve column name normalization in PutDatabaseRecord and UpdateDatabaseTable processor Aug 1, 2023
@exceptionfactory
Copy link
Contributor

@ravinarayansingh The Static Analysis build lists the particular problem:

https://github.com/apache/nifi/actions/runs/5719909353/job/15509139654?pr=7544#step:5:5763

It looks like those nifi-h2 directories are leftover on your local system from the support branch, so they could be removed, but that will not impact the pull request build.

@ravinarayansingh
Copy link
Contributor Author

Hi @exceptionfactory
I have made the require changes
Thanks

@MikeThomsen
Copy link
Contributor

@ravinarayansingh you have a merge conflict.

…NIFI-11858

� Conflicts:
�	nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/UpdateDatabaseTable.java
�	nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/db/TableSchema.java
@ravinarayansingh
Copy link
Contributor Author

ravinarayansingh commented Aug 12, 2023

@MikeThomsen

resolved

@ravinarayansingh
Copy link
Contributor Author

Hi @exceptionfactory ,
when this feature will be available in NiFi
Thanks

@exceptionfactory
Copy link
Contributor

Hi @exceptionfactory , when this feature will be available in NiFi Thanks

@ravinarayansingh These changes still need substantive review, along with a number of other pull requests, but thanks for addressing the initial feedback so far.

Copy link
Contributor

@exceptionfactory exceptionfactory left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your patience on this pull request review @ravinarayansingh. Reviewing the implementation details, the general concept of a configurable column normalization strategy is helpful.

As a general recommendation, it looks like it would be helpful to make ColumnNameNormalizer an interface, with getNormalizedName(String columnName) as the interface method. Although the current implementation is simple enough, providing separate implementations would remove the need for the conditional logic. Given that this method will be called for every column, there seems to be value in having narrowly defined implementation classes.

@ravinarayansingh
Copy link
Contributor Author

Thanks for your patience on this pull request review @ravinarayansingh. Reviewing the implementation details, the general concept of a configurable column normalization strategy is helpful.

As a general recommendation, it looks like it would be helpful to make ColumnNameNormalizer an interface, with getNormalizedName(String columnName) as the interface method. Although the current implementation is simple enough, providing separate implementations would remove the need for the conditional logic. Given that this method will be called for every column, there seems to be value in having narrowly defined implementation classes.

Hi @exceptionfactory
thanks for suggestion i will make required changes

@exceptionfactory exceptionfactory changed the title NIFI-11858 Improve column name normalization in PutDatabaseRecord and UpdateDatabaseTable processor NIFI-11858 Configurable Column Name Normalization in PutDatabaseRecord and UpdateDatabaseTable Nov 16, 2023
…NIFI-11858

� Conflicts:
�	nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/PutDatabaseRecord.java
@ravinarayansingh
Copy link
Contributor Author

Hi @exceptionfactory ,

I have made the required changes, please have a look
Thanks

Copy link
Contributor

@exceptionfactory exceptionfactory left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the updates @ravinarayansingh. The general approach looks good, but there are a few more naming recommendations. In particular, any regular expression pattern should be parsed to a Pattern object for more efficient processing in the normalizer implementation.

ravinarayansingh and others added 2 commits December 3, 2023 20:53
…/src/main/java/org/apache/nifi/processors/standard/PutDatabaseRecord.java

Co-authored-by: David Handermann <[email protected]>
…/src/main/java/org/apache/nifi/processors/standard/PutDatabaseRecord.java

Co-authored-by: David Handermann <[email protected]>
@ravinarayansingh
Copy link
Contributor Author

@ravinarayansingh Thanks for your work and patience on this pull request.

Unfortunately I may not have been clear in previous comments regarding the interface-based approached for the ColumnNameNormalizer. I did not intend for the ColumnNameNormalizer to be implemented by each Processor, but instead, the ColumnNameNormalizer should have separate implementations for each strategy.

If it would be helpful, I could follow up with a commit that implements the approach I described.

Hi @exceptionfactory
I have made the required changes please have a look

@exceptionfactory
Copy link
Contributor

Thanks for the updates @ravinarayansingh, the ColumnNameNormalizer with individual implementations matches what I had in mind. I appreciate your efforts, I will take a closer look soon.

@mattyb149
Copy link
Contributor

This needs a rebase against the latest main, then I will take a look also, thanks!

@mattyb149 mattyb149 self-requested a review April 17, 2024 16:24
@ravinarayansingh
Copy link
Contributor Author

ravinarayansingh commented Apr 25, 2024

Hi @mattyb149
Build is failing but i think that is not related to this PR
please let me know if i have to change anything form my side

…NIFI-11858

# Conflicts:
#	nifi-extension-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/db/ColumnNameNormalizer.java
#	nifi-extension-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/db/ColumnNameNormalizerFactory.java
#	nifi-extension-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/db/TranslationStrategy.java
#	nifi-extension-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/db/impl/PatternNormalizer.java
#	nifi-extension-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/db/impl/RemoveAllSpecialCharNormalizer.java
#	nifi-extension-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/db/impl/RemoveSpaceNormalizer.java
#	nifi-extension-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/db/impl/RemoveUnderscoreNormalizer.java
Copy link
Contributor

@jrsteinebrey jrsteinebrey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just noticed this PR today. I would like at least 24 hours more to review this and see if I have feedback on the code. Right now I am giving some wording feedback.

…NIFI-11858

# Conflicts:
#	nifi-extension-bundles/nifi-standard-bundle/nifi-standard-processors/src/test/java/org/apache/nifi/processors/standard/PutDatabaseRecordTest.java
@ravinarayansingh
Copy link
Contributor Author

ravinarayansingh commented May 22, 2024

Hi @jrsteinebrey
I have updated the code as per your suggestion, have a look

@jrsteinebrey
Copy link
Contributor

Thanks for the changes you made. Some other files in this PR now have checkstyle violations:

Warning: src/main/java/org/apache/nifi/processors/standard/db/TableSchema.java:[155,47] (whitespace) WhitespaceAround: '!=' is not followed by whitespace.
Warning: src/main/java/org/apache/nifi/processors/standard/db/TableSchema.java:[155,47] (whitespace) WhitespaceAround: '!=' is not preceded with whitespace.
Warning: src/test/java/org/apache/nifi/processors/standard/db/impl/TestOracleDatabaseAdapter.java:[124,122] (whitespace) WhitespaceAfter: ',' is not followed by whitespace.
Warning: src/test/java/org/apache/nifi/processors/standard/db/impl/TestOracle12DatabaseAdapter.java:[166,122] (whitespace) WhitespaceAfter: ',' is not followed by whitespace.
Warning: src/test/java/org/apache/nifi/processors/standard/PutDatabaseRecordTest.java:[1832,11] (whitespace) WhitespaceAfter: 'catch' is not followed by whitespace.
Warning: src/test/java/org/apache/nifi/processors/standard/PutDatabaseRecordTest.java:[1832,11] (whitespace) WhitespaceAround: 'catch' is not followed by whitespace.

@mattyb149
Copy link
Contributor

The checkstyle rules were recently made more stringent, looks like this needs another rebase and please run your Maven build with the contrib-check profile activated.

Copy link
Contributor

@jrsteinebrey jrsteinebrey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like @mattyb149, @exceptionfactory and anyone else interested to respond to my questions or comment if they disagree with any of my suggestions.

@@ -177,7 +181,25 @@ public class UpdateDatabaseTable extends AbstractProcessor {
.allowableValues("true", "false")
.defaultValue("true")
.build();
public static final PropertyDescriptor TRANSLATION_STRATEGY = new PropertyDescriptor.Builder()
.required(true)
.name("Column Name Translation Strategy")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ravinarayansingh Same comment as above that these apply to both field and column names.

public static final PropertyDescriptor TRANSLATION_STRATEGY = new PropertyDescriptor.Builder()
.required(true)
.name("Column Name Translation Strategy")
.description("The strategy used to normalize table column name")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ravinarayansingh The code uses all three of the these Translate properties (Translate Field Names, Column Name Translation Strategy, and Column Name Translation Pattern) and uses them to translate BOTH field names AND column names. I recommend that the displayable names and descriptions be changed to reflect the fact that they apply to both field and column names.
This comment also equally applies to UpdateDatabaseTable class.

@@ -270,6 +274,7 @@ public class PutDatabaseRecord extends AbstractProcessor {
.build();

static final PropertyDescriptor TRANSLATE_FIELD_NAMES = new Builder()
.required(true)
.name("put-db-record-translate-field-names")
.displayName("Translate Field Names")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel it is valuable to make these property names as clear as possible because this is a sophisticated feature.
I think another word besides "Translate" could be found to communicate what this feature does.
Here are some property display names I have thought of. I think Normalize is clearer. What to people think?
"Translate Field and Column Names for Comparison"
"Normalize Field and Column Names for Comparison"
"Adjust Field and Column Names for Comparison"
"Filter Field and Column Names for Comparison"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @exceptionfactory and @exceptionfactory
if you have any thought or suggestion regarding @jrsteinebrey above comment please let me know

@jrsteinebrey
Copy link
Contributor

Thanks, @ravinarayansingh. My comments that are automatically marked as Outdated are all resolved.
I still have some open existing comments that I recommend changing property names. I understand if you are waiting on those until you see if other reviews have feedback on my property renaming comments.

@mattyb149
Copy link
Contributor

Looks like there are some merge and/or rebase issues here, might be worth a new PR applying the desired commits to a feature branch based off the latest main branch.

@ravinarayansingh
Copy link
Contributor Author

Looks like there are some merge and/or rebase issues here, might be worth a new PR applying the desired commits to a feature branch based off the latest main branch.

Hi @mattyb149
I have created new PR please have a look

@mattyb149
Copy link
Contributor

Closing this in favor of the new PR, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants