Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NIFI-12313 - Update PutDatabaseRecord.java add property use database table column datatype #7981

Closed
wants to merge 3 commits into from

Conversation

avasquezkudaw
Copy link

@avasquezkudaw avasquezkudaw commented Nov 6, 2023

NIFI-12313 - Update PutDatabaseRecord.java add property use database table column table column datatype

Summary

We have encountered an issue when migrating from version 1.12.1 to version 1.23.2, which is related to the change associated with this issue: "NIFI-8223 - PutDatabaseRecord should use table column datatype instead of field datatype."

This issue introduced a change in the data transformation logic in the PutDatabaseRecord processor, which converts a field (e.g., of type string) to the data type of its corresponding database column.

This is causing problems because our field in the database is of type datetime2 and our string is "2023-10-19 14:14:59.999314" However, the processor is converting it to "2023-10-19 14:14:59" without microseconds, despite setting the format in the csvreader.

To address this, we have made an adjustment in the PutDatabaseRecord processor so that users at the processor level can choose whether to apply the transformation to the database column type or maintain the Nifi format.

NIFI-12313

Tracking

Please complete the following tracking steps prior to pull request creation.

Issue Tracking

Pull Request Tracking

  • Pull Request title starts with Apache NiFi Jira issue number, such as NIFI-00000
  • Pull Request commit message starts with Apache NiFi Jira issue number, as such NIFI-00000

Pull Request Formatting

  • Pull Request based on current revision of the main branch
  • Pull Request refers to a feature branch with one commit containing changes

Verification

Please indicate the verification steps performed prior to pull request creation.

Build

  • Build completed using mvn clean install -P contrib-check
    • JDK 21

Licensing

  • New dependencies are compatible with the Apache License 2.0 according to the License Policy
  • New dependencies are documented in applicable LICENSE and NOTICE files

Documentation

  • Documentation formatting appears as expected in rendered files

Copy link
Contributor

@exceptionfactory exceptionfactory left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution @avasquezkudaw. As this presents an important behavioral change in the Processor, it would be helpful to add a unit test method that verifies the expected behavior when the property is disabled. Also note that property names should use Title Case for naming.

@mattyb149
Copy link
Contributor

Thanks for the contribution @avasquezkudaw. As this presents an important behavioral change in the Processor, it would be helpful to add a unit test method that verifies the expected behavior when the property is disabled. Also note that property names should use Title Case for naming.

Agreed, when the capability for using database types was added it was decided purposely to not offer a property, to simplify the user experience and be flexible in which types were used, to be as smart as possible at inferring the user's intent when putting data of one type in the record into a column in the database with a different type, such as String -> Timestamp. This PR should have at least a handful of tests (or multiple different types in one or more tests) for covering possible users' intents.

@mattyb149
Copy link
Contributor

Is your CSVReader using an explicit schema or Infer Schema? If the latter, I believe this issue is actually related to NiFi's handling of microseconds rather than the fact that it is using the target DB's column datatype (I have seen the same issue with JSON). If it worked before, I think it was "lucky" in the sense that the value was always treated like a string and in the SQL the driver accepts the string as a timestamp literal and preserves the microseconds. I'm not sure that if the value had been converted to a NiFi timestamp that the microseconds would be retained (and perhaps not the milliseconds either as you are seeing). I am still investigating.

I'm not strongly opposed to having a property to configure this but we might be better served by fixing the code that truncates the fractional seconds, if that's what's going on here.

@avasquezkudaw
Copy link
Author

Yes, I agree that fixing the millisecond truncation issue is the best option.
Regarding your question, we have tested both functionalities, inferschema and setting a schema, and the result is the same in both cases.
I am attaching both flows for versions 1.12 and 1.23.
NiFi_Flow_112.json
NiFi_Flow_123.json

Another test we conducted is to transform the record into Avro format, but the result is the same as well.
avro_schema
convert_record_Avro

Copy link
Contributor

@exceptionfactory exceptionfactory left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on the issue described, and the additional complexity that would be involved introducing this property, it sounds like it would be much better to address the source of the problem. The fundamental change will involve refactor the use of SimpleDateFormat to the DateTimeFormatter, which supports microsecond and nanosecond precision.

@exceptionfactory
Copy link
Contributor

@avasquezkudaw I submitted pull request #8248 for NIFI-9458 implementing framework for microsecond and nanosecond precision in schema field definitions. There may be other changes required for this particular use case. However, in light of those changes and previous discussion, I am closing this pull request for now. Feel free to revisit this issue in light of those date formatting changes, and we can also evaluate other options as needed on this Jira issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants