Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Debug functionality of EBCDIC data #69

Closed
bprasen opened this issue Apr 21, 2019 · 4 comments
Closed

Debug functionality of EBCDIC data #69

bprasen opened this issue Apr 21, 2019 · 4 comments
Assignees
Labels
accepted Accepted for implementation enhancement New feature or request

Comments

@bprasen
Copy link

bprasen commented Apr 21, 2019

Hi,
Here I am not mentioning any issue but a functionality on debug purpose mainly. While Cobrix create dataframe it decodes the EBCDIC data according to the datatype of the given primitive field, Here if it is possible also to show the hex value of the column as well for a given option like add_hex = true. This functionality is only for debug purpose to check the data.

@yruslan
Copy link
Collaborator

yruslan commented Apr 22, 2019

Interesting idea. This could be helpful to diagnose decoding issues.
Although we cannot make Spark show the original bytes in hex, but we can add additional fields to the output dataframe. For instance, if a schema has ID, FIRST-NAME and LAST-NAME and if the debug option is turned on, the schema will contain additional ID_DEBUG, FIRST-NAME_DEBUG and LAST-NAME_DEBUG fields containing HEX values of the original data before decoding.

Please, clarify a couple of things about your use case:

  • Do you want to debug a particular column or all columns in the schema?
  • The HEX values should correspond to the original data before conversion to ASCII/Unicode, right?

@yruslan yruslan added the enhancement New feature or request label Apr 22, 2019
@bprasen
Copy link
Author

bprasen commented Apr 23, 2019

I was thinking about all the columns and yes the HEX values are original data before conversion. Truly speaking, I was also trying to modify the source code to have that functionality for FixedLengthNested option only right now. I can share the code with you if you want, may be that requires some standardisation. Thanks for your interest, please let me know your email so that I can send these codes for your review.

@yruslan
Copy link
Collaborator

yruslan commented Apr 24, 2019

Great, thanks for the answers! I think this is a helpful feature and we are going to implement it.
You can send your code as a pull request, but it is not necessary. The feature seems pretty straightforward.

@yruslan yruslan added the accepted Accepted for implementation label Apr 24, 2019
@yruslan yruslan self-assigned this Apr 26, 2019
@yruslan
Copy link
Collaborator

yruslan commented Mar 23, 2020

🎉 @bprasen, finally this very helpful feature is implemented and it is a part of 2.0.5 released today.

@yruslan yruslan closed this as completed Mar 23, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
accepted Accepted for implementation enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants