Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bytelength #423

Closed
Loganhex2021 opened this issue Sep 20, 2021 · 5 comments
Closed

Bytelength #423

Loganhex2021 opened this issue Sep 20, 2021 · 5 comments
Assignees
Labels
accepted Accepted for implementation enhancement New feature or request

Comments

@Loganhex2021
Copy link

Loganhex2021 commented Sep 20, 2021

Background [Optional]

We are using cobrix library for reading ebcdic file in the Databricks. There is a validation requirement to check record byte length for each record in the file.

Question

Is there any option to generate byte length for the record while reading ebcdic file?

@Loganhex2021 Loganhex2021 added the question Further information is requested label Sep 20, 2021
@Loganhex2021
Copy link
Author

@yruslan - Could you please let me know if you have any idea to calculate byte length for a reach in ebcdic file ?

@yruslan
Copy link
Collaborator

yruslan commented Sep 21, 2021

Do you need a record size for each record or file size for each record?

You can get a file name for each record using either

.option("with_input_file_name_col", "input_file_name")

or

df.withColumn("input_file_name", input_file_name())

depending on the type of file (variable length vs fixed length)
You can then use a filesystem API (Hadoop Client, etc) to get the file size for each file.

@Loganhex2021
Copy link
Author

Thanks @yruslan , I need record size for each record.

@Loganhex2021
Copy link
Author

@yruslan , could you please help here

@yruslan yruslan added accepted Accepted for implementation enhancement New feature or request and removed question Further information is requested labels Oct 5, 2021
@yruslan yruslan self-assigned this Oct 5, 2021
@yruslan
Copy link
Collaborator

yruslan commented Oct 5, 2021

Hi, sorry for the late reply. Currently, this is not supported. I've added this to feature requests.
We can make

.option("generate_record_id", "true")

generate record length as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
accepted Accepted for implementation enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants