Missing proper numeric(little endian type) values after parsing #740

jaysara · 2025-01-10T23:21:13Z

Background

I am trying to parse a fixed record size file with digits presented by little-endian. I modified COMP-5 in the copybook to use COMP-9. I am able to parse all the text fields fine. However some numeric fields are not coming out correctly. The example-copybook for Copybook.
The sample input file is Input File
The Cobrix parse produced file is Cobrix Output
The expected output file from original system is Expected
This is my java code,

```

String custInputCopyBook = readCopyBook(configPath);// getCustInputCopyBook(configPath,isLocal);

    Dataset<Row> df1 =  spark.read()
            .format("za.co.absa.cobrix.spark.cobol.source")
            .option("copybook_contents", custInputCopyBook)
            .option("encoding", "ascii")
            .option("schema_retention_policy", "collapse_root")
            .option("record_start_offset", "2")
            .load(inputFile);

    df1.printSchema();

    df1.show(false);
    System.out.println("Count "+df1.count());
    df1.repartition(1).write().mode("overwrite").option("header","true").csv(outputPath);


## Question

Can you please help me identify why some numeric values are missing (coming as null )

The example of missing values are Column 1 ->Row -11,
Column 3 --> Row 11,
Column 4-> Row 10,
Column 5 -> Row 6,
Column 8 -> Row 7 , Row 9
All missing values are only for numeric fields. All characters field seem too populate correctly.

Thanks

The text was updated successfully, but these errors were encountered:

yruslan · 2025-01-20T15:31:15Z

Hi @jaysara ,

First of all, thanks a lot for such a detailed explanation of the issue and for providing the copybook and examples!

From what I can see, the copybook has PICs like 9(9), e.g. maximum length is 9 digits. But Cobrix encounters numbers that have 10 digits. So it drops it.

I see this is an issue that cannot be worked around. This is because if you use 9(10), Cobrix will use 8-byte long in stead of 4 byte integer.

We can add a special option to tread binary fields to its full extent, and ignore the maximum length for binary fields.

Converting this question to a feature request.

yruslan · 2025-01-20T15:33:43Z

Proposed option to add:
(just a proposal, can change)

Current behavior:

.option("binary_numbers", "strict")

New behavior in order to get expected values:

.option("binary_numbers", "full_range")

jaysara · 2025-01-21T15:42:01Z

hi @yruslan Thanks for the reply. I tried with

.option("binary_numbers", "full_range")

However, that has not changed anything. Any specific version of cobrix I should use ? I am using 2.6.9
For Enhancements, how long will that take (approx. ) Is there any temp solution I can implement ?
thanks,

yruslan · 2025-01-21T15:57:39Z

Yes, the feature is not implemented yet. The option is just a proposal.

But after looking at the code, we won't add the option, will just make the default behavior to support the wider range of values. This is almost like a bugfix.

The feature is going to be implemented soon.

jaysara · 2025-01-21T16:45:45Z

Thanks @yruslan . Sorry for pushing this little. It will help with our internal planning timeline point of view. Can you help with what will "soon" mean for this kind of bug fix. Should we plan as 1-week, 1-month or less than that ? Also, will you let us know here when the bug is fixed and new version is available. Thanks again for your help.

yruslan · 2025-01-21T19:08:16Z

Sure. The plan is to implement it this week and release a new version next week. Will let you know when the change is available in the master branch so you can also test it on your use case.

…g big binary fields.

jaysara added the question Further information is requested label Jan 10, 2025

yruslan added enhancement New feature or request and removed question Further information is requested labels Jan 20, 2025

yruslan added a commit that referenced this issue Jan 22, 2025

#740 Add s unite test the illustrates the current behavior of handlin…

6d48657

…g big binary fields.

yruslan added a commit that referenced this issue Jan 22, 2025

#740 Make sure unsigned binary fields can fit data types.

888d212

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Missing proper numeric(little endian type) values after parsing #740

Missing proper numeric(little endian type) values after parsing #740

jaysara commented Jan 10, 2025 •

edited

Loading

yruslan commented Jan 20, 2025

yruslan commented Jan 20, 2025

jaysara commented Jan 21, 2025 •

edited

Loading

yruslan commented Jan 21, 2025

jaysara commented Jan 21, 2025

yruslan commented Jan 21, 2025

Missing proper numeric(little endian type) values after parsing #740

Missing proper numeric(little endian type) values after parsing #740

Comments

jaysara commented Jan 10, 2025 • edited Loading

Background

yruslan commented Jan 20, 2025

yruslan commented Jan 20, 2025

jaysara commented Jan 21, 2025 • edited Loading

yruslan commented Jan 21, 2025

jaysara commented Jan 21, 2025

yruslan commented Jan 21, 2025

jaysara commented Jan 10, 2025 •

edited

Loading

jaysara commented Jan 21, 2025 •

edited

Loading