Textual representation of the export parquet output file #4056

vga91 · 2024-04-23T22:41:01Z

Check if it is possible to export a Parquet file, with the apoc.export.parquet.* procedure, in a textual representation instead of a binary one,
and add a flag if so.

See related Slack thread: https://neo4j.slack.com/archives/C136J23GE/p1713856995225349

The text was updated successfully, but these errors were encountered:

jexp · 2024-05-16T11:33:34Z

Isn't that just CSV ?

Perhaps just wrap it in base64 encoding?

vga91 · 2024-12-17T16:58:39Z

Apache Parquet format is not human-readable and binary-based (instead of text-based) by definition, so there's no way to create a textual output.
Some references:

So, we should use other formats to create a readable file.

Even wrapping the byte[] with new String(..) is not human-readable, for example:

PAR1�������ō�������������������������ō�������������������������ō�������������������������ō������������������������������������������������������������ō���������������������R�R���ς	������������������������
���1999-01-01
���2000-01-01���.�.��֛������������������������������1����2�������ō��������������������������ʘ������������������������������������������������������������������0�0���������������������������������Another�������ō�������������������������ō�������������������������ō�������������������������ō�������������������������ō������������������������������������&���������������������&���������������������&���������������������&������������������&���F�����������������������&��������
1999-01-01��
2000-01-01������&���F�����������1���2������&���F�����������������������&�������������������������������������&�������������������������������������&���������Another���Another������&���F�����������������������&���������������������&���������������������&���������������������&���������������������&��������:����������B�:����������|�:����������:����������H�������������:�����������������(�������\�������������:�������������H����������J����������^�������������:�������������:����������:����������:����������:������������H
apocExport�"��%���name��%���place���%���male���%���age�5���kids����L<���5���list����%���element���%���born%�L���������5���listDate

The same thing by doing Base64.getEncoder().encodeToString(...):

UEFSMRUAFSoVKhWWz96QChwVBBUAFQYVCAAAAgAAAAMDBAAAAEFkYW0DAAAASmltFQAVhgEVhgEVl5mAygccFQQVABUGFQgAAAIAAAADATkAAABwb2ludCh7eDogMzMuNDY3ODksIHk6IDEzLjEsIHo6IDEwMC4wLCBjcnM6ICd3Z3MtODQtM2QnfSkVABUOFQ4VssfL6gwcFQQVABUGFQgAAAIAAAADAQEVBBUQFRAV7oel2QE8FQIVBAAAKgAAAAAAAAAVABUQFRAVzZ TkAocFQQVBBUGFQgAAAIAAAADAwADFQAVWBVYFdqq2rkKHBUKFQAVBhUGAAACAAAAAw4DAAAAA/8AAwAAAFNhbQQAAABBbm5hBQAAAEdyYWNlAwAAAFF3ZRUAFRwVHBWAwerUDRwVBBUAFQYVCAAAAgAAAAMBQEmDaE0BAAAVABUaFRoVkK/b0AwcFQQVABUGFQYAAAIAAAADAAMAAAADAAAVABUaFRoVkK/b0AwcFQQVABUGFQYAAAIAAAADAAMAAAADAAAVABUMFQwVqbLFjQUcFQQVABUGFQgAAAIAAAADABUAFQwVDBWpssWNBRwVBBUAFQYVCAAAAgAAAAMAFQAVLBUsFdGq1OUGHBUEFQAVBhUIAAACAAAAAwMYAAAAAAAAABkAAAAAAAAAFQQVEBUQFY

The only usable way seems to be using a ParquetReader.

The batch produced which is used via ParquetWriter.write() is something like this:

name: Adam
place: point({x: 33.46789, y: 13.1, z: 100.0, crs: 'wgs-84-3d'})
male: true
age: 42
kids
  list
    element: Sam
  list
    element: Anna
  list
    element: Grace
  list
    element: Qwe
born: 1431977544000
__id: 4
__labels
  list
    element: User

Therefore, instead of parquet, we could use the apoc.export.yaml.*,
or also the apoc.export.csv.* / apoc.export.json.* would be good.

vga91 added the extended-functionality label Apr 23, 2024

vga91 added this to APOC Extended Larus Apr 23, 2024

vga91 moved this to Todo in APOC Extended Larus Apr 23, 2024

vga91 moved this from Todo to Blocked in APOC Extended Larus May 27, 2024

vga91 moved this from Blocked to Todo in APOC Extended Larus Dec 16, 2024

vga91 closed this as completed Dec 17, 2024

github-project-automation bot moved this from Todo to Done (check if cherry-pick) in APOC Extended Larus Dec 17, 2024

vga91 moved this from Done (check if cherry-pick) to Done in APOC Extended Larus Dec 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Textual representation of the export parquet output file #4056

Textual representation of the export parquet output file #4056

vga91 commented Apr 23, 2024 •

edited

Loading

jexp commented May 16, 2024

vga91 commented Dec 17, 2024

Textual representation of the export parquet output file #4056

Textual representation of the export parquet output file #4056

Comments

vga91 commented Apr 23, 2024 • edited Loading

jexp commented May 16, 2024

vga91 commented Dec 17, 2024

vga91 commented Apr 23, 2024 •

edited

Loading