clp-s: Add support for serializing structured arrays. #413

gibber9809 · 2024-05-23T18:58:37Z

Description

This PR implements serialization for structurized arrays. All of the code changes are relegated to SchemaReader and JsonSerializer. Most of the heavy lifting to support serializing arrays was implemented in #355, so this code change mostly consists of code to walk over the schema for different unordered object types and push JsonSerializer ops as appropriate.

Validation performed

Validated that decompression works correctly for several array corner cases
Validated that search on arrays returns expected results

wraymo

Can you show the logs your are using for testing (It would be better if it covers all structured array cases in generate_structured_array_template, generate_structured_object_template and find_intersection_and_fix_brackets)?

wraymo · 2024-05-27T20:14:23Z

components/core/src/clp_s/SchemaReader.hpp

    /**
     * @param schema
     * @return the first column ID found in the given schema, or -1 if the schema contains no
     * columns
     */
    static inline int32_t get_first_column_in_span(std::span<int32_t> schema);

+    void find_intersection_and_fix_brackets(


Do you want to add a description for this method?

wraymo · 2024-05-27T20:48:00Z

components/core/src/clp_s/SchemaReader.cpp

+                int32_t global_child_id = m_local_id_to_global_id[child_id];
+                auto structured_it = m_global_id_to_unordered_object.find(global_child_id);
+                if (m_global_id_to_unordered_object.end() != structured_it) {
+                    size_t column_start = structured_it->second.first;
+                    std::span<int32_t> structured_schema = structured_it->second.second;
+                    generate_structured_array_template(
+                            global_child_id,


Suggested change

int32_t global_child_id = m_local_id_to_global_id[child_id];

auto structured_it = m_global_id_to_unordered_object.find(global_child_id);

if (m_global_id_to_unordered_object.end() != structured_it) {

size_t column_start = structured_it->second.first;

std::span<int32_t> structured_schema = structured_it->second.second;

generate_structured_array_template(

global_child_id,

auto structured_it = m_global_id_to_unordered_object.find(child_global_id);

if (m_global_id_to_unordered_object.end() != structured_it) {

size_t column_start = structured_it->second.first;

std::span<int32_t> structured_schema = structured_it->second.second;

generate_structured_array_template(

child_global_id,

wraymo · 2024-05-27T20:48:37Z

components/core/src/clp_s/SchemaReader.hpp

+     * @param id
+     * @param column_start
+     * @param schema
+     */


What about the return value?

wraymo · 2024-05-27T20:49:46Z

components/core/src/clp_s/SchemaReader.hpp

+     * @param id
+     * @param column_start
+     * @param schema
+     */


Same as the last comment

wraymo · 2024-05-28T19:37:20Z

components/core/src/clp_s/SchemaReader.cpp

+        int32_t cur_root,
+        int32_t next_root,


Suggested change

int32_t cur_root,

int32_t next_root,

int32_t cur_node_id,

int32_t next_node_id,

I think I'd prefer to keep the cur_root/next_root names. They carry some meaning because they're supposed to indicate the root of the subtree for which we've pushed all of the required keys/brackets to m_json_serializer before the field we are about to push. I'm open to other names that carry meaning though.

Got it. Personally I prefer using...id for int32_t node IDs as we extensively use ...column_id, ...child_id elsewhere. But it's fine to leave them as they are since it seems that we are following ...node for SchemaNode and ...root for int32_t.

…ne level of bracket fixing

gibber9809 · 2024-05-29T17:51:12Z

Can you show the logs your are using for testing (It would be better if it covers all structured array cases in generate_structured_array_template, generate_structured_object_template and find_intersection_and_fix_brackets)?

Right now I'm using

{"a":["a",{"b":"c"},{},[],{"e":{"f":{"g":{}}}},null,10,false,"h"],"b":[]}
{"a":["b",{"b":"d"},{},[],{"e":{"f":{"g":{}}}},null,11,true,"i"],"b":[]}
{"b":[0,1,2,3,[[]],[[4]],{"c":["d",{"e":{"f":["g",{}]}}]}]}
{"c":[{"a":{"b":"c","d":"e"},"f":{"g":{"h":{"i":"j"}},"k":"l"}}]}

which covers all of the cases for structured array and find_intersection_and_fix_brackets, and decompresses to the original input byte-for-byte.

wraymo · 2024-05-31T20:38:40Z

components/core/src/clp_s/SchemaReader.cpp

+        } else {
+            cur_root = cur_node->get_parent_id();
+            cur_node = &m_global_schema_tree->get_node(cur_root);
+            m_json_serializer.add_op(JsonSerializer::Op::EndObject);
+            path_to_intersection.push_back(next_root);
+            next_root = next_node->get_parent_id();
+            next_node = &m_global_schema_tree->get_node(next_root);
+        }


What if cur_node and next_node have the same parents? The code will never go into this branch. For example, after decompression, the log {"a": [1, {"b": {"c":3}, "d": [4]}]} becomes {"a":[1,{"b":{"c":3,4]}]}

This case works on the most recent commit -- I pushed a fix for this issue when I re-requested review. I added a code block after the loop that checks if cur_node and next_node are different, and if so fixes the last bracket + adds the last node to the path .

wraymo

Great work! Just one small thing.

components/core/src/clp_s/SchemaReader.cpp

wraymo · 2024-06-03T18:25:23Z

components/core/src/clp_s/SchemaReader.cpp

+        int32_t cur_root,
+        int32_t next_root,


Got it. Personally I prefer using...id for int32_t node IDs as we extensively use ...column_id, ...child_id elsewhere. But it's fine to leave them as they are since it seems that we are following ...node for SchemaNode and ...root for int32_t.

Co-authored-by: wraymo <[email protected]>

components/core/src/clp_s/ArchiveReader.cpp

wraymo · 2024-06-05T18:10:31Z

components/core/src/clp_s/ArchiveReader.cpp

+            append_unordered_reader_columns(
+                    m_schema_reader,
+                    column_id,
+                    schema.get_view(i, 0),


Do you want to change it to std::span<int32_t>() given it's an empty span?

Co-authored-by: wraymo <[email protected]>

wraymo · 2024-06-06T21:04:30Z

Can you fix the lint error?

wraymo

The PR title looks good to me.

kirkrodrigues · 2024-06-06T23:08:17Z

How about:

clp-s: Add support for serializing structured arrays.

gibber9809 added 2 commits May 23, 2024 18:55

Implement serialization for structurized arrays

1e08a2e

Rename some JsonSerializer ops

a906592

gibber9809 marked this pull request as ready for review May 24, 2024 20:14

gibber9809 requested a review from wraymo May 24, 2024 20:14

gibber9809 changed the title ~~[WIP] Implement serialization for structurized arrays.~~ clp-s: Implement serialization for structurized arrays. May 27, 2024

wraymo reviewed May 28, 2024

View reviewed changes

gibber9809 added 2 commits May 29, 2024 17:24

Fix a bug where empty structured arrays do not get marshalled

9d14226

Fix bug where find_intersection_and_fix_brackets can sometimes miss o…

a9aff7c

…ne level of bracket fixing

gibber9809 added 3 commits May 29, 2024 18:32

Improve comments in SchemaReader.hpp

1fe417f

Address review comment

da90ddb

Update comment

249f7f6

gibber9809 requested a review from wraymo May 31, 2024 13:23

wraymo reviewed May 31, 2024

View reviewed changes

gibber9809 requested a review from wraymo June 3, 2024 14:12

wraymo reviewed Jun 3, 2024

View reviewed changes

Update components/core/src/clp_s/SchemaReader.cpp

cf42270

Co-authored-by: wraymo <[email protected]>

gibber9809 requested a review from wraymo June 3, 2024 18:32

wraymo reviewed Jun 5, 2024

View reviewed changes

components/core/src/clp_s/ArchiveReader.cpp Outdated Show resolved Hide resolved

wraymo reviewed Jun 5, 2024

View reviewed changes

Address review comment

e8235bb

gibber9809 requested a review from wraymo June 6, 2024 16:31

Update components/core/src/clp_s/ArchiveReader.cpp

2e0b6db

Co-authored-by: wraymo <[email protected]>

Fix lint

e8d8218

wraymo approved these changes Jun 6, 2024

View reviewed changes

gibber9809 changed the title ~~clp-s: Implement serialization for structurized arrays.~~ clp-s: Add support for serializing structured arrays. Jun 7, 2024

gibber9809 merged commit 2725b9a into y-scope:main Jun 7, 2024
11 checks passed

gibber9809 deleted the structurized-array-serialization branch June 7, 2024 01:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

clp-s: Add support for serializing structured arrays. #413

clp-s: Add support for serializing structured arrays. #413

gibber9809 commented May 23, 2024 •

edited

Loading

wraymo left a comment

wraymo May 27, 2024

wraymo May 27, 2024

wraymo May 27, 2024

wraymo May 27, 2024

wraymo May 28, 2024

gibber9809 May 31, 2024

wraymo Jun 3, 2024

gibber9809 commented May 29, 2024

wraymo May 31, 2024

gibber9809 Jun 3, 2024

wraymo left a comment

wraymo Jun 3, 2024

wraymo Jun 5, 2024

wraymo commented Jun 6, 2024

wraymo left a comment

kirkrodrigues commented Jun 6, 2024

clp-s: Add support for serializing structured arrays. #413

clp-s: Add support for serializing structured arrays. #413

Conversation

gibber9809 commented May 23, 2024 • edited Loading

Description

Validation performed

wraymo left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gibber9809 commented May 29, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wraymo left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wraymo commented Jun 6, 2024

wraymo left a comment

Choose a reason for hiding this comment

kirkrodrigues commented Jun 6, 2024

gibber9809 commented May 23, 2024 •

edited

Loading