Optimize JSON_REMOVE on `IndexedJsonDocument` #8103

nicktobey · 2024-07-03T22:06:38Z

This PR includes a new implementation of the JSON_REMOVE function that leverages the new indexed JSON storage format.

For JSON documents that span multiple chunks, only the affected chunks need to be loaded and modified, allowing operations to scale with the size of the removed value instead of the size of the entire document.

coffeegoddd · 2024-07-03T22:39:39Z

@nicktobey DOLT

comparing_percentages
100.000000 to 100.000000

version	result	total
`56eaca2`	ok	5937457

version	total_tests
`56eaca2`	5937457

correctness_percentage
100.0

Imran-imtiaz48

Overview:
The Go code involves JSON document manipulation, with functions handling insertion and removal of JSON elements. The recent changes introduce refactoring for improved modularity and readability.

Review:

Code Clarity and Modularity:
- The introduction of lookupByLocation and insertIntoCursor functions improves code modularity, making the main functions cleaner and more readable.
- Refactoring the path lookup and insertion logic into separate functions allows for easier maintenance and potential reuse.
Error Handling:
- Consistent error handling with returning nil, err ensures that any issues are propagated back to the caller for handling.
- The refactoring maintains the existing error handling patterns, which is good for consistency.
Logic Flow:
- The changes maintain the logical flow of the original code while separating concerns. This improves both readability and potential debugging.

Improvements:

Documentation:
- Consider adding comments to the new functions (lookupByLocation and insertIntoCursor) to explain their purpose and parameters. This will help future developers understand the code faster.
- Inline comments for key operations, such as path equivalence checks and cursor manipulations, can enhance understanding.
Error Messaging:
- Enhance error messages where possible to provide more context. For example, include the path being processed in the error messages to aid in debugging.
Testing:
- Ensure thorough unit testing for the new functions (lookupByLocation and insertIntoCursor). Test various edge cases, such as non-existent paths and boundary conditions.
- Validate performance implications, especially with large JSON documents, to ensure the refactoring doesn't introduce inefficiencies.
Code Consistency:
- Verify that the style and naming conventions are consistent across the refactored and existing code. Consistency aids readability and maintainability.

Summary:
The refactor improves code modularity and readability, making the functions cleaner and more maintainable. Enhancing documentation, error messaging, and thorough testing will further improve the robustness and usability of the code.

fulghum

Implementation looks great. Just a couple minor suggestions on method documentation that could make the code a little easier for new eyes to read.

fulghum · 2024-07-08T16:53:32Z

go/store/prolly/tree/json_cursor.go

@@ -50,24 +50,25 @@ func getPreviousKey(ctx context.Context, cur *cursor) ([]byte, error) {

 // newJsonCursor takes the root node of a prolly tree representing a JSON document, and creates a new JsonCursor for reading
 // JSON, starting at the specified location in the document.
-func newJsonCursor(ctx context.Context, ns NodeStore, root Node, startKey jsonLocation) (*JsonCursor, error) {
+func newJsonCursor(ctx context.Context, ns NodeStore, root Node, startKey jsonLocation, forRemoval bool) (*JsonCursor, bool, error) {


(minor) Would be nice to give the return values names, or to document them (the boolean in particular) so that it's easier for future readers to quickly understand what the boolean return param indicates.

Added a comment and return variable names.

fulghum · 2024-07-08T16:58:44Z

go/store/prolly/tree/json_cursor.go

@@ -122,20 +123,20 @@ func (j *JsonCursor) isKeyInChunk(path jsonLocation) bool {
 	return compareJsonLocations(path, nodeEndPosition) <= 0
 }

-func (j *JsonCursor) AdvanceToLocation(ctx context.Context, path jsonLocation) error {
+func (j *JsonCursor) AdvanceToLocation(ctx context.Context, path jsonLocation, forRemoval bool) (found bool, err error) {


Some method docs would help readers quickly understand the params without having to read the implementation; especially since this method is exposed outside the package, it would be helpful to add docs. For example, it seems important for callers to know that if the third param is passed as true, then the cursor position returned will be different. Currently, readers would have to read the implementation to figure out how to use this method correctly.

Added a docstring.

github-actions · 2024-07-08T21:56:29Z

Additional work is required for integration with DoltgreSQL.

coffeegoddd · 2024-07-08T22:25:12Z

@nicktobey DOLT

comparing_percentages
100.000000 to 100.000000

version	result	total
`716d65d`	ok	5937457

version	total_tests
`716d65d`	5937457

correctness_percentage
100.0

nicktobey added 3 commits July 3, 2024 14:53

Implement IndexedJSONDocument::Remove

d182f70

Add tests for removing from large (multi-chunk) JSON documents.

f1ba971

Refactor IndexedJsonDocument

56eaca2

coffeegoddd added the correctness_approved label Jul 3, 2024

nicktobey requested a review from fulghum July 3, 2024 22:59

Imran-imtiaz48 reviewed Jul 8, 2024

View reviewed changes

fulghum approved these changes Jul 8, 2024

View reviewed changes

Respond to PR feedback.

716d65d

nicktobey merged commit 094ba06 into main Jul 8, 2024
21 checks passed

nicktobey deleted the nicktobey/json-remove branch July 8, 2024 22:28

BrewTestBot mentioned this pull request Jul 12, 2024

dolt 1.41.4 Homebrew/homebrew-core#177098

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize JSON_REMOVE on `IndexedJsonDocument` #8103

Optimize JSON_REMOVE on `IndexedJsonDocument` #8103

nicktobey commented Jul 3, 2024

coffeegoddd commented Jul 3, 2024

Imran-imtiaz48 left a comment

fulghum left a comment

fulghum Jul 8, 2024

nicktobey Jul 8, 2024

fulghum Jul 8, 2024

nicktobey Jul 8, 2024

github-actions bot commented Jul 8, 2024

coffeegoddd commented Jul 8, 2024

Optimize JSON_REMOVE on IndexedJsonDocument #8103

Optimize JSON_REMOVE on IndexedJsonDocument #8103

Conversation

nicktobey commented Jul 3, 2024

coffeegoddd commented Jul 3, 2024

Imran-imtiaz48 left a comment

Choose a reason for hiding this comment

fulghum left a comment

Choose a reason for hiding this comment

fulghum Jul 8, 2024

Choose a reason for hiding this comment

nicktobey Jul 8, 2024

Choose a reason for hiding this comment

fulghum Jul 8, 2024

Choose a reason for hiding this comment

nicktobey Jul 8, 2024

Choose a reason for hiding this comment

github-actions bot commented Jul 8, 2024

coffeegoddd commented Jul 8, 2024

Optimize JSON_REMOVE on `IndexedJsonDocument` #8103

Optimize JSON_REMOVE on `IndexedJsonDocument` #8103