Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize JSON_REMOVE on IndexedJsonDocument #8103

Merged
merged 4 commits into from
Jul 8, 2024
Merged

Conversation

nicktobey
Copy link
Contributor

This PR includes a new implementation of the JSON_REMOVE function that leverages the new indexed JSON storage format.

For JSON documents that span multiple chunks, only the affected chunks need to be loaded and modified, allowing operations to scale with the size of the removed value instead of the size of the entire document.

@coffeegoddd
Copy link
Contributor

@nicktobey DOLT

comparing_percentages
100.000000 to 100.000000
version result total
56eaca2 ok 5937457
version total_tests
56eaca2 5937457
correctness_percentage
100.0

Copy link

@Imran-imtiaz48 Imran-imtiaz48 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overview:
The Go code involves JSON document manipulation, with functions handling insertion and removal of JSON elements. The recent changes introduce refactoring for improved modularity and readability.

Review:

  1. Code Clarity and Modularity:

    • The introduction of lookupByLocation and insertIntoCursor functions improves code modularity, making the main functions cleaner and more readable.
    • Refactoring the path lookup and insertion logic into separate functions allows for easier maintenance and potential reuse.
  2. Error Handling:

    • Consistent error handling with returning nil, err ensures that any issues are propagated back to the caller for handling.
    • The refactoring maintains the existing error handling patterns, which is good for consistency.
  3. Logic Flow:

    • The changes maintain the logical flow of the original code while separating concerns. This improves both readability and potential debugging.

Improvements:

  1. Documentation:

    • Consider adding comments to the new functions (lookupByLocation and insertIntoCursor) to explain their purpose and parameters. This will help future developers understand the code faster.
    • Inline comments for key operations, such as path equivalence checks and cursor manipulations, can enhance understanding.
  2. Error Messaging:

    • Enhance error messages where possible to provide more context. For example, include the path being processed in the error messages to aid in debugging.
  3. Testing:

    • Ensure thorough unit testing for the new functions (lookupByLocation and insertIntoCursor). Test various edge cases, such as non-existent paths and boundary conditions.
    • Validate performance implications, especially with large JSON documents, to ensure the refactoring doesn't introduce inefficiencies.
  4. Code Consistency:

    • Verify that the style and naming conventions are consistent across the refactored and existing code. Consistency aids readability and maintainability.

Summary:
The refactor improves code modularity and readability, making the functions cleaner and more maintainable. Enhancing documentation, error messaging, and thorough testing will further improve the robustness and usability of the code.

Copy link
Contributor

@fulghum fulghum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implementation looks great. Just a couple minor suggestions on method documentation that could make the code a little easier for new eyes to read.

@@ -50,24 +50,25 @@ func getPreviousKey(ctx context.Context, cur *cursor) ([]byte, error) {

// newJsonCursor takes the root node of a prolly tree representing a JSON document, and creates a new JsonCursor for reading
// JSON, starting at the specified location in the document.
func newJsonCursor(ctx context.Context, ns NodeStore, root Node, startKey jsonLocation) (*JsonCursor, error) {
func newJsonCursor(ctx context.Context, ns NodeStore, root Node, startKey jsonLocation, forRemoval bool) (*JsonCursor, bool, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(minor) Would be nice to give the return values names, or to document them (the boolean in particular) so that it's easier for future readers to quickly understand what the boolean return param indicates.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a comment and return variable names.

@@ -122,20 +123,20 @@ func (j *JsonCursor) isKeyInChunk(path jsonLocation) bool {
return compareJsonLocations(path, nodeEndPosition) <= 0
}

func (j *JsonCursor) AdvanceToLocation(ctx context.Context, path jsonLocation) error {
func (j *JsonCursor) AdvanceToLocation(ctx context.Context, path jsonLocation, forRemoval bool) (found bool, err error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some method docs would help readers quickly understand the params without having to read the implementation; especially since this method is exposed outside the package, it would be helpful to add docs. For example, it seems important for callers to know that if the third param is passed as true, then the cursor position returned will be different. Currently, readers would have to read the implementation to figure out how to use this method correctly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a docstring.

Copy link

github-actions bot commented Jul 8, 2024

Additional work is required for integration with DoltgreSQL.

@coffeegoddd
Copy link
Contributor

@nicktobey DOLT

comparing_percentages
100.000000 to 100.000000
version result total
716d65d ok 5937457
version total_tests
716d65d 5937457
correctness_percentage
100.0

@nicktobey nicktobey merged commit 094ba06 into main Jul 8, 2024
21 checks passed
@nicktobey nicktobey deleted the nicktobey/json-remove branch July 8, 2024 22:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants