Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor storage index #3196

Merged
merged 47 commits into from
Nov 8, 2021

Conversation

cangfengzhs
Copy link
Contributor

What type of PR is this?

  • bug
  • feature

Which issue(s) this PR fixes:

Is part of issue 2745

What this PR does / why we need it?

In this PR, the index execution plan in storage is completely rewritten. In the original Index execution plan, querying index data and querying base data are completely separated, and the kv pairs of base data or kv pairs of index data are passed between the nodes of the execution plan, which leads to serious coupling between nodes , And can not deal with the problem of String type index being truncated.

This PR logically mainly contains the following modifications

  1. Graphd no longer performs interval conversion when the expression is converted to ColumnHint, and forcibly converts all intervals into left-closed and right-opened intervals, but transmits the original interval opening and closing information to storage without any truncation, otherwise the query will be found The result is wrong. (Currently only storage has been modified to adapt to this logic, but the modification of graphd requires another PR)
  2. Each node in the execution plan of the Index transfers Row instead of the original key-value uniformly. In addition, the processing method has been completely changed to iterative processing (may have a slight decrease in performance, not very sure. If necessary, batch processing can be added).
  3. Added a unittest to the Index execution plan. The previous unittest can no longer be compiled successfully.
  4. Encoding of double type index. Just a draft, in order to pass the unittest.

Special notes for your reviewer, ex. impact of this fix, etc:

This is a PR that has not yet been fully completed, and some comments need to be added, as well as the processing of some corner cases, and the supplement of unittest.

Just look at the overall execution logic.

Additional context:

Checklist:

  • Documentation affected (If need to modify document, please label it.)
  • Incompatible (If it is incompatile, please describle it and label it.)
  • Need to cherry pick (If need to cherry pick to some branchs, please label the destination version(s).)
  • Performance regression: Consumes more CPU
  • Performance regression: Consumes more Memory

Release notes:

Please confirm whether to reflect in release notes and how to describe:

                                                            `

Copy link
Contributor

@critical27 critical27 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally LGTM

/* Copyright (c) 2021 vesoft inc. All rights reserved.
*
* This source code is licensed under Apache 2.0 License,
* attached with Common Clause Condition 1.0, found in the LICENSES directory.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will this be removed ?

@bright-starry-sky
Copy link
Contributor

LGTM

@critical27
Copy link
Contributor

I'm ok with this PR

Copy link
Contributor

@critical27 critical27 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well done, great work

@codecov-commenter
Copy link

Codecov Report

Merging #3196 (2ff0617) into master (58f0b44) will increase coverage by 0.08%.
The diff coverage is 91.19%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #3196      +/-   ##
==========================================
+ Coverage   85.24%   85.32%   +0.08%     
==========================================
  Files        1295     1304       +9     
  Lines      118190   119900    +1710     
==========================================
+ Hits       100748   102306    +1558     
- Misses      17442    17594     +152     
Impacted Files Coverage Δ
src/storage/ExprVisitorBase.h 0.00% <0.00%> (ø)
src/storage/exec/StorageIterator.h 94.33% <ø> (+5.45%) ⬆️
src/storage/test/LookupIndexTest.cpp 100.00% <ø> (ø)
src/storage/exec/IndexSelectionNode.h 26.47% <26.47%> (ø)
src/storage/ExprVisitorBase.cpp 29.31% <29.31%> (ø)
src/storage/test/IndexTestUtil.h 66.53% <66.53%> (ø)
src/storage/exec/IndexLimitNode.cpp 66.66% <66.66%> (ø)
src/storage/exec/IndexVertexScanNode.cpp 90.41% <90.41%> (ø)
src/storage/exec/IndexNode.cpp 91.66% <91.66%> (ø)
src/storage/exec/IndexScanNode.cpp 92.50% <92.50%> (ø)
... and 52 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 58f0b44...2ff0617. Read the comment docs.

@liuyu85cn
Copy link
Contributor

LGTM

@critical27 critical27 merged commit 219d000 into vesoft-inc:master Nov 8, 2021
This was referenced Nov 12, 2021
yixinglu pushed a commit to yixinglu/nebula that referenced this pull request Mar 21, 2022
* stash

* commit IndexEdge/VertexScanNode

* commit IndexSelectionNode

* commit Projection/Dedup Node

* commit IndexLimitNode

* stash

* stash

* stash

* stash

* add index int test

* add double/string1 test

* add string2/string3 test

* finish index_scan_node unittest

* add index node test

* pass lookupindex test

* pass ttl test

* pass all unittest

* remove debug log

* fix bug

* fix bug

* fix bug

* clear file

* clear some useless code

* add comments for IndexNode

* add comments to IndexScanNode

* ad comment to Selection/Projection/DedupNode

* fix some init bug

* fix bug to support geo

* converge qualified strategy define together

* address comments

* address some comments; Modify IndexNode::next return Type

* fix bug

* add some comments

* Add blank lines between function definitions

* fix compile error

* modify new file license

* Modify test to avoid the bug mentioned in issue3191

* modify license error

Co-authored-by: hs.zhang <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ready-for-testing PR: ready for the CI test
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants