Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Chain export inefficiencies #4994

Open
austinabell opened this issue Nov 24, 2020 · 3 comments
Open

Chain export inefficiencies #4994

austinabell opened this issue Nov 24, 2020 · 3 comments
Assignees
Labels
area/chain Area: Chain kind/enhancement Kind: Enhancement need/analysis Hint: Needs Analysis

Comments

@austinabell
Copy link
Contributor

Describe the bug
A clear and concise description of what the bug is.

When walking a snapshot, ipld nodes are loaded 2x more than they need to, as well as redundant data structures (2 hashsets and 1 array to do what it seems 1 hashset could)

For extra blockstore reads:
Whenever the callback happens, the Cid is loaded from the blockstore (

blk, err := cs.bs.Get(c)
), but also in addition to this, it's also read when it's used during walking (
data, err := cs.bs.Get(blk)
,
data, err := bs.Get(root)
, ...)

Extra data structures:

  • seen and walked hashsets (

    seen := cid.NewSet()
    ) They are updated in lockstep, but why would you need two hashsets for this, either it's been seen and it should be ignored, or it hasn't and it should be recursed.

  • cids and out slices (

    var cids []cid.Cid
    ,
    out := cids
    ) My memory is failing as to if this would reallocate a new slice or it's just assigning the pointer in go, but neither of these should be needed. They are being added to this vector, then checked against the seen hashset, when this could be done before it's added

Let me know if I'm missing something here, I've implemented without these (using the callback to pass back the loaded bytes and only one hashset) and it seems to work the same, but I could be missing an edge case. This also doesn't matter that much to optimize, so probably a low priority ticket.

To Reproduce
Steps to reproduce the behavior:

  1. Run '...'
  2. See error

Expected behavior
A clear and concise description of what you expected to happen.

Screenshots
If applicable, add screenshots to help explain your problem.

Version (run lotus version):

Additional context
Add any other context about the problem here.

@raulk
Copy link
Member

raulk commented Nov 25, 2020

@austinabell ditto. I've ran into these inefficiencies when trying to produce a full snapshot. The current logic tries to remove duplicates, but it could be done differently.

@jennijuju
Copy link
Member

@vyzo mentioned he may be interested in this😉

@jennijuju jennijuju added kind/enhancement Kind: Enhancement and removed kind/improvement labels Jan 11, 2021
@jennijuju jennijuju added area/chain Area: Chain and removed area/chain/state labels Jan 20, 2021
@jennijuju jennijuju added this to the █Blockstore Improvements milestone May 10, 2021
@TippyFlitsUK TippyFlitsUK added the need/analysis Hint: Needs Analysis label May 3, 2022
@Kubuxu
Copy link
Contributor

Kubuxu commented Feb 7, 2023

Another thing of note is that lotus doesn't seem to stream the export CARs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/chain Area: Chain kind/enhancement Kind: Enhancement need/analysis Hint: Needs Analysis
Projects
None yet
Development

No branches or pull requests

6 participants