Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Improve DAG sync with highly concurrent updates #1031

Merged
merged 6 commits into from
Jan 17, 2023

Conversation

fredcarle
Copy link
Collaborator

Relevant issue(s)

Resolves #1029
Resolves #1030

Description

This PR resolves the eventual consistency bug when highly concurrent updates would cause the DAG syncer to fail reaching consistency. It also adds a simple retry mechanic to avoid sync issues when there is a transaction conflict on the head store.

Tasks

  • I made sure the code is well commented, particularly hard-to-understand areas.
  • I made sure the repository-held documentation is changed accordingly.
  • I made sure the pull request title adheres to the conventional commit style (the subset used in the project can be found in tools/configs/chglog/config.yml).
  • I made sure to discuss its limitations such as threats to validity, vulnerability to mistake and misuse, robustness to invalidation of assumptions, resource requirements, ...

How has this been tested?

make test

Specify the platform(s) on which this was tested:

  • MacOS

@fredcarle fredcarle added bug Something isn't working area/crdt Related to the (Merkle) CRDT system area/p2p Related to the p2p networking system action/no-benchmark Skips the action that runs the benchmark. labels Jan 13, 2023
@fredcarle fredcarle added this to the DefraDB v0.5 milestone Jan 13, 2023
@fredcarle fredcarle requested a review from a team January 13, 2023 02:33
@fredcarle fredcarle self-assigned this Jan 13, 2023
@fredcarle fredcarle changed the title fix: Improve eventual consistency with highly concurrent updates fix: Improve DAG sync with highly concurrent updates Jan 13, 2023
@codecov
Copy link

codecov bot commented Jan 13, 2023

Codecov Report

Merging #1031 (a856805) into develop (cb36b2d) will increase coverage by 0.02%.
The diff coverage is 61.84%.

Impacted file tree graph

@@             Coverage Diff             @@
##           develop    #1031      +/-   ##
===========================================
+ Coverage    57.79%   57.82%   +0.02%     
===========================================
  Files          173      173              
  Lines        19462    19495      +33     
===========================================
+ Hits         11249    11273      +24     
- Misses        7223     7232       +9     
  Partials       990      990              
Impacted Files Coverage Δ
client/errors.go 60.00% <0.00%> (-4.29%) ⬇️
config/configfile.go 75.00% <ø> (ø)
core/crdt/lwwreg.go 68.96% <40.00%> (-5.43%) ⬇️
net/process.go 74.62% <63.26%> (-0.97%) ⬇️
cli/start.go 44.04% <66.66%> (+0.75%) ⬆️
db/db.go 72.47% <66.66%> (-0.53%) ⬇️
config/config.go 75.58% <100.00%> (+0.07%) ⬆️
core/crdt/base.go 73.07% <100.00%> (ø)
net/pb/net.pb.go 15.49% <0.00%> (-0.17%) ⬇️
... and 3 more

Copy link
Member

@jsimnz jsimnz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A single pretty basic item is needed to change. Made a quick note aswell.

Comment on lines +152 to +156
// Do not use the first byte of the current value in the comparison.
// It's metadata that will falsify the result.
if len(curValue) > 0 {
curValue = curValue[1:]
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't exceptionally important. It can stay since technically its more accurate, but it wouldn't change the comparison either way, since its s a byte of metadata that corresponds to the CRDT type, which only get compared within the same type, so it does technically canel itself out.

But yes, technically this is more accurate, just wanted to add some insight.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It actually fails in Andy's PR if this is not done.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ohhh, yes you are right. I see. Thanks for the clarification.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was going to ask @fredcarle have you tested this using the tests in my PR? How do you know this PR makes a tangible improvement?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have yes. With a couple minor changes. Everything passes as it should.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice that's awesome - cheers Fred

net/process.go Outdated
Comment on lines 47 to 49
// TODO: Implement better transaction retry mechanics
// Github issue #1028
for {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: In the short term, can we apply a (perhaps configurable/option) limit to the number of retry attempts, instead of "infinite". Its unlikely to spin forever trying and constantly failing to commit, but just for santy / cleanlyness

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Absolutely. Will do.

Copy link
Member

@shahzadlone shahzadlone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor suggestion.

net/process.go Outdated
logging.NewKV("DocKey", dockey),
logging.NewKV("CID", c),
)
height := delta.GetPriority()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

todo: Move this below line 89 or do the suggestion on line 90.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is existing code that was simply moved inside a for loop. Do really want me to make the change even if it's unrelated to the the fix? Same would apply for your suggestion bellow.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can do it in a separate commit, you are touching this part of the code anyways and is a one liner.

This isn't worth making an issue for and I doubt we will remember to improve this.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good. I just wanted to know your opinion on that situation :)

net/process.go Outdated
)
height := delta.GetPriority()
ng := p.createNodeGetter(crdt, getter)
cids, err := crdt.Clock().ProcessNode(ctx, ng, c, height, delta, nd)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion:

Suggested change
cids, err := crdt.Clock().ProcessNode(ctx, ng, c, height, delta, nd)
cids, err := crdt.Clock().ProcessNode(ctx, ng, c, delta.GetPriority(), delta, nd)

Copy link
Contributor

@AndrewSisley AndrewSisley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Fred, looks good to me so long as there is some protection added against the infinite loop that John flagged :)

@fredcarle fredcarle force-pushed the fredcarle/fix/I1029-register-priority-fix branch from e129f6e to c949565 Compare January 16, 2023 23:17
@fredcarle fredcarle requested a review from jsimnz January 16, 2023 23:18
Copy link
Member

@jsimnz jsimnz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Minor suggestions and nitpicks, can change/merge as you see fit!

db/db.go Outdated Show resolved Hide resolved
net/process.go Outdated Show resolved Hide resolved
cli/start.go Outdated Show resolved Hide resolved
config/config.go Outdated Show resolved Hide resolved
client/db.go Outdated Show resolved Hide resolved
@fredcarle fredcarle force-pushed the fredcarle/fix/I1029-register-priority-fix branch from f92fc30 to a856805 Compare January 17, 2023 00:35
@fredcarle fredcarle merged commit 210423c into develop Jan 17, 2023
@fredcarle fredcarle deleted the fredcarle/fix/I1029-register-priority-fix branch January 17, 2023 01:43
shahzadlone pushed a commit that referenced this pull request Apr 13, 2023
Relevant issue(s)
Resolves #1029
Resolves #1030

Description
This PR resolves the eventual consistency bug when highly concurrent updates would cause the DAG syncer to fail reaching consistency. It also adds a simple retry mechanic to avoid sync issues when there is a transaction conflict on the head store.
shahzadlone pushed a commit to shahzadlone/defradb that referenced this pull request Feb 23, 2024
)

Relevant issue(s)
Resolves sourcenetwork#1029
Resolves sourcenetwork#1030

Description
This PR resolves the eventual consistency bug when highly concurrent updates would cause the DAG syncer to fail reaching consistency. It also adds a simple retry mechanic to avoid sync issues when there is a transaction conflict on the head store.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
action/no-benchmark Skips the action that runs the benchmark. area/crdt Related to the (Merkle) CRDT system area/p2p Related to the p2p networking system bug Something isn't working
Projects
None yet
4 participants