Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

large refactor of bitswap, implement wantmanager to manage wantlist #1218

Merged
merged 15 commits into from
May 21, 2015

Conversation

whyrusleeping
Copy link
Member

This PR moves all wantlist update logic into the wantmanager, this ensures that outgoing messages are synchronized and coalesced as needed. This greatly improves bitswap performance and correctness.

@whyrusleeping whyrusleeping added the status/in-progress In progress label May 9, 2015
@@ -35,30 +34,6 @@ func TestClose(t *testing.T) {
bitswap.Exchange.GetBlock(context.Background(), block.Key())
}

func TestProviderForKeyButNetworkCannotFind(t *testing.T) { // TODO revisit this
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this test made my head hurt. Pretty sure it wasnt testing anything useful.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well it's testing what error pops out of bitswap when: (a) we have a provider, but (b) it is not reachable. we need to keep these tests. if the name is too complicated, maybe:

TestUnreachableProvider

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what you describe is not what is being tested here. all it appears to be testing is that a context expires

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the test does appear to be testing it fine, see below. But the timeout may be way too small. maybe 1us not ns.

// create an identity and provide a block.
pinfo := p2ptestutil.RandTestBogusIdentityOrFatal(t)
rs.Client(pinfo).Provide(context.Background(), block.Key()) // but not on network   

solo := g.Next()    
defer solo.Exchange.Close()   

ctx, _ := context.WithTimeout(context.Background(), time.Nanosecond)    
// getting the block should fail, because the provider isn't there.
_, err := solo.Exchange.GetBlock(ctx, block.Key())    

// the way it fails is that thectx expires.
if err != context.DeadlineExceeded {

@whyrusleeping
Copy link
Member Author

The big problem here, is that we want backpressure everywhere but in bitswap...

@jbenet
Copy link
Member

jbenet commented May 9, 2015

So, doing some 'real world' testing on this, it doesnt appear that the net notifee thing actually works, i get messages going to and from peers that ive never received a 'Connected' notification from.

that's concerning. can you repro in an isolated case? i can sink into it and check it out

// TODO move this to peerstore
)

type PeerManager struct {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

may want to give an overview of what PeerManager is for here

@jbenet
Copy link
Member

jbenet commented May 9, 2015

without yet understanding it fully, this seems to me like a great step forward.

@whyrusleeping
Copy link
Member Author

without yet understanding it fully, this seems to me like a great step forward.

I hope so! 😄

@whyrusleeping whyrusleeping mentioned this pull request May 11, 2015
30 tasks
@whyrusleeping
Copy link
Member Author

@jbenet this is RFCR, i'm going to rebase in a little bit, but the big stuff is there. This is MUCH faster for transfer speeds on the open internet, theres an occasional pause of ten seconds that i'm still tracking down, but it never truly hangs or degrades. It will always continue after bitswaps 10 second 're-request' timer ticks. In practice, ive only seen the pause once every three or four runs.

if err != nil {
log.Error(err)
// TODO: cant connect, what now?
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's ok to check if you really wil send to mq.p. if there's a connection, this call returns early.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay, thoughts on what to do when it fails?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if ConnectTo fails-- connection failed. So it's like a disconnection, i'd say?

@jbenet
Copy link
Member

jbenet commented May 13, 2015

@whyrusleeping one thing i'm missing is: how do partial updates happen?

@whyrusleeping
Copy link
Member Author

partial wantlist updates happen the same way as they normally do. you just send your message through the peermanager instead of the network, and if multiple messages pile up while youre either dialing or sending a previous message, they get combined

@jbenet
Copy link
Member

jbenet commented May 13, 2015

you just send your message through the peermanager instead of the network

could you point me to funcs + flow? i'm failing to see it :(

@whyrusleeping
Copy link
Member Author

a partial update happen here: https://github.com/ipfs/go-ipfs/pull/1218/files#diff-9d5aff71f348a6f1cdcd472cd7bd4415R391

via a broadcast (which just means, send to everyone)

}()

if mq.wlmsg == nil || msg.Full() {
mq.wlmsg = msg
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if partial updates come this way too, not sure this should be getting set? maybe only if Full() ? not sure.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or is mq.wlmsg a message "staged" to be sent out?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, the wlmsg is the one we are staging to be sent out, and in runQueue we grab it, and send it out upon receiving the work signal

@jbenet
Copy link
Member

jbenet commented May 13, 2015

@whyrusleeping i don't grasp this changeset well-- the datastructs aren't obviously clear, for example:

type msgQueue struct {
 p peer.ID

 lk    sync.Mutex
 wlmsg bsmsg.BitSwapMessage

 work chan struct{}
 done chan struct{}
}

turns out this msgQueue is not as generic as the name implies-- no blocks go through it, only wantlist messages. and wlmsg is mostly a full wantlist message, unless it's nil. in which case it can be a partial one. it seems that this is just a message staged to be sent out. perhaps something like this is clearer:

type wantlistMsgQueue struct {
 p peer.ID

 lk   sync.Mutex  // guards  ... only next? or what else?
 next bsmsg.BitSwapMessage

 work chan struct{}
 done chan struct{}
}

the other datastructures could use some clarifying too. the control flow is hard.

maybe try drawing it? it's possible that drawing it will either reveal the complexity OR the simplicity -- i'm maybe not seeing it right.

@whyrusleeping
Copy link
Member Author

its only a full wantlist if the caller makes it a full wantlist. im not sure what youre seeing here

@whyrusleeping
Copy link
Member Author

    if mq.wlmsg == nil || msg.Full() {
        mq.wlmsg = msg
        return
    }

here, if we arent holding a message, we take the one being added.
OR
if the message being added is full, we overwrite whatever else we had.

Otherwise, down here, we add the entries from the passed in wantlist to the one we are holding, to combine them into one, saving us a send call

    // TODO: add a msg.Combine(...) method
    for _, e := range msg.Wantlist() {
        if e.Cancel {
            mq.wlmsg.Cancel(e.Key)
        } else {
            mq.wlmsg.AddEntry(e.Key, e.Priority)
        }
    }

@@ -326,7 +316,7 @@ func (bs *Bitswap) sendWantlistToProviders(ctx context.Context, entries []wantli

// TODO(brian): handle errors
func (bs *Bitswap) ReceiveMessage(ctx context.Context, p peer.ID, incoming bsmsg.BitSwapMessage) error {
defer log.EventBegin(ctx, "receiveMessage", p, incoming).Done()
//defer log.EventBegin(ctx, "receiveMessage", p, incoming).Done()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is version controlled, so we should delete instead of commenting-out code we want to kill.

@jbenet jbenet force-pushed the refactor/bitswap branch 2 times, most recently from 3c1e74f to b827e4e Compare May 20, 2015 17:21
@jbenet
Copy link
Member

jbenet commented May 21, 2015

@whyrusleeping b0eb0d2 breaks. all others pass.

@whyrusleeping
Copy link
Member Author

noted, ill rebase that out

jbenet added a commit that referenced this pull request May 21, 2015
large refactor of bitswap, implement wantmanager to manage wantlist
@jbenet jbenet merged commit 5853eac into master May 21, 2015
@jbenet jbenet removed the status/in-progress In progress label May 21, 2015
@jbenet jbenet deleted the refactor/bitswap branch May 21, 2015 19:47
@jbenet
Copy link
Member

jbenet commented May 21, 2015

After speaking with @whyrusleeping we decided to merge this in now, as it is already better than master.

@whyrusleeping
Copy link
Member Author

yeap, its better, not perfect. but we can address the new issues as they come up

@wking wking mentioned this pull request Jun 12, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants