Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CBGT: Single manager per SG instance #1087

Closed
tleyden opened this issue Aug 21, 2015 · 7 comments
Closed

CBGT: Single manager per SG instance #1087

tleyden opened this issue Aug 21, 2015 · 7 comments
Assignees
Milestone

Comments

@tleyden
Copy link
Contributor

tleyden commented Aug 21, 2015

Currently we are spinning up a new cbgt manager for each bucket, however there should be a single manager for the entire sync gateway process, and it should create indexes for each bucket.

@tleyden tleyden added the chore label Aug 21, 2015
@tleyden tleyden self-assigned this Aug 21, 2015
@tleyden
Copy link
Contributor Author

tleyden commented Aug 24, 2015

@steveyen I ended up creating a separate pindex type for each bucket. Code snippet:

func (bucket CouchbaseBucket) StartTapFeed(..) {

                // Create the TapEvent feed channel that will be passed back to the caller
                eventFeed := make(chan sgbucket.TapEvent, 10)

                // wrap functions with closures that have references to channel
                newSyncGatewayPIndexImpl := NewSyncGatewayPIndexImplFactory(eventFeed, bucket, args)
                openSyncGatewayPIndexImpl := OpenSyncGatewayPIndexImplFactory(eventFeed, bucket, args)

                // register with CBGT -- we're creating a new index type for each bucket because
                // we need to pass in the closure-wrapped per-bucket eventFeed along with the
                // factory.
                cbgt.RegisterPIndexImplType(bucket.Name, &cbgt.PIndexImplType{
                        New:         newSyncGatewayPIndexImpl,
                        Open:        openSyncGatewayPIndexImpl,
                        Count:       nil,
                        Query:       nil,
                        Description: "Sync Gateway Pindex",
                })
}

The reason being is that for each bucket, the New and Open callback functions passed to cbgt.RegisterPIndexImplType must wrap (via a closure) a bucket-specific event channel. Using the closure was a way to avoid using globals to be able to find the eventFeed channel -- basically every time we receive a DataUpdate callback on the PIndex, we create an event and send it down this channel.

Can you foresee any problems with having a separate pindex type for every index?

@steveyen
Copy link
Member

Can you foresee any problems with having a separate pindex type for every index?

Yeah, underneath the hood, the cbgt.RegisterPIndexImplType is just using a simple global lookup map, and it was not envisioned to be a thing that changes after process init()'lization. So there's no locking around it.

And, code probably has additional assumptions that the lookup map is kinda small ("there shouldn't be too many PIndex types, right?"). :-)

So, your approach is for sure an unanticipated usage approach. Gut intuition is that's like like rubbing the wood against the grain, and might result in splinter(s).

Would recommend you just have a global lookup map from name to eventFeed channel somewhere in SGW code.

That'll put you more into same kinda of usage territory as how the rest of the world (well, cbft) is using cbgt... which will help remove a set of future mental anguish (like, "hmmm, maybe this bug or issue is just because of how SGW's using the pindex types in a special way that no-one else is using?")

tleyden pushed a commit that referenced this issue Aug 24, 2015
@tleyden
Copy link
Contributor Author

tleyden commented Aug 24, 2015

@steveyen thanks. While this approach does seem to be working so far, and the map would be small, we'll look into the alternative approach you suggested.

@steveyen
Copy link
Member

While this approach does seem to be working so far, ...

@tleyden - good to know. Has that feel of "wow! it works? cool!" :-)

Also, one more thing to help tip you over the fence -- all nodes should ideally have the exact same PIndex impl type registrations. When they're kinda static (like the "bleve" pindex type, and "sync-gw" pindex type), then that's easy to ensure -- registrations are hardly ever changing (at the speed of new feature releases).

When they're dynamic, like based on buckets that are coming & going on end user whims... that's harder to reason about what might happen in the planner, etc.

@tleyden
Copy link
Contributor Author

tleyden commented Aug 25, 2015

@steveyen I ran into a snag during implementation.

Here is the cbgt.RegisterPIndexImplType we are using:


func (sc *ServerContext) registerCbgtPindexType() error {

    // Register with CBGT
    cbgt.RegisterPIndexImplType(base.IndexTypeSyncGateway, &cbgt.PIndexImplType{
        New:         sc.NewSyncGatewayPIndexFactory,
        Open:        sc.OpenSyncGatewayPIndexFactory,
        Count:       nil,
        Query:       nil,
        Description: "Sync Gateway Pindex",
    })
    return nil

}

We've moved the NewSyncGatewayPIndexFactory up to the Sync Gw server initialization, and it uses the indexParams parameter to figure out which Sync Gateway database/bucket this CBGT index corresponds to. Code snippet below:

func (sc *ServerContext) NewSyncGatewayPIndexFactory(indexType, indexParams, path string, restart func()) (cbgt.PIndexImpl, cbgt.Dest, error) {

    // When we create the CBGT index, the indexParams is a string with the bucket name
    bucketName := indexParams

    // lookup the database context based on the indexParams
    dbName := sc.findDbByBucketName(bucketName)
    if dbName == "" {
        return nil, nil, fmt.Errorf("Could not find database for bucket name: %v", bucketName)
    }

    dbContext, ok := sc.databases_[dbName]
    if !ok {
        return nil, nil, fmt.Errorf("Could not find database for name: %v", dbName)
    }

    // get the bucket, 
    base := ...

    result := base.NewSyncGatewayPIndex(bucket.SendTapEvents, bucket, ..)
    sc.CBGTPindexImpl = result

    return result, result, nil

}

But when the OpenSyncGatewayPIndexFactory is called (as opposed to NewSyncGatewayPIndexFactory), it has no indexParams being passed to it, and so it just looks up the cbgt.Dest / cbgt.PIndexImpl that was previously saved on the ServerContext and returns it:

func (sc *ServerContext) OpenSyncGatewayPIndexFactory(indexType, path string, restart func()) (cbgt.PIndexImpl, cbgt.Dest, error) {

    log.Printf("OpenSyncGatewayPIndexFactory indexType: %v path: %v", indexType, path)

    return sc.CBGTPindexImpl, sc.CBGTPindexImpl, nil

}

This should work as long as NewSyncGatewayPIndexFactory is called before OpenSyncGatewayPIndexFactory for a given index on a given Sync Gateway process. Is that guaranteed? Actually, it would be useful if you could give some review on when OpenSyncGatewayPIndexFactory would be called.

If OpenSyncGatewayPIndexFactory can be called without a prior call to NewSyncGatewayPIndexFactory, would it be possible to add the indexParams parameter?

@tleyden
Copy link
Contributor Author

tleyden commented Aug 25, 2015

Answer: OpenSyncGatewayPIndexFactory will be called on restart if previous run called NewSyncGatewayPIndexFactory

The indexParams should be stashed by NewSyncGatewayPIndexFactory, and retrieved by OpenSyncGatewayPIndexFactory

Examples:

https://github.com/couchbaselabs/cbmirror/blob/master/dcr_pindex.go#L44

https://github.com/couchbaselabs/cbft/blob/master/pindex_bleve.go#L144

tleyden pushed a commit that referenced this issue Aug 26, 2015
Only cbgt.RegisterPIndexImplType once during server context init

#1087
@tleyden
Copy link
Contributor Author

tleyden commented Aug 26, 2015

TODO:

  • Pass channel_index bucket
  • Rework sg-bucket to have WriteEvents

tleyden pushed a commit that referenced this issue Aug 27, 2015
Only cbgt.RegisterPIndexImplType once during server context init

#1087
tleyden pushed a commit to couchbase/sg-bucket that referenced this issue Aug 27, 2015
tleyden pushed a commit to couchbaselabs/walrus that referenced this issue Aug 27, 2015
tleyden pushed a commit that referenced this issue Aug 27, 2015
Only cbgt.RegisterPIndexImplType once during server context init

#1087
@tleyden tleyden closed this as completed Aug 28, 2015
@zgramana zgramana added this to the 1.2.0 milestone Aug 30, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants