Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CBG-3688 make sure kv_pool_size is overriden correctly for DCP #6680

Merged
merged 6 commits into from
Feb 15, 2024

Conversation

torcolvin
Copy link
Collaborator

@torcolvin torcolvin commented Feb 12, 2024

  • create two functions for getting the connection string
    • GetGoCBConnString
    • GetGoCBConnStringForDCP

Simplifications to API:

  • Fix a confusing invariant where GetGoCBConnString would initialize KvPoolSize on BucketSpec by requiring a constructor that does string parsing of server value to pull out kv_pool_size. This will fail if an invalid kv_pool_size value is created.

  • Move the creation of modified connection string for the bootstrap connection to the caller (rest/main.go). This means that both serverless and regular callers will have values set on the connection string for bootstrap connection. Before these values were only set for serverless bootstrap:

    • max_perhost_idle_http_connections
    • max_idle_http_connections
    • idle_http_connection_timeout

Questions:

  • We only set ca_cert_path for Bucket connections and not for the bootstrap connection? The only way that I imagine that this works is if the ca cert is in the system already or embedded in the pem file. I tried to use our x509 tests but they are broken because they don't run automatically. I tried to fix them but it is somewhat non trivial.

Pre-review checklist

  • Removed debug logging (fmt.Print, log.Print, ...)
  • Logging sensitive data? Make sure it's tagged (e.g. base.UD(docID), base.MD(dbName))
  • Updated relevant information in the API specifications (such as endpoint descriptions, schemas, ...) in docs/api

Integration Tests

- create two functions for getting the connection string
	- GetGoCBConnString
	- GetGoCBConnStringForDCP

Simplifications to API:

- Fix a confusing invariant where GetGoCBConnString would initialize
KvPoolSize on BucketSpec by requiring a constructor that does string
parsing of server value to pull out kv_pool_size. This will fail if an
invalid kv_pool_size value is created.

- Move the creation of modified connection string for the bootstrap
  connection to the caller (rest/main.go). This means that both
  serverless and regular callers will have values set on the connection
  string for bootstrap connection. Before these values were only set for
  serverless bootstrap:
  	- max_perhost_idle_http_connections
	- max_idle_http_connections
	- idle_http_connection_timeout
Copy link
Collaborator

@adamcfraser adamcfraser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few comments/questions.

base/bucket.go Outdated
@@ -119,6 +118,11 @@ type CouchbaseBucketType int

// Full specification of how to connect to a bucket
type BucketSpec struct {
BucketSpecOptions
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This refactor feels arbitrary - I don't think KvPoolSize is any more or less of an 'option' than things like BucketOpTimeout. What's the problem that this refactor fixes?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

KvPoolSize isn't an input parameter, it is only ever read from the connection string. It is parsed from the connection string and then stored when creating a bucket to determine how to throttle kvops in the below code.

	numPools := 1
	if spec.KvPoolSize > 0 {
		numPools = spec.KvPoolSize
	}
	gocbv2Bucket.kvOps = make(chan struct{}, MaxConcurrentSingleOps*nodeCount*numPools)

A different way of addressing the immutability of the struct would be to never set this value and avoid the use of a constructor for this struct. Then, inside GetGocbV2BucketFromCluster you could call getKvPoolSize(spec.Server) which would reparse the connection string. The reason I didn't do it like this was to only parse the connection string once but this is such a cheap operation (string parsing) compared to bucket opening that it would be fine.

Do you think this would be more readable?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not set on 2f038b9 but it simplifies the PR.

This will behave identically to the original PR but in order for serverless values to work, you have to modify the connection string on the BucketSpec to have kv_pool_size=1. This is done in GetBucketSpec anyway to add dcp_buffer_size and kv_buffer_size custom values so I think this is probably a reasonable refactor.

The downsides (which aren't bad):

  • the setting of an implicit kv_pool_size is set as DefaultGocbKvPoolSize inside some functions - BucketSpec.getGoCBConnString and BucketSpec.GetKvPoolSize and they have the potential to get out of sync.
  • the place that we will validate that we haven't passed kv_pool_size twice in a connection string will take place when opening a bucket and not loading a database. This is unlikely behavior we didn't test for anyway.
  • we parse the connection string twice when creating a Bucket, once to get the connection string for a cluster and another for figuring out the kv_pool_size. This is a cheap string operation vs a kv op for a rare condition.

Copy link
Collaborator

@adamcfraser adamcfraser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall this makes sense. Added a suggestion to go further with removing KvPoolSize from the bucket spec and just extract it from a computed connstring as needed.

if spec.KvPoolSize > 0 {
numPools = spec.KvPoolSize

numPools, err := spec.GetKvPoolSize()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It feels like GetKvPoolSize should be a helper function that takes a connString, and doesn't use the spec? That way we can be certain that the value here matches what's actually getting sent to gocb (on the conn string), and we don't need to have any logic inside GetKvPoolSize around default values, etc.

That would also let us get rid of the test-only getKVConnectionPol [sic], and just use the new connString based function for test verification.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed this to a generic helper function so I could use it for kv_pool_size and kv_buffer_size.

@@ -296,7 +291,7 @@ func initCBGTManager(ctx context.Context, bucket Bucket, spec BucketSpec, cfgSG

// Since cbgt initializes a buffer per CBS node per partition in most cases (vbuckets in partitions can't be grouped by CBS node),
// setting the small buffer size used in cbgt 1.3.2. (see CBG-3341 for potential optimization of this value)
options["kvConnectionBufferSize"] = "16384"
options["kvConnectionBufferSize"] = "16384" // This value will be overridden by the BucketSpec connection string.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it might be appropriate to log something at warn in the case where the connection string is setting a larger value for this, to help us triage possible memory issues that would result. The message would be along the lines of "dcp_buffer_size is set to %d on the connection string - this will result in increased memory usage"

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implemented this idea.

@adamcfraser adamcfraser assigned torcolvin and unassigned adamcfraser Feb 14, 2024
@torcolvin torcolvin assigned adamcfraser and unassigned torcolvin Feb 15, 2024
Copy link
Collaborator

@adamcfraser adamcfraser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally looks good, wanted one bit of clarification.

@@ -455,9 +455,33 @@ func (sc *ServerContext) getOrAddDatabaseFromConfig(ctx context.Context, config
return sc._getOrAddDatabaseFromConfig(ctx, config, options)
}

func GetBucketSpec(ctx context.Context, config *DatabaseConfig, serverConfig *StartupConfig) (spec base.BucketSpec, err error) {
// GetBucketSpec returns a BucketSpec from a given DatabaseConfig and StartupConfig.
func GetBucketSpec(ctx context.Context, config *DatabaseConfig, serverConfig *StartupConfig) (*base.BucketSpec, error) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the benefit in changing to a pointer return for BucketSpec here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had changed it to a pointer in a previous iteration because the value that it returns is effectively nil and you wouldn't want to use it. It doesn't matter in this case because we check the error value of the function and performing a construction/copy on BucketSpec is rare and cheap.

I restored the previous behavior.

@torcolvin torcolvin merged commit 6cd38f1 into master Feb 15, 2024
30 checks passed
@torcolvin torcolvin deleted the CBG-3688 branch February 15, 2024 23:54
bbrks pushed a commit that referenced this pull request Mar 28, 2024
* CBG-3688 make sure kv_pool_size is overriden correctly for DCP

- create two functions for getting the connection string
	- GetGoCBConnString
	- GetGoCBConnStringForDCP

Simplifications to API:

- Fix a confusing invariant where GetGoCBConnString would initialize KvPoolSize on BucketSpec but making BucketSpec stateless and moving operations to be performed on an connection string
- Remove redundant ca_cert_path, set by auth
- Centralize modifying the connection string defaults for serverless on Bootstrap and Bucket creation.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants