Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make the schemas declarative, add a new schema for better load balancing. #262

Merged
merged 2 commits into from
Feb 6, 2017

Conversation

tomwilkie
Copy link
Contributor

@tomwilkie tomwilkie commented Feb 2, 2017

Part of #254

  • Define a Schema interface for calculating the hash and range keys to use.
  • Make implementations of this interface for all the different schema's we've had.
  • Make a "CompositeSchema" which delegates to various schema's depending on when they we activated.
  • Update chunk store to use this; in particular, invert the order of the merge and intersect in the query path.
  • Bunch of testing

TODO:

@tomwilkie tomwilkie self-assigned this Feb 2, 2017
@tomwilkie tomwilkie force-pushed the 254-label-in-hash-key branch 2 times, most recently from a1b68bd to 97dc83d Compare February 2, 2017 20:58
@tomwilkie tomwilkie force-pushed the 254-label-in-hash-key branch 2 times, most recently from e304e0e to ef4929c Compare February 3, 2017 16:11
Copy link
Contributor

@jml jml left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a quick look at chunk/schema.go for now. More to follow later.

chunk/schema.go Outdated
// to write or read chunks from the external index.
type Schema interface {
// When doing a write, use this method to return the list of Buckets you should write to.
WriteBuckets(from, through model.Time, userID string, metricName model.LabelValue) []WriteBucket
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rename this to GetWriteBuckets because it doesn't actually write buckets.

chunk/schema.go Outdated

// When doing a read, use these method to return the list of Buckets you should query
ReadBucketsForMetric(from, through model.Time, userID string, metricName model.LabelValue) []ReadBucket
ReadBucketsForMetricLabel(from, through model.Time, userID string, metricName model.LabelValue, labelName model.LabelName) []ReadBucket
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same, GetRead...

chunk/schema.go Outdated
TableName() string
StartTime() model.Time

// For writing, pass metrics here and we'll tell you what range keys to write
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Worth mentioning how / why this might fail?

chunk/schema.go Outdated
HashKey() string
StartTime() model.Time

// For reading from this bucket, use this functions to construct a key that will prefix match the desired chunks
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/this/these/

chunk/schema.go Outdated
StartTime() model.Time

// For reading from this bucket, use this functions to construct a key that will prefix match the desired chunks
AllLabelsPrefix() ([]byte, error)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're going this far, why not type RangeKey = []byte (if that is the right thing)

chunk/schema.go Outdated
}

if !sort.IsSorted(byStart(schemas)) {
fmt.Printf("%+v", schemas)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly

return a
}
return b
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lolgo

chunk/schema.go Outdated
nextSchemaStarts = c.schemas[i+1].start
}
end := min(through, nextSchemaStarts)
callback(start, end, c.schemas[i].Schema)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The callback thing seems a bit weird. I can see how it leads to briefer code below, but surely defining a struct and returning a slice / sending to a channel is more traditional.

chunk/schema.go Outdated
f.Var(&cfg.V4SchemaFrom, "dynamodb.v4-schema-from", "The date (in the format YYYY-MM-DD) after which we enable v4 schema.")
}

type compositeSchema struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This deserves a doc comment saying what it's for.

chunk/schema.go Outdated
func (b *originalBucket) SpecificValuePrefix(value model.LabelValue) ([]byte, error) {
if b.labelName == "" {
panic("SpecificValuePrefix called on bucket not bound to specific label!")
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have a constructive alternative (yet), but these panics point to the interface not being quite right.

@tomwilkie tomwilkie changed the title [WIP] Make the schemas declarative. Make the schemas declarative, add a new schema for better load balancing. Feb 5, 2017
chunk/schema.go Outdated
return cfg.TablePrefix + strconv.Itoa(int(bucketStart/int64(cfg.TablePeriod/time.Second)))
}

// componsiteSchema is a Schema which deletegates to various schemas depending
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

componsite -> composite
deletegates -> delegates

chunk/schema.go Outdated
})
i--

// next, find the schema with the highest start _after_ through
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't it the lowest, not the highest, here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes it is. I think code is correct; will update comment.

chunk/schema.go Outdated
i := sort.Search(len(c.schemas), func(i int) bool {
return c.schemas[i].start > from
})
i--
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the found schema is the 0th, can this not become -1 and then the code below crashes (c.schemas[i])?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True; in reality I didn't think this would happen (as we stick a schema in at 0), but I guess someone could send a sample from before 1970 and crash us...

@juliusv
Copy link
Contributor

juliusv commented Feb 6, 2017

Looks pretty good to me besides minor comments!

- Define a Schema interface for calculating the hash and range keys to use.
- Make implementations of this interface for all the different schema's we've had.
- Make a "CompositeSchema" which delegates to various schema's depending on when they we activated.
- Update chunk store to use this; in particular, invert the order of the merge and intersect in the query path.
- Add v4 schema, which puts the label key in the hash key for better load distribution
- Remove query related-metrics, they're not used.

Bunch of testing:
- Test the hash key generators, range key generators and composite scheme
- Run the chunk store tests against each schema.
- Resolve issue found from extending the tests - don't check results match label name, thats implied.
- DynamoDB won't accept empty keys; they should just be omitted.
- Deal with case where two schemas start at the same time.
@tomwilkie tomwilkie force-pushed the 254-label-in-hash-key branch from fae601f to ee0abb5 Compare February 6, 2017 14:42
@juliusv
Copy link
Contributor

juliusv commented Feb 6, 2017

👍

@tomwilkie tomwilkie merged commit f66f505 into master Feb 6, 2017
@tomwilkie tomwilkie deleted the 254-label-in-hash-key branch February 6, 2017 15:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants