-
Notifications
You must be signed in to change notification settings - Fork 53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Allow new fields to be added locally to schema #1139
Conversation
f0f552c
to
05a2f0e
Compare
Codecov Report
@@ Coverage Diff @@
## develop #1139 +/- ##
===========================================
+ Coverage 68.23% 68.28% +0.05%
===========================================
Files 181 181
Lines 16617 17007 +390
===========================================
+ Hits 11338 11613 +275
- Misses 4337 4426 +89
- Partials 942 968 +26
|
075b49f
to
dc79dc9
Compare
5c7ddef
to
319aa94
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did a quick overview but didn't spend enough time on the details to approve it and I'm not sure how much time I'll have to spend on this in the coming days. As far as I can tell though, it looks good.
client/db.go
Outdated
// collection. Will return false and an error if it fails validation. | ||
ValidateUpdateCollectionTxn(context.Context, datastore.Txn, CollectionDescription) (bool, error) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion: "Will return false and and error" is a bit misleading here as it the doc string makes it sound as if we either get true and no error or false and error. But looking at the function I see that it can also return false and no error. To avoid that confusion I would simply say "Will return an error if it fails validation".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agreed, cheers Fred - I'll change this.
- reword bool return documentation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this need to be publicly exposed on the client interface? Cant it just be privatee within the UpdateCollection
? Whats the benefit of publicly calling Validate
then Update
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like its only used internally anyway?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whats the benefit of publicly calling Validate then Update?
It is quite common to want to validate something without immediate execution, and many places consider it best practice to expose validation functions by default. I'd probably want to use this as an app dev, and it seems reasonably likely that there'd be others too.
EDIT: Just noting that Orpheus also wanted this via the CLI and there is an open ticket for it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In that case, should this not be done from the perspective of the Patch DDL then for external use? At the moment, they would need to somehow (not that hard since its a patch, but we are delegating that responsibilty to the dev in this API) produce the new description objects, and then call this API.
Compared to calling something like db.ValidateUpdatePatch(ctx, patch)
. Which would essentially apply the patch in a temp object and call validateDescription
anyway.
Putting this API along the Schema
related APIs instead of the Collection
APIs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We'd want both, as updates can be either a json patch or collectionDescription update. This (the collectionDescription varient) already existed as a function so adding it to the interface was all the effort required.
Adding a ValidatePatch function would be slightly more effort and a scope expansion (ValidateUpdateCollectionTxn is already tested via UpdateCollectionTxn, whereas ValidatePatch would be untested).
Do you really dislike the adding of just this one now?
db/collection.go
Outdated
// ValidateUpdateCollectionTxn validates that the given collection description is a valid update. | ||
// | ||
// Will return true if the given desctiption differs from the current persisted state of the | ||
// collection. Will return false and an error if it fails validation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion: Same as above :)
{ "op": "add", "path": "/Users/Schema/Fields/-", "value": {"Name": "Foo", "Kind": 2, "Typ":3} } | ||
] | ||
`, | ||
ExpectedError: "only default or LWW (last writer wins) CRDT types are supported. Name: Foo, CRDTType: 3", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
using exact string to assert certain behavior is in general a bad practice.
Is it our code convention or more like lets-stick-to-it-for-now approach?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here I wish to match the whole string, as it is all relevant to whether the error is the expected.
ExpectedError
does work on partials, and we use it that way, but here I do want the whole string.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in this case the problem is not with entire vs partial string matching but with string matching in general.
The test should assert that certain error occurred and the test should fail only if the no error occurred or if some other error occurred instead of expected one.
With current change the test might fail for absolutely unrelated changes like text message formatting.
For this we will have to change we go about error handling.
But I guess this is a discussion for a bigger round.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, you mean protection against line breaks etc?
I think that would be a really easy change - should just be a couple of lines in tests/integration/utils2.go (~ln 848).
Shouldn't matter in the short term as the error text/formatting are managed by devs atm and if they are deliberately changing the formatting they can also deliberately change the expected error (or just make that tweak to errors.Is).
{ | ||
"_key": "bae-43deba43-f2bc-59f4-9056-fef661b22832", | ||
"Name": "John", | ||
"Email": nil, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks like this test consumed the previous one.
|
||
func TestSchemaUpdatesAddFieldWithCreateWithUpdateAfterSchemaUpdateAndVersionJoin(t *testing.T) { | ||
initialSchemaVersionId := "bafkreicg3xcpjlt3ecguykpcjrdx5ogi4n7cq2fultyr6vippqdxnrny3u" | ||
updatedSchemaVersionId := "bafkreicnj2kiq6vqxozxnhrc4mlbkdp5rr44awaetn5x5hcdymk6lxrxdy" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not const
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why bother with the extra word?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's semantics. If it const I, the reader of the code, will note straight away that the values aren't supposed to change. Otherwise I will have to go through the code to see why it is a variable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
:) I dont think that is something to worry about in our tests, the possiblities for mutation are quite small here and if anyone found a reason to mutate anything in a test declaration then that would probs/hopefully be flagged and removed in review anyway.
If you feel strongly about this, I'd suggest we add a linter (outside this PR). In most other langs I'd consider it a no-brainer, as it is a conscious choice between var
and const
, however Go's :=
makes it a bit awkward and a poor use of time and mental energy.
Request: `query { | ||
Users { | ||
Name | ||
Foo |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see anything bool-array-related here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Kind: 3
represents a field of type boolean array.
We plan on allowing string based defining of these values in the near future, and when the happens we'll get tests that use the more descriptive strings, but for the short term we (and any users) will need to deal with the uint8 representations.
type Users { | ||
Name: String | ||
} | ||
`, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these schema is repeated several times and some other things. Why not to extract to helper functions?
The test could look like this:
test := testUtils.TestCase{
Description: "Test schema update, add field with kind datetime (10)",
Actions: []any{
testUtils.SchemaUpdate{
Schema: getTestSchema(),
},
testUtils.SchemaPatch{
Patch: getTestPatchWithKind(10),
},
testUtils.Request{
Request: getTestQueryWithFoo(),
Results: []map[string]any{},
},
},
}
testUtils.ExecuteTestCase(t, []string{"Users"}, test)
or even like this:
test := testUtils.TestCase{
Description: "Test schema update, add field with kind datetime (10)",
Actions: []any{
getTestSchema(),
getTestPatchWithKind(10),
testUtils.Request{
Request: getTestQueryWithFoo(),
Results: []map[string]any{},
},
},
}
testUtils.ExecuteTestCase(t, []string{"Users"}, test)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have quite a strong preference for not using helps like that in tests, whilst they reduce the number of lines they often obscure what is under test.
EDIT: To Expand: Here I want to see the actions that will be used by the users. Not only do I think it makes it much easier to see what is actually being done in the test, I have a much better idea as to how it might look and feel to the users. It also reduces the complexity of the test, something I consider particularly important for part-timers/newcomers to the test - including Defra users - the documentation aspect of these tests is very important to me, not just what the host machine executes.
EDIT2: Helper functions and over-sharing of code often results in tests that do far more than they need to, further obscuring what is actually under test. For example we currently do this a lot with test schema - most of the fields are not relevant to most of the tests - it is a tolerable trade-off for just schema perhaps, but I really don't want it spreading (and e.g. standardising test queries so they return 10 fields when each test only cares about 1).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see your points. They are absolutely valid concerns. And, as you mentioned earlier in one of the comments, there are always trade-offs.
I just would like to bring up some other points:
One of the (often underrated) properties of tests is low-level documentation. Which means that in order to understand how a system under test works one should just read the tests. And to do this effectively the documentation should be as concise and informative as possible. For tests to be concise they should contain only information relevant to the test itself. All the boiler-plate should be hidden away in some test set up stage.
In our specific case there are many repetitive parts that are not necessarily related to the test, hence the tests look very similar. As a result, it takes some effort and mental text-diff skills to spot what exactly makes one test special in comparison to others.
I think your approach would fit more to systems with less complexity and therefore fewer text cases.
But maybe it's more general discussion.
}, | ||
}, | ||
} | ||
testUtils.ExecuteTestCase(t, []string{"Users"}, test) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
again the same test. Looks like these many of them here.
testUtils.SchemaPatch{ | ||
Patch: ` | ||
[ | ||
{ "op": "add", "path": "/Users/Schema/Fields/-", "value": {"Name": "Foo", "Kind": 19} } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not to use const or enum for Kind values?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As in fmt.Sprintf
them in (the enum does already exist in the prod code)?
The users can supply raw 19
s, so that is what we should test, hiding it behind a test const hides the hides the ugliness of the production interface, and using the production const changes the concept under test (as well as hiding the ugliness).
@@ -49,5 +50,5 @@ type Parser interface { | |||
ParseSDL(ctx context.Context, schemaString string) ([]client.CollectionDescription, error) | |||
|
|||
// Adds the given schema to this parser's model. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question: Is the documentation still valid for SetSchema
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, what it does conceptually has not changed. The only difference is that changes will only be applied if/until the txn is successfully committed.
The doc could probably be expanded to mention this (although it is somewhat a universal implicit-given for anything transaction-based), but it is internal, not public, and if left up to me laziness would probably win out. Do you want it expanded?
errCollectionIDDoesntMatch string = "CollectionID does not match existing" | ||
errSchemaIDDoesntMatch string = "SchemaID does not match existing" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question: Why are some error messages starting with a capitalized char
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are property names, the name of the property is CollectionID
, collectionID
does not exist.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The alternative would be to describe the property instead of naming it collection ID does not match existing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couple of little things. Overall I like the direction and functionality.
client/db.go
Outdated
// collection. Will return false and an error if it fails validation. | ||
ValidateUpdateCollectionTxn(context.Context, datastore.Txn, CollectionDescription) (bool, error) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this need to be publicly exposed on the client interface? Cant it just be privatee within the UpdateCollection
? Whats the benefit of publicly calling Validate
then Update
?
client/db.go
Outdated
// collection. Will return false and an error if it fails validation. | ||
ValidateUpdateCollectionTxn(context.Context, datastore.Txn, CollectionDescription) (bool, error) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like its only used internally anyway?
db/collection.go
Outdated
var hasChanged bool | ||
existingCollection, err := db.GetCollectionByNameTxn(ctx, txn, proposedDesc.Name) | ||
if err != nil { | ||
if err.Error() == "datastore: key not found" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
todo: Similar to standup convo on tests, but lets avoid doing direct string comparison. This particular error comes from ds.ErrNotFound
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
😆 Cheers lol - I spent a while looking for that err const as I thought it existed but never found it.
- replace with ds.ErrNotFound
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
err = txn.Commit(ctx) | ||
return col, err |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nitpick: this can be merge into a single line
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure how I feel about that, it reduces the importance of the Commit
call making it appear as something of an afterthought. May or may not change
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fair :)
err = txn.Commit(ctx) | ||
return col, err |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nitpick: merge lines
@@ -161,32 +161,46 @@ func (db *db) initialize(ctx context.Context) error { | |||
db.glock.Lock() | |||
defer db.glock.Unlock() | |||
|
|||
txn, err := db.NewTxn(ctx, false) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question: Does initialize
really need a txn. Not against it, but its run one on startup before anything else. Just a thought.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was documented in the commit message:
DB init was also covered almost as a side-affect, as the sequence needed protecting, and TBH it is probably a very good thing to protect against mutating the database state in the case of a failed init.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for pointing out. Realized things afterwards as well 👍
// The collections (including the schema version ID) will only be updated if any changes have actually | ||
// been made, if the net result of the patch matches the current persisted description then no changes | ||
// will be applied. | ||
func (db *db) PatchSchema(ctx context.Context, patchString string) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question: Obviously we have a transaction in place to handle the PatchSchem
call. But should there be any additional locks for safety. eg a RWLock
that is read locked on basically all ops but write locked on PatchSchema
acting as a schema lock.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What safety do you think you would gain?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A more explicit semantic that is enforced beyond the the transactions. Makes it clear that schema updates are clearly intended to be a protected action that locks the DB until completed.
Since at the moment, other transactions can run along side the schema update. The only thing the transaction protects us from shared keysets, which is unlikely.
Most databases don't support "online" or "lock-free" schema updates as theyre rather complex to get right. We're not going to lose anything by adding a lock as its not intended to be a highly concurrent/perf sensitive action. And it gives us a lot of safety regarding the semantics of other ongoing transactions.
I know we descoped the concurrent tests, but this feels independant of those tests, since its a global lock. Overall it provides much clearer and stronger semantics and gurantees about the nature of ongoing events.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Especially in lie of concurrency tests tbh. This would ensure that theres basically no concurrency tests to worry about.
We can look at online/lock-free schema updates in the future without breaking anything.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would ensure that theres basically no concurrency tests to worry about.
Adding a global lock wont change the testing required. The tests will not be designed around the current implementation, only the public behaviour/surface-area.
Most databases don't support "online" or "lock-free" schema updates as theyre rather complex to get right.
I am very confident that the SQLs that I have worked with do not lock the entire database on add of a single field. At least some of them lock the full table on create of an index, but I think there are very very few operations that lock the entire database.
I can add a lock here if you really want, but I do see it as an ugly, untested (i.e dead code), bottleneck that should be removed as soon as we have the bandwidth+tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, a lock within this function would offer very little/no protection at all. If stale data is able to make it through past the txn, then a lock here will do nothing to help.
I can see how our setup could keep things safe, as we load the collection description from the system store on queries, which would be tracked for conflictions that a schema update would trigger.
I believe we talked about it else where, but we wanted to verify schemaVersionID on transactions (data reads/writes) correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but we wanted to verify schemaVersionID on transactions (data reads/writes) correct?
That is a different issue, related to the use of stale client.Collection
s for doc writes. Not relevant here.
I can't tell what you are asking for in your other comment starting with Yes, my reference to...
. A mutex in this location will add no/almost-no protection against concurrent schema update clashes, any mutex based protect needs to align with the transaction scope - wrapping this function behind a mutex will do pretty much nothing as it still leaves massive room for state data to accumulate - if the transaction itself does not catch everything.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The potential flow would be
// col will reference CollectionDescription with schema v1
col := db.GetCollectionByName("users")
// then another request (another goroutine for example)
// now users description has schema v2
db.UpdateSchema(desc) // users desc
// col variable still holding old description
col.UpdateWithkey(key, update)
So here the update is going to apply things relative to schema v1, even tho there is now schema v2.
My suggested lock won't help this situation regardless, which is why I asked about comparing/verifying schemVersionIDs as previously discussed (can't remember when).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tldr; im OK dropping the suggested lock as described in this thread if we are comparing/validating schemaVersionIDs on the transaction. This would A) Manaully enforce a check B) Cause the schemaVersionID key to be read from the transaction, and ensure we protect against stale (inconsistent) data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is a different issue, related to the use of stale client.Collections for doc writes. Not relevant here.
fairly relevant imo. My initial hope for the lock is to prevent the stale accumulation of of CollectionDescription which has the potential to get by the transaction semantics.
I realize now as noted that the lock won't actually give us my intended desire, but verifying the schemaVersionID would, and I don't believe its implemented in this PR correct?
// The collection (including the schema version ID) will only be updated if any changes have actually | ||
// been made, if the given description matches the current persisted description then no changes will be | ||
// applied. | ||
UpdateCollectionTxn(context.Context, datastore.Txn, CollectionDescription) (Collection, error) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion: UpdateCollection
should take a name (string) as a parameter to make it clear what is the intended "collection" youre trying to update.
Obviously the description itself has the name too, but it should be more explicit from an API perspective.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I disagree with this quite strongly and it creates additional points of failure and confusion where the name
parameter does not pair up with the description. Such a param can also be added later if we feel the need.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not really convinced by that argument, but looking at the current system, CreateCollection
also doesn't take a named value, so its good for now 👍.
|
||
txn.OnSuccess( | ||
func() { | ||
p.schemaManager = schemaManager |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question: I wondering if theres any race condition potential here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The transaction should protect against any race I think, and I do not wish to go down some kind of static analysis hellhole when the concurrency tests that would test stuff like this were pushed out of scope. If those tests are not worth doing down, spending significant amounts of time staring at this line for the sake of potential concurrency bugs are also out of scope.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thats fair, we did descope the concurrency tests. It just jumped out at me incase there was an assumption that the txn protects thing here, but the hook functions don't necessarily inherit that semantic. Good for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cheers :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM - excited for this to land
Spotted and quickly corrected. Has nothing to do with this PR though, but is small enough to include.
It is not supported and does not work. Leaving it here will just confuse users, especially as they start to use the schema update system.
It is not supported and does not work. Leaving it here will just confuse users, especially as they start to use the schema update system.
They are not supported and does not work. Leaving it here will just confuse users, especially as they start to use the schema update system.
I'm not sure any more than this is really needed, especially given that we will largely be hiding this from users shortly after merge.
Previously this would partially succede - the gql types would be updated, but the saving of the collection descriptions would fail (as there is another uniqueness check there), leaving the database in an invalid state until restart.
Was incorrectly quering the version history, returning a collection per version, instead of just the current version. Went unnoticed as previously each collection could only have a single version.
Collection stuff needs to be protected by transactions. Partial success of either a create or a mutate cannot be permitted and the use of transactions protects against this. The transactions are also needed to protect against the use of stale data, by including the collection in the transaction used for P2P and query/planner stuff we should ensure that stuff is done against a single, complete version of the collection, and that it is not possible for the collection to mutate whilst something is using the collection. At the moment such concurrent use should now result in a transaction conflict error - this is not ideal and the logic here should probably grow to permit the queuing of such clashes (e.g. through use of a mutex) instead of making the users retry until it succedes - such a change will very likely need to be done within the scope declared by the transaction anyway, so I see no wasted code/time by the changes in this commit here - it is a start, and prevents odd/damaging stuff from happening. DB init was also covered almost as a side-affect, as the sequence needed protecting, and TBH it is probably a very good thing to protect against mutating the database state in the case of a failed init.
Renames and changes AddSchema to SetSchema. SetSchema is now transactional, GQL type changes will now only be 'commited' on transaction commit, whilst allowing SetSchema to be called safely at any point during the lifetime of the transaction - allowing for the schema to be validated against GQL constraints before any changes have been persisted else where, and allowing those other changes to be executed/validated before any changes have been made to the GQL types.
Please note that the client interfaces will be reworked in the near future so that the transaction related items clutter the primary interface less, and are more consistent. For now I have just followed the existing `Txn` suffix naming.
319aa94
to
12fb03e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for answering the questions, sorry for not being as involved in this review (did follow the other conversations). LGTM overall
* Correct P2P error message Spotted and quickly corrected. Has nothing to do with this PR though, but is small enough to include. * Remove field kind decimal It is not supported and does not work. Leaving it here will just confuse users, especially as they start to use the schema update system. * Remove field kind bytes It is not supported and does not work. Leaving it here will just confuse users, especially as they start to use the schema update system. * Remove field embedded object kinds They are not supported and does not work. Leaving it here will just confuse users, especially as they start to use the schema update system. * Add documentation to FieldDesc.Kind I'm not sure any more than this is really needed, especially given that we will largely be hiding this from users shortly after merge. * Assert for collection name uniquenss within SDL Previously this would partially succede - the gql types would be updated, but the saving of the collection descriptions would fail (as there is another uniqueness check there), leaving the database in an invalid state until restart. * Correctly query all existing collections Was incorrectly quering the version history, returning a collection per version, instead of just the current version. Went unnoticed as previously each collection could only have a single version. * Make collection persistance transactional Collection stuff needs to be protected by transactions. Partial success of either a create or a mutate cannot be permitted and the use of transactions protects against this. The transactions are also needed to protect against the use of stale data, by including the collection in the transaction used for P2P and query/planner stuff we should ensure that stuff is done against a single, complete version of the collection, and that it is not possible for the collection to mutate whilst something is using the collection. At the moment such concurrent use should now result in a transaction conflict error - this is not ideal and the logic here should probably grow to permit the queuing of such clashes (e.g. through use of a mutex) instead of making the users retry until it succedes - such a change will very likely need to be done within the scope declared by the transaction anyway, so I see no wasted code/time by the changes in this commit here - it is a start, and prevents odd/damaging stuff from happening. DB init was also covered almost as a side-affect, as the sequence needed protecting, and TBH it is probably a very good thing to protect against mutating the database state in the case of a failed init. * Make SetSchema (GQL) transactional Renames and changes AddSchema to SetSchema. SetSchema is now transactional, GQL type changes will now only be 'commited' on transaction commit, whilst allowing SetSchema to be called safely at any point during the lifetime of the transaction - allowing for the schema to be validated against GQL constraints before any changes have been persisted else where, and allowing those other changes to be executed/validated before any changes have been made to the GQL types. * Add support for adding fields to schema Please note that the client interfaces will be reworked in the near future so that the transaction related items clutter the primary interface less, and are more consistent. For now I have just followed the existing `Txn` suffix naming.
) * Correct P2P error message Spotted and quickly corrected. Has nothing to do with this PR though, but is small enough to include. * Remove field kind decimal It is not supported and does not work. Leaving it here will just confuse users, especially as they start to use the schema update system. * Remove field kind bytes It is not supported and does not work. Leaving it here will just confuse users, especially as they start to use the schema update system. * Remove field embedded object kinds They are not supported and does not work. Leaving it here will just confuse users, especially as they start to use the schema update system. * Add documentation to FieldDesc.Kind I'm not sure any more than this is really needed, especially given that we will largely be hiding this from users shortly after merge. * Assert for collection name uniquenss within SDL Previously this would partially succede - the gql types would be updated, but the saving of the collection descriptions would fail (as there is another uniqueness check there), leaving the database in an invalid state until restart. * Correctly query all existing collections Was incorrectly quering the version history, returning a collection per version, instead of just the current version. Went unnoticed as previously each collection could only have a single version. * Make collection persistance transactional Collection stuff needs to be protected by transactions. Partial success of either a create or a mutate cannot be permitted and the use of transactions protects against this. The transactions are also needed to protect against the use of stale data, by including the collection in the transaction used for P2P and query/planner stuff we should ensure that stuff is done against a single, complete version of the collection, and that it is not possible for the collection to mutate whilst something is using the collection. At the moment such concurrent use should now result in a transaction conflict error - this is not ideal and the logic here should probably grow to permit the queuing of such clashes (e.g. through use of a mutex) instead of making the users retry until it succedes - such a change will very likely need to be done within the scope declared by the transaction anyway, so I see no wasted code/time by the changes in this commit here - it is a start, and prevents odd/damaging stuff from happening. DB init was also covered almost as a side-affect, as the sequence needed protecting, and TBH it is probably a very good thing to protect against mutating the database state in the case of a failed init. * Make SetSchema (GQL) transactional Renames and changes AddSchema to SetSchema. SetSchema is now transactional, GQL type changes will now only be 'commited' on transaction commit, whilst allowing SetSchema to be called safely at any point during the lifetime of the transaction - allowing for the schema to be validated against GQL constraints before any changes have been persisted else where, and allowing those other changes to be executed/validated before any changes have been made to the GQL types. * Add support for adding fields to schema Please note that the client interfaces will be reworked in the near future so that the transaction related items clutter the primary interface less, and are more consistent. For now I have just followed the existing `Txn` suffix naming.
Relevant issue(s)
Resolves #1004
Description
Allows new simple (non-relational) fields to be added locally to schema.
The following items are out of scope of this PR and will be done later:
add operation does not apply: doc is missing path
)I very strongly recommend reviewing commit by commit, there are (hopefully) useful explanations as to what has been done in each commit body. The feature itself has been introduced in the final commit, everything else is preparation for that feature.