Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AWS DynamoDB Membership Provider #2008

Merged
merged 3 commits into from
Aug 15, 2016
Merged

AWS DynamoDB Membership Provider #2008

merged 3 commits into from
Aug 15, 2016

Conversation

galvesribeiro
Copy link
Member

@galvesribeiro galvesribeiro commented Aug 2, 2016

Implementation of #2006 which is part of #2005

/// <summary>
/// AWS DynamoDB basic Storage provider
/// </summary>
public class DynamoDBStorage : IStorageProvider
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why storage provider is part of membership table pull request? I thought you had a separate pr for it.

@gabikliot
Copy link
Contributor

@galvesribeiro , my recommendation (based on #2005 (comment) ) is to reopen this PR, remove all the code unrelated to MBR provider (remove storage provider) and I will review it thoroughly.

@galvesribeiro galvesribeiro reopened this Aug 9, 2016
@galvesribeiro galvesribeiro changed the title [WIP] AWS DynamoDB Membership Provider AWS DynamoDB Membership Provider Aug 9, 2016
@galvesribeiro
Copy link
Member Author

galvesribeiro commented Aug 9, 2016

Ok, this PR has the final membership and statistics code. All the changes on it are on OrleansAWSUtils\Membership and OrleansAWSUtils\Statistics.

This PR still need be rebased on master once #2007 is merged since it depends on a common class (Storage\DynamoDBStorage.cs). All the tests for either the MembershipTable and Liveness were included and are passing.

@galvesribeiro galvesribeiro changed the title AWS DynamoDB Membership Provider AWS DynamoDB Membership Provider and Statistics publisher Aug 9, 2016
@galvesribeiro
Copy link
Member Author

galvesribeiro commented Aug 14, 2016

@gabikliot rebased on latest master which has the Storage Provider already merged. Can you review it please?

Btw, kudos to @shayhatsor for the tips on the membership protocol :)

@gabikliot
Copy link
Contributor

Please separate and remove the statistics/metrics from this PR to a sepaarte PR.

@galvesribeiro
Copy link
Member Author

What is the problem with that? The PR was intent to be with it. It is 3 no-brain classes...

toDelete.Add(record.GetKeys());
}

if (records.Count <= 25)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think you need a special case here, the general loop below is good. Also, 25? at least lets use constant.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DynamoDB has a MAX_BATCH_SIZE of 25. Ok, createt a constant and removed the IF.

@gabikliot
Copy link
Contributor

It is no brainier for you. For me, as a reviewer that takes responsibility on what he reviews, that makes MY life much harder. Specifically, I want to review the MBR now with 100% attention. The extra unrelated code confuses me.
Plus, I may not even review the metrics and stats providers, so I want it to be very clear what I did review and what not.
Plus, it is just a good practice to keep unrelated features in unrelated PRs. Each provider is definitely unrelated from each other.

@gabikliot
Copy link
Contributor

In your AWS Storage Put method:
https://github.com/dotnet/orleans/blob/master/src/OrleansAWSUtils/Storage/DynamoDBStorage.cs#L208

What happens if you call this method without conditionExpression and without conditionValues - that is: you THINK there is no previous version of the row, but when the call arrives to AWS the row is there. Would the call overwrite the prev data or would it fail?
If it does not fail, it meansyou have a blind write method, which is ALWAYS WRONG.

@gabikliot
Copy link
Contributor

I think you have to go back and add detailed explanation on the AWS storage class about concurrency semantics of each operation.
https://github.com/dotnet/orleans/blob/master/src/OrleansAWSUtils/Storage/DynamoDBStorage.cs#L201 "Create or Replace an entry in a DynamoDB Table" but conditional is optional? I don t think it should be optional.
If you want a blind write method, add another method and call it this way, but in 99% of cases we will be using CreateIfNotExist with valid condition and Update with valid condition.

}

if (result == false)
logger.Warn(ErrorCode.MembershipBase,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this logging be moved inside the preceding catch clause?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

try
{
var conditionalValues = new Dictionary<string, AttributeValue> { { CURRENT_ETAG_ALIAS, new AttributeValue { N = etag } } };
var expression = $"{SiloInstanceRecord.ETAG_PROPERTY_NAME} = {CURRENT_ETAG_ALIAS}";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This generic expression could have a more descriptive name.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@galvesribeiro
Copy link
Member Author

Ok... All the comments addressed.

@veikkoeeva thanks for get back on the statistics but, I'm removing that from this PR as @gabikliot requested and will put another one with those and I appreciate if you can have some time to review that.

I noticed that probably there is a lack of knowledge of how DynamoDB works which is ok and I should had it detailed on the PR before submit it so let me explain a bit here...

DynamoDB has only 3 Write operations:

  1. PUT == Create/Insert or Replace
  2. UPDATE == Update or Create/Insert
  3. DELETE

All of them doesn't deal with ETag natively (unlike in Azure Table Storage or auto-increment fields in SQL which has native ways to deal with it) and any kind of conditional operation isn't implicitly implied. So we have to deal with it by using Conditional Expressions that you can see on those string around the code.

Basically, if you want to PUT something and make sure there is no other row with the same ID (as we do while Creating a new record) we need to use an expression like that:

var expression = $"attribute_not_exists({SiloInstanceRecord.DEPLOYMENT_ID_PROPERTY_NAME}) AND attribute_not_exists({SiloInstanceRecord.SILO_IDENTITY_PROPERTY_NAME})";
await storage.PutEntryAsync(TABLE_NAME_DEFAULT_VALUE, tableEntry.GetFields(true), expression);

So if there is a row with the same keys as the ones in the expression, it will throw ConditionalCheckFailedException.

Otherwise, if you want to replace it (blind write), just call Put without any conditional expression.

The same happens to the case where we need to check the ETag.

var etagConditionalExpression = $"{SiloInstanceRecord.ETAG_PROPERTY_NAME} = {CURRENT_ETAG_ALIAS}";
await storage.UpsertEntryAsync(TABLE_NAME_DEFAULT_VALUE, siloEntry.GetKeys(),
                        siloEntry.GetFields(), etagConditionalExpression, conditionalValues);

Expressions are optional and that is the way AWS works. I understand you concern about the dangerous of the blind write. But, think that "just because USA allow people to carry guns, it doesn't mean they should shoot each other". The same should apply here. All the classes are internal, and the provider is only accessible by Orleans Runtime and never directly by the users. The DynamoDBStorage makes no assumptions of whether the user want or not have an ETag, neither if it should create or update. It is just a thin easy wrapper on top of AWS DynamoDB SDK which is reused by all providers that want to. If the consumer want to enforce ETag conditional, it can do so by passing the parameters as expected and the expression will be validated by DynamoDB runtime and throw the appropriate exception in case the condition is not met.

IMHO blind write is not ALWAYS wrong. It is a matter of decision. If you need a condition to be met? Just express it. Remember that SQL can delete all the records of a single table if you don't specify a WHERE condition in it. It doesn't mean people can't want to delete a whole table without a condition. Its a matter of judgement for your case what are the conditions (if any) and express it. In our case, all the conditions are expressed using the provided APIs from AWS and after consulting them and with @shayhatsor which basically saw all the steps while I was building this code, I came up with the code.

Please let me know if you need anything else to change or more clarification on how DynamoDB works.

@galvesribeiro galvesribeiro changed the title AWS DynamoDB Membership Provider and Statistics publisher AWS DynamoDB Membership Provider Aug 14, 2016
@veikkoeeva
Copy link
Contributor

@galvesribeiro Sure, no problem for the other PR. :) About my comments, it would help to have more comments for people who'd like to see how things are implemented. Then variables named according to their specific purpose and why some default numbers have been chosen maybe with just some appropriately named constant. However, this again is only how I feel.

@galvesribeiro
Copy link
Member Author

@veikkoeeva sure! I addressed all the comments you made. :)

try
{
var conditionalValues = new Dictionary<string, AttributeValue> { { CURRENT_ETAG_ALIAS, new AttributeValue { N = etag } } };
var etagConditionalExpression = $"{SiloInstanceRecord.ETAG_PROPERTY_NAME} = {CURRENT_ETAG_ALIAS}";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should not conditional expression here include also, in adddition to etag, $"attribute_exists({SiloInstanceRecord.DEPLOYMENT_ID_PROPERTY_NAME}) AND attribute_exists({SiloInstanceRecord.SILO_IDENTITY_PROPERTY_NAME})";
?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll change that to be more explicit but it isn't required. A row is never saved with ETAG as null, the initial value is 0, so basically this would remove the need for the double check on the key. But I agree with you that it would make more clear for the reader the intentions with the expression even if it is not required by the DynamoDB expression in this case.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rethinking on that, no... I think we shouldn't change the filter and leave only the etag. Since the ETag is never null (initial value is 0), and the row will never be inserted without DEPLOYMENT_ID and SILO_IDENTITY since they are the keys, there is no point in a row to has ETag but no keys :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unless there is a bug?

@gabikliot
Copy link
Contributor

@galvesribeiro , thanks for the explanation! It makes much more sense now.
It would help a lot if you heavily documented DynamoDBStorage manager, and included all the explanation you gave here.
Agree that "blind write is not ALWAYS wrong", but usually they are. I just think that concurrency control is hard in general and distributed concurrency control even more so, thus being explicit in the method signatures forces people to think harder about what they are doing. It puts the choices of concurrency control in their face: what should I use? TryInsert and TryUpdate are much more explicit than Write("bunch of concurrency options"). But I agree its a matter of personal taste, so I won't insist on changing that. You cab keep it as is now.

Overall, looks good now, except for a couple of comments I made.
I think you were too eager in the tests to exclude too much code if !extendedProtocol. I think the vast majority of the checks will stay, even if the version is not persisted.

{
var membershipEntry = CreateMembershipEntryForTest();

var data = await membershipTable.ReadAll();
Assert.NotNull(data);
Assert.Equal(0, data.Members.Count);

bool ok = await membershipTable.InsertRow(membershipEntry, data.Version.Next());
TableVersion nextTableVersion = data.Version.Next();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this line was added by mistake. Please run the tests on Azure to make sure you haven't mistakenly broke anything related to the extended protocol.

@shayhatsor
Copy link
Member

@galvesribeiro, as most of this work was done while consulting me - it looks good to me 😄
But, the changes to the tests make me a bit 😟. As I mentioned earlier, please make another pass and make sure you only move the least amount of checks to the extended block. Then run all tests with Azure.

@galvesribeiro
Copy link
Member Author

Ok guys, thanks for the feedback @shayhatsor and @gabikliot. I made small changes on the tests to make sure we get more Asserts as you guys pointed and also run the Azure tests to make sure nothing were affected and as expected (the default for extendedProtocol is true) everything is passing just fine.

@galvesribeiro
Copy link
Member Author

It would help a lot if you heavily documented DynamoDBStorage manager, and included all the explanation you gave here.

Just for the sake of notes here, I'll be adding that explanation in a final PR on all the AWS lib which will make some cosmetic changes like error enum, messages, and comments. Rest assured of it.

@shayhatsor shayhatsor merged commit e3a6918 into dotnet:master Aug 15, 2016
@shayhatsor
Copy link
Member

@galvesribeiro, thanks for the hard work and dedication !

@galvesribeiro
Copy link
Member Author

Thanks to everyone for the feedback! 😄

@galvesribeiro galvesribeiro deleted the aws-membership branch August 15, 2016 20:00
@github-actions github-actions bot locked and limited conversation to collaborators Dec 9, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants