Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Convenience methods for new DynamoDB data types #357

Closed
Sazpaimon opened this issue Oct 8, 2014 · 12 comments
Closed

Convenience methods for new DynamoDB data types #357

Sazpaimon opened this issue Oct 8, 2014 · 12 comments
Labels
feature-request A feature should be added or improved. needs-discussion

Comments

@Sazpaimon
Copy link
Contributor

Version 2.7.0 deprecated the formatValue() and formatAttribute() methods, and the ItemIterator, Item, and Attribute classes (EDIT: added links to referenced classes) because they do not support the new data types. Will these methods be replaced with something else?

I know this is a very new feature, but I do have some projects that can take advantage of this (EDIT: "this" is referring to the new document model support; see release notes) right away and I want to know if the SDK will add better support for these data types or if the recommended way of doing things from now on would be to manually construct and parse your items.

@jeremeamia jeremeamia added discussion feature-request A feature should be added or improved. labels Oct 8, 2014
@jeremeamia
Copy link
Contributor

TL;DR maybe.

This is a great question, and is one we've definitely asked ourselves. We have been trying to come up with some ideas about this, but it turns out to be a little difficult. The current format*() methods (and underlying Attribute class) only support S, N, SS, and NS, which were the only types that existed when that code was written. Since then, DynamoDB has added B, BS, BOOL, NULL, L, and M. They could potentially add more in the future. Who knows? Someday there could be different types of numbers, dates or times, etc.

What makes supporting something like the format*() difficult now, is that there is a lot of ambiguity in how PHP native types map to DynamoDB types. For example: PHP only has an array type, but DynamoDB has lists (L), maps (M), and sets (*S). Also, DynamoDB has strings (S) and binary/blob (B) values, while PHP just has string. Sure, you could introspect values (e.g., does this array have numerical indexes or string keys?) and/or declare a set of conventions (e.g., all stream resources will be assumed to be binary/blob values), but there are still problems with consistency and consecutive roundtrips with the data. Let me give a step-by-step example regarding the list/map/set types.

  1. Let's say an $item is provided that looks like this: ['foo' => ['bar']] (or ['foo' => [0 => 'bar']])
  2. Well, that would get serialized into {"foo":{"SS":["bar"]}}
  3. Oh wait, with the new list type available now, you probably wanted {"foo":{"L":[{"S":"bar"}]}}
  4. OK, let's go with list. So, now let's say we retrieved the item later, and the SDK had a way to convert the item's DynamoDB serialization back to a native PHP. We'd end up with this again: ['foo' => [0 => 'bar']], whether it was a list or a set.
  5. What if we then did $item['foo']['fizz'] = 'buzz'; That would give us an array like ['foo' => [0 => 'bar', 'fizz' => 'buzz']].
  6. This is no longer a list or a set, it's now a map, and would be serialized to {"foo":{"M":{"0":{"S":"bar"},"fizz":{"S":"buzz"}}}}
  7. You know what, I didn't need that data, let's just unset($item['foo'][0], $item['foo']['fizz']). Now I have just ['foo' => []], which could be a list, map, or set. 😦

OK, so doing something along the same lines as the format*() methods might not work, but we've considered some other options as well.

  1. Define and require the use of some objects to "box" values that are ambiguous (e.g., Set, Map, List, Binary). This has usability drawbacks since you would need to import 1 or more additional objects. When reading these values out of DynamoDB, you would need to re-box them so they remain consistent if they end up being put back into DynamoDB. The boxed values would probably make the values more difficult to work with.
  2. Some kind of item/document builder object/methods. This would work be fairly easy to work with an even provide autocomplete support in IDEs, but would end up being more verbose than the raw API.
  3. Do something like format*(), but support only a subset of types. This could work, but would only support a subset. If more types are added in the future, they may or may not fit into this model. Even if they do, it might not be in a backwards-compatible way.
  4. Use a light schema-based approach where a schema can be defined and an object/method would transform the array into a the API structure. This would work, but defining the schema could be tedious, especially if the document is large or complex. Also, heterogeneous lists could not be supported, since defining a schema for that would not make sense.
  5. Accepting a literal JSON document and transforming into the API. This could work, but it would only support the types that JSON natively supports, not set or binary values.
  6. Full blown ORM. This would be a lot of work and might be overkill for most use cases.

So, yeah. We'd like to provide something to help DynamoDB users, but we want it to:

  • Be easier to use than the raw API.
  • Support all the types.
  • Support potential future types without breaking changes.
  • Not be significantly more verbose than the raw API.

I'm just not sure we've thought of something yet that provides all of these benefits. If you or anyone else has ideas/suggestions/opinions/etc., we'd be happy to hear them.

@Sazpaimon
Copy link
Contributor Author

I believe a literal JSON document would be the best option, as I think that's what most people that want to utilize maps and lists would be expecting to use. The lack of support for sets I don't see as particularly problematic as lists seem to cover all the use cases for sets unless looking at the raw API requests and responses. If the lack of binary support is an actual problem, one completely far out suggestion that I can think of is perhaps supporting BSON would be worth looking into, though I don't know what kind of can or worms that would open.

@jeremeamia
Copy link
Contributor

Note: I'm removing one of @Sazpaimon's previous comments from this thread, since it's off topic, and will reply to it where it was cross-posted on #368.

@jeremeamia
Copy link
Contributor

@Sazpaimon I've been toying with the idea (for both V2 and V3) of a DocumentMarshaler class that would look like this gist. Do you think something like this would be helpful?

@Sazpaimon
Copy link
Contributor Author

This seems to be a good start as far as supporting raw JSON goes. My hope is that we can get to a point where you can input and output native PHP types and have it just automagically work, but I'm still not sure how we can get there given everything that's been stated

@annoyingmouse
Copy link

Thank you @jeremeamia for inviting my input - I'm afraid I'm not sure what I can offer though. It looks like the best way to get the JSON into DynamoDB would be to pass a correctly formatted string though... I'm not sure how this would mess the current SDK though as I'm only taking baby-steps with DynamoDB at the minute. Sorry to of such little help but you did ask ;-)

@jeremeamia
Copy link
Contributor

@Sazpaimon Hey, I updated my gist/idea with some more code and docblocks. Along with the (un)marshalDocument methods which handle JSON documents, there are also the (un)marshalItem and (un)marshalValue methods. These are basically replacements for deprecated formatAttributes and formatValue methods, respectively. These should work because I am not providing support for set (*S) or binary (B) types. What do you think about this idea? Any suggestions?

@Sazpaimon
Copy link
Contributor Author

Looks pretty good. If I had to many any suggestion as a workaround to support sets and binary types, would be to implement classes that basically provide the marshaller type hints those attributes. I don't know how practical such a thing would be, though.

Now that I think about it, I think I just basically described the old Attribute class.

Also, for unmarshalling binary types, I would unbase64 the data (if it isn't already)

@jeremeamia
Copy link
Contributor

I don't think I'm going to worry about sets, but I might be able to use Guzzle's stream class to represent binary types. I'm going to play around with that after I write up some unit tests.

Also, for unmarshalling binary types, I would unbase64 the data (if it isn't already)

Good idea. I think V3 does it automatically, but I'll have to double check that for V2.

@Sazpaimon
Copy link
Contributor Author

Supporting sets would be useful if the user is going to do something like ADD/DELETE later on, as AFAIK, the list/map type does not support appending via the UpdateItem method, or if you want to ensure at the data layer that all your items are the same type and unique, so lists cannot replace sets 100% of the time.

@jeremeamia
Copy link
Contributor

@Sazpaimon and @ando-masaki, I've put together a PR for my marshaler idea: #406.

@jeremeamia
Copy link
Contributor

Resolved by 0a9c4fd. /cc @Sazpaimon

Note: In Version 2 of the SDK, binary (B) and set (*S) types will not be supported. When this is added into Version 3 (that requires PHP 5.4+), I'll add support, because I will be able to use the JsonSerializable interface to make sure the wrapper objects for binary and set types can be json_encode()'d.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature-request A feature should be added or improved. needs-discussion
Projects
None yet
Development

No branches or pull requests

4 participants