Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BEAM-9331] Add better Row builders #10883

Merged
merged 4 commits into from
Mar 27, 2020

Conversation

reuvenlax
Copy link
Contributor

This PR adds two builders to the Row object. The first allows building a Row by specifying fields by name:

Row row = Row.withSchema(schema)
.withFieldValue("userId", "user1)
.withFieldValue("location.city", "seattle")
.withFieldValue("location.state", "wa")
.build();

The second allows building. a Row based on a previous row by specifying only the fields to change:

Row modifiedRow =
Row.fromRow(row)
.withFieldValue("location.city", "tacoma")
.build();

R: @rezarokni

Copy link
Contributor

@rezarokni rezarokni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added some minor comments.

@alexvanboxel alexvanboxel self-requested a review February 18, 2020 13:52
// passed in, it could result in strange errors later in the pipeline. This method is largely
// used internal
// to Beam.
@Internal
public Builder attachValues(List<Object> values) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this is an opportunity to change the attachValues. No values should be set before or after the attach. I see 2 options to improve this:

  • In the attach first see if values are already set. Let the attachValues return the new Row directly. This is maybe a bit strange as it violates a builder pattern.
  • Have 4 build in builders. The starting one (that includes an attachValues, add and withFieldValue), all of them return a specific builder: the new ModifyingBuilder and a new AddValuesBuilder that only has the add methods and an AttachBuilder that only has build. This also eliminates some elaborate if/then/else's in the builder().

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like making attachValues return a Row. I thin the same for withFieldValueBuilders. this simplifies the builder code

}

/** Builder for {@link Row} that bases a row on another row. */
public static class ModifyingBuilder {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't this be tweaked a bit, that this is the builder specifically for use with withFieldValue. Meaning that if when nit doesn't have a source row it just assumes null values for the fields not set. See remark on withFieldValue on initial builder as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, good suggestion. This should simplify the code

* Set a field value using the field name. Nested values can be set using the field selection
* syntax.
*/
public Builder withFieldValue(String fieldName, Object value) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this one can return the ModifyingBuilder so that no other methods can used (no attachValues, no add's).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

return sourceRow.getSchema();
}

public ModifyingBuilder withFieldValue(String fieldName, Object value) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't it be useful to have a withFieldValue with an index?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added

Copy link
Contributor

@rezarokni rezarokni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@alexvanboxel
Copy link
Contributor

Maybe rebase, then we could get this in. This is a useful addition.

@reuvenlax
Copy link
Contributor Author

@alexvanboxel there are still some bugs in this PR related to logical types, which is why I haven't pushed it in yet.

@reuvenlax
Copy link
Contributor Author

Run Java PreCommit

@reuvenlax reuvenlax force-pushed the better_row_builder branch from 778bf70 to 52695aa Compare March 26, 2020 22:38
@reuvenlax
Copy link
Contributor Author

@alexvanboxel rebased and fixed bugs. Previously I was blocked on getting logical types to work, but now that we natively store logical types in Row, it's become much easier.

@reuvenlax
Copy link
Contributor Author

run sql postcommit

Copy link
Contributor

@alexvanboxel alexvanboxel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really like it. Would you consider making RowUtils public? Looks like a nice set of utilities I could use in my personal pinelines...

@reuvenlax
Copy link
Contributor Author

@alexvanboxel I would rather not in this PR, because the RowUtils API wasn't designed for public usage. If we were to make it public, I would prefer to spend a lot more time on the API design, and I would also want to understand the use cases a bit better.

@alexvanboxel
Copy link
Contributor

@alexvanboxel I would rather not in this PR, because the RowUtils API wasn't designed for public usage. If we were to make it public, I would prefer to spend a lot more time on the API design, and I would also want to understand the use cases a bit better.

Understand LGTM

@reuvenlax
Copy link
Contributor Author

run sql postcommit

1 similar comment
@reuvenlax
Copy link
Contributor Author

run sql postcommit

@reuvenlax reuvenlax merged commit 267f76f into apache:master Mar 27, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants