-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make Loader implement java.util.stream.Collector
#14
Make Loader implement java.util.stream.Collector
#14
Conversation
|
||
### Release date: | ||
|
||
2021/11/14 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Obviously wishful thinking
pom.xml
Outdated
<dependency> | ||
<groupId>com.google.guava</groupId> | ||
<artifactId>guava</artifactId> | ||
</dependency> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Specifying this duplicate dependency currently causes Maven warnings which should be resolved with an OSS parent update.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Up until now we explicitly avoided using Guava, is this a conscious decision to drop that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Guava was already used in test classes and is specified in the OSS parent so I wasn't aware of this intention.
The current Guava usage is not exposed in the public API, though.
What concerns do/did you have?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've removed the Guava dependency in non-test code.
for (var cell : other.entityAndIdToObject.cellSet()) { | ||
add(cell.getRowKey(), cell.getColumnKey(), cell.getValue()); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
entityAndIdToObject.putAll(other.entityAndIdToObject)
would override existing cells but we want to give precedence to existing ones in case of conflicts.
&& Objects.equals(leftSetter, relation.leftSetter) | ||
&& Objects.equals(rightSetter, relation.rightSetter) | ||
&& Objects.equals(relationLoader, relation.relationLoader); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Function equivalence is generally undecidable so I'm not sure if these setters should be included in this equals
check. WDYT?
(object, successors) -> { | ||
validate( | ||
successors.size() <= 1, | ||
"N-to-1 relation between %s (%s) and %s (%s) contains conflicting tuples", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The error message is now slightly less helpful as it's missing examples.
* @see <a href="https://github.com/jOOQ/jOOQ/issues/8509">jOOQ bug report</a> | ||
*/ | ||
@Test | ||
public void testEqualsOnFields() { | ||
Field<Long> withSchema = DSL.field(DSL.name("PUBLIC", "FOO", "ID"), Long.class); | ||
Field<Long> noSchema = DSL.field(DSL.name("FOO", "ID"), Long.class); | ||
assertTrue(FOO.ID.equals(withSchema)); | ||
assertFalse(withSchema.equals(FOO.ID)); | ||
assertFalse(FOO.ID.equals(noSchema)); | ||
assertFalse(noSchema.equals(FOO.ID)); | ||
|
||
// Also documenting this oddity: the qualified name of FOO.ID is not "PUBLIC.FOO.ID", even | ||
// though it is only considered equal to a field that has the "PUBLIC" qualifier. | ||
assertEquals(FOO.ID.toString(), "\"PUBLIC\".\"FOO\".\"ID\""); | ||
assertEquals(FOO.ID.getQualifiedName(), DSL.name("FOO", "ID")); | ||
assertEquals(withSchema.getQualifiedName(), DSL.name("PUBLIC", "FOO", "ID")); | ||
assertEquals(noSchema.getQualifiedName(), DSL.name("FOO", "ID")); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This has been fixed upstream. The following tests would now pass.
I have changed the implementation to now use Field#equals()
directly.
* @see <a href="https://github.com/jOOQ/jOOQ/issues/8509">jOOQ bug report</a> | |
*/ | |
@Test | |
public void testEqualsOnFields() { | |
Field<Long> withSchema = DSL.field(DSL.name("PUBLIC", "FOO", "ID"), Long.class); | |
Field<Long> noSchema = DSL.field(DSL.name("FOO", "ID"), Long.class); | |
assertTrue(FOO.ID.equals(withSchema)); | |
assertFalse(withSchema.equals(FOO.ID)); | |
assertFalse(FOO.ID.equals(noSchema)); | |
assertFalse(noSchema.equals(FOO.ID)); | |
// Also documenting this oddity: the qualified name of FOO.ID is not "PUBLIC.FOO.ID", even | |
// though it is only considered equal to a field that has the "PUBLIC" qualifier. | |
assertEquals(FOO.ID.toString(), "\"PUBLIC\".\"FOO\".\"ID\""); | |
assertEquals(FOO.ID.getQualifiedName(), DSL.name("FOO", "ID")); | |
assertEquals(withSchema.getQualifiedName(), DSL.name("PUBLIC", "FOO", "ID")); | |
assertEquals(noSchema.getQualifiedName(), DSL.name("FOO", "ID")); | |
} | |
* @see <a href="https://github.com/jOOQ/jOOQ/issues/8509">jOOQ bug report</a> | |
*/ | |
@Test | |
public void testEqualsOnFields() { | |
Field<Long> withSchema = DSL.field(DSL.name("PUBLIC", "FOO", "ID"), Long.class); | |
Field<Long> noSchema = DSL.field(DSL.name("FOO", "ID"), Long.class); | |
assertFalse(FOO.ID.equals(withSchema)); | |
assertFalse(withSchema.equals(FOO.ID)); | |
assertTrue(FOO.ID.equals(noSchema)); | |
assertTrue(noSchema.equals(FOO.ID)); | |
// Also documenting this oddity: the qualified name of FOO.ID is not "PUBLIC.FOO.ID", even | |
// though it is only considered equal to a field that has the "PUBLIC" qualifier. | |
assertEquals(FOO.ID.toString(), "\"FOO\".\"ID\""); | |
assertEquals(FOO.ID.getQualifiedName(), DSL.name("FOO", "ID")); | |
assertEquals(withSchema.getQualifiedName(), DSL.name("PUBLIC", "FOO", "ID")); | |
assertEquals(noSchema.getQualifiedName(), DSL.name("FOO", "ID")); | |
} |
cc @scranen
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, so the semantics changed but is still a bit strange. assertFalse(FOO.ID.equals(withSchema))
seems unexpected; can this cause any issues?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's because the default schema assumed by Jooq's code generator is empty, hence FOO.ID.toString()
(Field#equals
is based on the String representation) returns "FOO"."ID"
and the check fails.
If I declare the default schema used by jooq-code-gen
to be "public"
<property>
<key>unqualifiedSchema</key>
<value>public</value>
</property>
then the following test passes:
@Test
public void testEqualsOnFields() {
Field<Long> withSchema = DSL.field(DSL.name("PUBLIC", "FOO", "ID"), Long.class);
Field<Long> noSchema = DSL.field(DSL.name("FOO", "ID"), Long.class);
assertTrue(FOO.ID.equals(withSchema));
assertTrue(withSchema.equals(FOO.ID));
assertFalse(FOO.ID.equals(noSchema));
assertFalse(noSchema.equals(FOO.ID));
assertEquals(FOO.ID.toString(), "\"PUBLIC\".\"FOO\".\"ID\"");
// this is now unexpected?
assertEquals(FOO.ID.getQualifiedName(), DSL.name("FOO", "ID"));
assertEquals(withSchema.getQualifiedName(), DSL.name("PUBLIC", "FOO", "ID"));
assertEquals(noSchema.getQualifiedName(), DSL.name("FOO", "ID"));
}
I think the H2 in-memory DB does not create a default public
schema like Postgres does, hence the confusion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To conclude, fields from tables with the same name but from different schemas are considered different.
Given that schemas name-space tables, that makes sense to me.
2e3868d
to
dd12846
Compare
@@ -61,6 +60,31 @@ private Entity(Table<R> table, Class<T> type, Field<Long> primaryKey, Field<?>[] | |||
this.fields = fields; | |||
} | |||
|
|||
@Override | |||
public boolean equals(@Nullable Object o) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we doing this? Two distinct Entity objects should never be considered equal IMO. You can create two different Entity objects for the same table, and use both to load the result of a single query (e.g. a self join).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two entity objects e1 and e2 for the same table and with the same result fields will Entity#load
the same objects for the same record as calling e.g. e1.load(record)
twice, right?
Can you provide an example that is not possible with this equals
implementation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yes, sorry. You'd of course use 2 different aliases for the tables. Then it makes sense. Still not sure why we're adding a definition for equals though, just because we can?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It felt natural to override equals
such that it is in sync with hashCode
but it is indeed not necessary.
API-wise what is better?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I would not override equals
/hashCode
, simply because I cannot think of a natural use case for wanting to have 2 instances of this class that should be considered equal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(But this is a highly subjective of course, so also happy to be convinced otherwise)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea was to guard against accidental misuse since 2 entities that refer to the same table and same fields must effectively be equivalent, otherwise the corresponding SQL query would be invalid.
In that case the loader would automatically squash these entities into one.
It's indeed an unlikely scenario, though (guessing here since I haven't seen Jolo examples outside of Picnic). Adding this API characteristic would also constrain further API evolution and possibly prevent some unforeseen use cases.
Another option would be to emit a warning log if aforementioned misuse is detected during loader-building.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nevermind, there's already a check in the builder that states that "Distinct entities cannot refer to the same primary key"
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd leave it out then. The default semantics meet the requirements for Jolo, and the most obvious form of abuse is guarded by a fail fast mechanism. If use cases do pop up, then we'll have some real requirements to work with and we can open an new PR based on those.
pom.xml
Outdated
<dependency> | ||
<groupId>com.google.guava</groupId> | ||
<artifactId>guava</artifactId> | ||
</dependency> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Up until now we explicitly avoided using Guava, is this a conscious decision to drop that?
Custom relation loader: the In general, we should probably check if there's any performance difference with the new code if applied to a large data set. |
Just for clarification: The
There's actually a test covering this functionality, see the
I can add a |
Thanks for pointing out the unit test; I forgot about the array use case. I did try to minimise the overhead when writing the first version of Jolo, but I do not know whether I dodged any bullets with that effort; I just chose the implementation with the least overhead if there was no compelling reason to not do so. No performance tests were written explicitly I'm afraid, just kept an eye on WMS performance when we started using it there. I guess a good performance test could be to just feed it a set of records that represents a complete carthesian product of a few largish tables. |
25bb5e9
to
570d02d
Compare
@scranen Besides the open questions I'm considering removing the factory pattern and just keeping a |
I made the separation to create a more foolproof API. The idea wasn't completely finished because after calling BTW a performance reduction of almost 100% still seems on the high side to me. Did you see if the |
Another update on the performance difference: The benchmark output still reports a ~100% slowdown for 100 records (higher number of records are affected as well). The JFR results indicate the slowdown occurs in I would appreciate if someone could run the benchmarks locally as well and have a critical look at the benchmark itself. |
I found the issue and fixed it by lazily loading records into objects only when the collector hasn't come across the record's ID before. This optimisation was already in place but not applied to the collector implementation. JMH results without collector implementation: JMH results with collector implementation: Btw, nice improvement unlocked by JDK 11 and/or jOOQ v3.15 (compare it to my earlier comment). |
@@ -8,8 +8,7 @@ | |||
[![SonarCloud Maintainability][sonarcloud-badge-maintainability]][sonarcloud-measure-maintainability] | |||
[![BCH compliance][bettercodehub-badge]][bettercodehub-results] | |||
|
|||
Short for _jOOQ Loader_. A utility library to add basic object-relation mapping | |||
to your [jOOQ][jooq] code. | |||
Short for _jOOQ Loader_. A utility library to add basic object-relation mapping to your [jOOQ][jooq] code. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason we are switching to unwrapped text? MarkDown is a lot easier to read in plaintext viewers if it's wrapped.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The text is still wrapped but IntelliJ defaults to a 120 character limit instead of 80.
I don't know of anyone who doesn't have screen estate for at least 120 characters these days so that default seems sensible to me.
Let me check how other OSS projects handle that.
@Nullable private Field<?>[] resultFields; | ||
|
||
/** | ||
* Creates a mapping from a jOOQ table to the given class. | ||
* Creates a mapping from a {@link Table table} to the given {@link Class class}. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like you want to make a code-base wide style change in the scope of a functional change. Personally I prefer the original. Note that the exact types are already clear from the signature, and that most developers will not feel like a link to the documentation of Class
is very helpful. The context that it's the jOOQ table class is now missing so you'll have to do something extra (hover over the text?) to recover that information.
*/ | ||
public void load(Record record) { | ||
private void check(Record record) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this method no longer belongs here. It's keeping state about the data set being loaded inside the object.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we even need resultFields
? Shouldn't it always be equivalent to fields
?
A mismatch IMO indicates that the loader definition doesn't match what the query returns and thus means some columns are SELECT
ed but unused. Following best practices that is discouraged.
public Function<ObjectGraph, List<T>> finisher() { | ||
return objectGraph -> { | ||
for (Relation relation : relations) { | ||
relation.link(objectGraph.getObjectMapping(relation)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you make objectGraph
responsible for doing the linking, that could simplify the code a bit. You would not have to instantiate this object mapping, and instead of using extra containers to store the intermediate results you could probably just have a relation.link(left, right)
method that just accepts the left and right side of the relation. I think it would probably also shave a little bit of the performance overhead of this new implementation again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you make objectGraph responsible for doing the linking, that could simplify the code a bit.
My intention was to cleanly separate state from (linking) logic, hence why the objectGraph
doesn't apply any linking which IMO is associated with the relation definition.
If I understand you correctly merging these concerns would not simplify the overall design. Did I understand you correctly?
An option to avoid instantiating an ObjectMapping
while keeping that separation would be to add a signature similar to ObjectGraph#apply(Entity left, Entity right, BiConsumer<T, U> setter
.
That looks unwieldy and some validation logic would still need to be moved inside ObjectGraph
.
Closing in favour of https://github.com/ferdinand-swoboda/folo. Thanks for all the feedback and comments. The reason for going with a new project is that I want to see this code being available to the world and I felt this was required to unblock myself and potential users. |
jOOQ
v3.15
deprecated theorg.jooq.RecordHandler
interface which Jolo's publictech.picnic.jolo.Loader
API implemented.One way to deal with this upstream change is to make
Loader
implementjava.util.function.Consumer
directly and changing the usage advice fromto
However, this leads to an unnecessarily cumbersome API. Since jOOQ
v3.15
also introducedorg.jooq.ResultQuery#collect(Collector<? super R, A, X> collector)
a more convenient & generic API is possible:Both options require downstream users to update their usage of Jolo.
This PR foregoes the first option and makes the necessary changes to implement the
Collector
interface for theLoader
. Consequently, it's a non-goal of this PR to maintain a backwards-compatible API surface.This PR includes a number of dependency updates (JDK 11 among others) that I suggest to extract into a separate PR and possibly a release of the OSS parent (cc @Stephan202 ). I've also addressed some upstream bug references and outdated comments along the way.
Some details to pay attention to:
java.util.stream.Collector#combiner
I've created a mutable object graph data structure to separate state from entity & relation definitions, makingEntity
andRelation
effectively immutable.Collector
contract will callfinisher()
exactly once, thus there's no need for memoizing this step.CHANGELOG.md
for details.BiConsumer<Relation, Set<IdPair>>
toFunction<Record, Set<IdPair>
. I failed to understand why the former was chosen and the latter makes for a cleaner implementation. @scranen would you mind chipping in?A potential future improvement is making the loader
java.util.stream.Collector.Characteristics#CONCURRENT
i.e. suitable for concurrent parallel collection. This is not relevant for streams directly stemming fromorg.jooq.ResultQuery
s (which are ordered) but downstream users might opt to do some intermediate processing resulting in unordered streams in which case it might be handy.