Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What's your take on jOOQ 3.15 MULTISET and MULTISET_AGG #15

Open
lukaseder opened this issue Dec 9, 2021 · 8 comments
Open

What's your take on jOOQ 3.15 MULTISET and MULTISET_AGG #15

lukaseder opened this issue Dec 9, 2021 · 8 comments

Comments

@lukaseder
Copy link

Hi there folks, long time no see! :)

Time has passed, and in the meantime, jOOQ 3.15 has shipped with the awesome MULTISET and MULTISET_AGG operators:

Introductory blog post also here:

User created query example here:

Given the new functionality, I'm curious about your thoughts on MULTISET, given that with Jolo, you've tried to tackle solving this problem on the client side via join deduplication.

Looking forward,
Lukas

@ferdinand-swoboda
Copy link

ferdinand-swoboda commented Dec 11, 2021

Hi @lukaseder,

Thanks for reaching out for feedback. We still make significant use of Jolo in many places, though we haven't updated Jolo in quite some time. I'm intent to fix this and there's a PR in review that makes use of another feature introduced in jOOQ v3.15, ResultQuery#collect (your feedback would be appreciated 🙂 ).

That being said we are closely following jOOQ's development as well as your amazing blog and have started using MULTISET and MULTISET_AGG in some of our services. One of my colleagues filed this related issue in the past (thanks for fixing, btw).

I personally have used the new operators only superficially in toy examples and they're particularly neat to fetch rows into immutable data structures. It seems it doesn’t support cyclic dependencies or any non-tree-shaped object graph, though. Not a big deal in most cases but something that works with Jolo and is IMO worth noting.

Some colleagues have mentioned concerns around the performance impact of the PostgreSQL emulation (PG is our go-to SQL DB), in particular with multiple nesting levels. I can't provide further details myself but I'll ask them to chime in.

As mentioned in your blogpost it’s very useful to save on roundtrips to the DB and avoid wasteful joins in some cases e.g. loading data from multiple independent 1-n relations.

To conclude, we're actively using them and will continue to do so where possible but so far it doesn't qualify as a 100% replacement of Jolo. Maybe this will change with jOOQ v3.16?

cc @PicnicSupermarket/distribution @Marco-Z @scranen in case you have more to add (or correct 😄 )

@nadirbelarouci
Copy link

Hi @lukaseder,

When we filed this issue, I remember there was no exception loggin to that error, instead null was return when the conversion failed, maybe throwing it would make it easy to debug

@lukaseder
Copy link
Author

(your feedback would be appreciated 🙂 ).

Well, I've read the description. Certainly makes sense. I'm sure that a collector based implementation of Jolo will be much more powerful than a RecordHandler based one, especially, since you can also opt into fetching the Result in memory first, and collecting it later from a JDK Stream. Not sure if parallel collection is important, but might be.

It seems it doesn’t support cyclic dependencies or any non-tree-shaped object graph, though

Yes, by design. ORDBMS have introduced the weird concept of a REF type, which has found almost no adoption in the SQL ecosystem, outside of Oracle (and not even with Oracle. Most users aren't using these features).

The classic jOOQ and SQL world sees data as sets of tuples, which are immutable values. The MULTISET (as well as nested ROW or ARRAY) ORDBMS extensions just allow for nesting tuples. They re-introduce the hierarchical model into the relational world, despite this being "forbidden" in 1NF, but if it's used for projections only, not for storage, then the "forbidden-ness" doesn't have to be enforced as thoroughly. There have also been other movements in this direction, see also the https://en.wikipedia.org/wiki/The_Third_Manifesto. Given that the native implementation, SQL/XML, and SQL/JSON all follow this hierarchical approach, it is natural to support it in jOOQ.

Supporting graphs (networked model), however is a lot harder. Outside of Eclipse's EMF, I've never seen a decent implementation of such client side graph representations. Most other implementations are happy to accept graphs in inconsistent, intermediary or even incomplete states (parent references child but not vice versa). I haven't checked, does Jolo do this correctly, like EMF?

I'm not opposed to eventually supporting this as well, but it requires a lot more thought on the jOOQ side. The complexity of correctly materialising all the cycles that can appear this way in a more complex query than the usual single-parent-single-child examples seem quite challenging.

Anyway, definitely worth mentioning that Jolo is doing all that.

Some colleagues have mentioned concerns around the performance impact of the PostgreSQL emulation (PG is our go-to SQL DB), in particular with multiple nesting levels. I can't provide further details myself but I'll ask them to chime in.

I'll be very curious. My own benchmarks are still on the TODO list, but I'd say, as soon as you have multiple nesting levels, the amount of repetition in the parent entity that can now be avoided should be worth it, alone! My guess would have been that these approaches are less useful when there are fewer nesting levels, but the more the better? Of course, nesting means nested loops rather than hash joins, so for reporting style queries on large data sets, this approach might play less well, but a report is hardly going to nest stuff, rather aggregate and denormalise, instead. The queries where MULTISET is useful typically work with nested loop joins already.

Maybe this will change with jOOQ v3.16?

Graphs are not anywhere on the short term roadmap.

@nadirbelarouci When we filed this issue, I remember there was no exception loggin to that error, instead null was return when the conversion failed, maybe throwing it would make it easy to debug

There are a few issues already to improve that. We can't just break the current behaviour, which turns all incovertible values into null, even logging this might be a touch decision given that people rely on this behaviour and would now be flooded by logs.

@Marco-Z
Copy link

Marco-Z commented Dec 15, 2021

Hi @lukaseder,

I was playing around with the multisets these days, comparing it with some previous attempt that I did when trying to fetch JSON from the DB.
The problem that I had there was that I used WITH statements to define the nested queries. This caused the query plan to ignore the indexes on the joined fields and loop through the whole tables to find the relevant rows from the second level of nesting onward 😬 .

The jOOQ implementation is different, even with the emulation on PSQL, I didn't observe the same problem (though I haven't run thorough benchmarks).
And as you said, the more nesting, the more data transfer is saved because we don't need to fetch duplicated data.

Question: is there a similar construct in jOOQ for 1-to-1 nested records?
I'm trying to see if queries can be built in such a way that they match closely the entity model.

@lukaseder
Copy link
Author

@Marco-Z: I'm assuming this is about PostgreSQL, so can you try WITH .. AS NOT MATERIALIZED? See https://www.postgresql.org/docs/current/queries-with.html#id-1.5.6.12.7

Question: is there a similar construct in jOOQ for 1-to-1 nested records?

I'm not sure what you mean by 1-to-1 here, but you can nest records with DSL.row(). Row[N] types now extend SelectField<Record[N]> for precisely that purpose, see:
https://www.jooq.org/doc/latest/manual/sql-building/column-expressions/nested-records/

Along with ad-hoc conversion, you can map any nested Record[N] to a constructor reference with N parameters. See examples on the above manual page.

@lukaseder
Copy link
Author

Of course, there are always embedded records, too to nest data: https://www.jooq.org/doc/latest/manual/code-generation/codegen-embeddable-types/

@Marco-Z
Copy link

Marco-Z commented Dec 16, 2021

I'm assuming this is about PostgreSQL, so can you try WITH .. AS NOT MATERIALIZED?

Didn't know about this one. I guess it may work. But now multiset allow to do the same with a better syntax (in the application code at least) and with good performance as well.

you can nest records with DSL.row(). Row[N] types now extend SelectField<Record[N]> for precisely that purpose

Nice! Exactly what I was looking for. Using this and multiset we can build the query with exactly the same grouping as our Java entities (with a couple of caveats as noted in #15 (comment)), so instantiating Pojos becomes a matter of defining a simple mapper function/constructor.

@lukaseder
Copy link
Author

so instantiating Pojos becomes a matter of defining a simple mapper function/constructor.

Yeah. All you have to do is patch up the resulting POJO tree to become a graph, where appropriate

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

4 participants