N + 1 round trip problem #35

itaied246 · 2017-02-16T16:34:58Z

Does this library handle nested models (joins) in a single query from the server to the DB?
For example

user {
  id
  posts {
    id
  }
}

The text was updated successfully, but these errors were encountered:

yfilali · 2017-02-24T21:33:25Z

No, it is something you'll need to handle on your own. See a few code samples to deal with this here: graphql-python/graphene#348

Use the code from that issue to know what is being asked at the very first resolve level, then combine all of it into the query you return in the resolve (using subqueryload or joinedload)

yfilali · 2017-03-16T22:08:28Z

In case you are still looking for a complete solution:
https://yacine.org/2017/02/27/graphqlgraphene-sqlalchemy-and-the-n1-problem/

Cito · 2017-05-27T17:37:11Z

Thank you for that blog post @yfilali, that was really helpful.

Your optimize_resolve method works pretty well for me. By using joinedload instead of subqueryload, I was able to reduce a deeply nested GraphQL query into a single database query. I had to add all paths, however, not only the leaves (simply removed the if not p.isLeaf part when assembling the joins).

Another small problem is that in resolve_related you need to translate the field names from GraphQL (camel case) to SQLAlchemy (snake case) with field = to_snake_case(field), otherwise it doesn't work with table columns with underscores in them. You can import the to_snake_case method from graphene.utils.str_converters.

I feel something like this should be really integrated into graphene-sqlalchemy. Or at least it should provide some helper functions which make it easy to implement such optimized querying, with various loading strategies as options. After all, one of the main advantages of GraphQL over REST is that it solves the N+1 problem. This advantage would be greater if it existed also on the level of the database, not only on the network level.

chanind · 2017-06-08T08:06:15Z

Another option is to use a lazy-loader for SQLAlchemy that is more intelligent about bulk loading. We wrote a custom loader that will inspect the DB session and bulk-load all relations on similar models whenever a relation is lazy-loaded. The result should be the equivalent to using subqueryload on all relations upfront, but without needing any custom code to do so. The loader we wrote is available here in case it's helpful to anyone else: https://github.com/operator/sqlalchemy_bulk_lazy_loader

flewellyn · 2018-02-06T01:36:04Z

@yfilali Is that code you wrote for optimize_resolve under the same MIT license as the rest of the library? I would love to make use of it.

yfilali · 2018-02-07T22:56:50Z

@flewellyn absolutely!

mekhami · 2018-07-16T21:34:15Z

@yfilali I can't seem to access your website, is it still up? Would you mind posting the contents here if not?

yfilali · 2018-07-17T02:13:46Z

@mekhami it seems up at the moment so it could have been temporary or regional (I didn't get any alerts either)

mekhami · 2018-10-21T18:02:19Z

@yfilali sorry to bother again but it's still down and I'm not able to access it. Could you post the text of it here or somewhere? I remember it being extremely valuable but didn't have the time then to implement it.

ocavue · 2018-10-22T03:17:26Z

@mekhami I think it's valuable and I can't access it too. I read it from google cache several months ago and I do think it's worth to read. But the catch itself is 404 now, maybe because the original page is down for too long. I saved the cache just in case: GraphQL_Yacine.zip
(github don't allow .htm file so I compress it)

@yfilali I will delete the cache page if you are not ok with this.

yfilali · 2018-10-22T16:59:02Z

@mekhami it's completely fine. Sorry about that. I'll have to find time to work on my personal site's uptime at some point :)

iksteen · 2019-02-24T22:45:55Z

A bit late to the party. Also: shameless self-promotion. But I've just published an article about this subject which includes a solution where you specify what and how to eagerly load related data in a node's metadata on a per-field basis: https://blurringexistence.net/graphene-sqlalchemy-n-plus-1.html

flewellyn · 2019-04-26T17:45:04Z

@iksteen Your eager loader looks fascinating. Question: will it be able to handle custom-defined object types that are not SQLAlchemy models? I have a couple defined to handle special cases, like geoalchemy2 geometries.

jnak · 2019-10-30T17:46:44Z

Just a heads that there is a PR currently opened that addresses the N+1 problem. Would love to get your input. Cheers.

See #253 (review)

flewellyn · 2019-10-30T18:14:29Z

Intriguing. When that's available, I will have to compare it with @yfilali's solution for performance and usability.

sreerajkksd · 2020-08-04T10:33:53Z

Did we solve this in #253 and #254 ?

jnak · 2020-08-04T23:57:11Z

Yes that's available as of 2.3.0. See release notes how to enable it. Would love feedback on performance and usability.

sreerajkksd · 2020-08-05T12:20:49Z

I tested the batching option and it works flawlessly.

I have one more related question:

From the given example, If user field is meant to return multiple users, then it is still going to iterate on all users and then posts to return the data. Can that be solved in anyway ?

To be more clear: can we stop iterating all users one by one and then go through the posts in any way ?

I know I'm asking a bit too much, but just wanted to know your opinion.

adrianschneider94 · 2020-10-13T12:49:07Z

Hey, I tested #253, however it's targeting Relay, right? At least I couldn't make it work without it.
I guess I reimplemented something that was discussed here before :D
But it works and maybe could be of use for somebody.

https://gist.github.com/adrianschneider94/90f662ffab9dce06e2f291579ad480b7

Usage:

class Query(graphene.ObjectType):
    get_all = graphene.List(ModelSchema)

    def resolve_get_all(self, info):
        query: Query = ModelSchema.get_query(info)
        query = smart_load(query, info)
        return query.all()

You can select the loading strategy:

query = smart_load(query, info, strategy="select-in")
# or
query = smart_load(query, info, strategy="lazy")
# or 
query = smart_load(query, info, strategy="joined")

Best wishes!

adrianschneider94 · 2021-05-21T16:07:28Z

And here the dataloader approach which is superior I guess.

Concept:

SmartSQLAlchemyObjectType inherits from SQLAlchemyObjectType and creates a promise.DataLoader for each relationship
The data loaders are created with a factory function that only depends on the SQLAlchemy attribute
All data loaders are stored in a class variable as mapping RelationshipProperty -> data loader
Additionally SmartSQLAlchemyObjectType overwrites the default resolver for each schema class which uses the data loaders.

I get the session from a ContextVar but I guess there should be a solution where you use the session of root in sqlalchemy_default_resolver.

I haven't written tests yet, but it works really well so far.

from functools import partial

from graphene.types.resolver import default_resolver
from graphene_sqlalchemy import SQLAlchemyObjectType
from promise import Promise
from promise.dataloader import DataLoader
from sqlalchemy import inspect, tuple_
from sqlalchemy.orm import RelationshipProperty
from sqlalchemy.orm.attributes import InstrumentedAttribute
from sqlalchemy.orm.base import MANYTOONE

from my_api.context import use_session


def get_identity(obj):
    return inspect(obj).identity


def sqlalchemy_default_resolver(attname, default_value, root, info, **kwargs):
    parent_type = info.parent_type

    class_manager = getattr(root, "_sa_class_manager", None)
    if class_manager:
        class_ = getattr(class_manager, "class_", None)
        if class_:
            attribute = getattr(class_, attname, None)
            if attribute:
                prop = attribute.property
                data_loader = parent_type.graphene_type.data_loaders.get(prop, None)
                if data_loader:
                    data = data_loader.load(get_identity(root))
                    if prop.direction == MANYTOONE:
                        return data.then(lambda value: value[0] if len(value) == 1 else None)
                    else:
                        return data

    return default_resolver(attname, default_value, root, info, **kwargs)


def create_relationship_loader(attribute: InstrumentedAttribute):
    class RelationshipLoader(DataLoader):
        def batch_load_fn(self, keys):
            def resolver(resolve, reject):
                session = use_session()
                Remote = attribute.entity.class_
                Local = attribute.parent.class_
                primary_keys = tuple(key.expression for key in inspect(Local).primary_key)
                order_by_property = attribute.property.order_by
                order_by = tuple(item.expression for item in order_by_property) if order_by_property else tuple()

                result = session \
                    .query(tuple_(*primary_keys), Remote) \
                    .join(attribute) \
                    .filter(tuple_(*primary_keys).in_(keys)) \
                    .order_by(*order_by, *primary_keys) \
                    .all()

                res = [[v for k, v in result if k == key or k == key[0]] for key in keys]
                return resolve(res)

            return Promise(resolver)

    return RelationshipLoader


class SmartSQLAlchemyObjectType(SQLAlchemyObjectType):
    data_loaders = {}

    class Meta:
        abstract = True

    @classmethod
    def __init_subclass_with_meta__(cls, *args, **kwargs):
        model = kwargs['model']
        for key, attribute in model._sa_class_manager.local_attrs.items():
            property = attribute.property
            if isinstance(property, RelationshipProperty):
                data_loader = create_relationship_loader(attribute)(cache=False, max_batch_size=50)
                cls.data_loaders[property] = data_loader

                if property.direction in [MANYTOONE]:
                    resolver_name = f"resolve_{key}"
                    if resolver_name not in dir(cls):
                        setattr(cls, resolver_name, partial(sqlalchemy_default_resolver, key, None))

        if "default_resolver" not in kwargs:
            kwargs['default_resolver'] = sqlalchemy_default_resolver

        super().__init_subclass_with_meta__(*args, **kwargs)

thaonc97 · 2021-07-06T08:46:37Z

Could anyone provide a working example with batching please? I 've tried to figure it out for days but still no luck.

thaonc97 · 2021-07-12T07:21:49Z

OK I figured it out, the batching option doesn't work with the current release of SQLAlchemy (1.4.20), I reinstalled SQLAlchemy to version 1.3.18 and it works fine. Haven't tested on others.

PaulSchweizer · 2022-04-19T19:22:21Z

I am also a bit confused about the batching feature.
I inspected the tests and I can see two SELECT statements where I would expect only one.

https://github.com/graphql-python/graphene-sqlalchemy/blob/master/graphene_sqlalchemy/tests/test_batching.py#L121

Printing the logging messages that are captured in the test, I can see two SELECT statements:

[
    'BEGIN (implicit)', 
    'SELECT articles.id AS articles_id, articles.headline AS articles_headline, articles.pub_date AS articles_pub_date, articles.reporter_id AS articles_reporter_id \nFROM articles', 
    '[generated in 0.00017s] ()', 
    'SELECT reporters.id AS reporters_id, (SELECT CAST(count(reporters.id) AS INTEGER) AS count_1 \nFROM reporters) AS anon_1, reporters.first_name AS reporters_first_name, reporters.last_name AS reporters_last_name, reporters.email AS reporters_email, reporters.favorite_pet_kind AS reporters_favorite_pet_kind \nFROM reporters \nWHERE reporters.id IN (?, ?)', 
    '[generated in 0.00059s] (1, 2)'
]

I get the same result with SQLAlchemy==1.3.18 and 1.4.35 and also ran sqltap to verify this behavior.

We have solved the n+1 issue similar to how @adrianschneider94 demonstrated above so I thought that batching could make our custom solution superfluous but that seems to not be the case.

Am I maybe misunderstanding what batching is supposed to do?

PaulSchweizer · 2022-04-23T12:25:21Z

So I read up on it a bit more and it makes sense to me now:

Batching uses the "select in" loading mechanic of sqlalchemy which loads all related objects upfront by using the primary keys returned from the first query. It then emits a second SELECT statement to load the related objects in bulk as opposed to loading each one of them individually at the point of access. This is just the gist however and not the whole truth, better to read up the official docs: https://docs.sqlalchemy.org/en/14/orm/loading_relationships.html#selectin-eager-loading

So in conclusion if you want just ONE SINGLE SELECT statement, you have to implement your own solution like mentioned above.

flipbit03 · 2022-09-02T15:10:38Z

(SQLAlchemy 1.4.40)

I've activated batching = True on all my graphene-sqlalchemy translated models, and whenever I try to access some
relationships, the graphql resolver is failing with "future is attached to a different loop"

class GConstruct(GraphQLPrimaryKeyIsUUIDMixin, AuthorizeCreatorMixin):
    class Meta:
        model = Construct
        batching = True

class GConstructPart(GraphQLPrimaryKeyIsUUIDMixin, AuthorizeCreatorMixin):
    class Meta:
        model = ConstructPart
        batching = True

class GPart(GraphQLPrimaryKeyIsUUIDMixin, AuthorizeCreatorMixin):
    class Meta:
        model = Part
        batching = True

This is the GraphQL Query:

{
readConstruct(constructId:"ad500676-5503-aac0-ec38-d3097958c6d0")
   {
    id
    constructParts { <--- this is a relationship
      id
      part { <-- this is a relationship
        id
      }
    }
  }
}

I get an error on the part resolver (but also in several other places in the system, basically it's not working correctly in most places)

Task <Task pending name='Task-347' coro=<ExecutionContext.resolve_field.<locals>.await_result()
 running at /home/cadu/.local/share/virtualenvs/kernel-backend-N3YlRYFF/lib/python3.10/site-
packages/graphql/execution/execute.py:625> cb=[gather.<locals>._done_callback() at
 /home/cadu/.pyenv/versions/3.10.6/lib/python3.10/asyncio/tasks.py:720]> 
got Future <Future pending> attached to a different loop

This is the traceback:

Traceback (most recent call last):

> File "/home/cadu/.local/share/virtualenvs/kernel-backend-N3YlRYFF/lib/python3.10/site-packages/graphql/execution/execute.py", line 625, in await_result
    return_type, field_nodes, info, path, await result
    │            │            │     │           └ <coroutine object get_batch_resolver.<locals>.resolve at 0x7fa8a930d7e0>
    │            │            │     └ Path(prev=Path(prev=Path(prev=Path(prev=None, key='readConstruct', typename='Query'), key='constructParts', typename='GConstr...
    │            │            └ GraphQLResolveInfo(field_name='part', field_nodes=[FieldNode at 116:141], return_type=<GrapheneObjectType 'GPart'>, parent_ty...
    │            └ [FieldNode at 116:141]
    └ <GrapheneObjectType 'GPart'>
  File "/home/cadu/.local/share/virtualenvs/kernel-backend-N3YlRYFF/lib/python3.10/site-packages/graphene_sqlalchemy/batching.py", line 87, in resolve
    return await loader.load(root)
                 │      │    └ <ConstructPart PKID=01d1cab8-2bb3-b064-1250-54c9940fd582>
                 │      └ <function DataLoader.load at 0x7fa8df12f910>
                 └ <graphene_sqlalchemy.batching.get_batch_resolver.<locals>.RelationshipLoader object at 0x7fa8b09d9960>

RuntimeError: Task <Task pending name='Task-391' coro=<ExecutionContext.resolve_field.<locals>.await_result() running at /home/cadu/.local/share/virtualenvs/kernel-backend-N3YlRYFF/lib/python3.10/site-packages/graphql/execution/execute.py:625> cb=[gather.<locals>._done_callback() at /home/cadu/.pyenv/versions/3.10.6/lib/python3.10/asyncio/tasks.py:720]> got Future <Future pending> attached to a different loop

This query is happening from the GraphQL Playground from a FastAPI app

thejcannon mentioned this issue Apr 5, 2019

Consider sharing implementation with graphene-django #202

Open

erikwrede mentioned this issue Apr 14, 2022

Question: Support sqlalchemy AsyncSession ? #324

Closed

jendrikjoe mentioned this issue May 16, 2022

feat(async): add support for async sessions #350

Merged

2 tasks

zeptonaut mentioned this issue Jun 13, 2023

Recreate loader if old loader is on different loop #395

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

N + 1 round trip problem #35

N + 1 round trip problem #35

itaied246 commented Feb 16, 2017 •

edited

Loading

yfilali commented Feb 24, 2017 •

edited

Loading

yfilali commented Mar 16, 2017

Cito commented May 27, 2017

chanind commented Jun 8, 2017

flewellyn commented Feb 6, 2018

yfilali commented Feb 7, 2018

mekhami commented Jul 16, 2018

yfilali commented Jul 17, 2018

mekhami commented Oct 21, 2018

ocavue commented Oct 22, 2018 •

edited

Loading

yfilali commented Oct 22, 2018

iksteen commented Feb 24, 2019 •

edited

Loading

flewellyn commented Apr 26, 2019

jnak commented Oct 30, 2019

flewellyn commented Oct 30, 2019

sreerajkksd commented Aug 4, 2020

jnak commented Aug 4, 2020

sreerajkksd commented Aug 5, 2020 •

edited

Loading

adrianschneider94 commented Oct 13, 2020

adrianschneider94 commented May 21, 2021 •

edited

Loading

thaonc97 commented Jul 6, 2021

thaonc97 commented Jul 12, 2021 •

edited

Loading

PaulSchweizer commented Apr 19, 2022

PaulSchweizer commented Apr 23, 2022

flipbit03 commented Sep 2, 2022 •

edited

Loading

N + 1 round trip problem #35

N + 1 round trip problem #35

Comments

itaied246 commented Feb 16, 2017 • edited Loading

yfilali commented Feb 24, 2017 • edited Loading

yfilali commented Mar 16, 2017

Cito commented May 27, 2017

chanind commented Jun 8, 2017

flewellyn commented Feb 6, 2018

yfilali commented Feb 7, 2018

mekhami commented Jul 16, 2018

yfilali commented Jul 17, 2018

mekhami commented Oct 21, 2018

ocavue commented Oct 22, 2018 • edited Loading

yfilali commented Oct 22, 2018

iksteen commented Feb 24, 2019 • edited Loading

flewellyn commented Apr 26, 2019

jnak commented Oct 30, 2019

flewellyn commented Oct 30, 2019

sreerajkksd commented Aug 4, 2020

jnak commented Aug 4, 2020

sreerajkksd commented Aug 5, 2020 • edited Loading

adrianschneider94 commented Oct 13, 2020

adrianschneider94 commented May 21, 2021 • edited Loading

thaonc97 commented Jul 6, 2021

thaonc97 commented Jul 12, 2021 • edited Loading

PaulSchweizer commented Apr 19, 2022

PaulSchweizer commented Apr 23, 2022

flipbit03 commented Sep 2, 2022 • edited Loading

itaied246 commented Feb 16, 2017 •

edited

Loading

yfilali commented Feb 24, 2017 •

edited

Loading

ocavue commented Oct 22, 2018 •

edited

Loading

iksteen commented Feb 24, 2019 •

edited

Loading

sreerajkksd commented Aug 5, 2020 •

edited

Loading

adrianschneider94 commented May 21, 2021 •

edited

Loading

thaonc97 commented Jul 12, 2021 •

edited

Loading

flipbit03 commented Sep 2, 2022 •

edited

Loading