Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

N + 1 round trip problem #35

Open
itaied246 opened this issue Feb 16, 2017 · 25 comments
Open

N + 1 round trip problem #35

itaied246 opened this issue Feb 16, 2017 · 25 comments

Comments

@itaied246
Copy link

itaied246 commented Feb 16, 2017

Does this library handle nested models (joins) in a single query from the server to the DB?
For example

user {
  id
  posts {
    id
  }
}
@yfilali
Copy link
Contributor

yfilali commented Feb 24, 2017

No, it is something you'll need to handle on your own. See a few code samples to deal with this here: graphql-python/graphene#348

Use the code from that issue to know what is being asked at the very first resolve level, then combine all of it into the query you return in the resolve (using subqueryload or joinedload)

@yfilali
Copy link
Contributor

yfilali commented Mar 16, 2017

In case you are still looking for a complete solution:
https://yacine.org/2017/02/27/graphqlgraphene-sqlalchemy-and-the-n1-problem/

@Cito
Copy link
Member

Cito commented May 27, 2017

Thank you for that blog post @yfilali, that was really helpful.

Your optimize_resolve method works pretty well for me. By using joinedload instead of subqueryload, I was able to reduce a deeply nested GraphQL query into a single database query. I had to add all paths, however, not only the leaves (simply removed the if not p.isLeaf part when assembling the joins).

Another small problem is that in resolve_related you need to translate the field names from GraphQL (camel case) to SQLAlchemy (snake case) with field = to_snake_case(field), otherwise it doesn't work with table columns with underscores in them. You can import the to_snake_case method from graphene.utils.str_converters.

I feel something like this should be really integrated into graphene-sqlalchemy. Or at least it should provide some helper functions which make it easy to implement such optimized querying, with various loading strategies as options. After all, one of the main advantages of GraphQL over REST is that it solves the N+1 problem. This advantage would be greater if it existed also on the level of the database, not only on the network level.

@chanind
Copy link

chanind commented Jun 8, 2017

Another option is to use a lazy-loader for SQLAlchemy that is more intelligent about bulk loading. We wrote a custom loader that will inspect the DB session and bulk-load all relations on similar models whenever a relation is lazy-loaded. The result should be the equivalent to using subqueryload on all relations upfront, but without needing any custom code to do so. The loader we wrote is available here in case it's helpful to anyone else: https://github.com/operator/sqlalchemy_bulk_lazy_loader

@flewellyn
Copy link

@yfilali Is that code you wrote for optimize_resolve under the same MIT license as the rest of the library? I would love to make use of it.

@yfilali
Copy link
Contributor

yfilali commented Feb 7, 2018

@flewellyn absolutely!

@mekhami
Copy link

mekhami commented Jul 16, 2018

@yfilali I can't seem to access your website, is it still up? Would you mind posting the contents here if not?

@yfilali
Copy link
Contributor

yfilali commented Jul 17, 2018

@mekhami it seems up at the moment so it could have been temporary or regional (I didn't get any alerts either)

@mekhami
Copy link

mekhami commented Oct 21, 2018

@yfilali sorry to bother again but it's still down and I'm not able to access it. Could you post the text of it here or somewhere? I remember it being extremely valuable but didn't have the time then to implement it.

@ocavue
Copy link

ocavue commented Oct 22, 2018

@mekhami I think it's valuable and I can't access it too. I read it from google cache several months ago and I do think it's worth to read. But the catch itself is 404 now, maybe because the original page is down for too long. I saved the cache just in case: GraphQL_Yacine.zip
(github don't allow .htm file so I compress it)

@yfilali I will delete the cache page if you are not ok with this.

@yfilali
Copy link
Contributor

yfilali commented Oct 22, 2018

@mekhami it's completely fine. Sorry about that. I'll have to find time to work on my personal site's uptime at some point :)

@iksteen
Copy link

iksteen commented Feb 24, 2019

A bit late to the party. Also: shameless self-promotion. But I've just published an article about this subject which includes a solution where you specify what and how to eagerly load related data in a node's metadata on a per-field basis: https://blurringexistence.net/graphene-sqlalchemy-n-plus-1.html

@flewellyn
Copy link

@iksteen Your eager loader looks fascinating. Question: will it be able to handle custom-defined object types that are not SQLAlchemy models? I have a couple defined to handle special cases, like geoalchemy2 geometries.

@jnak
Copy link
Collaborator

jnak commented Oct 30, 2019

Just a heads that there is a PR currently opened that addresses the N+1 problem. Would love to get your input. Cheers.

See #253 (review)

@flewellyn
Copy link

Intriguing. When that's available, I will have to compare it with @yfilali's solution for performance and usability.

@sreerajkksd
Copy link

Did we solve this in #253 and #254 ?

@jnak
Copy link
Collaborator

jnak commented Aug 4, 2020

Yes that's available as of 2.3.0. See release notes how to enable it. Would love feedback on performance and usability.

@sreerajkksd
Copy link

sreerajkksd commented Aug 5, 2020

I tested the batching option and it works flawlessly.

I have one more related question:

From the given example, If user field is meant to return multiple users, then it is still going to iterate on all users and then posts to return the data. Can that be solved in anyway ?

To be more clear: can we stop iterating all users one by one and then go through the posts in any way ?

I know I'm asking a bit too much, but just wanted to know your opinion.

@adrianschneider94
Copy link

Hey, I tested #253, however it's targeting Relay, right? At least I couldn't make it work without it.
I guess I reimplemented something that was discussed here before :D
But it works and maybe could be of use for somebody.

https://gist.github.com/adrianschneider94/90f662ffab9dce06e2f291579ad480b7

Usage:

class Query(graphene.ObjectType):
    get_all = graphene.List(ModelSchema)

    def resolve_get_all(self, info):
        query: Query = ModelSchema.get_query(info)
        query = smart_load(query, info)
        return query.all()

You can select the loading strategy:

query = smart_load(query, info, strategy="select-in")
# or
query = smart_load(query, info, strategy="lazy")
# or 
query = smart_load(query, info, strategy="joined")

Best wishes!

@adrianschneider94
Copy link

adrianschneider94 commented May 21, 2021

And here the dataloader approach which is superior I guess.

Concept:

  • SmartSQLAlchemyObjectType inherits from SQLAlchemyObjectType and creates a promise.DataLoader for each relationship
  • The data loaders are created with a factory function that only depends on the SQLAlchemy attribute
  • All data loaders are stored in a class variable as mapping RelationshipProperty -> data loader
  • Additionally SmartSQLAlchemyObjectType overwrites the default resolver for each schema class which uses the data loaders.

I get the session from a ContextVar but I guess there should be a solution where you use the session of root in sqlalchemy_default_resolver.

I haven't written tests yet, but it works really well so far.

from functools import partial

from graphene.types.resolver import default_resolver
from graphene_sqlalchemy import SQLAlchemyObjectType
from promise import Promise
from promise.dataloader import DataLoader
from sqlalchemy import inspect, tuple_
from sqlalchemy.orm import RelationshipProperty
from sqlalchemy.orm.attributes import InstrumentedAttribute
from sqlalchemy.orm.base import MANYTOONE

from my_api.context import use_session


def get_identity(obj):
    return inspect(obj).identity


def sqlalchemy_default_resolver(attname, default_value, root, info, **kwargs):
    parent_type = info.parent_type

    class_manager = getattr(root, "_sa_class_manager", None)
    if class_manager:
        class_ = getattr(class_manager, "class_", None)
        if class_:
            attribute = getattr(class_, attname, None)
            if attribute:
                prop = attribute.property
                data_loader = parent_type.graphene_type.data_loaders.get(prop, None)
                if data_loader:
                    data = data_loader.load(get_identity(root))
                    if prop.direction == MANYTOONE:
                        return data.then(lambda value: value[0] if len(value) == 1 else None)
                    else:
                        return data

    return default_resolver(attname, default_value, root, info, **kwargs)


def create_relationship_loader(attribute: InstrumentedAttribute):
    class RelationshipLoader(DataLoader):
        def batch_load_fn(self, keys):
            def resolver(resolve, reject):
                session = use_session()
                Remote = attribute.entity.class_
                Local = attribute.parent.class_
                primary_keys = tuple(key.expression for key in inspect(Local).primary_key)
                order_by_property = attribute.property.order_by
                order_by = tuple(item.expression for item in order_by_property) if order_by_property else tuple()

                result = session \
                    .query(tuple_(*primary_keys), Remote) \
                    .join(attribute) \
                    .filter(tuple_(*primary_keys).in_(keys)) \
                    .order_by(*order_by, *primary_keys) \
                    .all()

                res = [[v for k, v in result if k == key or k == key[0]] for key in keys]
                return resolve(res)

            return Promise(resolver)

    return RelationshipLoader


class SmartSQLAlchemyObjectType(SQLAlchemyObjectType):
    data_loaders = {}

    class Meta:
        abstract = True

    @classmethod
    def __init_subclass_with_meta__(cls, *args, **kwargs):
        model = kwargs['model']
        for key, attribute in model._sa_class_manager.local_attrs.items():
            property = attribute.property
            if isinstance(property, RelationshipProperty):
                data_loader = create_relationship_loader(attribute)(cache=False, max_batch_size=50)
                cls.data_loaders[property] = data_loader

                if property.direction in [MANYTOONE]:
                    resolver_name = f"resolve_{key}"
                    if resolver_name not in dir(cls):
                        setattr(cls, resolver_name, partial(sqlalchemy_default_resolver, key, None))

        if "default_resolver" not in kwargs:
            kwargs['default_resolver'] = sqlalchemy_default_resolver

        super().__init_subclass_with_meta__(*args, **kwargs)

@thaonc97
Copy link

thaonc97 commented Jul 6, 2021

Could anyone provide a working example with batching please? I 've tried to figure it out for days but still no luck.

@thaonc97
Copy link

thaonc97 commented Jul 12, 2021

OK I figured it out, the batching option doesn't work with the current release of SQLAlchemy (1.4.20), I reinstalled SQLAlchemy to version 1.3.18 and it works fine. Haven't tested on others.

@PaulSchweizer
Copy link
Collaborator

I am also a bit confused about the batching feature.
I inspected the tests and I can see two SELECT statements where I would expect only one.

https://github.com/graphql-python/graphene-sqlalchemy/blob/master/graphene_sqlalchemy/tests/test_batching.py#L121

Printing the logging messages that are captured in the test, I can see two SELECT statements:

[
    'BEGIN (implicit)', 
    'SELECT articles.id AS articles_id, articles.headline AS articles_headline, articles.pub_date AS articles_pub_date, articles.reporter_id AS articles_reporter_id \nFROM articles', 
    '[generated in 0.00017s] ()', 
    'SELECT reporters.id AS reporters_id, (SELECT CAST(count(reporters.id) AS INTEGER) AS count_1 \nFROM reporters) AS anon_1, reporters.first_name AS reporters_first_name, reporters.last_name AS reporters_last_name, reporters.email AS reporters_email, reporters.favorite_pet_kind AS reporters_favorite_pet_kind \nFROM reporters \nWHERE reporters.id IN (?, ?)', 
    '[generated in 0.00059s] (1, 2)'
]

I get the same result with SQLAlchemy==1.3.18 and 1.4.35 and also ran sqltap to verify this behavior.

We have solved the n+1 issue similar to how @adrianschneider94 demonstrated above so I thought that batching could make our custom solution superfluous but that seems to not be the case.

Am I maybe misunderstanding what batching is supposed to do?

@PaulSchweizer
Copy link
Collaborator

So I read up on it a bit more and it makes sense to me now:

Batching uses the "select in" loading mechanic of sqlalchemy which loads all related objects upfront by using the primary keys returned from the first query. It then emits a second SELECT statement to load the related objects in bulk as opposed to loading each one of them individually at the point of access. This is just the gist however and not the whole truth, better to read up the official docs: https://docs.sqlalchemy.org/en/14/orm/loading_relationships.html#selectin-eager-loading

So in conclusion if you want just ONE SINGLE SELECT statement, you have to implement your own solution like mentioned above.

@flipbit03
Copy link
Contributor

flipbit03 commented Sep 2, 2022

(SQLAlchemy 1.4.40)

I've activated batching = True on all my graphene-sqlalchemy translated models, and whenever I try to access some
relationships, the graphql resolver is failing with "future is attached to a different loop"

class GConstruct(GraphQLPrimaryKeyIsUUIDMixin, AuthorizeCreatorMixin):
    class Meta:
        model = Construct
        batching = True

class GConstructPart(GraphQLPrimaryKeyIsUUIDMixin, AuthorizeCreatorMixin):
    class Meta:
        model = ConstructPart
        batching = True

class GPart(GraphQLPrimaryKeyIsUUIDMixin, AuthorizeCreatorMixin):
    class Meta:
        model = Part
        batching = True

This is the GraphQL Query:

{
readConstruct(constructId:"ad500676-5503-aac0-ec38-d3097958c6d0")
   {
    id
    constructParts { <--- this is a relationship
      id
      part { <-- this is a relationship
        id
      }
    }
  }
}

I get an error on the part resolver (but also in several other places in the system, basically it's not working correctly in most places)

Task <Task pending name='Task-347' coro=<ExecutionContext.resolve_field.<locals>.await_result()
 running at /home/cadu/.local/share/virtualenvs/kernel-backend-N3YlRYFF/lib/python3.10/site-
packages/graphql/execution/execute.py:625> cb=[gather.<locals>._done_callback() at
 /home/cadu/.pyenv/versions/3.10.6/lib/python3.10/asyncio/tasks.py:720]> 
got Future <Future pending> attached to a different loop

This is the traceback:

Traceback (most recent call last):

> File "/home/cadu/.local/share/virtualenvs/kernel-backend-N3YlRYFF/lib/python3.10/site-packages/graphql/execution/execute.py", line 625, in await_result
    return_type, field_nodes, info, path, await result
    │            │            │     │           └ <coroutine object get_batch_resolver.<locals>.resolve at 0x7fa8a930d7e0>
    │            │            │     └ Path(prev=Path(prev=Path(prev=Path(prev=None, key='readConstruct', typename='Query'), key='constructParts', typename='GConstr...
    │            │            └ GraphQLResolveInfo(field_name='part', field_nodes=[FieldNode at 116:141], return_type=<GrapheneObjectType 'GPart'>, parent_ty...
    │            └ [FieldNode at 116:141]
    └ <GrapheneObjectType 'GPart'>
  File "/home/cadu/.local/share/virtualenvs/kernel-backend-N3YlRYFF/lib/python3.10/site-packages/graphene_sqlalchemy/batching.py", line 87, in resolve
    return await loader.load(root)
                 │      │    └ <ConstructPart PKID=01d1cab8-2bb3-b064-1250-54c9940fd582>
                 │      └ <function DataLoader.load at 0x7fa8df12f910><graphene_sqlalchemy.batching.get_batch_resolver.<locals>.RelationshipLoader object at 0x7fa8b09d9960>

RuntimeError: Task <Task pending name='Task-391' coro=<ExecutionContext.resolve_field.<locals>.await_result() running at /home/cadu/.local/share/virtualenvs/kernel-backend-N3YlRYFF/lib/python3.10/site-packages/graphql/execution/execute.py:625> cb=[gather.<locals>._done_callback() at /home/cadu/.pyenv/versions/3.10.6/lib/python3.10/asyncio/tasks.py:720]> got Future <Future pending> attached to a different loop

This query is happening from the GraphQL Playground from a FastAPI app

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests