-
Notifications
You must be signed in to change notification settings - Fork 228
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
N + 1 round trip problem #35
Comments
No, it is something you'll need to handle on your own. See a few code samples to deal with this here: graphql-python/graphene#348 Use the code from that issue to know what is being asked at the very first resolve level, then combine all of it into the query you return in the resolve (using subqueryload or joinedload) |
In case you are still looking for a complete solution: |
Thank you for that blog post @yfilali, that was really helpful. Your Another small problem is that in I feel something like this should be really integrated into graphene-sqlalchemy. Or at least it should provide some helper functions which make it easy to implement such optimized querying, with various loading strategies as options. After all, one of the main advantages of GraphQL over REST is that it solves the N+1 problem. This advantage would be greater if it existed also on the level of the database, not only on the network level. |
Another option is to use a lazy-loader for SQLAlchemy that is more intelligent about bulk loading. We wrote a custom loader that will inspect the DB session and bulk-load all relations on similar models whenever a relation is lazy-loaded. The result should be the equivalent to using |
@yfilali Is that code you wrote for optimize_resolve under the same MIT license as the rest of the library? I would love to make use of it. |
@flewellyn absolutely! |
@yfilali I can't seem to access your website, is it still up? Would you mind posting the contents here if not? |
@mekhami it seems up at the moment so it could have been temporary or regional (I didn't get any alerts either) |
@yfilali sorry to bother again but it's still down and I'm not able to access it. Could you post the text of it here or somewhere? I remember it being extremely valuable but didn't have the time then to implement it. |
@mekhami I think it's valuable and I can't access it too. I read it from google cache several months ago and I do think it's worth to read. But the catch itself is 404 now, maybe because the original page is down for too long. I saved the cache just in case: GraphQL_Yacine.zip @yfilali I will delete the cache page if you are not ok with this. |
@mekhami it's completely fine. Sorry about that. I'll have to find time to work on my personal site's uptime at some point :) |
A bit late to the party. Also: shameless self-promotion. But I've just published an article about this subject which includes a solution where you specify what and how to eagerly load related data in a node's metadata on a per-field basis: https://blurringexistence.net/graphene-sqlalchemy-n-plus-1.html |
@iksteen Your eager loader looks fascinating. Question: will it be able to handle custom-defined object types that are not SQLAlchemy models? I have a couple defined to handle special cases, like geoalchemy2 geometries. |
Just a heads that there is a PR currently opened that addresses the N+1 problem. Would love to get your input. Cheers. See #253 (review) |
Intriguing. When that's available, I will have to compare it with @yfilali's solution for performance and usability. |
Yes that's available as of |
I tested the batching option and it works flawlessly. I have one more related question: From the given example, If To be more clear: can we stop iterating all users one by one and then go through the posts in any way ? I know I'm asking a bit too much, but just wanted to know your opinion. |
Hey, I tested #253, however it's targeting Relay, right? At least I couldn't make it work without it. https://gist.github.com/adrianschneider94/90f662ffab9dce06e2f291579ad480b7 Usage: class Query(graphene.ObjectType):
get_all = graphene.List(ModelSchema)
def resolve_get_all(self, info):
query: Query = ModelSchema.get_query(info)
query = smart_load(query, info)
return query.all() You can select the loading strategy: query = smart_load(query, info, strategy="select-in")
# or
query = smart_load(query, info, strategy="lazy")
# or
query = smart_load(query, info, strategy="joined") Best wishes! |
And here the dataloader approach which is superior I guess. Concept:
I get the session from a ContextVar but I guess there should be a solution where you use the session of I haven't written tests yet, but it works really well so far. from functools import partial
from graphene.types.resolver import default_resolver
from graphene_sqlalchemy import SQLAlchemyObjectType
from promise import Promise
from promise.dataloader import DataLoader
from sqlalchemy import inspect, tuple_
from sqlalchemy.orm import RelationshipProperty
from sqlalchemy.orm.attributes import InstrumentedAttribute
from sqlalchemy.orm.base import MANYTOONE
from my_api.context import use_session
def get_identity(obj):
return inspect(obj).identity
def sqlalchemy_default_resolver(attname, default_value, root, info, **kwargs):
parent_type = info.parent_type
class_manager = getattr(root, "_sa_class_manager", None)
if class_manager:
class_ = getattr(class_manager, "class_", None)
if class_:
attribute = getattr(class_, attname, None)
if attribute:
prop = attribute.property
data_loader = parent_type.graphene_type.data_loaders.get(prop, None)
if data_loader:
data = data_loader.load(get_identity(root))
if prop.direction == MANYTOONE:
return data.then(lambda value: value[0] if len(value) == 1 else None)
else:
return data
return default_resolver(attname, default_value, root, info, **kwargs)
def create_relationship_loader(attribute: InstrumentedAttribute):
class RelationshipLoader(DataLoader):
def batch_load_fn(self, keys):
def resolver(resolve, reject):
session = use_session()
Remote = attribute.entity.class_
Local = attribute.parent.class_
primary_keys = tuple(key.expression for key in inspect(Local).primary_key)
order_by_property = attribute.property.order_by
order_by = tuple(item.expression for item in order_by_property) if order_by_property else tuple()
result = session \
.query(tuple_(*primary_keys), Remote) \
.join(attribute) \
.filter(tuple_(*primary_keys).in_(keys)) \
.order_by(*order_by, *primary_keys) \
.all()
res = [[v for k, v in result if k == key or k == key[0]] for key in keys]
return resolve(res)
return Promise(resolver)
return RelationshipLoader
class SmartSQLAlchemyObjectType(SQLAlchemyObjectType):
data_loaders = {}
class Meta:
abstract = True
@classmethod
def __init_subclass_with_meta__(cls, *args, **kwargs):
model = kwargs['model']
for key, attribute in model._sa_class_manager.local_attrs.items():
property = attribute.property
if isinstance(property, RelationshipProperty):
data_loader = create_relationship_loader(attribute)(cache=False, max_batch_size=50)
cls.data_loaders[property] = data_loader
if property.direction in [MANYTOONE]:
resolver_name = f"resolve_{key}"
if resolver_name not in dir(cls):
setattr(cls, resolver_name, partial(sqlalchemy_default_resolver, key, None))
if "default_resolver" not in kwargs:
kwargs['default_resolver'] = sqlalchemy_default_resolver
super().__init_subclass_with_meta__(*args, **kwargs) |
Could anyone provide a working example with batching please? I 've tried to figure it out for days but still no luck. |
OK I figured it out, the |
I am also a bit confused about the batching feature. Printing the logging messages that are captured in the test, I can see two SELECT statements: [
'BEGIN (implicit)',
'SELECT articles.id AS articles_id, articles.headline AS articles_headline, articles.pub_date AS articles_pub_date, articles.reporter_id AS articles_reporter_id \nFROM articles',
'[generated in 0.00017s] ()',
'SELECT reporters.id AS reporters_id, (SELECT CAST(count(reporters.id) AS INTEGER) AS count_1 \nFROM reporters) AS anon_1, reporters.first_name AS reporters_first_name, reporters.last_name AS reporters_last_name, reporters.email AS reporters_email, reporters.favorite_pet_kind AS reporters_favorite_pet_kind \nFROM reporters \nWHERE reporters.id IN (?, ?)',
'[generated in 0.00059s] (1, 2)'
] I get the same result with SQLAlchemy==1.3.18 and 1.4.35 and also ran sqltap to verify this behavior. We have solved the n+1 issue similar to how @adrianschneider94 demonstrated above so I thought that batching could make our custom solution superfluous but that seems to not be the case. Am I maybe misunderstanding what batching is supposed to do? |
So I read up on it a bit more and it makes sense to me now: Batching uses the "select in" loading mechanic of sqlalchemy which loads all related objects upfront by using the primary keys returned from the first query. It then emits a second SELECT statement to load the related objects in bulk as opposed to loading each one of them individually at the point of access. This is just the gist however and not the whole truth, better to read up the official docs: https://docs.sqlalchemy.org/en/14/orm/loading_relationships.html#selectin-eager-loading So in conclusion if you want just ONE SINGLE SELECT statement, you have to implement your own solution like mentioned above. |
(SQLAlchemy 1.4.40) I've activated class GConstruct(GraphQLPrimaryKeyIsUUIDMixin, AuthorizeCreatorMixin):
class Meta:
model = Construct
batching = True
class GConstructPart(GraphQLPrimaryKeyIsUUIDMixin, AuthorizeCreatorMixin):
class Meta:
model = ConstructPart
batching = True
class GPart(GraphQLPrimaryKeyIsUUIDMixin, AuthorizeCreatorMixin):
class Meta:
model = Part
batching = True This is the GraphQL Query: {
readConstruct(constructId:"ad500676-5503-aac0-ec38-d3097958c6d0")
{
id
constructParts { <--- this is a relationship
id
part { <-- this is a relationship
id
}
}
}
} I get an error on the
This is the traceback: Traceback (most recent call last):
> File "/home/cadu/.local/share/virtualenvs/kernel-backend-N3YlRYFF/lib/python3.10/site-packages/graphql/execution/execute.py", line 625, in await_result
return_type, field_nodes, info, path, await result
│ │ │ │ └ <coroutine object get_batch_resolver.<locals>.resolve at 0x7fa8a930d7e0>
│ │ │ └ Path(prev=Path(prev=Path(prev=Path(prev=None, key='readConstruct', typename='Query'), key='constructParts', typename='GConstr...
│ │ └ GraphQLResolveInfo(field_name='part', field_nodes=[FieldNode at 116:141], return_type=<GrapheneObjectType 'GPart'>, parent_ty...
│ └ [FieldNode at 116:141]
└ <GrapheneObjectType 'GPart'>
File "/home/cadu/.local/share/virtualenvs/kernel-backend-N3YlRYFF/lib/python3.10/site-packages/graphene_sqlalchemy/batching.py", line 87, in resolve
return await loader.load(root)
│ │ └ <ConstructPart PKID=01d1cab8-2bb3-b064-1250-54c9940fd582>
│ └ <function DataLoader.load at 0x7fa8df12f910>
└ <graphene_sqlalchemy.batching.get_batch_resolver.<locals>.RelationshipLoader object at 0x7fa8b09d9960>
RuntimeError: Task <Task pending name='Task-391' coro=<ExecutionContext.resolve_field.<locals>.await_result() running at /home/cadu/.local/share/virtualenvs/kernel-backend-N3YlRYFF/lib/python3.10/site-packages/graphql/execution/execute.py:625> cb=[gather.<locals>._done_callback() at /home/cadu/.pyenv/versions/3.10.6/lib/python3.10/asyncio/tasks.py:720]> got Future <Future pending> attached to a different loop This query is happening from the GraphQL Playground from a FastAPI app |
Does this library handle nested models (joins) in a single query from the server to the DB?
For example
The text was updated successfully, but these errors were encountered: