Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about PrimaryKeyRelatedField with many=True #9607

Open
cheehong1030 opened this issue Dec 16, 2024 · 10 comments
Open

Question about PrimaryKeyRelatedField with many=True #9607

cheehong1030 opened this issue Dec 16, 2024 · 10 comments

Comments

@cheehong1030
Copy link
Contributor

cheehong1030 commented Dec 16, 2024

I have some questions regarding the following code snippet:

model = PrimaryKeyRelatedField(
    pk_field=IntegerField(),
    queryset=MachineModel.objects.filter(active=True),
    many=True,
    required=True,
)

When PrimaryKeyRelatedField is set with many=True, it generates these queries. It seems like they are not using join.

SELECT "api_application_machinemodel"."id", "api_application_machinemodel"."brand_id", "api_application_machinemodel"."name", "api_application_machinemodel"."description", "api_application_machinemodel"."active" FROM "api_application_machinemodel" WHERE ("api_application_machinemodel"."active" AND "api_application_machinemodel"."id" = 1) LIMIT 21; args=(1,); alias=default
SELECT "api_application_machinemodel"."id", "api_application_machinemodel"."brand_id", "api_application_machinemodel"."name", "api_application_machinemodel"."description", "api_application_machinemodel"."active" FROM "api_application_machinemodel" WHERE ("api_application_machinemodel"."active" AND "api_application_machinemodel"."id" = 2) LIMIT 21; args=(2,); alias=default
SELECT "api_application_machinemodel"."id", "api_application_machinemodel"."brand_id", "api_application_machinemodel"."name", "api_application_machinemodel"."description", "api_application_machinemodel"."active" FROM "api_application_machinemodel" WHERE ("api_application_machinemodel"."active" AND "api_application_machinemodel"."id" = 3) LIMIT 21; args=(3,); alias=default
SELECT "api_application_machinemodel"."id", "api_application_machinemodel"."brand_id", "api_application_machinemodel"."name", "api_application_machinemodel"."description", "api_application_machinemodel"."active" FROM "api_application_machinemodel" WHERE ("api_application_machinemodel"."active" AND "api_application_machinemodel"."id" = 4) LIMIT 21; args=(4,); alias=default
SELECT "api_application_machinemodel"."id", "api_application_machinemodel"."brand_id", "api_application_machinemodel"."name", "api_application_machinemodel"."description", "api_application_machinemodel"."active" FROM "api_application_machinemodel" WHERE ("api_application_machinemodel"."active" AND "api_application_machinemodel"."id" = 5) LIMIT 21; args=(5,); alias=default

Is this behavior a bug, or is it expected not to use join?

@browniebroke
Copy link
Member

Are you prefeching the relationship in your view? You'll need to use Django's prefetch_related().

@cheehong1030
Copy link
Contributor Author

Are you prefeching the relationship in your view? You'll need to use Django's prefetch_related().

This is for validation purposes, so I cannot use prefetch_related.

def put(self, request):
    serializer = MachinePartUpdateSerializer(data=request.data)
    serializer.is_valid()

Here is the serializer:

class MachinePartSerializer(ModelSerializer):
    brand = PrimaryKeyRelatedField(
        pk_field=IntegerField(),
        queryset=MachineBrand.objects.filter(active=True),
        required=True,
    )
    # TODO: performance issue n+1
    model = PrimaryKeyRelatedField(
        pk_field=IntegerField(),
        queryset=MachineModel.objects.filter(active=True),
        many=True,
        required=True,
    )

    class Meta:
        model = MachinePart
        fields = ["id", "brand", "model", "name", "description"]

Here is my model using a many-to-many relationship:

class MachinePart(Model):
    brand = ForeignKey(MachineBrand, on_delete=models.CASCADE)
    model = ManyToManyField(MachineModel, related_name="parts")
    name = CharField(max_length=255)
    part_number = CharField(max_length=255)
    description = TextField(blank=True, null=True)
    active = BooleanField(default=True)

So, how should I handle prefetch the relations

@browniebroke
Copy link
Member

Ah ok, I thought you were doing a read-only request, I didn't realise it was a PUT request. I don't know the answer in this case.

@cheehong1030
Copy link
Contributor Author

This might be a bug. It seems that the case of PrimaryKeyRelatedField with many=True wasn't considered.

@sevdog
Copy link
Contributor

sevdog commented Dec 23, 2024

The problem lies in the implementation of ManyrelatedField which is extremely simple and does not implement any kind of DB-level optimization. The same problem is in ListSerializer.create method. To keep those elements "simple" they just delegate DB operations to children, but this is not-optimal when there are many elements.

This is not technically a bug, because it does what it is expected, yet it is a bad usage of resources and should be addressed.

@cheehong1030
Copy link
Contributor Author

Maybe this could open a PR to optimize it.

@sreedhar742
Copy link

The behavior you're observing is due to how Django's PrimaryKeyRelatedField interacts with the database when many=True is used. Each item in the list of primary keys results in an individual query to fetch the corresponding MachineModel instance. This can lead to N+1 query problems, where N is the number of related objects being fetched.
Validation Per Instance:

PrimaryKeyRelatedField :
fetches each related object separately to ensure it exists and matches the given queryset.
The queries you see are part of this validation process.
No Bulk Query:
By default, PrimaryKeyRelatedField does not perform a single bulk query to fetch all related objects at once.

@cheehong1030
Copy link
Contributor Author

So, is there no solution for this issue?

@sevdog
Copy link
Contributor

sevdog commented Jan 7, 2025

This may be handled like it is done in Django core MultipleChoiceField: build a "bulk" select and than check if every provided key is in the returned queryset:

https://github.com/django/django/blob/40d5516385448a73426aad396778f369a363eda9/django/forms/models.py#L1622-L1658

This is not an hard task, since an implementation is ready for use for the same use-case.

@tomviner
Copy link
Contributor

tomviner commented Jan 9, 2025

We hit this today. Searching the repo I see an attempt was made a while back:

And there's also a approach suggested here: #8919 (comment) which is to try a single self.child_relation.bulk_to_internal_value(data) call before falling back to the current

[
    self.child_relation.to_internal_value(item)
    for item in data
]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants