Prefetch descriptor functions at graph-handling time, not resource-handling time #10453

jacobtylerwalls · 2023-12-18T13:44:50Z

#10439 applied in two more places the trick from #10312 to skip queries for descriptor functions on the assumption that resources are generally grouped by graph:

arches/arches/app/utils/data_management/resources/formats/archesfile.py

Lines 393 to 395 in 1af6ea8

    
           # Reuse the queryset for FunctionXGraph rows if the graph is the same. 
        
           if last_resource and (last_resource.graph_id == this_resource.graph_id): 
        
               this_resource.descriptor_function = last_resource.descriptor_function

But we could get more savings by removing this "trick" and just prefetching the descriptor functions before we start iterating the resources.

That would go something like this:

When querying for resources, use a Prefetch() object to do the filtering for the graph's descriptor function and save it to descriptor_function on each Resource.
Then remove the "trick" linked above in both files it appears

Then, with the hood popped on this (looking at callers of save_descriptors()), there are a few more wins to be had:

index_resources_by_type() refetches graph objects because the earlier call in index_resources() only gets a values array
Tile.delete() refetches the resource instance
Tile.delete() saves the resource descriptor once per node
import_business_data_without_mapping() fetches graphs one-by-one

The text was updated successfully, but these errors were encountered:

jacobtylerwalls · 2024-04-08T14:50:57Z

The last few bullet-points tacked on here don't provide enough value to work on.

index_resources_by_type() refetches graph objects because the earlier call in index_resources() only gets a values array
Tile.delete() refetches the resource instance
Tile.delete() saves the resource descriptor once per node
import_business_data_without_mapping() fetches graphs one-by-one

[1] -- not very many queries
[2] -- refetching is necessary, because we need the Proxy model
[3] -- already fixed in #10481
[4] -- too hard to refactor

Follow-up to f2861f2.

Use count() for indexing progress bar re #10453

jacobtylerwalls added the Subject: Performance label Dec 18, 2023

chiatt added this to pipeline Dec 18, 2023

jacobtylerwalls moved this to 🏗 In Progress in pipeline Apr 5, 2024

jacobtylerwalls self-assigned this Apr 5, 2024

jacobtylerwalls added a commit that referenced this issue Apr 5, 2024

Prefetch descriptor functions and tiles when indexing #10453

8c6540f

jacobtylerwalls mentioned this issue Apr 5, 2024

Prefetch graphs and tiles when indexing resources #10453 #10743

Merged

6 tasks

jacobtylerwalls linked a pull request Apr 8, 2024 that will close this issue

Prefetch graphs and tiles when indexing resources #10453 #10743

Merged

6 tasks

jacobtylerwalls moved this from 🏗 In Progress to 👀 In Review in pipeline Apr 8, 2024

jacobtylerwalls added a commit that referenced this issue Apr 17, 2024

Add iterator() with chunk_size re #10453

f5f4845

jacobtylerwalls added a commit that referenced this issue Apr 17, 2024

Prefetch graphs and tiles when indexing resources #10453 (#10743)

f2861f2

jacobtylerwalls closed this as completed Apr 17, 2024

github-project-automation bot moved this from 👀 In Review to ✅ Done in pipeline Apr 17, 2024

jacobtylerwalls added a commit that referenced this issue Apr 18, 2024

Use count() for indexing progress bar re #10453

f102945

Follow-up to f2861f2.

whatisgalen added a commit that referenced this issue Apr 22, 2024

Merge pull request #10790 from archesproject/jtw/index-progress-bar

8d49ccf

Use count() for indexing progress bar re #10453

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Prefetch descriptor functions at graph-handling time, not resource-handling time #10453

Prefetch descriptor functions at graph-handling time, not resource-handling time #10453

jacobtylerwalls commented Dec 18, 2023

jacobtylerwalls commented Apr 8, 2024

Prefetch descriptor functions at graph-handling time, not resource-handling time #10453

Prefetch descriptor functions at graph-handling time, not resource-handling time #10453

Comments

jacobtylerwalls commented Dec 18, 2023

jacobtylerwalls commented Apr 8, 2024