Skip to content

Commit

Permalink
Archive: Respect filter_size in query for existing nodes
Browse files Browse the repository at this point in the history
The `QueryParams` dataclass defines the `filter_size` attribute which is
used in all queries to limit the number of parameters used in a query.
This is necessary because without it, large archives would result in
queries with a lot of parameters which can cause exceptions in database
backends, such as SQLite, which define a limit of 1000 by default.

The `aiida.tools.archive._import_nodes` function was not respecting this
setting when determining the set of nodes from the archive that already
exist in the target storage. This would result in an exception when
trying to import a large archive into a storage using SQLite. The
problem is fixed by using the `batch_iter` utility to retrieve the
existing UUIDs in batches of size `filter_size`.
  • Loading branch information
sphuber committed May 20, 2024
1 parent be0db3c commit 26c4c2a
Showing 1 changed file with 10 additions and 5 deletions.
15 changes: 10 additions & 5 deletions src/aiida/tools/archive/imports.py
Original file line number Diff line number Diff line change
Expand Up @@ -460,12 +460,17 @@ def _import_nodes(

# get matching uuids from the backend
backend_uuid_id: Dict[str, int] = {}
input_id_uuid_uuids = list(input_id_uuid.values())

if input_id_uuid:
backend_uuid_id = dict(
orm.QueryBuilder(backend=backend_to)
.append(orm.Node, filters={'uuid': {'in': list(input_id_uuid.values())}}, project=['uuid', 'id'])
.all(batch_size=query_params.batch_size)
)
for _, batch in batch_iter(input_id_uuid_uuids, query_params.filter_size):
backend_uuid_id.update(
dict(
orm.QueryBuilder(backend=backend_to)
.append(orm.Node, filters={'uuid': {'in': batch}}, project=['uuid', 'id'])
.all(batch_size=query_params.batch_size)
)
)

new_nodes = len(input_id_uuid) - len(backend_uuid_id)

Expand Down

0 comments on commit 26c4c2a

Please sign in to comment.