Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use explicit values for MaxResults in all Glue APIs #20189

Merged
merged 1 commit into from
Dec 28, 2023

Conversation

hashhar
Copy link
Member

@hashhar hashhar commented Dec 21, 2023

Description

Most listing Glue APIs accept a MaxResults parameter which decides how many objects are returned in a single API call. The default values are not documented but observed behaviour is that the default values change
on some basis.

This commit adds explicit values for MaxResults in the APIs which support it to the maximum possible values.

This possibly reduces the number of Glue calls when listing tables, databases or functions in some cases.

This is similar to 4f22b0e and 45dc37d.

Release notes

( ) This is not user-visible or is docs only, and no release notes are required.
( ) Release notes are required. Please propose a release note for me.
(x) Release notes are required, with the following suggested text:

# Hive, Iceberg, Delta
* Reduce number of requests made to AWS Glue when listing tables, schemas or functions in some cases. ({issue}`20189`)

@cla-bot cla-bot bot added the cla-signed label Dec 21, 2023
@github-actions github-actions bot added tests:hive hive Hive connector labels Dec 21, 2023
@hashhar hashhar marked this pull request as ready for review December 21, 2023 08:58
@findinpath
Copy link
Contributor

findinpath commented Dec 21, 2023

I've checked with debugging (in io.trino.plugin.hive.metastore.glue.AwsSdkUtil#getPaginatedResults) with & without your changes (at least for listing databases) and haven't noticed actually any difference.
The schema list result had in both cases retrieved 100 schemas at a time.
Pls update the description of the issue to reflect this. (the number of schemas retrieved by default is not 1 , but 100).
Nevertheless, I don't see any reason why not to have explicit values used for the AWS Glue calls.

Most listing Glue APIs accept a MaxResults parameter which decides how
many objects are returned in a single API call. The default values are
not documented but observed behaviour is that the default values change
on some basis.

This commit adds explicit values for MaxResults in the APIs which
support it to the maximum possible values.

This possibly reduces the number of Glue calls when listing tables,
databases or functions in some cases.

This is similar to 4f22b0e and
45dc37d.
@hashhar hashhar force-pushed the hashhar/glue-page-size branch from 640bd0e to 38008a4 Compare December 28, 2023 05:24
@hashhar hashhar changed the title Reduce number of Glue calls when listing tables, databases or functions Use explicit values for MaxResults in all Glue APIs Dec 28, 2023
@hashhar hashhar merged commit f124589 into master Dec 28, 2023
64 checks passed
@hashhar hashhar deleted the hashhar/glue-page-size branch December 28, 2023 06:44
@github-actions github-actions bot added this to the 436 milestone Dec 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla-signed hive Hive connector
Development

Successfully merging this pull request may close these issues.

3 participants