Skip to content

Commit

Permalink
avoid fail of table discover on a big number of tables in database (#308
Browse files Browse the repository at this point in the history
)

* avoid fail of table discover on a big number of tables in database
Co-authored-by: Iakov Gan <[email protected]>
  • Loading branch information
iakov-aws authored Aug 10, 2022
1 parent 51f650a commit dd2fa2a
Showing 1 changed file with 14 additions and 3 deletions.
17 changes: 14 additions & 3 deletions cid/helpers/athena.py
Original file line number Diff line number Diff line change
Expand Up @@ -67,6 +67,11 @@ def DatabaseName(self) -> str:
""" Check if Athena database exist """

if not self._DatabaseName:
if get_parameters().get('athena-database'):
self._DatabaseName = get_parameters().get('athena-database')
if not self.get_database(self._DatabaseName):
logger.critical(f'Database {self._DatabaseName} not found in Athena catalog {self.CatalogName}')
exit(1)
# Get AWS Athena databases
athena_databases = self.list_databases()
if not len(athena_databases):
Expand All @@ -78,7 +83,10 @@ def DatabaseName(self) -> str:
elif len(athena_databases) > 1:
# Remove empty databases from the list
for d in athena_databases:
tables = self.list_table_metadata(DatabaseName=d.get('Name'))
tables = self.list_table_metadata(
DatabaseName=d.get('Name'),
max_items=1000, # This is an impiric limit. User can have up to 200k tables in one DB we need to draw a line somewhere
)
if not len(tables):
athena_databases.remove(d)
# Select default database if present
Expand Down Expand Up @@ -144,10 +152,13 @@ def get_database(self, DatabaseName: str=None) -> bool:
logger.debug(e, stack_info=True)
return False

def list_table_metadata(self, DatabaseName: str=None) -> dict:
def list_table_metadata(self, DatabaseName: str=None, max_items: int=None) -> dict:
params = {
'CatalogName': self.CatalogName,
'DatabaseName': DatabaseName if DatabaseName else self.DatabaseName
'DatabaseName': DatabaseName or self.DatabaseName,
'PaginationConfig':{
'MaxItems': max_items,
},
}
table_metadata = list()
try:
Expand Down

2 comments on commit dd2fa2a

@barrettje
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Attempted to setup CID dashboards using cid-deploy today on a new instance of CloudShell. This commit appears to have broken the deployment process. When running cid-cmd deploy it will error and referencing line 70 in athena.py. Was bale to get the tool to run by editing line 3 of athena.py
from cid.utils import get_parameter, get_parameters

@darken99
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WIll fix in #322, @iakov-aws need a review approved

Please sign in to comment.