Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Track Glue API calls that were untracked #11059

Merged
merged 3 commits into from
Feb 22, 2022

Conversation

homar
Copy link
Member

@homar homar commented Feb 16, 2022

Description

One call to Glue was not being tracked.

Rename GlueMetastoreStats.renameTable to GlueMetastoreStats.updateTable.

General information

Is this change a fix, improvement, new feature, refactoring, or other?

a fix

Is this a change to the core query engine, a connector, client library, or the SPI interfaces? (be specific)

a connector

How would you describe this change to a non-technical end user or system administrator?

It will allow to more precisely track calls to external api - Glue

Related issues, pull requests, and links

Documentation

(x) No documentation is needed.
( ) Sufficient documentation is included in this PR.
( ) Documentation PR is available with #prnumber.
( ) Documentation issue #issuenumber is filed, and can be handled later.

Release notes

(x) No release notes entries required.
( ) Release notes entries required with the following suggested text:

# Section
* Fix some things. ({issue}`5678`)

Copy link
Member

@findepi findepi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are you positive this is the only untracked one?

@@ -370,10 +370,10 @@ public void updateTableStatistics(String databaseName, String tableName, AcidTra
final Map<String, String> statisticsParameters = updateStatisticsParameters(table.getParameters(), updatedStatistics.getBasicStatistics());
tableInput.setParameters(statisticsParameters);
table = Table.builder(table).setParameters(statisticsParameters).build();
glueClient.updateTable(new UpdateTableRequest()
stats.getReplaceTable().call(() -> glueClient.updateTable(new UpdateTableRequest()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the stats should be called "updateTable"

@findepi
Copy link
Member

findepi commented Feb 16, 2022

cc @losipiuk @alexjo2144

@homar
Copy link
Member Author

homar commented Feb 16, 2022

are you positive this is the only untracked one?

I looked through other usages of glueClient and didn't spot anything else.

@findepi
Copy link
Member

findepi commented Feb 16, 2022

What about getAllDatabases and other places where we call getPaginatedResults?
or

partitionUpdateRequestsFutures.add(glueClient.batchUpdatePartitionAsync(new BatchUpdatePartitionRequest()
.withCatalogId(catalogId)
.withDatabaseName(table.getDatabaseName())
.withTableName(table.getTableName())
.withEntries(partitionUpdateRequestsPartition)));
?

@homar
Copy link
Member Author

homar commented Feb 16, 2022

What about getAllDatabases and other places where we call getPaginatedResults? or

partitionUpdateRequestsFutures.add(glueClient.batchUpdatePartitionAsync(new BatchUpdatePartitionRequest()
.withCatalogId(catalogId)
.withDatabaseName(table.getDatabaseName())
.withTableName(table.getTableName())
.withEntries(partitionUpdateRequestsPartition)));

?

getPaginatedResults is covered everywhere, just look 1 or 2 lines above

Regarding the one you pointed out, maybe I am wrong but I thought there is no good way to measure this as this is an async call so we pretty much will measure only the time of the invocation of async which is not very informative

@homar homar closed this Feb 16, 2022
@homar homar reopened this Feb 16, 2022
@homar homar force-pushed the homar/track_glue_api_calls branch from ff65249 to 72d5f58 Compare February 16, 2022 12:04
@losipiuk
Copy link
Member

that was -> that were

Copy link
Member

@losipiuk losipiuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - commit message may be a bit more accurate as you are also changing counter name

@losipiuk
Copy link
Member

What about getAllDatabases and other places where we call getPaginatedResults? or

partitionUpdateRequestsFutures.add(glueClient.batchUpdatePartitionAsync(new BatchUpdatePartitionRequest()
.withCatalogId(catalogId)
.withDatabaseName(table.getDatabaseName())
.withTableName(table.getTableName())
.withEntries(partitionUpdateRequestsPartition)));

?

getPaginatedResults is covered everywhere, just look 1 or 2 lines above

Regarding the one you pointed out, maybe I am wrong but I thought there is no good way to measure this as this is an async call so we pretty much will measure only the time of the invocation of async which is not very informative

You may want to add a callback on the returned future to bump statistic.
May be a follwoup.

@homar homar force-pushed the homar/track_glue_api_calls branch 2 times, most recently from c2cf1ca to e739f6b Compare February 17, 2022 13:44
@homar homar changed the title Track Glue API calls that was untracked Track Glue API calls that were untracked Feb 17, 2022
@homar homar force-pushed the homar/track_glue_api_calls branch from e739f6b to a1aa458 Compare February 17, 2022 15:05
@alexjo2144
Copy link
Member

Not strictly necessary but a possible improvement from what we have. Would it be better if getPaginatedResults took a GlueMetastoreApiStats as a parameter and tracked individual API calls rather than wrapping the whole method with one?

Seems like that might be a more helpful way of tracking those stats to me.

.withDatabaseName(table.getDatabaseName())
.withTableName(table.getTableName())
.withEntries(partitionUpdateRequestsPartition)));
.withCatalogId(catalogId)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is more *async calls than just this one. We should address all together.

Comment on lines 57 to 63
public void updateFailures()
{
totalFailures.update(1);
}

public void updateTime(long elapsedMilliseconds)
{
time.add(elapsedMilliseconds, MILLISECONDS);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

replace with single method

void recordCall(executionTime, boolean failure)

@losipiuk
Copy link
Member

Not strictly necessary but a possible improvement from what we have. Would it be better if getPaginatedResults took a GlueMetastoreApiStats as a parameter and tracked individual API calls rather than wrapping the whole method with one?

Seems like that might be a more helpful way of tracking those stats to me.

Agreed. Currently lots of calls is still untracked.

@homar homar force-pushed the homar/track_glue_api_calls branch from a1aa458 to 1fc15e3 Compare February 18, 2022 13:12
.flatMap(List::stream)
.map(com.amazonaws.services.glue.model.Database::getName)
.collect(toImmutableList());
return databaseNames;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you retained the variable, the diff would be readable.

(btw the IDE asks me to inline such variables, but i acutally find it useful during debugging)


public StatsRecordingAsyncHandler(GlueMetastoreStats stats, long startTimeInMillis)
{
this.stats = stats;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

requireNonNull

.filter(tableFilter)
.map(com.amazonaws.services.glue.model.Table::getName)
.collect(toImmutableList());
return tableNames;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same, no idea what changed

@Override
public void onError(Exception e)
{
stats.getBatchUpdatePartition().recordCall(System.currentTimeMillis() - startTimeInMillis, true);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should use different GlueMetastoreApiStats instance for different places when you use StatsRecordingAsyncHandler. Not stats.getBatchUpdatePartition() all the time.
Change parametrization of StatsRecordingAsyncHandler to use GlueMetastoreApiStats instead of GlueMetastoreStats

Copy link
Member Author

@homar homar Feb 18, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes definitely, I forgot to change that, thanks for catching this

@homar homar force-pushed the homar/track_glue_api_calls branch 2 times, most recently from edafa22 to 61df66b Compare February 21, 2022 11:02
.collect(toImmutableList());
return databaseNames;
});
ImmutableList<String> databases = getPaginatedResults(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use List

And keep variable name

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also add a followup commit which just gets rid of variables.

.withDatabaseName(databaseName),
GetTablesRequest::setNextToken,
GetTablesResult::getNextToken,
stats.getGetAllTables())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should merge stats couneters for getting views and tables.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you also rename dropDatabase stats counter tdeleteDatabase.

(you can put all the renames in single commit)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same for renameDatabase -> updateDatabase

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And dropTable -> deleteTable

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

batchGetPartitionAsync is not covered with stats from what I see.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dropPartition -> deletePartition

@homar homar force-pushed the homar/track_glue_api_calls branch from 61df66b to fabf601 Compare February 21, 2022 12:43
@homar homar force-pushed the homar/track_glue_api_calls branch from fabf601 to 442a800 Compare February 21, 2022 14:24
Copy link
Member

@findepi findepi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

skimmed, lgtm

@@ -93,13 +92,6 @@ public GlueMetastoreApiStats getGetTable()
return getTable;
}

@Managed
@Nested
public GlueMetastoreApiStats getGetAllViews()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This belongs to previous commit, where the last usage was removed.

@homar homar force-pushed the homar/track_glue_api_calls branch from 442a800 to 017d982 Compare February 22, 2022 08:47
@@ -1126,4 +1130,29 @@ public void revokeTablePrivileges(String databaseName, String tableName, String
{
return ImmutableSet.of();
}

static class StatsRecordingAsyncHandler<Request extends AmazonWebServiceRequest, Result>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like this belongs in GlueMetastoreApiStats

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's change it as a followup :)

@losipiuk losipiuk merged commit 6293d52 into trinodb:master Feb 22, 2022
@github-actions github-actions bot added this to the 372 milestone Feb 22, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

4 participants