Allow scans with the number of partitions exceeding the limit #14225

arhimondr · 2022-09-20T23:55:09Z

Description

Treat such scans as scans over non partitioned tables without applying partition pruning. This is needed to avoid storing too much information in HiveTableHandle.

Non-technical explanation

Allow scanning more partitions than the limit per scan

Release notes

( ) This is not user-visible or docs only and no release notes are required.
(x) Release notes are required, please propose a release note for me.
( ) Release notes are required, with the following suggested text:

# Section
* Fix some things. ({issue}`issuenumber`)

arhimondr · 2022-09-21T01:30:55Z

An attempt to remove the limitation on the number of partitions per scan. This change doesn't remove the setting defining the limit, instead when the number of partitions is higher than the limit the system will try to avoid loading partitions eagerly. Certain optimizations are not possible (such as partition pruning) when partitions cannot be loaded inmemory.

This change also changes how HiveSplitSource works to minimize memory footprint when the number of partitions is high.

@electrum @findepi @Praveen2112 Could you please take a look and let me know what you think?

raunaqmorarka · 2022-09-21T04:44:04Z

hive.max-partitions-per-scan could be used by admins to block queries which scan a huge number of partitions. If we're going to allow running such queries now without partition pruning, that totally changes the meaning of that config and it might not be the behaviour that an admin wants.
Could we define a separate config for the threshold on the number of partitions after which we want to avoid loading partitions eagerly and use that to cap memory usage ? If someone wants to allow queries on a large number of partitions, they can raise the limit on hive.max-partitions-per-scan in their config.

arhimondr · 2022-09-21T16:23:00Z

The hive.max-partitions-per-scan setting was added in 2016 by @electrum to avoid memory pressure on coordinator. This change resolves the problem with memory pressure by avoiding enumerating partitions eagerly in one shot. Since the underlying problem is resolved I wonder whether it is necessary to have a configured limit? Though I agree that the property name may need to be changed and I'm open for suggestions.

hashhar · 2022-09-21T18:09:11Z

One practical way this config is used to prevent people from running queries which scan too many partitions (to prevent costs and to force users to add predicates on partition columns to get efficient queries).

For cost there's the better option now with query.max-scan-physical-bytes (and query_max_scan_physical_bytes session property) - it was added in 339/341.

I don't think someone would still actually want to enforce limits based on number of partitions since that's very arbitrary but I agree with @raunaqmorarka we should not re-purpose existing configs.

Ideally we can mark existing config as @DefunctConfig so that proper error is thrown for people who have that set and we can add a new config if we want for new behaviour to act as a kill switch for some time while the new impl is proven out.

findepi · 2022-09-21T19:48:12Z

I don't think someone would still actually want to enforce limits based on number of partitions since that's very arbitrary

Looking at configs like hive.query-partition-filter-required, the hive.max-partitions-per-scan is even more reasonable as a safety net preventing queries from accessing "too much"

cc @JamesRTaylor

we should not re-purpose existing configs.

Not like this, yes

Ideally we can mark existing config as @DefunctConfig so that proper error is thrown for people who have that set

Do we have any migration path? Do we need any?

raunaqmorarka · 2022-09-23T05:19:22Z

For cost there's the better option now with query.max-scan-physical-bytes

query.max-scan-physical-bytes is useful but the problem with it is that the limit will be enforced after the specified amount of data has been already read. This would potentially waste a lot of resources before killing the query. hive.max-partitions-per-scan has the advantage of stopping the query before we spend worker resources to scan the data.

The hive.max-partitions-per-scan setting was added in 2016 to avoid memory pressure on coordinator

I don't think users take into account the intent of the original author when using a config. In this case there is no documentation or description available to see that this had anything to do with memory. So anyone already using it can't possibly know that this was not meant to be used as a way to block queries touching a large number of partitions.

I think we should look into deprecating and removing hive.max-partitions-per-scan as a separate issue if we're convinced that it's not useful anymore. Maybe start a discussion in slack to see if there are ppl relying on its current behaviour.
We can have a new config for switching to lazy loading of partitions beyond some threshold and assume that those who want to query a large number of partitions will raise the limit on hive.max-partitions-per-scan to a larger number to get past the errors thrown by that.

I'm also wondering if this trade-off of reducing coordinator memory usage but giving up on partition pruning makes sense. If someone wanted to run queries on a large number of partitions and the coordinator memory was a limiting factor, why wouldn't they get a bigger coordinator instead of incurring higher cost of running query without partition pruning ? It seems cheaper to upgrade 1 node rather than consume a lot more resources on workers.

Is it possible that we can still prune splits on the workers using the predicate on partitioned columns ? E.g. for dynamic partition pruning we have HivePageSourceProvider#shouldSkipSplit, could we fallback to similar split pruning for static predicates as well when partition pruning on coordinator was skipped to keep memory usage low ?

arhimondr · 2022-09-23T19:06:30Z

Thanks everybody for the feedback.

I updated the PR preserving the hive.max-partitions-per-scan. The hive.max-partitions-per-scan property can still be used to enforce the limit on the maximum number of partitions.

Instead I introduced a new property, hive.max-partitions-for-eager-load. This property controls how many partitions can be loaded eagerly on coordinator and it is set to 100000 by default.

This PR also makes it possible to scan tables that exceed the value set by hive.max-partitions-for-eager-load. It is done by refactoring the HiveSplitManager to avoid loading partitions eagerly. However when the number of partitions is higher some optimizations (such as filter pushdown) will not be performed. While it is not ideal (and we should probably improve it) this PR is an incremental improvement allowing the engine to be used on tables with high number of partitions, although with potentially reduced efficiency.

Please take an another look.

Praveen2112 · 2022-09-30T12:14:34Z

plugin/trino-hive/src/main/java/io/trino/plugin/hive/HiveMetadata.java

@@ -2722,7 +2722,9 @@ public OptionalLong executeDelete(ConnectorSession session, ConnectorTableHandle
            metastore.truncateUnpartitionedTable(session, handle.getSchemaName(), handle.getTableName());
        }
        else {
-            for (HivePartition hivePartition : partitionManager.getOrLoadPartitions(metastore, handle)) {
+            Iterator<HivePartition> partitions = partitionManager.getPartitions(metastore, handle);


Do we skip the partition check during DELETE

Yeah, I don't think there's a real reason to limit the number of partitions for delete. Also the property says "scan".

Praveen2112 · 2022-09-30T12:15:54Z

plugin/trino-hive/src/main/java/io/trino/plugin/hive/HiveSplitManager.java

@@ -221,9 +227,6 @@ public ConnectorSplitSource getSplits(
        // validate bucket bucketed execution
        Optional<HiveBucketHandle> bucketHandle = hiveTable.getBucketHandle();

-        // sort partitions


Don't we have to sort them ?

It looks like it is not necessary. It could be that case that the partitions were sorted for debugging convenience reasons. @electrum / @dain do you remember why could it be necessary?

Praveen2112 · 2022-09-30T12:34:38Z

plugin/trino-hive/src/main/java/io/trino/plugin/hive/BackgroundHiveSplitLoader.java

+        addLoaderIfNecessary();
+    }
+
+    private void addLoaderIfNecessary()


nit: Can we extract it as a preparatory commit

What specifically? The commit is to "Avoid loading partitions eagerly in HiveSplitManager" and the changes to the BackgroundHiveSplitLoader are necessary to achieve that

Praveen2112 · 2022-09-30T12:40:20Z

plugin/trino-hive/src/main/java/io/trino/plugin/hive/HiveConfig.java

@@ -70,7 +70,7 @@
    private boolean singleStatementWritesOnly;

    private DataSize maxSplitSize = DataSize.of(64, MEGABYTE);
-    private int maxPartitionsPerScan = 100_000;
+    private int maxPartitionsPerScan = 1_000_000;


Won't it impact the memory. IIRC this check was added to restrict the memory usage by HivePartition which internally has a Map for the partition keys.

cc: @findepi / @electrum

The Avoid loading partitions eagerly in HiveSplitManager improves split enumeration to avoid parsing / loading partitions eagerly. The memory should no longer be of concern. The property now is mostly for system administrators to prevent unnecessary large scans.

electrum · 2022-10-05T22:34:15Z

plugin/trino-hive/src/main/java/io/trino/plugin/hive/BackgroundHiveSplitLoader.java

+        ReentrantLock lock = new ReentrantLock();
+        lock.lock();
+        try {
+            executor.execute(() -> checkState(!lock.isHeldByCurrentThread(), "executor is a direct executor"));


Do we actually need the lock? This seems to be dependent on the fact that the exception is thrown in the calling thread, so we could simply do

executor.execute(() -> { throw new IllegalArgumentException("executor is a direct executor"); });

I thought about that. I'm worried that if an executor is configured to handle uncaught exceptions (for example log them) it will create unnecessary noise in the output.

plugin/trino-hive/src/main/java/io/trino/plugin/hive/HiveConfig.java

arhimondr · 2022-10-06T10:31:31Z

Updated

To have the hive.max-partitions-per-scan property only as a limit

Chaho12 · 2023-04-27T12:04:40Z

Thx for very interesting feature. We were expecting this kind of feature to limit maximum number of partitions.

https://trinodb.slack.com/archives/CP1MUNEUX/p1646707920591449?thread_ts=1645502073.803079&cid=CP1MUNEUX

Before this, we were using Bytebuddy to Hook HivePartitionManager class to prevent users from throwing abusing queries and slowing cluster by having limitations on maximum number of partitions they can query in single query.

We simply check and return error if number of partitions read is greater than config for specific tables only (tables which we are sure that has HUGE amount of data per partitions)

arhimondr requested review from electrum, findepi and Praveen2112 September 20, 2022 23:55

cla-bot bot added the cla-signed label Sep 20, 2022

github-actions bot added the tests:hive label Sep 20, 2022

arhimondr force-pushed the remove-partition-limit branch from 67f4ba4 to 28bcba0 Compare September 21, 2022 01:25

arhimondr force-pushed the remove-partition-limit branch from 28bcba0 to 3148f77 Compare September 23, 2022 18:59

Praveen2112 reviewed Sep 30, 2022

View reviewed changes

findepi mentioned this pull request Oct 4, 2022

Do not cache Hadoop LocatedFileStatus objects #14408

Merged

electrum reviewed Oct 5, 2022

View reviewed changes

electrum approved these changes Oct 6, 2022

View reviewed changes

plugin/trino-hive/src/main/java/io/trino/plugin/hive/HiveConfig.java Show resolved Hide resolved

arhimondr force-pushed the remove-partition-limit branch from 3148f77 to 1161609 Compare October 6, 2022 10:31

arhimondr added 3 commits October 6, 2022 12:45

Avoid loading partitions eagerly in HiveSplitManager

a5167b7

Add max-partitions-for-eager-load property

c86df57

To have the hive.max-partitions-per-scan property only as a limit

Increase hive.max-partitions-per-scan to 1M

f22436d

arhimondr force-pushed the remove-partition-limit branch from 1161609 to f22436d Compare October 6, 2022 16:58

arhimondr merged this pull request into trinodb:master Oct 7, 2022

arhimondr deleted the remove-partition-limit branch October 7, 2022 01:17

This was referenced Oct 11, 2022

Add Trino 400 release notes #14556

Merged

Document hive.max-partitions-for-eager-load configuration property #14599

Merged

This was referenced Feb 14, 2023

Remove assertion restricting hive.max-partitions-per-scan #16111

Merged

Apply hive.max-partitions-for-eager-load after applying predicate #16113

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow scans with the number of partitions exceeding the limit #14225

Allow scans with the number of partitions exceeding the limit #14225

arhimondr commented Sep 20, 2022

arhimondr commented Sep 21, 2022

raunaqmorarka commented Sep 21, 2022

arhimondr commented Sep 21, 2022

hashhar commented Sep 21, 2022 •

edited

Loading

findepi commented Sep 21, 2022

raunaqmorarka commented Sep 23, 2022

arhimondr commented Sep 23, 2022

Praveen2112 Sep 30, 2022

arhimondr Oct 4, 2022

Praveen2112 Sep 30, 2022

arhimondr Oct 4, 2022

Praveen2112 Sep 30, 2022

arhimondr Oct 4, 2022

Praveen2112 Sep 30, 2022

arhimondr Oct 4, 2022

electrum Oct 5, 2022

arhimondr Oct 6, 2022

arhimondr Oct 6, 2022

arhimondr commented Oct 6, 2022

Chaho12 commented Apr 27, 2023

Allow scans with the number of partitions exceeding the limit #14225

Allow scans with the number of partitions exceeding the limit #14225

Conversation

arhimondr commented Sep 20, 2022

Description

Non-technical explanation

Release notes

arhimondr commented Sep 21, 2022

raunaqmorarka commented Sep 21, 2022

arhimondr commented Sep 21, 2022

hashhar commented Sep 21, 2022 • edited Loading

findepi commented Sep 21, 2022

raunaqmorarka commented Sep 23, 2022

arhimondr commented Sep 23, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

arhimondr commented Oct 6, 2022

Chaho12 commented Apr 27, 2023

hashhar commented Sep 21, 2022 •

edited

Loading