Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adjust logging level to debug #20214

Merged
merged 3 commits into from
Dec 22, 2023
Merged

Adjust logging level to debug #20214

merged 3 commits into from
Dec 22, 2023

Conversation

martint
Copy link
Member

@martint martint commented Dec 22, 2023

They result in very verbose logs in tests and are not actionable for anyone managing a Trino deployment

Release notes

(x) This is not user-visible or is docs only, and no release notes are required.

They result in very verbose logs in tests and are not actionable
for anyone managing a Trino deployment
@cla-bot cla-bot bot added the cla-signed label Dec 22, 2023
Arguments should go on the same line or each on a separate line.
@martint martint requested review from dekimir, losipiuk and dain December 22, 2023 20:08
@@ -251,15 +251,15 @@ synchronized void updateTaskInfo(TaskInfo newTaskInfo)
TaskStatus newRemoteTaskStatus = newTaskInfo.getTaskStatus();

if (!newRemoteTaskStatus.getTaskId().equals(taskId)) {
log.warn("Task ID mismatch on remote task status. Member task ID is %s, but remote task ID is %s. This will confuse finalTaskInfo listeners.",
log.debug("Task ID mismatch on remote task status. Member task ID is %s, but remote task ID is %s. This will confuse finalTaskInfo listeners.",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If these fill the logs, then I don't understand the implicit contract of this method. These mismatched IDs can end up in finalTaskInfo here:

finalTaskInfo.compareAndSet(Optional.empty(), Optional.of(newValue));

... and then any listeners added to this TaskInfoFetcher via addFinalTaskInfoListener will be called with the task ID of some other task. That other ID will be looked up in StagePartition.tasks here:

... which results in action on some other task than where the listener was registered. That can't be right, can it?

Are these warnings perhaps needlessly triggering when the task state is not FINISHED? If so, we can add an additional guard to the if condition.

/cc @losipiuk

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see lots of Future calls to retrieveAndDropSpoolingOutputStats will fail. And actually it makes sense as those are only set for FTE execution, and not for pipelined.
Let's move those logs to debug and enable logging in configuration.
Separatly @dekimir please send out PR to filter out this line for non FTE queries.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK.

And Martin is correct in saying this isn't actionable for Trino administrators, just developers. Perhaps I should've used a check call instead of the log?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

checkargument would be fine here I think. We need to be sure that condition is fine - and will not backfire killing proper queries :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please send out PR to filter out this line for non FTE queries.

#20216

@losipiuk losipiuk merged commit b2e98e9 into trinodb:master Dec 22, 2023
79 of 88 checks passed
@github-actions github-actions bot added this to the 436 milestone Dec 22, 2023
@martint martint deleted the fix-logging branch December 22, 2023 22:26
dekimir added a commit to dekimir/trino that referenced this pull request Dec 22, 2023
In pipelined mode, the spooling stats are always null, which fills the log needlessly.

trinodb#20214 (comment)
dekimir added a commit to dekimir/trino that referenced this pull request Dec 26, 2023
In pipelined mode, the spooling stats are always null, which fills the log needlessly.

trinodb#20214 (comment)
wendigo pushed a commit that referenced this pull request Dec 26, 2023
In pipelined mode, the spooling stats are always null, which fills the log needlessly.

#20214 (comment)
dprophet added a commit to bloomberg/trino that referenced this pull request Jan 22, 2024
* Use correct HostnameVerifier when verify-hostnames set to true

Previously we were using `NoopHostnameVerifier` which turns off hostname verification,
if the HostnameVerifier is not set it uses `DefaultHostnameVerifier` which performs
hostname verification

* Update AWS SDK v1 to 1.12.610

* Update Azure SDK to 1.2.19

* Update grpc to 1.60.0

* Update reactor-core to 3.4.34

* Update JNA to 5.14.0

* Update kafka-clients to 3.6.1

* Update postgresql to 42.7.1

* Update AWS SDK v2 to 2.21.43

* Add example SQL routine for formatting bytes

Co-authored-by: Jan Waś <[email protected]>

* Upgrade Hudi test version to 0.14.0

* Remove product tests targeting Hive 2 or older

* Add a config to toggle local scheduling of Mongo splits

Additionally disable local scheduling of Mongo splits

* Test Iceberg metadata file access for materialized view column queries

* Separate properties from MaterializedViewDefinition

Materialized view properties can be more expensive to retrieve
and are unused except for SHOW CREATE queries. Splitting them
from the main definition reduces file system calls for metadata
queries against the Iceberg connector.

* Move dependency to root pom

Multiple modules depend on org.threeten.bp.Duration.
Move to the root pom.

* Fix GcsFileSystemConfig validation message

Cleanup

* Configure retries for trino-filesystem-gcs

* Replace TableToPartitionMapping with hiveColumnCoercion map

* Unblock driver based on configurable timeout

* Add support for closing idle writers

This change will help in reducing memory
footprint in long-running partitioned
writes. Thus, we can scale more to increase
the write performance.

* Fix incorrect LIKE for patterns with repeating substrings

After the initial KMP mismatch, the next match was being skipped
at position i == longest match.

* Add docs for mongodb.allow-local-scheduling

* Add support for JSON_TABLE

* Add docs for elasticsearch.tls.verify-hostnames

* Revert "Allow differentiating materialized views from tables efficiently"

This needs more though and a different approach that does not involve modifying the
information_schema tables.

This reverts commit 1c5e5eb.

* Add Trino 435 release notes

* [maven-release-plugin] prepare release 435

* [maven-release-plugin] prepare for next development iteration

* Remove redundant code from Kudu TypeHelper.getObject

* Reorder conditions in Kudu TypeHelper

* Remove unnecessary test

CREATE TABLE with supported type is tested in type mapping tests already

* Use default stats column filter for the retrieval of active files

* Disable failing bigquery test

* Fix serialization failure when invoking json_table

The concrete type declared in the JsonTableFunctionHandle is not directly serializable.
The deserializer binding is only declared for the Type class.

* Disallow changing the location of a table in CREATE OR REPLACE  statement

* Flush dictionaries before releasing partitioners

PagePartitioners should either flatten their dictionary mode appenders
into direct mode, or force flush their current page to preserve the
dictionary encoding before being released to the pool for reuse since it
is not possible for the appender to observe the same dictionary input
when used from a different driver and the dictionary appenders do not
accurately report their size and therefore may have been preventing a
flush from occurring up until this point.

Also fixes an issue where dictionary block outputs were severely under
reporting their output size in bytes.

* Ensure "nessie.commit.id" table property is set when updating the table

Spark sets the table property NESSIE_COMMIT_ID_PROPERTY in
NessieTableOperations#loadTableMetadata.
Then NessieIcebergClient.commitTable uses this property.

In Trino, this property is never set but used in
NessieIcebergClient.commitTable as it is a common code.
Hence, the commit id is old and doesn't allow new commits.

Use the common code (available From Iceberg 1.4.0)
NessieUtil.updateTableMetadataWithNessieSpecificProperties in Trino,
which handles setting the property like "nessie.commit.id".

* Make logPaths and listeners final

* Process only files smaller than the threshold for OPTIMIZE

Filter for processing the files which are smaller than
the threshold and consider a partition for optimization
if and only if it contains at least two such files.

* Fix identation

* Fix test comment typo

* Fix negative cost estimates in LocalQueryRunner

Fixes negative cost estimated failure in
`TestQueryPlanDeterminism.testUnionAllAboveBroadcastJoin`.

In `LocalQueryRunner` we were assuming 0 "node count for stats", and
getting negative cost in the formula (somewhere deep in the stats
calculator):

    double cpuCost = buildSideSize * (estimatedSourceDistributedTaskCount - 1);

where `estimatedSourceDistributedTaskCount` is the node count.

* Add guards for 0 #nodes in CBO

Current formulas like

    double cpuCost = buildSideSize * (estimatedSourceDistributedTaskCount - 1);

would produce negative values when node count is 0.
In fact, 0 nodes cannot execute a query, so 0 is not a valid #nodes
estimate.

* Reject negative cost components

No work has negative cost. Previously such negative costs could go
unnoticed, or noticed only if explain plan is produced.

* Remove unused field

* Make DistributedQueryRunner.withTracing idempotent

* Log finalQueryInfo when FTE scheduler stalled

* Increase GCLocker retries for Maven

The build was observed failing with

    Warning: [325.033s][warning][gc,alloc] mvn-builder-trino-parquet: Retried waiting for GCLocker too often allocating 4194306 words

on CI.

* Fix table listing concurrency in FileHiveMetastore

Before the change, `FileHiveMetastore` would rely on
`fileSystem.listFiles` to find schemas or tables within a schema.
`listFiles` is recursive and, on a local file system, fails when
files/directories are being modified concurrently. This commit replaces
recursive `listFiles` listing with non-recursive `listDirectories`
leveraging the fact that listed entities are represented as directories.

* Increase timeout for TestWindow.testManyFunctionsWithSameWindow

The test sometimes failed due to a timeout on CI.

* Adjust BigQuery CI profile

* Rearrange candidate-host logic in bin-packing allocator

Results should be identical, but the new code is easier to extend for failover.

* Factor out dropCoordinatorsIfNecessary

* Update requirement to Oracle 19

* Reduce repetitions of testCreateFileRetry

* Convert PTL tests to JUnit

Convert product tests launcher's test to JUnit. This does not change
product tests themselves.

* Remove TestNG from test classpath

Remove TestNG from classpath of modules that do not use it anymore.

* Migrate trino-record-decoder off TestNG assertions

* Migrate trino-prometheus off TestNG assertions

* Update flyway to 10.3.0

* Update RoaringBitmap to 1.0.1

* Update redshift-jdbc42 to 2.1.0.24

* Update metrics-core to 4.2.23

* Update nimbus-jose-jwt 9.37.3

* Update oauth2-oidc-sdk to 11.8

* Update AWS SDK v1 to 1.12.618

* Update AWS SDK v2 to 2.21.45

* Update plexus-xml to 4.0.3

* Use sf1 table for testMultipleWritersWithSkewedData

* Add example for listagg with filter

* Add example SQL routines for charts

* Add comment why disabling pushdown on MariaDB REAL type

* Expand catalog routine docs

Add explicit mention what connector support storage so readers dont have
to go looking through all connectors to find out.

* Document filtering during aggregation for `listagg`

* Randomize identifiers in BaseSqlServerConnectorTest

* Fail with proper error on overflow in from_unixtime

The exception is due to the value user entered hence should be reported as a TrinoException rather than a GENERIC_INTERNAL_ERROR

* Fix page partitioner output bytes on release handling

* Inline pom property for test profiles

When defining test profiles, we don't use properties.

* Make createExclusive atomic

`GcsTransactionLogSynchronizer` requires that the new transaction log file
is created exclusively and atomically. Otherwise concurrent readers of a
table may see a file partially written (or not written yet, just empty)
and fail. This commit:

- fixes the GCS native filesystem implementation so that it's atomic
- changes the method signature to indicate atomic creation and
  remove default not atomic implementation.
- makes it clear in-memory buffering occurs (previously it was
  implicitly done in `HdfsOutputFile` which could be considered
  surprising)
- in `AbstractTestTrinoFileSystem` decouples "is create() exclusive" and
  "supports createExclusive" behaviors. For example local filesystem has
  the former, GCS filesystem has both and S3 filesystem has none.

* Remove redundant classloader wrapper from exclude_columns

Engine-provided functions do not need
`ClassLoaderSafeConnectorTableFunction` wrapper.

* Remove redundant classloader wrapper from sequence

Engine-provided functions do not need
`ClassLoaderSafeConnectorTableFunction` wrapper.

* Remove redundant air.test.parallel property

* Fix ConstantExpression serialization

When `ConstantExpression` was created with a Trino type and a value
that didn't match this type's java type (for example Trino INTEGER with
value being Integer, instead of Long), the `ConstantExpression` would
fail to serialize to JSON. Such failure happens during sending task
status updates to workers and is currently logged and ignored, leading
to query hang. The problem could be triggered with SQL routines, which
utilize `RowExpression` (including `ConstantExpression`) serialization.

Thus, this fixes execution of SQL routines involving Row field
dereference.

* Fix TestAccessControl execution

The test must run single-threaded and was run like that until
9a7cf10.

* Add tracing for query start

Query start may be time consuming operation and should be traced.
Especially in case of `DataDefinitionExecution`, the start may cover all
of the query execution.

* Fix code indentation

* Remove redundant PER_CLASS & CONCURRENT annotations

Tests extending `AbstractTestQueryFramework` inherit `PER_CLASS`
lifecycle and `CONCURRENT` behavior and that should be expected given
how `AbstractTestQueryFramework` works. No need to define that in
subclasses, and indeed most subclasses do not define these behaviors
again.

* Remove unused field from TestIcebergMetadataListing

* Factor out common code in TestUniformNodeSelector

* Warn about unexpected task updates

We observed duplicate attempts to drop spoolingOutputStats in some logs.  These warnings will help diagnose what's going on.

* Add example SQL routine for formatting topn results

* Fix incorrectly concurrent tests

Some tests being fixed simply fail because they do not support
concurrent test method execution.
Others worked, but could allocate lots of resources, potentially leading
to OOM failures on CI.

The tests where found by looking at
9a7cf10 commit:

    git show 9a7cf10 | grep -A 3 '@test(singleThreaded = true)'

* Set up only one concurrent query runner in TestQueryManager

`TestQueryManager` defined a default `queryRunner` used by one test
method, but other test methods created additional runners. This commit
moves the default one into the only test method that used it. In the
result, there is at most 1 query runner at the same time created during
test execution.

* Add tracing for SplitManager.getSplits

Sometimes, `ConnectorSplitManager.getSplits` can take long to construct
`ConnectorSplitSource`. For example, in Delta, there is IO work being
done before `ConnectorSplitSource` is returned. This work would better
be delayed until `ConnectorSplitSource.getNextBatch` is invoked, but
currently this is not the case. Let's add tracing so that time spent in
`ConnectorSplitManager.getSplits` is attributable.

* Rename parameter in helper method

The parameter refers to "a stage span", not to the top level "query
span".

* Hide FTE tests configs

Failure injection is a for-tests functionality. Hide its config, they
are not meant to be used.

* Fix rendering of time/timestamp with timezone type name

* Update hudi test resource for hudi_non_part_cow table

The change is to make the schema sync with hudi_cow_pt_tbl.

* Optional check for query partition filter for Hudi

* Throw IndexOutOfBoundsException in InputStream implementations

* Catch exception in sendUpdate

Catch exception in sendUpdate. Currently in case we get unexpected
exception from that method query would fail as thread responsible for
updating task state will terminate.

* Restore lifecycle annotations in tests

They are not redundant, as the tests contain lifecycle annotations
such as BeforeAll and AfterAll.

The fact that AbstractTestQueries also has them is an implementation
detail to manage its own private internal state, and subclasses
should not rely on them being present.

This reverts commit f6ea5b4.

* Adjust logging level to debug

They result in very verbose logs in tests and are not actionable
for anyone managing a Trino deployment

* Fix formatting

Arguments should go on the same line or each on a separate line.

* Remove unnecessary calls to toString()

* Remove unnecessary simplification step

This step is not needed anymore as
corresponding tests do not fail.

* Do not generate redundant straddling predicates by equality inference

When there are predicates like a1 = b1 and a2 = a1 + 1, then
equality inference would derive staddling predicate for
a2 = b1 + 1, which is redundant to a1 = b1, a2 = a1 + 1.
This commit makes sure that redundant straddling predicates
are not generated.

* Update JLine to 3.25.0

* Fix SQL Server DATETIMEOFFSET for old dates

The value returned via the microsoft.sql.DateTimeOffset when
converted to an OffsetDateTime is changed for old dates due
to an issue in the JDBC driver.

This changes retrieving datetimeoffset types from SQL Server
to use getString instead of getObject and OffsetDateTime.

Predicate pushdown is now disabled for this type due to test
failures with IS NOT DISTINCT FROM predicates if the value
is before the year 1583.

* Only log null spooling stats in FTE mode

In pipelined mode, the spooling stats are always null, which fills the log needlessly.

trinodb#20214 (comment)

* Fix failing test on master

* Use explicit values for MaxResults in all Glue APIs

Most listing Glue APIs accept a MaxResults parameter which decides how
many objects are returned in a single API call. The default values are
not documented but observed behaviour is that the default values change
on some basis.

This commit adds explicit values for MaxResults in the APIs which
support it to the maximum possible values.

This possibly reduces the number of Glue calls when listing tables,
databases or functions in some cases.

This is similar to 4f22b0e and
45dc37d.

* Add Delta table read version to connectorInfo

* Fix case in RN section header

* Refactor IcebergSplitSource to reduce per split operations

We can apply pruning logic at file level before splitting it into
splits to avoid redundant operations per split

* Extract pruneFileScanTask in IcebergSplitSource

* Avoid unnecessary string to slice conversion

* Assert position count of page instead of individual blocks

* Refactor test class to be able to host eventually multiple tests

* Short circuit page source in case of partition mismatch

In case that the dynamic filter completes after scheduling of split
on the worker, the results in the split will be getting pruned in
the situation that there is a partition predicate mismatch.

* Strip the partition columns domains from the effective predicate

* Unignore re2j dependency conflicts

`trino-re2j` is a fork of `com.google.re2j:re2j`. The forks diverged
over time. It is not safe to ignore this dependency conflict e.g.
because `com.google` version has `com.google.re2j.Pattern.matcher`
accepting `CharSequence` or `byte[]` while Trino version accepts only
`Slice`. They cannot be used interchangeably at all.

* Prune dependency management for alluxio-shaded-client

Last use of this dependency was removed in
15cc6e5, while the main use was removed
probably in 59d4859.

* Put related modernizer exclusions together

* Remove unused test helper method

* Mark wether a PlanFragement is contains TableScanNode or not

There may be a little bit performance issue while response for api of cluster stats, if cluster has huge amounts of tasks.
Most of time was spent on distinguish the type of fragements temporarily.

* fix ST_Centroid and ST_Buffer for tiny geometries
  backporting from https://github.com/prestodb/presto/pull/13323/files

* Use the effective predicate when doing partition matching

Use the effective predicate instead of the dynamic filter predicate
to check for partition matching.
This change results in short circuiting the page source and not
having to read anymore the data file footer in the exotic case
where a partition filter acts as unenforced
predicate due to table partition spec evolution.

* Remove Deprecated annotation from parquet.ignore-statistics

Deprecation was originally intended for parquet.fail-on-corrupted-statistics
which is no longer in the code base.
Reading files with incorrect/corrupted statistics by ignoring statistics
is a valid use case and safe to use.

* Add parquet_ignore_statistics session property to iceberg

* Add parquet_ignore_statistics session property to delta lake

* Add documentation for parquet.ignore-statistics

* Update docs for java 17.0.5

* Increase threshold for stalled FTE scheduler logging

* Decouple debug logging frequency and scheduler stalled time

Count separately how much time passed since FTE scheduler processed a
meaningful event and how much time passed since debug information was
last logged.

* Sometimes EventListenerMangager.queryCompleted() called too early

For happy path and parsable queries, we have the following order of calls:
- query is added to the QueryTracker
- EventListener.queryCompleted() is called

For really bad queries, the ones that do not even parse to a proper SQL, the order is inverted.
As a result, when implementing an EventListener that wants to access query detailed information
we have to use a workaround for this edge case and artificially invert the sequence of calls.
EventListener will typically use a retry mechanism combined with switching to a separate thread,
which adds unnecessary complexity and multi-threading to a code that otherwise would be
single-threaded.

* Update to Iceberg 1.4.3

This e.g. fixes data loss on retried commit under some circumstances.

* Add scaffolding for unit testing dynamic filtering

* Short circuit page source in case of partition mismatch

In case that the dynamic filter completes after scheduling of split
on the worker, the results in the split will be getting pruned in
the situation that there is a partition predicate mismatch.

* Fix TestRedshiftTypeMapping

* Add DomainUserDefinedPredicate#toString

* Remove unnecessary forceTestNgToRespectSingleThreaded

* Rename io.trino.plugin.hive.parquet.TestParquetReader

* Rename TestParquetReaderMemoryUsage

* Fix handling of pruned parquet row groups with column indexes

Filters on multiple columns with page indexes may yield non-overlapping row ranges
and eliminate a parquet row group.
In this scenario, the reader is fixed to advance to the next non-pruned row group in
the split instead of stopping early and missing rows from the remaining row groups

* Upgrade required JVM version to 17.0.8

Previous versions are affected by https://bugs.openjdk.org/browse/JDK-8294677

* Update AWS SDK v1 to 1.12.630

* Update AWS SDK v2 to 2.22.10

* Update JDBI to 3.43.0

* Update flywaydb to 10.4.1

* Update Jetty to 11.0.19

* Update swagger to 2.2.20

* Update datasketches-java to 5.0.1

* Update keycloak to 23.0.3

* Update databricks-jdbc to 2.6.36

* Update byte-buddy to 1.14.11

* Update javassist to 3.30.2-GA

* Update MariaDB driver to 3.3.2

* Update s3mock-testcontainers to 3.3.0

* Updater oauth2-oidc-sdk to 11.9

* Update checker-equal to 3.42.0

* Remove redundant ? from SHOW COLUMNS in SqlBase.g4

* Specify table name in testPartitionProjectionIgnore

* Fixes failure when building trino-server alone

Added a dummy test in trino-server

* Increase stale operations count

- Currently uses default 30 and thats too low
- Also update to latest version of the action
- Also configure to only process PRs

* Allow union data to conform to smaller union

HIVE/AVRO: It is possible for data that is written using a 3 element
union to be read with a 2 element union provided that either all
data types can be coerced (already possible) or the offending data
type(s) isn't present. This change delays all type errors to read time
to allow more type leniency.

* Update oshi-core to 6.4.10

* Update grpc-bom to 1.60.1

* Update joda-time to 2.12.6

* Update arrow to 14.0.2

* Detect leaked containers when running with JUnit

With TestNG, `ManageTestResources` was attempting to prevent resource
leaks, including container leaks, in tests. It relied on certain common
test patterns to operate (like storing resource on instance fields).

This commit attempts to provide similar functionality for JUnit. For now
it's limited to containers. As an added bonus, it works regardless of
how the test class is written.

* Return empty table list in Iceberg REST Catalog when invalid namespace is provided

* Fix typo in stalebot config

* Require JDK 21 to run Trino

* Use isDynamicFilterFunction in ConnectorExpressionTranslator

* Document encryption of FTE spool

* Cache result of evaluating constraint per partition in iceberg

* Reduce test boilerplate

* Remove redundant variable

* Unify table type switch handling

* Remove obsolete table property checks

These were made obsolete by commit
62acd1b.

* Don't fail Iceberg on `$partitions` table name

Like in Hive connector, return table not found instead of exception when
querying `$partitions` (not prefixed with a table name).

* Simplify IcebergTableName exception handling

Some names were rejected with `TrinoException` and some with graceful
fallback (`return Optional.empty`). Since name validity is now checked
by `IcebergTableName.isIcebergTableName` we do not need to be lax in all
other methods.

* Improve argument validation message

* Define tables required by AbstractTestAggregations

* Test hive metastore operations involving retries

* Ignore TableNotFoundException on retries when dropping the table

* Add trino-opensearch plugin

This is a preparatory step: opensearch plugin is an exact copy of elasticsearch plugin

* Rename classes and drop Elasticsearch tests

* Rename config properties to opensearch

Also cleanup defunct configs

* Rename connector factory to opensearch

* Rename exported JMX beans to match package names

* Update lucene analyzer to 9.7.0

* Migrate OpenSearch plugin to OpenSearch java client

Add tests for both OpenSearch 1.x and 2.x compatibility

* Enable testing of trino-opensearch module

* Add OpenSearch connector docs

* Add testing coverage against latest OpenSearch server

* Move OpenSearch plugin to it's own package

* Fix Javadoc for largeInValuesCountData

* Fix remaining Elasticsearch references

* Cleanup opensearch plugin pom

* Migrate to official OpenSearch testcontainers

* Drop legacy full-text search support

* Fix formatting

* Extract version embedding to a helper

* Extract query executor binding to a method

* Embed version in scheduler stacktraces

* Fix code indentation

* Add some traceability which option is failing

* Move extractVariables utility

* Pull RewriteComparison.ComparisonOperator as top level class

* Implement complex join pushdown in JDBC connectors

Implement non-deprecated `ConnectorMetadata.applyJoin` overload in
`DefaultJdbcMetadata`. Thew old implementation is retained as a safety
valve. The new implementation is not limited to the
`List<JdbcJoinCondition>` model, so allows pushdown of joins involving
more complex expressions, such as arithmetics.

The `BaseJdbcClient.implementJoin` and
`QueryBuilder.prepareJoinQuery` methods logically changed, but the old
implementation is left as the fallback. These methods were extension
points, so the old implementations are renamed to ensure implementors
are updated. For example, if an implementation was overriding
`BaseJdbcClient.implementJoin` it most likely wants to override the new
`implementJoin` method as well, and this is reminded about by rename of
the old method.

* Re-add expression-less join pushdown as fallback

Restore older JDBC join pushdown implementation not based on
`ConnectorExpression` as a fallback.

This comes as a separate commit so that the introduction of
`ConnectorExpression`-based join pushdown can be seen (e.g. reviewed) as
a _change_, not as an _addition_.

* Fix auto formatting of throwing blocks

`{ throw ...; }` lambdas read very well. However, all recent IntelliJ
versions insist on removing whitespace before `throw` and after final
`;`. This results in checkstyle violation when IDE automatic formatting
is used.

Reformat lambdas in a way that IDE automatic formatting does not violate
checkstyle rules, in order to save time when working with code.

* Iterate over keys only when only keys needed

* Remove redundant and misnamed updatePartitionStatistics "overload"

Due to recent changes,
`updatePartitionsStatistics` was denoting update of single partition (despite looking plural)
`updatePartitionStatistics` was denoting update of multiple partitions (despite looking singular)

This commit removes `updatePartitionsStatistics`, redirecting all usages
to the other method.

* Fix cast time w/ tz exception message

* Support translation of JsonPathType to ConnectorExpression

* Move Vale vocabulary to conform to Vale v3.0.0

* Document single quote escape

* Give Hive ACID PT suite a name

`suite-8-non-generic` runs only Hive ACID tests. Since these tests are
very long and more flaky than others, no other tests will be added to
this suite. Rename it.

The `hdp3_only` test group verifies environment setup, and as such is
redundant, removed.

* Remove Hive transactional tests from suite1

No need to run these tests in Suite1. They are covered in a different
suite.

* Reduce Hive transactional coverage in suites

Hive transactional tables are tested as part of the STORAGE_FORMATS
group to ensure test coverage of transactional tables with various
metastore and HDFS setups (kerberized or not, impersonation or not).

Even so, this coverage can be reduced. We do not need to test e.g.
various bucketing versions.

* Correct FTE encryption detail in docs

* Clean up OpenSearch connector docs

* Clean up Query Management doc

* Add note about Nessie catalogs for views

* Fix map assertion in TestLazyMap

`org.testng.Assert.assertEquals(Map, Map)` does not really check  map
equality when the actual map contains null values.

Migrate to AssertJ and fix the test.

* Remove redundant generics in expression mapping

* Rename rewrite continuation interface

Prepare for adding another

* Prevent mutable state leak in expression mapping

* Support conditional rewrites

Add support for conditional rewrites in
`JdbcConnectorExpressionRewriterBuilder`.  This allows merging
`PostgreSqlClient`'s two different rewrite objects into one.

* Inline function calls in hot path

* Revert "Fix auto formatting of throwing blocks"

This reverts commit fd048d0.

* Fix TestTrinoGlueCatalog.testListTables

Make it use unique namespace names and add cleanup.

* Migrate off testng assertions in TestHiveFileFormats

* Add support iceberg parquet predicate pushdown with column id

* Upgrade Elasticsearch client to 7.x

This removes support for Elasticsearch 6.x and Opensearch 1.x while introducing support and test coverage for 8.x.

Both http security and user authentication are enabled by default.

* Cleanup elasticsearch pom

* Fix impacted features inclusion in product tests

In few suites `configured_features` was misspelled as
`configured-features`.

* Reformat array initializers

Prepare for upcoming airbase change requiring exactly one space in array
initializers. This is to help ensure that the code that passes CI does
not change upon automatic formatting.

* Remove redundant lambda braces

* Fix sentence in SQL routines docs

* Add Snowflake JDBC Connector (#11)

Had to redo the connector because all the rebases caused havoc

* Update the github CI (#12)

* Add Snowflake JDBC Connector

* Add snowflake in the ci

* Various style fixes and cleanup (#15)

* Various style fixes and cleanup (#15) (#17)

Co-authored-by: Martin Traverso <[email protected]>

* Update according to reviews 11/01/2024

---------

Co-authored-by: praveenkrishna <[email protected]>
Co-authored-by: Mateusz "Serafin" Gajewski <[email protected]>
Co-authored-by: Manfred Moser <[email protected]>
Co-authored-by: Jan Waś <[email protected]>
Co-authored-by: Y Ethan Guo <[email protected]>
Co-authored-by: Alex Jo <[email protected]>
Co-authored-by: Elon Azoulay <[email protected]>
Co-authored-by: Bhargavi Sagi <[email protected]>
Co-authored-by: Gaurav Sehgal <[email protected]>
Co-authored-by: Martin Traverso <[email protected]>
Co-authored-by: kasiafi <[email protected]>
Co-authored-by: Yuya Ebihara <[email protected]>
Co-authored-by: Marius Grama <[email protected]>
Co-authored-by: James Petty <[email protected]>
Co-authored-by: ajantha-bhat <[email protected]>
Co-authored-by: chenjian2664 <[email protected]>
Co-authored-by: Piotr Findeisen <[email protected]>
Co-authored-by: Łukasz Osipiuk <[email protected]>
Co-authored-by: Dejan Mircevski <[email protected]>
Co-authored-by: dahn <[email protected]>
Co-authored-by: Athul T R <[email protected]>
Co-authored-by: Naoki Takezoe <[email protected]>
Co-authored-by: Vikash Kumar <[email protected]>
Co-authored-by: Jonas Irgens Kylling <[email protected]>
Co-authored-by: Karol Sobczak <[email protected]>
Co-authored-by: Adam J. Shook <[email protected]>
Co-authored-by: Ashhar Hasan <[email protected]>
Co-authored-by: Raunaq Morarka <[email protected]>
Co-authored-by: XuPengfei-1020 <[email protected]>
Co-authored-by: Chenren Shao <[email protected]>
Co-authored-by: Jaeho Yoo <[email protected]>
Co-authored-by: Dominik Zalewski <[email protected]>
Co-authored-by: Fokko Driesprong <[email protected]>
Co-authored-by: Jack Klamer <[email protected]>
Co-authored-by: Amogh Jahagirdar <[email protected]>
Co-authored-by: Will Morrison <[email protected]>
Co-authored-by: Slawomir Pajak <[email protected]>
Co-authored-by: yanxiangqin <[email protected]>
Co-authored-by: Assaf Bern <[email protected]>
Co-authored-by: Star Poon <[email protected]>
Co-authored-by: Heltman <[email protected]>
Co-authored-by: sheajamba <[email protected]>
Co-authored-by: Erik Anderson <[email protected]>
everypp pushed a commit to everypp/trino that referenced this pull request Jan 30, 2024
In pipelined mode, the spooling stats are always null, which fills the log needlessly.

trinodb#20214 (comment)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

3 participants