Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-46410][SQL] Assign error classes/subclasses to JdbcUtils.classifyException #44358

Closed
wants to merge 15 commits into from

Conversation

MaxGekk
Copy link
Member

@MaxGekk MaxGekk commented Dec 14, 2023

What changes were proposed in this pull request?

In the PR, I propose to raise exceptions with only error classes from JdbcUtils.classifyException, and introduce new error class FAILED_JDBC with sub-classes linked to a particular JDBC operation.

Why are the changes needed?

To improve user experience with Spark SQL by migrating on new error framework when all Spark exceptions from the JDBC datasource have an error class.

Does this PR introduce any user-facing change?

Yes, if user's code depends on exceptions from the JDBC datasource.

How was this patch tested?

By running the affected test suites:

$ build/sbt "test:testOnly *JDBCV2Suite"
$ build/sbt "test:testOnly *JDBCTableCatalogSuite"
$ build/sbt "test:testOnly *JDBCSuite"

Was this patch authored or co-authored using generative AI tooling?

No.

@github-actions github-actions bot added the BUILD label Dec 14, 2023
@MaxGekk MaxGekk changed the title [WIP][SPARK-46410][SQL] Assign error classes/subclasses to JdbcUtils.classifyException [SPARK-46410][SQL] Assign error classes/subclasses to JdbcUtils.classifyException Dec 15, 2023
@MaxGekk MaxGekk marked this pull request as ready for review December 15, 2023 09:48
val tableName = regex.findFirstMatchIn(message).get.group(2)
case 42111 if errorClass == "FAILED_JDBC.CREATE_INDEX" =>
val indexName = messageParameters("indexName")
val tableName = messageParameters("tableName")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good improvement!

@@ -1096,6 +1096,79 @@
],
"sqlState" : "38000"
},
"FAILED_JDBC" : {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FAILED_JDBC -> FAILED_JDBC_OPERATION

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Too long, does _OPERATION bring any benefits?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just because FAILED_JDBC looks confused.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

JDBC means Java Database Connectivity, the connectivity might fails at any time, ops or while establishing connections or while keeping it. For instance, this is real example:

org.postgresql.util.PSQLException: The connection attempt failed.
at org.postgresql.core.v3.ConnectionFactoryImpl.openConnectionImpl(ConnectionFactoryImpl.java:331)
at org.postgresql.core.ConnectionFactory.openConnection(ConnectionFactory.java:49)
at org.postgresql.jdbc.PgConnection.<init>(PgConnection.java:223)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess the connectivity is not changed if create index failed

Copy link
Contributor

@beliefer beliefer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

val newTable = messageParameters("newName")
throw QueryCompilationErrors.tableAlreadyExistsError(newTable)
case "42P07" if pgAlreadyExistsRegex.findFirstMatchIn(sqlException.getMessage).nonEmpty =>
val tableName = pgAlreadyExistsRegex.findFirstMatchIn(sqlException.getMessage)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please make pgAlreadyExistsRegex.findFirstMatchIn(sqlException.getMessage) execute only once

]
}
},
"sqlState" : "40000"
Copy link
Contributor

@srielau srielau Dec 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would be "transaction rollback"
This is all about reaching out to foreign data right?
HV000 FDW-specific condition?

Is it obvious from the context what we are connecting to? Should we add some JDBC level context (URL?)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is all about reaching out to foreign data right?

Right.

HV000 FDW-specific condition?

Seems like, but what do you propose? to get sqlState from SQLExpression?

Is it obvious from the context what we are connecting to?

Should be if an user reads data from one JDBC datasource.

Should we add some JDBC level context (URL?)

I can extract and pass option.url to AnalysisException with new error class FAILED_JDBC but how about other (classified exceptions) like:

          case "42P07" =>
            if (errorClass == "FAILED_JDBC.CREATE_INDEX") {
              throw new IndexAlreadyExistsException(
                indexName = messageParameters("indexName"),
                tableName = messageParameters("tableName"),
                cause = Some(e))

The IndexAlreadyExistsException has its own error class INDEX_ALREADY_EXISTS without any parameters related to JDBC url.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was proposing to replace 40000 with HV004.

This example is interesting. How do we get a 42P07 with FAILED_JDBC errorclass?

Copy link
Member Author

@MaxGekk MaxGekk Dec 16, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was proposing to replace 40000 with HV004.

@srielau I didn't get the logic why HV004 but not HV000? In postgresql for instance, it is related to fdw_invalid_data_type

How do we get a 42P07 with FAILED_JDBC errorclass?

From the doc https://www.postgresql.org/docs/14/errcodes-appendix.html and experiments.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I fat-fingered. Yes HV000

@MaxGekk
Copy link
Member Author

MaxGekk commented Dec 16, 2023

@srielau @cloud-fan I am going to merge this PR soon if you don't mind.

@MaxGekk
Copy link
Member Author

MaxGekk commented Dec 17, 2023

Merging to master. Thank you, @beliefer @srielau for review.

@MaxGekk MaxGekk closed this in 14a933b Dec 17, 2023
@@ -1180,12 +1180,15 @@ object JdbcUtils extends Logging with SQLConfHelper {
}
}

def classifyException[T](message: String, dialect: JdbcDialect)(f: => T): T = {
def classifyException[T](
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a breaking change, isn't it? JDBCDialect is a public developer API.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when was this API added? @beliefer do you know?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we should keep two versions.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And how will you combine exceptions from the both versions?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there are no nested exception here. We pass a default error class to classifyException and if a dialect cannot classify an exception from JDBC driver, it throws the default one otherwise it forms and throws another one. The exception from JDBC driver is added as cause.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So if a dialect can classify an exception, we will lose the error class which is actually worse?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we also enable MiMa for this? I believe this is being skipped because it's under execution package.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So if a dialect can classify an exception, we will lose the error class which is actually worse?

@cloud-fan A dialect can override proposed error class and make our default error class more precise using the driver specific info. And yes, the original error class will be lost.

Can we also enable MiMa for this? I believe this is being skipped because it's under execution package.

@HyukjinKwon I made the modification because MiMa complained. Did you change MiMa to expect different behaviour?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think losing the original error class is bad. We added classifyException for getting more precise java exception type, but I don't think this is useful anymore as we have error classes. My suggestion is to deprecate classifyException, saying that Spark never calls it anymore.

MaxGekk added a commit that referenced this pull request Jan 8, 2024
…ssifyException`

### What changes were proposed in this pull request?
In the PR, I propose to restore `classifyException()` of `JdbcDialect` before the commit 14a933b, and extends `classifyException()` with the error class parameter by `description`:
```scala
  def classifyException(
      e: Throwable,
      errorClass: String,
      messageParameters: Map[String, String],
      description: String): AnalysisException
```

The `description` parameter has the same meaning as `message` in the old version of `classifyException()` which is deprecated.

Also old implementation of `classifyException()` has been restored in JDBC dialects: MySQL, PostgreSQL and so on.

### Why are the changes needed?
To restore compatibility with existing JDBC dialects.

### Does this PR introduce _any_ user-facing change?
No, this PR restores the behaviour prior #44358.

### How was this patch tested?
By running the affected test suite:
```
$ build/sbt "core/testOnly *SparkThrowableSuite"
```
and modified test suite:
```
$ build/sbt "test:testOnly *JDBCV2Suite"
$ build/sbt "test:testOnly *JDBCTableCatalogSuite"
```

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes #44449 from MaxGekk/restore-jdbc-classifyException.

Authored-by: Max Gekk <[email protected]>
Signed-off-by: Max Gekk <[email protected]>
MaxGekk added a commit that referenced this pull request Jan 16, 2024
…or classes

### What changes were proposed in this pull request?
In the PR, I propose to port the existing `classifyException()` method which accepts a description to new one w/ an error class added by #44358. The modified JDBC dialects are: DB2, H2, Oracle, MS SQL Server, MySQL and PostgreSQL.

### Why are the changes needed?
The old method `classifyException()` which accepts a `description` only has been deprecated already by ...

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
By existing integration tests, and the modified test suite:
```
$ build/sbt "test:testOnly *JDBCV2Suite"
```

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes #44739 from MaxGekk/port-jdbc-classifyException.

Authored-by: Max Gekk <[email protected]>
Signed-off-by: Max Gekk <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants