Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add an optional JDBC connector type mapping to varchar #186

Closed
wants to merge 1 commit into from

Conversation

guyco33
Copy link
Member

@guyco33 guyco33 commented Feb 7, 2019

Add optional catalog parameters that map Jdbc types to varchar:

jdbc-types-mapped-to-varchar = jdbcTypeName[,jdbcTypeName[,...]]

Now catalogs can control whether or not to fetch unsupported Presto types and to stay compatible when future support will be implemented.

For example:
PostgreSQL catalog with the following configuration will map tsrange and inet to Presto VARCHAR:

jdbc-types-mapped-to-varchar = tsrange,inet

Ref prestodb/presto#11833

@cla-bot cla-bot bot added the cla-signed label Feb 7, 2019
@findepi
Copy link
Member

findepi commented Feb 7, 2019

BTW if we treat an unsupported type as a varchar

@guyco33 guyco33 force-pushed the handle_special_jdbc_types branch from 6e99107 to b1a19a6 Compare February 8, 2019 14:58
@guyco33
Copy link
Member Author

guyco33 commented Feb 10, 2019

AFAIK most of the use-cases with such non-standard types in Presto are for reading and extracting needed values for analytic processing to gain valuable insights.

@guyco33 guyco33 force-pushed the handle_special_jdbc_types branch from b1a19a6 to 62405ff Compare February 10, 2019 09:45
@guyco33
Copy link
Member Author

guyco33 commented Feb 20, 2019

I suggest to change the parameter name to column-mapping with the following values:

  • skip_unsupported_types (default)
  • unsupported_types_to_varchar (non-compatible. Queries might fail for new supported types like JSON)
  • nonstandard_types_to_varchar (compatible. Queries won't break for new supported types)

@findepi
Copy link
Member

findepi commented Feb 20, 2019

@guyco33 i don't fully understand.
what's the difference between unsupported_types and nonstandard_types?
assuming nonstandard_types are the ones reported in JDBC as java.sql.Types.OTHER, where does "compatibility guarantee" come from?

@guyco33
Copy link
Member Author

guyco33 commented Feb 21, 2019

@findepi nonstandard_types are all the types that are not implemented in StandardColumnMappings class.
A catalog that is using nonstandard_types_to_varchar value will skip all the current and future implementations that are done in the connector code (eg: JSON in postgres)

@guyco33
Copy link
Member Author

guyco33 commented Feb 21, 2019

Maybe it will be better to define it with 2 parameters like this:

special-varchar-column-mapping = true | false (default is false)
special-varchar-column-mapping-skip-types = type_name[,type_name,...]

When special-varchar-column-mapping is set to true all types in special-varchar-column-mapping-skip-types will be mapped to their specific implementation that was done in the connector code. This way connectors that set the special-varchar-column-mapping won't break when new implementations come until they set the skip parameter

@findepi
Copy link
Member

findepi commented Feb 21, 2019

nonstandard_types are all the types that are not implemented in StandardColumnMappings class.

I don't think this is a sufficient definition.
StandardColumnMappings#jdbcTypeToPrestoType can be extended (for example, it lacks Types.TIMESTAMP_WITH_TIMEZONE today).
Or, we may eventually remove this method and let all the connectors handle the mappings explicitly (a connector knows which database it connects to, so it knows what types are possible).

When special-varchar-column-mapping is set to true all types in special-varchar-column-mapping-skip-types will be mapped to their specific implementation that was done in the connector code.

This is intersting idea. Actually, this could be simplified (we don't need to have two parameters):

types-mapped-to-varchar = typename1, typename2, ...

some concerns:

  • if we want to promise future-compatibility, this option needs to be applied before determining if type mapping is known
  • is type name (TYPE_NAME in java.sql.DatabaseMetaData#getColumns) always sufficient to distinguish types?
    • for example decimals that cannot be mapped to Presto's DECIMAL?
  • do we need to support patterns? I.e. in each database, is TYPE_NAME one of a few possible values (e.g. char, integer, varchar, ..) or can it be parameterized (e.g. char(2), decimal(3,1))?
    • in the latter case, we could accept a regex, but this is getting a bit ugly
  • can we come up with a better name for the switch?

@guyco33
Copy link
Member Author

guyco33 commented Feb 26, 2019

@findepi
To summarise, I suggest to add the following configurations:

  • types-mapped-to-varchar-include = jdbcType|jdbcTypeName[,jdbcType|jdbcTypeName]
  • types-mapped-to-varchar-exclude = jdbcType|jdbcTypeName[,jdbcType|jdbcTypeName]

For example:
types-mapped-to-varchar-include = OTHER,ARRAY
types-mapped-to-varchar-exclude = json,jsonb

@guyco33 guyco33 changed the title Add an optional catalog parameter (fetch-all-table-columns) Add an optional Jdbc type mapping to varchar Feb 26, 2019
@guyco33 guyco33 force-pushed the handle_special_jdbc_types branch 4 times, most recently from a105792 to 503a36b Compare March 4, 2019 12:57
@guyco33 guyco33 force-pushed the handle_special_jdbc_types branch 3 times, most recently from 73b057e to 2562c56 Compare March 13, 2019 15:50
@guyco33 guyco33 force-pushed the handle_special_jdbc_types branch from 2562c56 to ce0829e Compare March 17, 2019 06:57
@guyco33 guyco33 force-pushed the handle_special_jdbc_types branch 2 times, most recently from e1bede8 to da4019d Compare April 14, 2019 09:31
Copy link
Member

@findepi findepi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed offline:
I would strongly prefer this mapping to be done based on JdbcTypeHandle#jdbcTypeName only, so that we have one, easy to understand configuration toggle.

@dain dain changed the title Add an optional Jdbc type mapping to varchar Add an optional JDBC connector type mapping to varchar Apr 22, 2019
@guyco33 guyco33 force-pushed the handle_special_jdbc_types branch from da4019d to edee55b Compare April 23, 2019 09:31
@guyco33
Copy link
Member Author

guyco33 commented Apr 25, 2019

As discussed offline:
I would strongly prefer this mapping to be done based on JdbcTypeHandle#jdbcTypeName only, so that we have one, easy to understand configuration toggle.

Using jdbcTypeName only might not be sufficient for RDBMS connectors with types without names.
see: #523

@findepi
Copy link
Member

findepi commented Apr 25, 2019

Using jdbcTypeName only might not be sufficient for RDBMS connectors with types without names.

@guyco33 i am aware of this.

@guyco33
Copy link
Member Author

guyco33 commented May 7, 2019

As discussed offline, using typeName only is not sufficient without solving issue #682 either by JSON mapping or native array with PR #659

@findepi
Copy link
Member

findepi commented May 7, 2019

@guyco33 thanks for checking. So the plan would be:

@guyco33 guyco33 force-pushed the handle_special_jdbc_types branch from edee55b to dfcc52c Compare May 21, 2019 09:42
@guyco33 guyco33 force-pushed the handle_special_jdbc_types branch 2 times, most recently from d218031 to 774c50c Compare June 1, 2019 14:44
@guyco33 guyco33 force-pushed the handle_special_jdbc_types branch from 774c50c to 22f973c Compare June 17, 2019 07:12
@guyco33 guyco33 force-pushed the handle_special_jdbc_types branch from 22f973c to 68140a1 Compare June 29, 2019 14:06
@guyco33
Copy link
Member Author

guyco33 commented Aug 6, 2019

@findepi
Waiting for #1148 and #1023 to be merged in order to simplify this PR by using jdbcTypeName only toggle.

jdbc-types-mapped-to-varchar = jdbcTypeName[,jdbcTypeName[,...]]

@guyco33 guyco33 force-pushed the handle_special_jdbc_types branch 2 times, most recently from 163903f to e26d20f Compare August 18, 2019 07:05
@guyco33 guyco33 force-pushed the handle_special_jdbc_types branch from f49eb88 to 2867771 Compare August 19, 2019 16:39
Copy link
Member

@findepi findepi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

{
if (jdbcTypesMappedToVarchar != null) {
for (String type : jdbcTypesMappedToVarchar.split(",")) {
this.jdbcTypesMappedToVarchar.add(type.trim());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • You should reset jdbcTypesMappedToVarchar when setter is called.
  • avoid String#split for all its weirdnesses
this.jdbcTypesMappedToVarchar = ImmutableSet.copyOf(Splitter.on(",").omitEmptyStrings().split(nullToEmpty(jdbcTypesMappedToVarchar)));

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whops.. add .trimResults() too.

{
this.identifierQuote = requireNonNull(identifierQuote, "identifierQuote is null");
this.connectionFactory = requireNonNull(connectionFactory, "connectionFactory is null");
requireNonNull(caseInsensitiveNameMatchingCacheTtl, "caseInsensitiveNameMatchingCacheTtl is null");

this.caseInsensitiveNameMatching = caseInsensitiveNameMatching;
this.jdbcTypesMappedToVarchar = jdbcTypesMappedToVarchar;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
this.jdbcTypesMappedToVarchar = jdbcTypesMappedToVarchar;
this.jdbcTypesMappedToVarchar = ImmutableSet.copyOf(requireNonNull(jdbcTypesMappedToVarchar, "jdbcTypesMappedToVarchar is null"));

@@ -125,20 +126,23 @@ public BaseJdbcClient(BaseJdbcConfig config, String identifierQuote, ConnectionF
identifierQuote,
connectionFactory,
requireNonNull(config, "config is null").isCaseInsensitiveNameMatching(),
config.getCaseInsensitiveNameMatchingCacheTtl());
config.getCaseInsensitiveNameMatchingCacheTtl(),
config.getJdbcTypesMappedToVarchar());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move the new arg right after identifierQuote and connectionFactory (same in assignments below)
(i'm aware the identifierQuote and connectionFactory are misordered)

@@ -47,6 +51,22 @@ public BaseJdbcConfig setConnectionUrl(String connectionUrl)
return this;
}

public Set<String> getJdbcTypesMappedToVarchar()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add getter/setter in same order you added the field (ie field is last, so move getter/setter to the end of the class)

{
if (jdbcTypesMappedToVarchar.contains(type.getJdbcTypeName().get())) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't fail when type.getJdbcTypeName() is empty

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it won't fail since jdbcTypesMappedToVarchar doesn't contain empty strings


assertEquals(expected.getJdbcTypesMappedToVarchar().contains("mytype"), true);
assertEquals(expected.getJdbcTypesMappedToVarchar().contains("struct_type1"), true);
assertEquals(expected.getJdbcTypesMappedToVarchar().contains("supported_type"), false);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
assertEquals(expected.getJdbcTypesMappedToVarchar().contains("supported_type"), false);
assertEquals(expected.getJdbcTypesMappedToVarchar(), ImmutableSet.of("mytype", "struct_type1"));

@@ -260,7 +264,7 @@ public BaseJdbcClient(
@Override
public Optional<ColumnMapping> toPrestoType(ConnectorSession session, Connection connection, JdbcTypeHandle typeHandle)
{
return jdbcTypeToPrestoType(session, typeHandle);
return jdbcTypeToPrestoType(session, typeHandle, jdbcTypesMappedToVarchar);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be handled before calling jdbcTypeToPrestoType, so let's keep it inside JdbcClient and don't change that method.

Since the mapping is not that trivial (eg disabling predicate pushdown), I think a helper method could be useful.
It could be used like that:

Optional<ColumnMapping> mapping = getForcedMappingToVarchar(typeHandle);
if (mapping.isPresent()) {
    return mapping;
}

where getForcedMappingToVarchar is defined in BaseJdbcClient:

protected Optional<...> getForcedMappingToVarchar(...) { ... }

@@ -33,6 +36,7 @@
private boolean caseInsensitiveNameMatching;
private Duration caseInsensitiveNameMatchingCacheTtl = new Duration(1, MINUTES);
private CredentialProviderType credentialProviderType = INLINE;
private Set<String> jdbcTypesMappedToVarchar = new HashSet<>();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
private Set<String> jdbcTypesMappedToVarchar = new HashSet<>();
private Set<String> jdbcTypesMappedToVarchar = ImmutableSet.of();

@@ -122,7 +123,8 @@ public PhoenixClient(PhoenixConfig config, ConnectionFactory connectionFactory)
ESCAPE_CHARACTER,
connectionFactory,
config.isCaseInsensitiveNameMatching(),
config.getCaseInsensitiveNameMatchingCacheTtl());
config.getCaseInsensitiveNameMatchingCacheTtl(),
new HashSet<>());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
new HashSet<>());
ImmutableSet.of());

@@ -261,6 +263,9 @@ protected ResultSet getTables(Connection connection, Optional<String> schemaName
String jdbcTypeName = typeHandle.getJdbcTypeName()
.orElseThrow(() -> new PrestoException(JDBC_ERROR, "Type name is missing: " + typeHandle));

if (jdbcTypesMappedToVarchar.contains(jdbcTypeName)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to be added to MySqlClient and SqlServerClient as well, since they too override toPrestoType()

@findepi findepi requested review from kokosing and findepi August 19, 2019 18:58
this.includeTypeNamesMappingToVarchar = new HashSet<String>();
if (typesMappedToVarcharInclude != null) {
for (String type : typesMappedToVarcharInclude.split(",")) {
if (typesMap.containsKey(type.toUpperCase())) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

toUpperCase(ENGLISH) here and below

this.typesMappedToVarcharInclude = typesMappedToVarcharInclude;
this.includeTypesMappingToVarchar = new HashSet<Integer>();
this.includeTypeNamesMappingToVarchar = new HashSet<String>();
if (typesMappedToVarcharInclude != null) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

invert the condition, that way you will avoid long nested blocks

private Set<Integer> includeTypesMappingToVarchar;
private Set<String> includeTypeNamesMappingToVarchar;
private Set<Integer> excludeTypesMappingToVarchar;
private Set<String> excludeTypeNamesMappingToVarchar;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would extract these collection of the config classes to some new entity like MappingToVarchar. IMO config should not have any logic.

@@ -115,6 +115,7 @@

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add "end-to-end" test where you SELECT unsupported type as varchar. Maybe to io.prestosql.plugin.jdbc.TestJdbcIntegrationSmokeTest or somewhere close.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A test in TestPostgreSqlTypeMapping would be nice too.

}
}
}
};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please squash the commits. Changes in single PR should not go back and forth.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:) sorry @kokosing i didn't warn you this has significantly altered since the beginning

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@findepi @kokosing Rebased and squashed to one commit
Will add now the end-to-end test to TestPostgreSqlTypeMapping

{
this.identifierQuote = requireNonNull(identifierQuote, "identifierQuote is null");
this.connectionFactory = requireNonNull(connectionFactory, "connectionFactory is null");
requireNonNull(caseInsensitiveNameMatchingCacheTtl, "caseInsensitiveNameMatchingCacheTtl is null");

this.caseInsensitiveNameMatching = caseInsensitiveNameMatching;
this.jdbcTypesMappedToVarchar = jdbcTypesMappedToVarchar;
this.jdbcTypesMappedToVarchar = ImmutableSet.copyOf(requireNonNull(jdbcTypesMappedToVarchar, "jdbcTypesMappedToVarchar is null"));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please move this up, after this.connectionFactory =

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, missed that one ...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed


protected Optional<ColumnMapping> getForcedMappingToVarchar(JdbcTypeHandle typeHandle)
{
if (jdbcTypesMappedToVarchar.contains(typeHandle.getJdbcTypeName().get())) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if (typeHandle.getJdbcTypeName().isPresent() && jdbcTypesMappedToVarchar.contains(typeHandle.getJdbcTypeName().get()))

@guyco33 guyco33 force-pushed the handle_special_jdbc_types branch 3 times, most recently from 35644d3 to 21bae0e Compare August 20, 2019 12:26
@guyco33
Copy link
Member Author

guyco33 commented Aug 20, 2019

@findepi @kokosing
Done implementing requested changes

@guyco33 guyco33 force-pushed the handle_special_jdbc_types branch from 21bae0e to 50276f2 Compare August 20, 2019 12:49
@findepi
Copy link
Member

findepi commented Aug 20, 2019

Merged as 52c7c39, thanks!

I fixed an unused import and introduced StandardColumnMappings#varcharReadFunction.

@findepi findepi closed this Aug 20, 2019
@findepi findepi added this to the 318 milestone Aug 20, 2019
@findepi findepi mentioned this pull request Aug 20, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

3 participants