Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problems with huge Int data types (Tableau) #771

Closed
artshevchenko opened this issue Dec 3, 2021 · 20 comments · Fixed by #774
Closed

Problems with huge Int data types (Tableau) #771

artshevchenko opened this issue Dec 3, 2021 · 20 comments · Fixed by #774
Milestone

Comments

@artshevchenko
Copy link

Hi! Our team is developing a Tableau connector to ClickHouse based on this JDBC driver. During testing, we encountered problems when working with data types UInt64, UInt256, Int128, Int256:

  • In the case of UInt64, we get an error parsing a number as Long
    1951F9AA-1E93-43F0-A775-144FF03CE904
    It looks like problem in lines 171-173 of ClickHouseRowBinaryStream.java

  • In the case of the other listed data types, the number is truncated
    6B3E2AEC-863A-4342-B0E3-142465AA0A75
    FB4F196A-C1DE-4066-8502-C10480F1B693
    Looks like problems with declared precision in ClickHouseDataType.java

Is this a bug or a feature?

@den-crane
Copy link
Collaborator

den-crane commented Dec 3, 2021

It's the Tableau problem. It does not support numbers bigger than Int64 such as UInt64/Int128/UInt128/Int256/UInt256.

To overcome this issue, we should add to JDBC a new setting -- expose_long_numbers_as = float64/string.

In case of expose_long_numbers_as = float64 select toFloat64(18446744073709551615) -> 18446744073709552000

In case of expose_long_numbers_as = string 18446744073709551615 -> '18446744073709551615'

@zhicwu
Copy link
Contributor

zhicwu commented Dec 3, 2021

Thanks @artshevchenko for reporting the issue. Is the project analytikaplus/clickhouse-tableau-connector-jdbc? I'd strongly suggest you guys start to integrate with new JDBC driver for a couple of reasons:

  1. it's been completely rewritten with improved jdbc compliance, especially metadata part
  2. it's smaller(700KB vs 3MB) and faster(see benchmarks at [HELP] 0.3.2 pre-release for public testing #768) with less memory consumption

Back to the issue, JDBC supports custom type mapping(between database native type and jdbc standard type), but I'm not sure if tableau supports that. If not, I can enhance the driver to expose additional parameters for type mapping.

We're closing to release the new driver. Let me know your findings with the test build, and I'm happy to help.

@yurifal
Copy link

yurifal commented Dec 7, 2021

Hi @zhicwu ,
This is Yuri (the other guy from the A+ team).

Testing the 0.3.2 pre-release drivers with Tableau
would be rather problematic, please take a look:

Screenshot 2021-12-07 at 01 00 38

Tried both *_shaded and *_all driver pack -- no luck.

@zhicwu
Copy link
Contributor

zhicwu commented Dec 7, 2021

Thanks @yurifal and sorry it didn't work for you.

It seems a ServiceLoader issue related to JPMS, which happens on a few clients(e.g. SQuirrel SQL). The workaround is to put the driver in class path explicitly. Let me download a Tableau Destop and see if I can fix the issue and provide a new build.

Update: it's not related to JPMS but the class loader passed to ServiceLoader. I'm now able to connect to ClickHouse on both SQuirrel SQL and Tableau Desktop. Will publish a new build tonight.

@zhicwu
Copy link
Contributor

zhicwu commented Dec 8, 2021

Tried both *_shaded and *_all driver pack -- no luck.

Please use *_http or *_all of v0.3.2-test2 to test. If you keep seeing error like HTTP/1.1 header parser received no bytes(similar as this one?), you may run zip -d <path-to-jdbc.jar> "META-INF/version*" to use HttpURLConnection which is slower but more stable.

@yurifal
Copy link

yurifal commented Dec 9, 2021

Better this time, can connect and read a schema.

The attempt to read from any table gives the error.
Screenshot 2021-12-09 at 02 08 56

Please find the attached jprotocolserver.log
jprotocolserver.log

@zhicwu zhicwu linked a pull request Dec 9, 2021 that will close this issue
@zhicwu
Copy link
Contributor

zhicwu commented Dec 9, 2021

The attempt to read from any table gives the error.

Sorry, I'd not call that better :p Could you download the driver and try again? I re-published the test build using same version.

@yurifal
Copy link

yurifal commented Dec 9, 2021

@zhicwu No prob, I've connected & made it working
using a "shaved" _http driver (HttpURLConnection).

Minor glitches do far (RIP http session_id ;-)
btw, do you know if it's possible to establish
a valid client session_id via Properties?

I'd like to SET join_use_nulls=1 (in the Initial SQL
or wherever, but not make it server-wide)
for a full support of Tableau Sets functionality.

@zhicwu
Copy link
Contributor

zhicwu commented Dec 9, 2021

@zhicwu No prob, I've connected & made it working using a "shaved" _http driver (HttpURLConnection).

Good to know. Sounds like the default implementation using HttpClient didn't work very well :<

Minor glitches do far (RIP http session_id ;-) btw, do you know if it's possible to establish a valid client session_id via Properties?

session_id will be automatically set when executing multi-statement query, but currently it cannot be set via connection URL or properties. Why do you need manually set session_id? Is that just for execute SET statement and/or creating temporary tables?

I'd like to SET join_use_nulls=1 (in the Initial SQL or wherever, but not make it server-wide) for a full support of Tableau Sets functionality.

All settings(except session_id and query_id I think) can be customized in below two ways:

  • url - jdbc:ch://localhost/system?custom_http_params=join_use_nulls%3D1%26something_else%3D0
  • properties - set custom_http_params to URL encoded value like join_use_nulls%3D1%26something_else%3D0
    Similarly, you can set custom_http_headers if you're connecting to a load balance / gateway instead of a ClickHouse server.

@yurifal
Copy link

yurifal commented Dec 9, 2021

OK, I could live with the join_use_nulls=1 written to the designated Profile in the users.xml )

Back to the original issue (on huge int / uint), could you please try
to add the following Aliases to the respective Data Types?

Int8 -- I1
Int16 -- I2
Int32 -- I4
Int64 -- I8

UInt8 -- UI1
UInt16 -- UI2
UInt32 -- UI4
UInt64 -- UI8

Thanks in advance.

@zhicwu
Copy link
Contributor

zhicwu commented Dec 9, 2021

Are all remote types defined at here should be added?

I'm not going to mess ClickHouseDataType by adding these aliases directly, because they do not exist in ClickHouse. However, I can add a system property(either -Dname=value in CLI, or a property file located at home directory, for example: ~/.clickhouse/data-type-aliases.properties) for customization, and it's not just for Tableau but also M language types etc. Make sense?

Moving forward, there'll be a clickhouse-jdbc-ext package, which can be used to simplify this(e.g. making join_use_nulls=1 as default, and all kinds of data type mappings for convenience).

@yurifal
Copy link

yurifal commented Dec 9, 2021

Are all remote types defined at here should be added?
I hope so. Frankly, I have no idea what is the most convenient place for them, up to you.

@nvm1
Copy link

nvm1 commented Dec 23, 2021

It's the Tableau problem. It does not support numbers bigger than Int64 such as UInt64/Int128/UInt128/Int256/UInt256.

To overcome this issue, we should add to JDBC a new setting -- expose_long_numbers_as = float64/string.

In case of expose_long_numbers_as = float64 select toFloat64(18446744073709551615) -> 18446744073709552000

In case of expose_long_numbers_as = string 18446744073709551615 -> '18446744073709551615'

Hi there,

ClickHouse/clickhouse-odbc#356 - I see that in ODBC driver the same problem has been resolved with HugeIntAsString parameter. Though your solution provides more flexibity, it might be a good idea to align how drivers solve the issue. Or not... I'll be happy either way ;)

@artshevchenko
Copy link
Author

Hi there!

To sum up:

  1. We need the expose_long_numbers_as=string parameter, since in most cases aggregates (like COUNTD) are considered for fields with large Integers or these fields are used as Dimensions (to specify the detail of visualizations). It is unlikely that anyone will sum up the values of such fields, knowing that the result most likely will not fit into the acceptable range of values.
  2. In theory, Tableau supports UInt64 (UI8 in dialect.tdd), but we can't use mapping since UInt64 is passed as java.lang.Long on the driver side, so the easiest way is to use the expose_long_numbers_as=string parameter when it is available

@zhicwu
Copy link
Contributor

zhicwu commented Dec 24, 2021

@artshevchenko & @nvm1, could you try clickhouse-jdbc-0.3.2-test3-http.jar? I'm about to release 0.3.2 tonight, which is almost same as 0.3.2-test3.

You can set connection property typeMappings to something like UInt64=java.lang.String(e.g. jdbc:ch://localhost:8123/default?typeMappings=UInt64%3Djava.lang.String), and then you'll get String instead of Long for all UInt64 values. My Tableau desktop was expired so I only tested on DBeaver:
image

@yurifal suggested to add aliases like UI8 etc. but I think we probably no longer need that.

@zhicwu zhicwu added this to the 0.3.2 Release milestone Dec 24, 2021
@yurifal
Copy link

yurifal commented Dec 24, 2021

@zhicwu wrote "... My Tableau desktop was expired... "

There's an old beta key (still valid as of 2021-12-31)
TCJD-8646-8CA0-9845-1C12

FYI starting from October 2021, Tableau does no longer issue Desktop Creator keys.
Customers have to activate their Desktops/Preps using their on-prem Servers with LBLM.

@zhicwu
Copy link
Contributor

zhicwu commented Dec 24, 2021

Thank you @yurifal. Just actived Tableau and ?typeMappings=UInt64%3Djava.lang.String gave me expected output :)
image

@yurifal
Copy link

yurifal commented Dec 24, 2021

nice to see @zhicwu .

could you please share which version of the driver
you're using with DBeawer for testing?

I've run the TDVT tests from Tableau Connector SDK
and found a bunch of errors mainly with Dates / Datetimes
(cast, operators & dateadd / datetrunc ones).

You can test for some of them manually (right in Tableau)
by placing a Date / Datetime pill on a view as the Exact Date,
or Week green one -- the calc is DATETRUNC('week', [your_date])

@zhicwu
Copy link
Contributor

zhicwu commented Dec 25, 2021

could you please share which version of the driver you're using with DBeawer for testing?

clickhouse-jdbc-0.3.2-test3-http.jar

I've run the TDVT tests from Tableau Connector SDK and found a bunch of errors mainly with Dates / Datetimes (cast, operators & dateadd / datetrunc ones).

Is there stack trace you can provide? To me, it seems working fine for DateTime but not Date. I had to map Date to String.

You can test for some of them manually (right in Tableau) by placing a Date / Datetime pill on a view as the Exact Date, or Week green one -- the calc is DATETRUNC('week', [your_date])

I added a calculation field using above expression against a DateTime field and it works.

@zhicwu
Copy link
Contributor

zhicwu commented Dec 29, 2021

I think the custom type mapping should address the issue - if not, please feel free to reopen the issue or create a new one.

@zhicwu zhicwu closed this as completed Dec 29, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants