Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Treating the Empty String as NULL since 0.3.2-patch6 #931

Closed
mafiore opened this issue May 11, 2022 · 29 comments · Fixed by #932
Closed

Treating the Empty String as NULL since 0.3.2-patch6 #931

mafiore opened this issue May 11, 2022 · 29 comments · Fixed by #932
Labels
Milestone

Comments

@mafiore
Copy link

mafiore commented May 11, 2022

I can reproduce this misbehaviour since version 0.3.2-patch6.

Up to version 0.3.2-patch5 everything worked fine.

I poked around an bit and found out that the jdbc-driver has problems with empty strings.
It seems that it handles empty strings like null values.

Before patch6 everything worked fine. This behavior still exists in patch9

This relates to #896
Unexpected error inserting/updating row in part [insertRow add batch]
...
com.clickhouse.client.data.BinaryStreamUtils.writeString (BinaryStreamUtils.java:1667)

If is replace the empty string value with any character the error is gone.

@zhicwu
Copy link
Contributor

zhicwu commented May 11, 2022

Thanks @mafiore. It sounds like empty string result in a null flag. Let me add a few more test cases to see if I can reproduce the issue - would be great if you can share a minimal test case.

@zhicwu zhicwu added the bug label May 11, 2022
@mafiore
Copy link
Author

mafiore commented May 11, 2022

thanks for your quick reply @zhicwu. My test-case is build in Apache Hop, so this is no raw code to share.
but I can explain how I reproduced the behaviour.

Up to 0.3.2-patch5 everthing worked fine. When we tried to change to a higher revision, the mess began.
First I thought it has something to do with the timestamp-format.

I ended up with a Simple Table like

CREATE TABLE TEST_EMPTY_STRING
(
    `ID` UInt64,
    `SOMETHING` String
)
ENGINE = MergeTree
PRIMARY KEY (ID)
ORDER BY (ID);

I set the NULL values within the test records to empty with SOMETHING==null?"":SOMETHING

This leads to the described error. If I set the value to e.g. "-" the record will be accepted an written.

There must be a related change between rev5 and rev6

@zhicwu
Copy link
Contributor

zhicwu commented May 11, 2022

Below test case works for me. Did I miss something?

try (ClickHouseConnection conn = newConnection(new Properties());
        Statement s = conn.createStatement()) {
    s.execute("drop table if exists test_read_write_strings;"
            + "create table test_read_write_strings(id Int32, s1 String, s2 Nullable(String), s3 Array(Nullable(String)))engine=Memory");
    try (PreparedStatement stmt = conn
            .prepareStatement("insert into test_read_write_strings")) {
        stmt.setInt(1, 1);
        stmt.setString(2, "");
        stmt.setString(3, "");
        stmt.setObject(4, new String[] { "" });
        stmt.addBatch();
        stmt.setInt(1, 2);
        stmt.setString(2, "");
        stmt.setString(3, null);
        stmt.setObject(4, new String[] { null });
        stmt.addBatch();
        int[] results = stmt.executeBatch();
        Assert.assertEquals(results, new int[] { 1, 1 });
    }

    ResultSet rs = s.executeQuery("select * from test_read_write_strings order by id");
    Assert.assertTrue(rs.next());
    Assert.assertEquals(rs.getInt(1), 1);
    Assert.assertEquals(rs.getString(2), "");
    Assert.assertEquals(rs.getString(3), "");
    Assert.assertEquals(rs.getObject(4), new String[] { "" });
    Assert.assertTrue(rs.next());
    Assert.assertEquals(rs.getInt(1), 2);
    Assert.assertEquals(rs.getString(2), "");
    Assert.assertEquals(rs.getString(3), null);
    Assert.assertEquals(rs.getObject(4), new String[] { null });
    Assert.assertFalse(rs.next());
}

@zhicwu
Copy link
Contributor

zhicwu commented May 11, 2022

I set the NULL values within the test records to empty with SOMETHING==null?"":SOMETHING

This leads to the described error. If I set dthe value to e.g. "-" the record will be accepted an written.

This might not related but what's the version of ClickHouse you're using?

@mafiore
Copy link
Author

mafiore commented May 11, 2022

22.3.2 revision 54455.

@zhicwu
Copy link
Contributor

zhicwu commented May 11, 2022

Hmm... I can only reproduce the issue when trying to write null to a non-nullable string column. Also I didn't see suspicious change in patch6. Could that be some feature in Hop to treat empty string as null?

@zhicwu zhicwu linked a pull request May 11, 2022 that will close this issue
zhicwu added a commit that referenced this issue May 11, 2022
@mafiore
Copy link
Author

mafiore commented May 12, 2022

No there is absolutely nothing like that in Hop. What I do ist to filter out the nulls and set them explicitely to an empty String to avoid Nulls. to be wriiten to a non nullable column.I tried it in every direction. leave the filed null, set the field to empty String, set it to a defaut String "-" .

The crazy is, that it works with the same setup since month, but the switch away from rev5 is breaking it.

For the moment it is ok for us to stay at rev5, but this no longterm solution.

@zhicwu
Copy link
Contributor

zhicwu commented May 12, 2022

Thanks @mafiore and sorry for the inconvenience at your end. In general, we should avoid writing null to a non-nullable column. However, it's not always true - for example, when we want to use default value of a column, we may pass null even we know the column is not nullable. This is also the reason why #935 didn't pass build check :)

I think both cases can be addressed by adding a JDBC-specific option nullParameterHandler (defaults to none for backward compatibility, and only for PrepareStatement parameters) to customize the behavior:

  • convert - convert null to default value of corresponding data type(not the default value of the column)
  • exception - throw SQLException instead of ugly NPE to prevent user from setting null to a non-null column
  • none - do nothing

Make sense?

@mafiore
Copy link
Author

mafiore commented May 12, 2022

I agree with you @zhicwu but I don't tried write NULL to non nullable. I change NULL to an empty String which is not NULL at all and this had worked until rev5. Thank you !

@zhicwu
Copy link
Contributor

zhicwu commented May 12, 2022

I agree with you @zhicwu but I don't tried write NULL to non nullable. I change NULL to an empty String which is not NULL at all and this had worked until rev5. Thank you !

I understand. It could be a breaking change I made without noticing. But with the new option, it can simplify your ETL work too, because you'll no longer need additional step to convert null to empty string. Have you tried insert_null_as_default before? It could be a server-side solution.

@mafiore
Copy link
Author

mafiore commented May 12, 2022

@zhicwu this is a very cool helpful feature. The clickhouse-developers do a great job. I curious how it will be when the jdbc driver supports native connections (hot http based)

@zhicwu
Copy link
Contributor

zhicwu commented May 12, 2022

@mafiore, did you mean clickhouse-tcp-client? I don't think it will help much, except native data format support(for parallel deserialization & serialization). If the concern is performance, you may watch #928 as I'll provide more updates in the next following days. To put in short, the JDBC driver is very slow. Java client and streaming make things better, but still far behind ClickHouse native client.

@mafiore
Copy link
Author

mafiore commented May 13, 2022

@zhicwu It may be that JDBC ist slow compared to the native CH Client. But for our needs it is fast like a flash :-)
JDBC is often the only way to interact with other systems.
Maybe it could be a way to have a JDBC wrapper around the clickhouse-client. (with the caveat of the separate deployment of platform native libs). When I remember it correctly, this was done with SQL Server for a long time

@zhicwu
Copy link
Contributor

zhicwu commented May 13, 2022

Maybe it could be a way to have a JDBC wrapper around the clickhouse-client.

Exactly. There'll be a clickhouse-cli-client, which is simply a wrapper of the native command line :D

@zhicwu zhicwu added this to the 0.3.2-patch10 milestone May 16, 2022
@zhicwu
Copy link
Contributor

zhicwu commented Jun 20, 2022

nullAsDefault connection property(defaults to 0) was added in #956, but it's only for JDBC driver(not supported in Java client for performance reason).

Example: jdbc:ch://localhost/db?nullAsDefault=1

0 - treat null as is and throw exception when trying to insert null value into non-nullable column
1 - treat null as is and disable null-check for insertion
2 - convert null to default value of corresponding data type for both insertion and query

@mafiore
Copy link
Author

mafiore commented Jun 27, 2022

hello @zhicwu
I am sorry to say, that there is no difference in the behaviour. The Empty String ist not accepted.
I am using the patch 10

Error inserting/updating row
2022/06/27 14:40:57 - store barvalue event 2.0 - Cannot set null to non-nullable column #6 [event_unit LowCardinality(String)]

image

Why can't this not work, like it did before patch 6 ?

I tried every combination and ended up with setting the value to "empty", which is not what I want. The empty String is a String with the length of 0, not NULL

@zhicwu
Copy link
Contributor

zhicwu commented Jun 27, 2022

Sorry to hear about that @mafiore. Let's reopen the issue.

Can you provide minimum pipeline for reproducing the issue? I'll then install Apache hop and debug from there.

@zhicwu zhicwu reopened this Jun 27, 2022
@zhicwu
Copy link
Contributor

zhicwu commented Jun 27, 2022

@mafiore, I tried on hop 2.0.0 and was not able to reproduce the issue. nullAsDefault works exactly as expected. Also the exception above will only be thrown when nullAsDefault < 1:

https://github.com/ClickHouse/clickhouse-jdbc/blob/ea5aaf579b0612bcf1825eb1ec31bf9b170a7a65/clickhouse-jdbc/src/main/java/com/clickhouse/jdbc/internal/InputBasedPreparedStatement.java#L320-L323

On a side note, it looks like they started to enhance clickhouse plugin by replacing clickhouse4j to official driver, so that nobody has to use generic database plugin to access ClickHouse :D

@mafiore
Copy link
Author

mafiore commented Jun 28, 2022

@zhicwu Sorry, But I have a lack of time thees days.
I will try this again, but I thought I tested it with different setups

On a side note, it looks like they started to enhance clickhouse plugin by replacing clickhouse4j to official driver, so that nobody has to use generic database plugin to access ClickHouse :D

Who do you think, has triggered this? :-)
yes - you are right. It was me ...

@uklft
Copy link

uklft commented Nov 24, 2022

@zhicwu Hey there, unfortunately this bug still occurs for us.
Currenty we are testing on Apache Hop 2.1.0.
We could reproduce it using driver clickhouse-jdbc-0.3.2-patch9-http.jar (which ships with Hop) and clickhouse-jdbc-0.3.2-patch11-http.jar

Independently from the value nullAsDefault is set to following error is thrown:

2022/11/24 11:02:35 - Table output.0 - Error inserting row into table [Test] with values: [null], [null]
2022/11/24 11:02:35 - Table output.0 - 
2022/11/24 11:02:35 - Table output.0 - Unexpected error inserting/updating row in part [insertRow add batch]
2022/11/24 11:02:35 - Table output.0 -  at java.lang.Thread.run (Thread.java:829)
2022/11/24 11:02:35 - Table output.0 -  at org.apache.hop.pipeline.transform.RunThread.run (RunThread.java:51)
2022/11/24 11:02:35 - Table output.0 -  at org.apache.hop.pipeline.transforms.tableoutput.TableOutput.processRow (TableOutput.java:117)
2022/11/24 11:02:35 - Table output.0 -  at org.apache.hop.pipeline.transforms.tableoutput.TableOutput.writeToTable (TableOutput.java:251)
2022/11/24 11:02:35 - Table output.0 -  at org.apache.hop.core.database.Database.insertRow (Database.java:1097)
2022/11/24 11:02:35 - Table output.0 -  at com.clickhouse.jdbc.internal.InputBasedPreparedStatement.addBatch (InputBasedPreparedStatement.java:313)
2022/11/24 11:02:35 - Table output.0 -  at com.clickhouse.client.data.ClickHouseRowBinaryProcessor$MappedFunctions.serialize (ClickHouseRowBinaryProcessor.java:486)
2022/11/24 11:02:35 - Table output.0 -  at com.clickhouse.client.data.ClickHouseRowBinaryProcessor$MappedFunctions.lambda$buildMappingsForDataTypes$65 (ClickHouseRowBinaryProcessor.java:337)
2022/11/24 11:02:35 - Table output.0 -  at com.clickhouse.client.data.BinaryStreamUtils.writeString (BinaryStreamUtils.java:1667)

One therory is that the empty string interferes with the check for a null character here:
https://github.com/ClickHouse/clickhouse-jdbc/blob/42b1b2193931dc68b86cb5dfc86392d4c6d6601b/clickhouse-client/src/main/java/com/clickhouse/client/data/BinaryStreamUtils.java#L1758-L1764

I created an example repo to demonstrate this behaviour if you want to check it out yourself:
https://github.com/uklft/clickhouse-jdbc-hop-test

edit: nullAsDefault seems to be ignored from DBeaver + ClickHouse JDBC, too.
Using the URL jdbc:clickhouse:http://localhost:18123?nullAsDefault=0.

@zhicwu
Copy link
Contributor

zhicwu commented Nov 25, 2022

Hi @uklft, thanks for sharing your findings. Just tried the example you provided using nullAsDefault=2 and I was able to reproduce the issue using patch9 but it works for both patch11 and nightly build. I use below table for testing:

drop table if exists test_empty_string;
create table test_empty_string(id Int32, name String)engine=MergeTree() order by tuple();

To upgrade JDBC driver:

$ cd hop/plugins/databases/clickhouse/lib
$ rm -f *.jar
$ wget https://repo1.maven.org/maven2/com/clickhouse/clickhouse-jdbc/0.3.2-patch11/clickhouse-jdbc-0.3.2-patch11-all.jar

Did I miss anything?

nullAsDefault seems to be ignored from DBeaver + ClickHouse JDBC, too. Using the URL jdbc:clickhouse:http://localhost:18123?nullAsDefault=0.

What's the expectation? The option is only useful when you pass null values to PreparedStatement, but I'm not sure how I can do that in DBeaver. If you're talking about binding variables like insert into test_empty_string values( :id, :name), it seems DBeaver does not support binding null value.

Update: submitted https://issues.apache.org/jira/browse/HOP-4629 for upgrading JDBC driver and remove http-client jar, which might be the cause of the issue.

@uklft
Copy link

uklft commented Nov 28, 2022

Hey @zhicwu thanks for looking into it and creating an Issue for Hop!

You are right, the parameter works as expected with patch11.
Indeed the http-client jar was present during my test, so that may be the reason the problem still occurred.

Concerning DBeaver: I expected the parameter to work also on a direct insert. Sorry for the misunderstanding.

@zhicwu
Copy link
Contributor

zhicwu commented Nov 28, 2022

Thanks @uklft for confirming the issue has been resolved.

@zhicwu zhicwu closed this as completed Nov 28, 2022
@uklft
Copy link

uklft commented Dec 2, 2022

@zhicwu Sorry to bother you again, but we noticed that the original problem of this issue unfortunately is not resolved.

nullAsDefault being ignored by the driver is resolved.
But the problem that (at least in Hop) empty strings are passed as null values remains.
When i send rows containing ("", "") i get this error:

2022/12/02 12:29:14 - Table output.0 - Prepared statement : INSERT INTO EmptyStringTest.Test (id, name) VALUES ( ?,  ?)
2022/12/02 12:29:14 - clickhouse - Rollback on database connection [clickhouse]
2022/12/02 12:29:14 - Table output.0 - ERROR: Because of an error, this transform can't continue:
2022/12/02 12:29:14 - Table output.0 - ERROR: org.apache.hop.core.exception.HopException: 
2022/12/02 12:29:14 - Table output.0 - Error inserting row into table [Test] with values: [], []
2022/12/02 12:29:14 - Table output.0 - 
2022/12/02 12:29:14 - Table output.0 - Error inserting/updating row
2022/12/02 12:29:14 - Table output.0 - Cannot set null to non-nullable column #1 [id UInt64]
2022/12/02 12:29:14 - Table output.0 - 
2022/12/02 12:29:14 - Table output.0 - 
2022/12/02 12:29:14 - Table output.0 - 	at org.apache.hop.pipeline.transforms.tableoutput.TableOutput.writeToTable(TableOutput.java:383)
2022/12/02 12:29:14 - Table output.0 - 	at org.apache.hop.pipeline.transforms.tableoutput.TableOutput.processRow(TableOutput.java:117)
2022/12/02 12:29:14 - Table output.0 - 	at org.apache.hop.pipeline.transform.RunThread.run(RunThread.java:51)
2022/12/02 12:29:14 - Table output.0 - 	at java.base/java.lang.Thread.run(Thread.java:829)
2022/12/02 12:29:14 - Table output.0 - Caused by: org.apache.hop.core.exception.HopDatabaseException: 
2022/12/02 12:29:14 - Table output.0 - Error inserting/updating row
2022/12/02 12:29:14 - Table output.0 - Cannot set null to non-nullable column #1 [id UInt64]
2022/12/02 12:29:14 - Table output.0 - 
2022/12/02 12:29:14 - Table output.0 - 	at org.apache.hop.core.database.Database.insertRow(Database.java:1135)
2022/12/02 12:29:14 - Table output.0 - 	at org.apache.hop.pipeline.transforms.tableoutput.TableOutput.writeToTable(TableOutput.java:251)
2022/12/02 12:29:14 - Table output.0 - 	... 3 more
2022/12/02 12:29:14 - Table output.0 - Caused by: java.sql.SQLException: Cannot set null to non-nullable column #1 [id UInt64]
2022/12/02 12:29:14 - Table output.0 - 	at com.clickhouse.jdbc.SqlExceptionUtils.clientError(SqlExceptionUtils.java:73)
2022/12/02 12:29:14 - Table output.0 - 	at com.clickhouse.jdbc.internal.InputBasedPreparedStatement.addBatch(InputBasedPreparedStatement.java:328)
2022/12/02 12:29:14 - Table output.0 - 	at com.clickhouse.jdbc.internal.InputBasedPreparedStatement.executeAny(InputBasedPreparedStatement.java:110)
2022/12/02 12:29:14 - Table output.0 - 	at com.clickhouse.jdbc.internal.InputBasedPreparedStatement.executeLargeUpdate(InputBasedPreparedStatement.java:185)
2022/12/02 12:29:14 - Table output.0 - 	at com.clickhouse.jdbc.internal.AbstractPreparedStatement.executeUpdate(AbstractPreparedStatement.java:135)
2022/12/02 12:29:14 - Table output.0 - 	at org.apache.hop.core.database.Database.insertRow(Database.java:1100)
2022/12/02 12:29:14 - Table output.0 - 	... 4 more

Version is clickhouse-jdbc-0.3.2-patch11-all.jar.
I updated the example repo in case you want to try yourself.

@zhicwu
Copy link
Contributor

zhicwu commented Dec 2, 2022

Hi @uklft, just tried your repo again and it works for me. Did you update database configuration by setting nullAsDefault to 2? You can either do that via manual URL(e.g. jdbc:ch:http://<server>/?nullAsDefault=2) or add an option accordingly.

@uklft
Copy link

uklft commented Dec 2, 2022

Thanks for looking into it.
So when you run the project as it is (with nullAsDefault=0) you get no error?

If i add nullAsDefault=2 it works as expected but that is not the point.

The problem is that the driver sends a null value although i am passing an empty string.
But i may want to set an empty string on a not nullable column and fail if it receives a null value. I would expect that behaviour by default.

@zhicwu
Copy link
Contributor

zhicwu commented Dec 2, 2022

I see. By default, Apache Hop treats empty string as null value, but you can change the behavior by setting variable HOP_EMPTY_STRING_DIFFERS_FROM_NULL to Y (Tools -> Edit config variables).

@uklft
Copy link

uklft commented Dec 2, 2022

Oh wow, that is the key. Thank you so much for pointing that out!

@Flu-iid
Copy link

Flu-iid commented Aug 1, 2023

today I updated clickhouse and now same python code that passed None data to clickhouse-connect and made Null values in tables, now makes empty string.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants