Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KSQL forces Avro field names to upper case #2415

Open
rmoff opened this issue Feb 6, 2019 · 22 comments
Open

KSQL forces Avro field names to upper case #2415

rmoff opened this issue Feb 6, 2019 · 22 comments
Assignees
Labels
breaking-change P1 Slightly lower priority to P0 ;) streaming-engine Tickets owned by the ksqlDB Streaming Team

Comments

@rmoff
Copy link
Member

rmoff commented Feb 6, 2019

Given an Avro schema in which the field name is mixed case:

$ curl -s "http://localhost:8081/subjects/AVRO_WITH_MIXED_CASE_FIELDS-value/versions/1"|jq '.schema|fromjson'

{
  "type": "record",
  "name": "KsqlDataSourceSchema",
  "namespace": "io.confluent.ksql.avro_schemas",
  "fields": [
    {
      "name": "FooBar",
      "type": [
        "null",
        "string"
      ],
      "default": null
    }
  ]
}

KSQL reads the field as mixed case:

ksql> print 'AVRO_WITH_MIXED_CASE_FIELDS' from beginning;
Format:AVRO
06/02/19 09:34:58 GMT, null, {"FooBar": "FOO"}

But when registered as a stream, KSQL forces the field name to upper case:

ksql> CREATE STREAM TEST WITH (VALUE_FORMAT='AVRO', KAFKA_TOPIC='AVRO_WITH_MIXED_CASE_FIELDS');

 Message
----------------
 Stream created
----------------
ksql> DESCRIBE TEST;

Name                 : TEST
 Field   | Type
-------------------------------------
 ROWTIME | BIGINT           (system)
 ROWKEY  | VARCHAR(STRING)  (system)
 FOOBAR  | VARCHAR(STRING)
-------------------------------------
For runtime statistics and query details run: DESCRIBE EXTENDED <Stream,Table>;
@j-halbert
Copy link

JSON is affected in the same way. Makes using the streams API on topics fed by KSQL special cases that require extra work to use in the same way as other topics.

@agavra agavra self-assigned this Oct 28, 2019
@agavra
Copy link
Contributor

agavra commented Oct 28, 2019

I think this is expected default behavior, but a workaround is provided. In general, we assume all fields are case insensitive (e.g. if my avro schema is what you have above, I should be able to SELECT FOOBAR FROM ... and get data) unless otherwise specified. If you want to maintain the case sensitivity, then you can explicitly specify the schema (as long as it does not clash with the avro schema in the <subject>-value in Schema Registry:

CREATE STREAM foobar (`FooBar` VARCHAR) WITH (...);

Perhaps it would be valuable to provide a shortcut in the WITH clause to treat all avro fields as case-sensitive...

@tmbull
Copy link

tmbull commented Dec 5, 2020

I am trying to use KSQL to filter messages from a debezium SQL Server source that uses a case-sensitive collation. KSQL makes this filtering very easy, but I am currently unable to use it due to this issue. The SQL statements I generate at my sink fail because the casing of (for example) column names does not match what is in the database.

IMO, the default behavior should be to leave the schema alone.

@saadshahd
Copy link

Perhaps it would be valuable to provide a shortcut in the WITH clause to treat all avro fields as case-sensitive...

Please this is very needed 😄

@greendad
Copy link

We have a similar use case where we want to retain the case when filtering the messages from a topic using ksqldb. Can someone please tell me if this change can be expected in the near future?

@agavra agavra added needs-triage streaming-engine Tickets owned by the ksqlDB Streaming Team labels Apr 26, 2021
@vcrfxia vcrfxia added breaking-change P1 Slightly lower priority to P0 ;) and removed needs-triage labels Apr 27, 2021
@emerzonic
Copy link

emerzonic commented May 3, 2021

I don't know why ksql decided to uppercase data when kafka is not doing so. Could we have ksql just leave the data just as the schema in the schema registry or as it receives it from the source kafka topic? This makes working with ksql a pain especially when you have large data set.

@dimagoldin
Copy link

I also find this a problem. all fields are automatically uppercased, makes working with ksql along with kafka connect sinks impossible.

to add matters worse, even if aliasing fields with quoted lower case names works, this is extra difficult and annoying when the column is nested. so you cant alias just the top field name.

Any news on a fix in the near future?

@ratskates
Copy link

Any movement on this ?

@akotb89
Copy link

akotb89 commented Aug 16, 2021

PLEASE RELEASE A NEW VERSION FOR KSQL WITH A RESOLUTION TO THIS. Create stream from a topic with a registered avro schema using ksql -> ksql must respect the schema and not force the field names to uppercase.

@ReasonDuan
Copy link

This is a very useful feature.

@ethanl-indeed
Copy link

This is a desirable feature for our company to use Confluent Cloud ksqlDB to process Kafka topic data and feed data back to Kafka.

@sscots
Copy link

sscots commented Sep 23, 2021

+1 We want to use the JDBC Sink Connector but the fields in ksql are uppercase while the fields in our db are lowercase.

@spancespants
Copy link

Is there going to be any movement on this anytime soon? we are running into issues with this as well

@mjsax
Copy link
Member

mjsax commented Nov 3, 2021

KLIP-56 will help with this issue and improve the situation.

We are still considering to give users even more control over the behavior (based on user demand) by adding a new property that allows to enable/disable upper-casing the names.

@jchambondynadmic
Copy link

Same problem here. Do we have any news about that ?

@ghost
Copy link

ghost commented Apr 4, 2023

Still see this issue in recent versions. it's a pain.

@jchambondynadmic
Copy link

Yeah agree. We're waiting for it since a long time !

@ghost
Copy link

ghost commented Apr 4, 2023

I went for some further investigation. This feature got implemented with https://docs.ksqldb.io/en/latest/operate-and-deploy/schema-inference-with-id/ . Only issue is that you have to use the schema id of the schema and it cannot be inheritated by the kafka topic name.

@jchambondynadmic
Copy link

It still forces uppercase. Which is the purpose of this issue.

@guilhermeneves
Copy link

Any updates on this issue? Is there any workaround if using schema-registry?

@fapinheiro
Copy link

Any updates on this issue?

@chainhead
Copy link

I am not even using schema registry and still see column names forced to upper case. I wish there was a global setting right at start-up to control this e.g., PRESERVE_COLUMN_CASE=true or false to force upper case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breaking-change P1 Slightly lower priority to P0 ;) streaming-engine Tickets owned by the ksqlDB Streaming Team
Projects
None yet
Development

No branches or pull requests