-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow statement to specify the casing (camel case, uppercase, etc) for field names when serialized to output topic #1039
Comments
This is not possible with KSQL at the moment. We would have to implement this as a feature that lets you specify (to some degree) how you want your field names serialized. Lets use this issue to track the feature request. |
for the time being, i was thinking about implementing a SMT that uses a schema to rename the fields. is that a plausible work-around for now? The SMT would match the name of the field to the schema (case insensitive) and then replace the source field for the schema field. |
You mean your sink connector writing to couch base will transform the field names to the correct casing? That should work. |
Any news since april to avoid UPPERCASING? Thank you :) |
I haven't seen any updates yet. I think this is quite a feature so I am hesitant to create a PR myself as I don't oversee all the parts involved (to be honest, I wouldn't know where to start what). We are using a simple SMT at the moment to rename the fields before they are written to the db. Unfortunately, that code is in an internal repo of a company. The SMT is configured with a avro schema, with the uppercased fields in an alias property. When a new message arrives, the SMT searches for an occurrence in the aliases and then substitutes the field for the name in the schema.
|
I'm not sure in which version this changed, but as of 5.1 you can quote fields (and objects) to retain their casing.
|
Hi everyone, I'm facing and issue when I'm trying to Here is my command The error is the following one
It seems that the column name is not resolved. |
When I rename and column name contains any lower case character, column becomes null. Source topic is getting the data. As long column name does not contains any lower case character everything works fine. |
In the example above, |
👍 for getting this looked at. This is unexpected behavior and causes integration issues. |
related to #2589 |
We should allow setting a configuration value so that materialised tables that are created based on streams with AVRO schemas will honour original casing. |
Any update regarding this issue? Will it be solved in future release? |
Maybe I did not entirely understand bug behind #4018 but I just want to be sure that this fix also takes care of mentioned bug, doing From @maeglindeveloper example: I'm trying to clarify it because #4018 doesn't say anything about quoted column names. Just want to be sure we talk here about same thing. |
I don't believe that #3477 really fixed this issue; while you can now specify the casing manually for each column, this is not feasible for data sources with many columns. Is there some rationale behind the upper-casing? When auto-creating columns from Avro, KSQL already has case information which it can use. When creating tables, streams and topics, the user has just typed in the identifier and overriding what they typed without reason seems like a bad idea. Worse, not all databases are case insensitive. In particular, postgres is case sensitive in exactly the opposite way to KSQL; unquoted identifiers are coerced to lower case by default. This makes a very bad user experience! To me the logical default where identifiers absolutely must be coerced to some case or another would be to lower them. Not only because of postgres, but because identifiers should contrast with SQL keywords. If there is no particular rationale I firmly believe that the default should be to leave column names unmolested or, at worst, lower-case them. This is of course a backwards incompatible change and would have to wait for a major version bump. Pending that, a setting which can alter the behaviour would be a great benefit. |
@agavra sorry to highlight you directly on an old issue but do you have any comment on the above? I can open a new issue if it is felt that this is too different. |
hey @fish-face - thanks for your detailed thoughts! You definitely have a lot of valid concerns and we should address them.
I totally agree that manually casing each column for something that already exists in schema registry is bad behavior, we should verify that this is the behavior and if so create a new ticket to track that. On the other hand, I don't think we should change the default case sensitivity to "case sensitive" as this is not SQL standard (5.2.13 in SQL92):
You can see here that the sql standard definition of identifier equivalence is actually to replace lower-case letters with upper-case ones and I'm guessing that's why we implemented it that way historically - but I agree that's a little aggressive and it would have identical behavior if we flipped that logic to lowercase instead of uppercase.
I think this is the default behavior for most databases. I believe postgres is actually also case-insensitive (but you are right that it implements this by lowercasing everything as opposed to uppercasing): postgres=# CREATE TABLE foo (ID VARCHAR, COL2 BIGINT);
CREATE TABLE
postgres=# \d foo
Table "public.foo2"
Column | Type | Collation | Nullable | Default
--------+-------------------+-----------+----------+---------
id | character varying | | |
col2 | bigint | | |
postgres=# INSERT INTO foo (Id, cOL2) VALUES ('hi', 1);
INSERT 0 1
postgres=# SELECT * FROM FOO;
id | col2
----+------
hi | 1
(1 row)
I don't think contrasting with SQL keywords is a valid concern. SQL keywords are case insensitive and some people prefer typing the keywords in lower case (again an example from postgres): postgres=# cReaTe Table FOo3 (ID varchar);
CREATE TABLE
postgres=# \d foo3;
Table "public.foo3"
Column | Type | Collation | Nullable | Default
--------+-------------------+-----------+----------+---------
id | character varying | | | Let me know if this answered any lingering doubts you may have! |
@agavra thanks for the reply. I will double-check the schema case and raise another issue if I can confirm it. There is also the matter of creating topics, which we notice KSQL upper-cases as well - let me know if you would like that as a separate issue. I agree introducing case sensitivity would be a backwards step! (Of course topic names are case sensitive due to kafka itself so we don't have a choice there) Postgres is in fact case-sensitive but the lowering masks this somewhat: postgres=# CREATE TABLE "FOO" ("ID" VARCHAR, "COL2" BIGINT);
CREATE TABLE
postgres=# SELECT * FROM FOO;
ERROR: relation "foo" does not exist
LINE 1: select * from FOO
postgres=# SELECT * FROM "FOO";
ID
----
(0 rows) Behaviour for columns is identical. Notice the lowercase " Would it help to look at some other major databases to see if postgres (and hence our particular irritation here) is an anomaly? Regarding case-contrast, I agree this is less important, though whether by convention or by standard the advantage remains with upper case keywords and lower/mixed case identifiers IMO :) |
This is to emulate what happens if a KTable is pushed, via connect, to postgres, with upper-case columns. Sorry I wasn't clear. To further clarify: Postgres is never case insensitive, but if you don't double-quote, it will forcibly lower case everything. Thus if you, for any reason, have upper case column or table names, you must from that point on quote those names every time you use them, which is quite annoying for users. This remains, of course, a bit of a postgres oddity. However I think it does highlight the dangers of interfering with cases.
It looks like the case where you're creating a stream from an avro-formatted topic is covered by this issue. |
@fish-face I had realized what you meant after I had submitted my comment and deleted it, but I suppose not before you had responded! (For context for people looking at this discussion) I'm giving it another thought |
Haha, I happened to be writing part of a reply at the time so I saw it immediately :P |
All the databases I've worked on have lowercase convention for field names. KSQL is the only exception and it has made a lot of confusion and trouble.. especially then connecting with other data sources. |
I don't know why ksql decided to uppercased data when kafka is not doing so. Could we have ksql just leave the data just as the schema in the schema registry or as it receives it from the source kafka topic? This makes working with ksql impossible especially when you have large data set. |
hey @Deninc @emerzonic - thanks for your (totally valid) concerns, since this ticket is specifically to allow casing can you add your comments here: #2415? Otherwise comments on closed issues tend to slip through the cracks. |
We are getting data from several sources using kafka connect and transform the data using ksql. One of the target databases is a document database (couchbase). When we store the data that is transformed by ksql all fields are uppercased. Aliasing fields did not change the behavior. The systems that rely on the data in couchbase expect the data to be camelcased which is not possible with v0.4 of ksql. Any workarounds or ways to retain the casing set by an alias (or source topic)?
The text was updated successfully, but these errors were encountered: