You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Default Vector output timestamp format is expected to be compatible with default Clickhouse timestamp input format.
Actual Behavior
It is not. Clickhouse' DateTime parsing is determined by date_time_input_format setting, which allows two options:
basic — as documentation says, "ClickHouse can parse only the basic YYYY-MM-DD HH:MM:SS or YYYY-MM-DD format". This is the default behavior.
best_effort — can also parse ISO 8601 formats.
Vector' Clickhouse sink have option encoding.timestamp_format, which also may accept two options: rfc3339 (the default) or unix.
When using default Vector Clickhouse sink with default encoding.timestamp_format together with Clickhouse with default date_time_input_format, Clickhouse rejects data with error
Code: 27, e.displayText() = DB::Exception: Cannot parse input: expected '\"' before: 'Z\"}\\n': (while read the value of key timestamp): (at row 1)\n`
Proposed solutions
Setting Vector to use Unix timestamps in Clickhouse sink solves problem for me:
encoding.timestamp_format = "unix"
However, I think this must be at least noted in the documentation.
Also, maybe it is worth to consider changing default value encoding.timestamp_format to unix, to get the correct behavior "out of the box".
Adding new timestamp_format option basic may also be an option, however, this cannot be the optimal default value IMHO, because timezone interpretation by Clickhouse may depend on column setting or, if none set, from system timezone, which is may be useful only when selected explicitly, i.e. user knows what she/he is doing.
I'm using the following workaround in VRL to format my timestamp in the default way ClickHouse expects:
.ts = format_timestamp(.ts, "%F %T%.9f") ?? .ts
I read in a ClickHouse issue that the basic date format is faster than the best_effort setting. I can't find the issue anymore, but the best effort path tries parsing a bunch of ways so it make sense.
The downside to using unix timestamp is that it is in seconds. best_effort may be slower but at the moment it is the only way to handle DateTime64. The best solution to this would be to add explicit RFC3339 option to ClickHouse
P.S. I'm trying to add support for best_effort in #15644.
P.P.S. There may be an argument for having it "on" as default for future releases - so that the sink works by default. An option for optimization by switching to unix and second-precision timestamp will also be retained.
But this did not work for some reasong (perhaps wrong paramters or format). My guess is that the sink setting overwrites the transform format or something like that.
I don't know if this would even result in a better performance.
Your date_time_best_effort: true option works perfectly out of the box so I just keep it! :)
Part of my yaml configuration of the clickhouse sink:
Vector Version
Vector Configuration File
Field
timestamp
previously coerced to Vector timestampColumn
timestamp
in Clickhouse is DateTimeExpected Behavior
Default Vector output timestamp format is expected to be compatible with default Clickhouse timestamp input format.
Actual Behavior
It is not. Clickhouse' DateTime parsing is determined by
date_time_input_format
setting, which allows two options:basic
— as documentation says, "ClickHouse can parse only the basic YYYY-MM-DD HH:MM:SS or YYYY-MM-DD format". This is the default behavior.best_effort
— can also parse ISO 8601 formats.Vector' Clickhouse sink have option
encoding.timestamp_format
, which also may accept two options:rfc3339
(the default) orunix
.When using default Vector Clickhouse sink with default
encoding.timestamp_format
together with Clickhouse with defaultdate_time_input_format
, Clickhouse rejects data with errorProposed solutions
Setting Vector to use Unix timestamps in Clickhouse sink solves problem for me:
However, I think this must be at least noted in the documentation.
Also, maybe it is worth to consider changing default value
encoding.timestamp_format
tounix
, to get the correct behavior "out of the box".Adding new
timestamp_format
optionbasic
may also be an option, however, this cannot be the optimal default value IMHO, because timezone interpretation by Clickhouse may depend on column setting or, if none set, from system timezone, which is may be useful only when selected explicitly, i.e. user knows what she/he is doing.References
#1634
The text was updated successfully, but these errors were encountered: