Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to parse ISO8601 #1528

Closed
MOZGIII opened this issue Apr 5, 2017 · 10 comments
Closed

Unable to parse ISO8601 #1528

MOZGIII opened this issue Apr 5, 2017 · 10 comments

Comments

@MOZGIII
Copy link

MOZGIII commented Apr 5, 2017

Currently fluentd branch 0.12 does not contain code to allow parsing ISO8601 time format.
For some reason unknown to me, when time_format is omitted in JSON parser, time is interpreted as integer value (UNIX-time). However, there is no way to parse ISO 8601: it cannot be described as a Time#strptime format, since it allows ambiguities. On the onther hand, it is trivial to implement UNIX-time parising with Time#strptime: Time.strptime(value.to_s, '%s'). This may be further optimized to parse integers directly (like it's done currently in the code), but it's only applicable if time_format == '%s'!
It just seems like a poor design decision to make default format different from ISO8601 like in the rest of the parsers and inputs.

Can this be fixed now, or should this maintain backward-compatibility with the previous design? In case we have to maintain backwards compatibility, maybe we could add a special case to allow proper ISO8601 parsing? Maybe add logic for %iso8601 format string to force parsing with Time.iso8601?

@repeatedly
Copy link
Member

Is specifying time_format not enough?
time_format is for string type time field like ISO8601.

@MOZGIII
Copy link
Author

MOZGIII commented Apr 6, 2017

Is doesn't work. ISO8601 can be formatted in different ways, just check out the source for Time.iso8601.

@repeatedly
Copy link
Member

repeatedly commented Apr 6, 2017

It means your log has different formatted ISO8601 format in one stream/file?

@MOZGIII
Copy link
Author

MOZGIII commented Apr 6, 2017

Yes. We have a generic log format parser for all of out docker containers. Some of them are outputting timestams with milliseconds or not depending on user input on particular request (users are allowed to pass timestamps in ISO 8601).

@repeatedly
Copy link
Member

repeatedly commented Apr 6, 2017

I see.
We should keep backward compatibility and similar parsing performance.
So adding special case, like %iso8601, in TimeParser is better.

@VelorumS
Copy link

VelorumS commented May 5, 2017

Is specifying time_format not enough?

@repeatedly, actually, I've got a similar problem, but the time_format doesn't work with precise formats either: something is wrong with nanoseconds.

<source>
  @type tail
  path /log.log
  pos_file /log.log.pos
  time_format %Y-%m-%dT%H:%M:%S.%NZ
  tag kubernetes.*
  format json
  read_from_head true
</source>

<match kubernetes.**>
  @type             stdout
  include_time_key  true
  time_format       %Y-%m-%dT%H:%M:%S.%NZ
</match>

log.log:

{"log":"2017-05-05 06:54:05.025581 W | etcdserver: apply entries took too long [31.226255ms for 1 entries]\n","stream":"stderr","time":"2017-05-05T06:54:05.029910755Z"}
{"log":"2017-05-05 06:54:05.029733 W | etcdserver: avoid queries with large range/delete range!\n","stream":"stderr","time":"2017-05-05T06:54:05.029980733Z"}
{"log":"2017-05-05 06:54:13.233914 W | etcdserver: apply entries took too long [39.581817ms for 1 entries]\n","stream":"stderr","time":"2017-05-05T06:54:13.234191617Z"}
{"log":"2017-05-05 06:54:13.234019 W | etcdserver: avoid queries with large range/delete range!\n","stream":"stderr","time":"2017-05-05T06:54:13.234233351Z"}
{"log":"2017-05-05 06:54:19.925164 W | etcdserver: apply entries took too long [56.329248ms for 1 entries]\n","stream":"stderr","time":"2017-05-05T06:54:19.925451571Z"}
{"log":"2017-05-05 06:54:19.925485 W | etcdserver: avoid queries with large range/delete range!\n","stream":"stderr","time":"2017-05-05T06:54:19.92562947Z"}
{"log":"2017-05-05 06:54:27.166801 W | etcdserver: apply entries took too long [39.754859ms for 1 entries]\n","stream":"stderr","time":"2017-05-05T06:54:27.167138721Z"}
{"log":"2017-05-05 06:54:27.166902 W | etcdserver: avoid queries with large range/delete range!\n","stream":"stderr","time":"2017-05-05T06:54:27.167220711Z"}
{"log":"2017-05-05 06:55:22.232082 W | etcdserver: apply entries took too long [38.132101ms for 1 entries]\n","stream":"stderr","time":"2017-05-05T06:55:22.232374977Z"}
{"log":"2017-05-05 06:55:22.232175 W | etcdserver: avoid queries with large range/delete range!\n","stream":"stderr","time":"2017-05-05T06:55:22.232465125Z"}

Output (fractional seconds are now zero in the last "time" field):

2017-05-05 06:54:05 +0000 kubernetes.log.log: {"log":"2017-05-05 06:54:05.025581 W | etcdserver: apply entries took too long [31.226255ms for 1 entries]\n","stream":"stderr","time":"2017-05-05T06:54:05.000000000Z"}
2017-05-05 06:54:05 +0000 kubernetes.log.log: {"log":"2017-05-05 06:54:05.029733 W | etcdserver: avoid queries with large range/delete range!\n","stream":"stderr","time":"2017-05-05T06:54:05.000000000Z"}
2017-05-05 06:54:13 +0000 kubernetes.log.log: {"log":"2017-05-05 06:54:13.233914 W | etcdserver: apply entries took too long [39.581817ms for 1 entries]\n","stream":"stderr","time":"2017-05-05T06:54:13.000000000Z"}
2017-05-05 06:54:13 +0000 kubernetes.log.log: {"log":"2017-05-05 06:54:13.234019 W | etcdserver: avoid queries with large range/delete range!\n","stream":"stderr","time":"2017-05-05T06:54:13.000000000Z"}
2017-05-05 06:54:19 +0000 kubernetes.log.log: {"log":"2017-05-05 06:54:19.925164 W | etcdserver: apply entries took too long [56.329248ms for 1 entries]\n","stream":"stderr","time":"2017-05-05T06:54:19.000000000Z"}
2017-05-05 06:54:19 +0000 kubernetes.log.log: {"log":"2017-05-05 06:54:19.925485 W | etcdserver: avoid queries with large range/delete range!\n","stream":"stderr","time":"2017-05-05T06:54:19.000000000Z"}
2017-05-05 06:54:27 +0000 kubernetes.log.log: {"log":"2017-05-05 06:54:27.166801 W | etcdserver: apply entries took too long [39.754859ms for 1 entries]\n","stream":"stderr","time":"2017-05-05T06:54:27.000000000Z"}
2017-05-05 06:54:27 +0000 kubernetes.log.log: {"log":"2017-05-05 06:54:27.166902 W | etcdserver: avoid queries with large range/delete range!\n","stream":"stderr","time":"2017-05-05T06:54:27.000000000Z"}
2017-05-05 06:55:22 +0000 kubernetes.log.log: {"log":"2017-05-05 06:55:22.232082 W | etcdserver: apply entries took too long [38.132101ms for 1 entries]\n","stream":"stderr","time":"2017-05-05T06:55:22.000000000Z"}
2017-05-05 06:55:22 +0000 kubernetes.log.log: {"log":"2017-05-05 06:55:22.232175 W | etcdserver: avoid queries with large range/delete range!\n","stream":"stderr","time":"2017-05-05T06:55:22.000000000Z"}

@MOZGIII
Copy link
Author

MOZGIII commented May 5, 2017

I forgot about this. Maybe someone else could pick up from here and add the %iso8601 format to parse with Time.iso8601?

@repeatedly
Copy link
Member

@ChipmunkV Do you use fluentd v0.12?

@VelorumS
Copy link

VelorumS commented May 8, 2017

@repeatedly, it's not very obvious from the docs that v0.12 doesn't support anything except seconds and you need v0.14. Other non-obvious thing is the versioning of the td-agent: you would think that v2.3.5 is greater than v0.12, but it's not and not related at all.

repeatedly added a commit that referenced this issue May 12, 2017
Support %iso8601 special case for time_format. fix #1528
@MOZGIII
Copy link
Author

MOZGIII commented May 12, 2017

Will it be backported to v0.12? Or is it safe to use v0.14 now? We tried v0.14 a year ago and have had problems, so we've switched back to v0.12.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants