-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add PARSE_TIME and FORMAT_TIME functions #7722
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! This will need to be cherry-picked to the 0.20.x-ksqldb
branch.
private final LoadingCache<String, DateTimeFormatter> formatters = | ||
CacheBuilder.newBuilder() | ||
.maximumSize(1000) | ||
.build(CacheLoader.from(DateTimeFormatter::ofPattern)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've been wondering about this cache for a long time and haven't asked. But why do we need it in the time/date/timestamp functions? If a query calls a time UDF with a specific format, then the query will only use 1 format pattern for all rows, won't it? Or if a query calls UDF more than once (one per column) with different formats, doesn't each column have its own instance of FormatTime
which will end up with one single format pattern for all rows?
I haven't checked the above reasoning, but is that the right assumption?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's because this function gets called every time there's a new record, so having a cache prevents it from having to recreate the formatter each time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it is instantiated once per record, then it probably makes sense. But that magic number of 1000 seems too big. We should dig more into this after 0.20. See if we can get rid of that cache or make it hold the exact # of formatters of the row.
return null; | ||
} | ||
try { | ||
final DateTimeFormatter formatter = formatters.get(formatPattern); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If formatPattern
has characters, such as days, months, etc., would they be added to the resulted string?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, an exception gets thrown - I'll add a test for that.
} | ||
try { | ||
final DateTimeFormatter formatter = formatters.get(formatPattern); | ||
return LocalTime.ofNanoOfDay(time.getTime() * 1000000).format(formatter); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return LocalTime.ofNanoOfDay(time.getTime() * 1000000).format(formatter); | |
return LocalTime.ofNanoOfDay(time.getTime() * 1_000_000).format(formatter); |
For easy reading. Perhaps a declaring a constant for this is better? Is this a nano per second value?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Used TimeUnit conversion functions instead.
final DateTimeFormatter formatter = formatters.get(formatPattern); | ||
return new Time(LocalTime.parse(formattedTime, formatter).toNanoOfDay() / 1000000); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same two questions from FormatTime.
- Do we want to allow date characters in the format? I don't think we shoud.
- Can we use a constant variable for the nano per second value?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently, if we parse something like parse_time('2021 05:45', 'yyyy HH:mm)
, then it will parse everything but only return the time component (so in this case, it returns 05:45). It's weird that Local time.parse doesn't throw anything.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a check to reject formats with non-time elements
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
private final LoadingCache<String, DateTimeFormatter> formatters = | ||
CacheBuilder.newBuilder() | ||
.maximumSize(1000) | ||
.build(CacheLoader.from(DateTimeFormatter::ofPattern)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it is instantiated once per record, then it probably makes sense. But that magic number of 1000 seems too big. We should dig more into this after 0.20. See if we can get rid of that cache or make it hold the exact # of formatters of the row.
Description
Enables TIME data for UDFs and adds the
PARSE_TIME
andFORMAT_TIME
functions + docs.Testing done
QTT + unit tests
Reviewer checklist