Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[PPL-Lang]support earliest/latest date-time functions #957

Open
YANG-DB opened this issue Nov 28, 2024 · 8 comments
Open

[PPL-Lang]support earliest/latest date-time functions #957

YANG-DB opened this issue Nov 28, 2024 · 8 comments
Labels
enhancement New feature or request Lang:PPL Pipe Processing Language support

Comments

@YANG-DB
Copy link
Member

YANG-DB commented Nov 28, 2024

Is your feature request related to a problem?
As a PPL query syntax for easily supporting predefined simple time ranges such as:
earliest week, latest day, latest month, earliest hour.

Supporting for the predefined time units:

  • second - s
  • minute - m
  • hour - h
  • day - d
  • week - w
  • month - mnt
  • quarter - qrt
  • year - yr

What solution would you like?
source = logs | where status > 200 AND (earliest=-24h AND latest<@d)
source = logs | where status > 200 AND (earliest=-5d@w1 AND latest=@w6)
source = logs | where status > 200 AND (earliest='2023-11-15:20:00:00' AND latest='2023-11-22:20:00:00')

@YANG-DB YANG-DB added enhancement New feature or request untriaged Lang:PPL Pipe Processing Language support labels Nov 28, 2024
@YANG-DB YANG-DB moved this to Todo in PPL Commands Nov 29, 2024
@andy-k-improving
Copy link
Contributor

@YANG-DB wonder do you have more documentations to provide as reference?

If I understand this correct, earliest and latest itself will be served as an identifier, which will be evaluated and substituted with the earliest and the latest value from a particular timestamp column on the dataset.

If that is case, should the function also ask for the timestamp column as argument for the computation? (Assume there are multiple timestamp columns appear on the dataset).

Also what's the meaning of the @ sign from the first usage?
Thanks,

@YANG-DB
Copy link
Member Author

YANG-DB commented Dec 3, 2024

@YANG-DB wonder do you have more documentations to provide as reference?

If I understand this correct, earliest and latest itself will be served as an identifier, which will be evaluated and substituted with the earliest and the latest value from a particular timestamp column on the dataset.

If that is case, should the function also ask for the timestamp column as argument for the computation? (Assume there are multiple timestamp columns appear on the dataset).

Also what's the meaning of the @ sign from the first usage? Thanks,

Hi

  1. You are correct that we need an explicit reference to the timestamp column
  2. IMO the format should be as follows: earliest=[+|-] <time-value(integer)><time-unit(s/m/h/w...)>@<time-unit(s/m/h/w...)>
    so in this example (earliest=-24h AND latest<@d) the earliest would be the last 24 hours within the last day ( @d)

@YANG-DB YANG-DB moved this from Todo to Design in PPL Commands Dec 5, 2024
@andy-k-improving
Copy link
Contributor

@YANG-DB To clarify on the reference syntax and usage, will this be similar to the Splunk counterpart?
https://docs.splunk.com/Documentation/SCS/current/Search/Timemodifiers

@YANG-DB
Copy link
Member Author

YANG-DB commented Dec 6, 2024

@YANG-DB To clarify on the reference syntax and usage, will this be similar to the Splunk counterpart?
https://docs.splunk.com/Documentation/SCS/current/Search/Timemodifiers

Yes in general

@currantw
Copy link

@YANG-DB. I've taken a look through the above. It looks like a first step would be to add support for converting relative time strings to the datetime data type. I believe that this would sufficient, along with the existing comparison operators, to implement the desired functionality. If you also want to better align the syntax with Splunk, we could then add new functions earliest and latest to facilitate this? Let me know what you think!

Add Support for Relative Time

Add support for automatically converting relative time strings to the datetime data type, and then use the existing comparison operators.

 source = logs | field_name >= '-24h' AND field_name <= '<@d'
 source = logs | field_name >= '-5d@w1' AND field_name <= '@w6'
 source = logs | field_name >= '2023-11-15:20:00:00' AND field_name <= '2023-11-22:20:00:00'

New Functions

Add new functions earliest and latest that take a field name and a dateime time string. This option will be closest to the syntax used in Splunk (but would duplicate some of the functionality already provided by the comparison operators).

 source = logs | earliest(field_name, '-24h') AND latest(field_name, '<@d')
 source = logs | earliest(field_name, '-5d@w1') AND latest(field_name, '@w6')
 source = logs | earliest(field_name, '2023-11-15:20:00:00') AND latest(field_name, '2023-11-22:20:00:00')

@YANG-DB
Copy link
Member Author

YANG-DB commented Dec 11, 2024

@YANG-DB. I've taken a look through the above. It looks like a first step would be to add support for converting relative time strings to the datetime data type. I believe that this would sufficient, along with the existing comparison operators, to implement the desired functionality. If you also want to better align the syntax with Splunk, we could then add new functions earliest and latest to facilitate this? Let me know what you think!

Add Support for Relative Time

Add support for automatically converting relative time strings to the datetime data type, and then use the existing comparison operators.

 source = logs | field_name >= '-24h' AND field_name <= '<@d'
 source = logs | field_name >= '-5d@w1' AND field_name <= '@w6'
 source = logs | field_name >= '2023-11-15:20:00:00' AND field_name <= '2023-11-22:20:00:00'

New Functions

Add new functions earliest and latest that take a field name and a dateime time string. This option will be closest to the syntax used in Splunk (but would duplicate some of the functionality already provided by the comparison operators).

 source = logs | earliest(field_name, '-24h') AND latest(field_name, '<@d')
 source = logs | earliest(field_name, '-5d@w1') AND latest(field_name, '@w6')
 source = logs | earliest(field_name, '2023-11-15:20:00:00') AND latest(field_name, '2023-11-22:20:00:00')

Yes sounds good to me
Also yes for earliest and latest func

@YANG-DB YANG-DB moved this from Design to In Progress in PPL Commands Dec 12, 2024
@currantw
Copy link

Yes sounds good to me Also yes for earliest and latest func

Great, thanks @YANG-DB. I have raised #991 for adding support for relative date-times. Once that is completed, I'll come back and add the earliest/latest functions themselves.

@dblock
Copy link
Member

dblock commented Dec 16, 2024

[Catch All Triage - 1, 2, 3]

@YANG-DB Can you please help make sure this repo is triaged as part of a team triage weekly? Thanks.

@dblock dblock removed the untriaged label Dec 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request Lang:PPL Pipe Processing Language support
Projects
Status: In Progress
Development

No branches or pull requests

4 participants