-
Notifications
You must be signed in to change notification settings - Fork 242
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Qualification tool: Filter based on timestamp in event logs #2947
Conversation
Signed-off-by: Niranjan Artal <[email protected]>
Signed-off-by: Niranjan Artal <[email protected]>
Signed-off-by: Niranjan Artal <[email protected]>
Signed-off-by: Niranjan Artal <[email protected]>
Signed-off-by: Niranjan Artal <[email protected]>
build |
1 similar comment
build |
tools/src/main/scala/org/apache/spark/sql/rapids/tool/AppFilterImpl.scala
Outdated
Show resolved
Hide resolved
(for processing newest 100 event logs). eg: 100-oldest | ||
(for processing oldest 100 event logs). Filesystem | ||
based filtering happens before any application based filtering. | ||
Application based filter-criteria are: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm debating on the names here. I would think most people probably want to filter based on the app time so I wonder if we use the 100-newest to be app time and then make like a 100-newest-filesystem for filesystem time. I hate to change the config but don't think many people probably use it yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I also don't think many people are using this config yet. I am fine changing if you think it makes more sense.
"100-newest-per-app-name" - per application based on app time.
"100-newest" - based on app time.
"100-newest-filesystem" - based on filesystem time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that filtering based on filesystem metadata isn’t likely to be particularly useful (and find(1)
will do that in any case, so it doesn’t necessarily need to be built into a tool).
In the prototype, we had --limit X
(to choose the newest X
logs) and --per-app-limit X
(to choose at most the X
newest logs per application name). We also had a --oldest
flag to override the ordering. I thought this was a pretty sensible UX and in general prefer multiple options to configure multiple aspects of functionality instead of having a string to parse as an argument.
For time-based filtering, there are a couple of interesting cases:
- since a certain (human-readable) time
- between a range of human-readable times
- in the last
k
days (or hours / weeks / months)
I think that a reasonable command-line UX for these would look like this:
--since 7/1/2021
--since 7/1/2021 --before 7/15/2021
--in-last 5d
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @willb .
We do have flag for : the last k
days(mins/hours/weeks/months) - start-app-time . The input is not the date format though. It is : --start-app-time 2d
(past 2days), 3w(past 3 weeks) and so on.
tools/src/main/scala/com/nvidia/spark/rapids/tool/EventLogPathProcessor.scala
Outdated
Show resolved
Hide resolved
tools/src/main/scala/com/nvidia/spark/rapids/tool/qualification/QualificationArgs.scala
Outdated
Show resolved
Hide resolved
tools/src/main/scala/org/apache/spark/sql/rapids/tool/AppFilterImpl.scala
Outdated
Show resolved
Hide resolved
tools/src/main/scala/org/apache/spark/sql/rapids/tool/AppFilterImpl.scala
Outdated
Show resolved
Hide resolved
tools/src/test/scala/com/nvidia/spark/rapids/tool/qualification/AppFilterSuite.scala
Outdated
Show resolved
Hide resolved
tools/src/test/scala/com/nvidia/spark/rapids/tool/qualification/AppFilterSuite.scala
Outdated
Show resolved
Hide resolved
tools/src/test/scala/com/nvidia/spark/rapids/tool/qualification/AppFilterSuite.scala
Outdated
Show resolved
Hide resolved
Signed-off-by: Niranjan Artal <[email protected]>
build |
tools/src/main/scala/com/nvidia/spark/rapids/tool/qualification/QualificationArgs.scala
Outdated
Show resolved
Hide resolved
tools/src/test/scala/com/nvidia/spark/rapids/tool/qualification/AppFilterSuite.scala
Outdated
Show resolved
Hide resolved
tools/src/test/scala/com/nvidia/spark/rapids/tool/qualification/AppFilterSuite.scala
Outdated
Show resolved
Hide resolved
tools/src/test/scala/com/nvidia/spark/rapids/tool/qualification/AppFilterSuite.scala
Outdated
Show resolved
Hide resolved
tools/src/test/scala/com/nvidia/spark/rapids/tool/qualification/AppFilterSuite.scala
Show resolved
Hide resolved
Signed-off-by: Niranjan Artal <[email protected]>
build |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think only minor thing would be missed the comment about changing the help/usage output to put the filesystem based options last since we think they will be used least
Signed-off-by: Niranjan Artal <[email protected]>
build |
User story: