-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] es/query: request_id-based derivation tasks statistics #187
Open
mgolosova
wants to merge
8
commits into
master
Choose a base branch
from
es-htag-deriv-stat
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
The query gets information: * aggregated by output data formats; * within formats -- aggregated by task status; * for each format+status bucket: * total number of input events; * total size of input datasets; * total number of output events; * total size of output datasets; * average task walltime; * estimated total cpu time.
It is said that for derivation tasks it is more common to look for tasks with given request ID than with given hashtag(s).
ES aggregation "terms" returns by default only first 10 buckets; to get others, "size" should be specified.
Field "data_format" of output dataset is artificially extended with "general" format: "DAOD_EXOT12" is turned to ["DAOD", "DAOD_EXOT12"] (see PR #102, commit 8c5ca49). For given task it is not that good: we have extra format "DAOD", that does not fit any specific datatset yet fits all the "DAOD_*" datasets. To bypass this issue, list of data formats can be taken from tasks metadata ("output_formats" field).
Somehow there are tasks with `start_time` > `end_time` in ProdSys2 DB, so we have to check it explicitly to have the correct result.
mgolosova
changed the title
[WIP] Hashtag-based derivation tasks statistics
[WIP] Request-based derivation tasks statistics
Jan 11, 2019
mgolosova
changed the title
[WIP] Request-based derivation tasks statistics
[WIP] es/query: request_id-based derivation tasks statistics
Jan 23, 2019
[WIP] status is due to the fact that we still don`t know if the query does what it was made for. |
Initially it was supposed that the query main parameter is hashtag (or list of hashtags), but later it was changed to Request ID.
NOTE: output sample will be updated later, when data in ES are ready.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Added query to get hashtag-based derivation tasks statistics.
The query gets information:
ToDo
walltime
value (see q.json and deriv-stats.json);