-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Correlate entries in monit_prod_cmssw_pop_* with those in monit_prod_condor_raw_metric* #36351
Comments
A new Issue was created by @joseflix Josep Flix, PhD. @Dr15Jones, @perrotta, @dpiparo, @makortel, @smuzaffar, @qliphy can you please review it and eventually sign/assign? Thanks. cms-bot commands are listed here |
assign core |
New categories assigned: core @Dr15Jones,@smuzaffar,@makortel you have been requested to review this Pull request/Issue and eventually sign? Thanks |
If I understood correctly, the (ref
The next step would be to look if a common identifier could be easily delivered to both. |
Hi Matti,
Best, |
InputSource has a GUID that could be used for this, but I have not looked to see how accessible it is from the monitoring routines: cmssw/FWCore/Framework/interface/InputSource.h Lines 194 to 195 in ad04da7
|
Thanks @dan131riley. Just thinking out loud, one (probably not very good) option could be to make the Likely a better place would be cmssw/FWCore/Framework/src/EventProcessor.cc Lines 181 to 195 in ad04da7
Then the GUID would at least be propagated for all Source types. I wouldn't really want FWCore to gain direct dependence on StatisticsSenderService (i.e. Utilities/StorageFactory ), and considering also CondorStatusService makes me think of adding a new signal to ActivityRegistry for this. I also thought about using postSourceConstructionSignal_ for this, but this information wouldn't play well with "sending signal even with exception".
|
@joseflix (Mostly out of curiosity, I'd like to understand the bigger picture better) The framework job report (that I think get uploaded into the monitoring system) reports the processed files, and wall clock and CPU times (for the CPU efficiency calculation). Is this information not sufficient? (like something missing, or would like information already when the jobs are running instead after the fact) |
#36570 adds a "process-level GUID" that is used in @joseflix The |
Hi @makortel I guess what you are proposing in the last message is fine. We need a guid that is the same in the two views, so we can later correlate entries that appear in the two views. Concerning the FJR, I think they don't go to kibana, maybe they go somewhere and they can be parsed, but I have no clues on it. |
Each CMSSW job reports popularity entries in monit_prod_cmssw_pop_.
Also, each job reports utilization and other things in monit_prod_condor_raw_metric.
There is no ID to be used as unique identifier between these entities, and this is needed. For example, if a job was accessing a file X from local/remote storage, we want to know the CPU efficiency degradation.
Files are listed in monit_prod_cmssw_pop_* and CPU utilization in monit_prod_condor_raw_metric*. No way to correlate this.
Is it very difficult to add an ID that will help people to join both sources of information?
Thanks
The text was updated successfully, but these errors were encountered: