-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
EventSetup Records with large payloads #33436
Comments
A new Issue was created by @makortel Matti Kortelainen. @Dr15Jones, @dpiparo, @silviodonato, @smuzaffar, @makortel, @qliphy can you please review it and eventually sign/assign? Thanks. cms-bot commands are listed here |
assign core, alca |
New categories assigned: core,alca @Dr15Jones,@smuzaffar,@christopheralanwest,@tlampen,@pohsun,@yuanchao,@makortel,@francescobrivio,@malbouis you have been requested to review this Pull request/Issue and eventually sign? Thanks |
@cms-sw/alca-l2 I can quickly think of
as Records that can have large payloads, could you comment if this is correct and what other such Records we have? (let's say "large" is more than 10 MB) |
assign db I think that @ggovi is probably the best person to answer this question. |
New categories assigned: db @ggovi you have been requested to review this Pull request/Issue and eventually sign? Thanks |
these are not even close to the absolute largest which is the per-pixel Gain Calibration used for offline reconstruction.
you can find the complete list here: |
Thanks Marco for this prompt answer. We need then to identify the threshold. How will the exclusion list be implemented? Hard-coded or configurable? |
An other possibility is to avoid at all to keep the payloads in memory, given that they are all cached permanently in frontier... |
Thanks @mmusich! Does the list contain only the payloads in the CondDB? The EventSetup products created within CMSSW contribute to the memory requirement too. Does anyone have any hunch on those, or they have to be looked for with a profiler? |
Simplest way is to disable the concurrent IOV support for the relevant Records in the C++ code along cmssw/FWCore/Framework/test/Dummy2Record.h Lines 13 to 16 in 53993f8
The level of concurrency can also be set in the configuration per Record (for those for which the concurrency is not disabled) along process.options.eventSetup = cms.untracked.PSet(
numberOfConcurrentIOVs = cms.untracked.uint32(2), # default concurrency
forceNumberOfConcurrentIOVs = cms.untracked.PSet(
SiPixelGainCalibrationOfflineRcd = cms.untracked.uint32(1),
EcalPulseCovariancesRcd = cms.untracked.uint32(1),
...
)
) I would believe the hardcoding to be good-enough to get most threading efficiency benefits (also I can't think of a natural place for a configuration that would automatically propagate to all applications). |
I probably misunderstood, but I believe asking the payloads from the Frontier on each event would (significantly) decrease the event processing throughput. |
correct
I am wondering if it would be possible to get the (non persisted) records data modifying this? |
If I got it right (from condDbBrowser), the largest payload with non-Run IOV is Actually, how easy would it be to get a list of tags that have non-run IOVs? |
straightforward.
for the record one can get it with: import CondCore.Utilities.conddblib as conddb
con = conddb.connect(url = conddb.make_url("pro"))
session = con.session()
IOV = session.get_dbtype(conddb.IOV)
TAG = session.get_dbtype(conddb.Tag)
GT = session.get_dbtype(conddb.GlobalTag)
GTMAP = session.get_dbtype(conddb.GlobalTagMap)
RUNINFO = session.get_dbtype(conddb.RunInfo)
GTMap = session.query(GTMAP.record, GTMAP.label, GTMAP.tag_name).\
filter(GTMAP.global_tag_name == "112X_dataRun3_Prompt_v5").\
order_by(GTMAP.record, GTMAP.label).\
all()
print "| Record | Label |Tag |Time Type |Syncronization|"
print "| -------| ------|----|----------|--------------|"
for element in GTMap:
Record = element[0]
Label = element[1]
Tag = element[2]
TagInfo = session.query(TAG.synchronization,TAG.time_type).filter(TAG.name == Tag).all()[0]
if(TagInfo[1]!="Run"):
print "|",Record,"|",Label,"|",Tag,"|",TagInfo[1],"|",TagInfo[0],"|" |
Thanks @mmusich! Correlating those to your earlier list gives
so ~1.9 MB in total. That alone sounds something I'd expect us to live with (i.e. at most 2 MB memory increase per job during any IOV transition period). This number still misses all the ESProducts constructed within the job, but I'd imagine even factor of 10 increase to be tolerable. |
Given that the largest possible increase from DB payloads would be around 2 MB, and that the transient ESProducts in non-Run IOV records are unlikely (many) magnitudes larger, we could enable concurrent IOVs by default (when concurrent lumis are enabled), and deal with possible problems if they arise. |
+1 |
I was supposed to do this at the same time as cms-sw#35302 that followed cms-sw#34231 and the conclusion in cms-sw#33436
Enabling concurrent IOVs has a risk to increase memory usage, because the payloads for all active IOVs need to be kept in memory as long as events from those IOVs are being processed. One way to limit this memory increase is to disable concurrency for EventSetup Records that have large payloads (and hopefully have long IOVs). The purpose of this issue is to identify such Records.
The text was updated successfully, but these errors were encountered: