Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[HLT/Patatrack] Duplicate Output Selection Two (or more) equivalent branches have been selected for output. #37207

Open
silviodonato opened this issue Mar 11, 2022 · 21 comments

Comments

@silviodonato
Copy link
Contributor

cmsrel CMSSW_12_3_0_pre6
cd CMSSW_12_3_0_pre6/src/
cmsenv
hltGetConfiguration /dev/CMSSW_12_3_0/GRun    --globaltag auto:run3_hlt    --data    --unprescale    --eras Run2_2018    --process MYHLT    --full    --offline    --max-events 8    --output full    --customise HLTrigger/Configuration/customizeHLTforCMSSW.customiseFor2018Input,HLTrigger/Configuration/customizeHLTforPatatrack.customizeHLTforPatatrack    --input file:/eos/cms/store/data/Run2018D/EphemeralHLTPhysics2/RAW/v1/000/323/775/00000/17ADD12B-52E2-8C4C-B375-8AF943A24212.root      > hltData.py
CUDA_VISIBLE_DEVICES= cmsRun hltData.py

%MSG-i ThreadStreamSetup:  (NoModuleName) 11-Mar-2022 09:54:44 CET pre-events
setting # threads 4
setting # streams 4
%MSG
%MSG-i CUDAService:  (NoModuleName) 11-Mar-2022 09:54:47 CET pre-events
CUDAService disabled by configuration
%MSG
11-Mar-2022 09:54:56 CET  Initiating request to open file file:/eos/cms/store/data/Run2018D/EphemeralHLTPhysics2/RAW/v1/000/323/775/00000/17ADD12B-52E2-8C4C-B375-8AF943A24212.root
11-Mar-2022 09:54:57 CET  Successfully opened file file:/eos/cms/store/data/Run2018D/EphemeralHLTPhysics2/RAW/v1/000/323/775/00000/17ADD12B-52E2-8C4C-B375-8AF943A24212.root
2022-03-11 09:55:18.578515: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  SSE4.1 SSE4.2 AVX AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
11-Mar-2022 09:55:24 CET  Closed file file:/eos/cms/store/data/Run2018D/EphemeralHLTPhysics2/RAW/v1/000/323/775/00000/17ADD12B-52E2-8C4C-B375-8AF943A24212.root
----- Begin Fatal Exception 11-Mar-2022 09:55:24 CET-----------------------
An exception of category 'Configuration' occurred while
   [0] Constructing the EventProcessor
Exception Message:
Duplicate Output Selection Two (or more) equivalent branches have been selected for output.
#1: BranchKey(SiPixelRecHitedmNewDetSetVector, hltSiPixelRecHits, , MYHLT)
#2: BranchKey(SiPixelRecHitedmNewDetSetVector, hltSiPixelRecHitSoA, , MYHLT)
Please drop at least one of them.
----- End Fatal Exception -------------------------------------------------

The error is self-explaining and related to the keep *.
Not sure what is the best way to avoid this. Perhaps adding ad drop of all the EDAlias (or of the input used in the EDAlias)

@silviodonato
Copy link
Contributor Author

assign hlt
@fwyzard

@cmsbuild
Copy link
Contributor

A new Issue was created by @silviodonato Silvio Donato.

@Dr15Jones, @perrotta, @dpiparo, @makortel, @smuzaffar, @qliphy can you please review it and eventually sign/assign? Thanks.

cms-bot commands are listed here

@silviodonato
Copy link
Contributor Author

(I cannot anymore assign the issue)
@makortel @Martin-Grunewald @missirol

@fwyzard
Copy link
Contributor

fwyzard commented Mar 11, 2022

assign core

@fwyzard
Copy link
Contributor

fwyzard commented Mar 11, 2022

assign hlt

@fwyzard
Copy link
Contributor

fwyzard commented Mar 11, 2022

assign heterogeneous

@cmsbuild
Copy link
Contributor

New categories assigned: heterogeneous,core,hlt

@missirol,@fwyzard,@Dr15Jones,@smuzaffar,@makortel,@makortel,@Martin-Grunewald you have been requested to review this Pull request/Issue and eventually sign? Thanks

@silviodonato silviodonato changed the title [HLT/Patrack] Duplicate Output Selection Two (or more) equivalent branches have been selected for output. [HLT/Patatrack] Duplicate Output Selection Two (or more) equivalent branches have been selected for output. Mar 11, 2022
@fwyzard
Copy link
Contributor

fwyzard commented Mar 11, 2022

The issue is due to

        # SwitchProducer wrapping the legacy pixel rechit producer or the transfer of the pixel rechits to the host and the conversion from SoA
        process.hltSiPixelRecHits = SwitchProducerCUDA(
            # legacy producer
            cpu = cms.EDAlias(
                hltSiPixelRecHitSoA = cms.VPSet(
                    cms.PSet(type = cms.string("SiPixelRecHitedmNewDetSetVector")),
                    cms.PSet(type = cms.string("uintAsHostProduct"))
                )
            ),
            # conversion from SoA to legacy format
            cuda = ...
        )

right ?

@silviodonato
Copy link
Contributor Author

The issue is due to

        # SwitchProducer wrapping the legacy pixel rechit producer or the transfer of the pixel rechits to the host and the conversion from SoA
        process.hltSiPixelRecHits = SwitchProducerCUDA(
            # legacy producer
            cpu = cms.EDAlias(
                hltSiPixelRecHitSoA = cms.VPSet(
                    cms.PSet(type = cms.string("SiPixelRecHitedmNewDetSetVector")),
                    cms.PSet(type = cms.string("uintAsHostProduct"))
                )
            ),
            # conversion from SoA to legacy format
            cuda = ...
        )

right ?

yes, I think so.
Adding drop *_hltSiPixelRecHitSoA_*_*
the error disappeared

@fwyzard
Copy link
Contributor

fwyzard commented Mar 11, 2022

Looks like a problem of possibly more general interest, though.

If we put the SwitchProducer branches "inline", like

product = SwitchProducer(
    cpu = cms.EDProducer("CPUProducer", ...),
    acc = cms.EDProducer("AccProducer", ...)
)

then

  • we can do "keep *" to keep only the one that runs
  • cannot keep both branches, because the OutputModules do not accept "keep *_product@cpu_*_*", "keep *_product@acc_*_*".

If we put the SwitchProducer branches "out of line" and use EDAliases, like

productCpu = cms.EDProducer("CPUProducer", ...)

productAcc = cms.EDProducer("AccProducer", ...)

product = SwitchProducer(
    cpu = cms.EDAlias("productCpu"),
    acc = cms.EDAlias("productAcc")
)

then

  • we cannot do "keep *" (this issue), and we probably do not want to
  • we can use "keep *_product_*_*" to keep only the one that runs
  • we can use "keep *_productCpu_*_*", "keep *_productAcc_*_*" to run and keep both

@silviodonato
Copy link
Contributor Author

Thanks a lot @fwyzard for the clarification.
I wanted to save the hltPixelTracks, so I used

hltGetConfiguration /users/sdonato/GPUtest/Tau/HLT/V3 --globaltag auto:run3_hlt --data --eras Run2_2018 --max-events 10 --input file:aaa.root --output minimal --customise HLTrigger/Configuration/customizeHLTforCMSSW.customiseFor2018Input,HLTrigger/Configuration/customizeHLTforPatatrack.customizeHLTforPatatrackTriplets --open

and I added

        'keep *_hltPixelTracks_*_*',

to process.hltOutputMinimal.

The tracks are not visible (both running with GPU and with CPU)

Events->Scan("[email protected]()")
************************
*    Row   * recoTrack *
************************
*        0 *         0 *
*        1 *         0 *
*        2 *         0 *
*        3 *         0 *
*        4 *         0 *
*        5 *         0 *
*        6 *         0 *
*        7 *         0 *
*        8 *         0 *
*        9 *         0 *
************************

If I drop the Patatrack customization function, everything looks ok

root [2] Events->Scan("[email protected]()")
************************
*    Row   * recoTrack *
************************
*        0 *      2277 *
*        1 *         0 *
*        2 *      2716 *
*        3 *      2398 *
*        4 *      3702 *
*        5 *      1700 *
*        6 *      3209 *
*        7 *      6061 *
*        8 *      2488 *
*        9 *      2783 *
************************

@fwyzard
Copy link
Contributor

fwyzard commented Mar 11, 2022

That's weird !?

hltPixelTracks is a standard EDProducer, not a SwitchProducer or an EDAlias, and the product should be in the legacy format.

@makortel @Dr15Jones is there anythng that would prevent storing the collection in the root file ?
For example, the fact that its ancestors are transient ?

@silviodonato
Copy link
Contributor Author

It seems that the problem is not related to SwitchProducer but with Task.
In /afs/cern.ch/work/s/sdonato/public/HLT_Task_noPixelTracks
you can find two configuration: hltNew_dump_ok.py and hltNew_dump_ok.py.

The first configuration is ok, while the second one shows no pixel tracks.

The difference is just:

+process.HLTRecopixelvertexingTask = cms.Task(process.HLTRecoPixelTracksTask, process.hltPixelVertices, process.hltPixelVerticesSoA, process.hltTrimmedPixelVertices)
-process.HLTRecopixelvertexingSequence = cms.Sequence(process.hltPixelTracksFitter+process.hltPixelTracksFilter+process.HLTRecoPixelTracksSequence+process.hltPixelVerticesSoA+process.hltPixelVertices+process.hltTrimmedPixelVertices)
+process.HLTRecopixelvertexingSequence = cms.Sequence(process.hltPixelTracksFitter+process.hltPixelTracksFilter, process.HLTRecopixelvertexingTask)

From the log file I see these changes:

  • hltPixelTracks: 10 executions -> 0 executions
  • hltPixelVerticesSoA: 10executions -> 0 executions
  • hltPixelVertices: 10 executions -> module disappeared (ie. not scheduled?)
  • hltTrimmedPixelVertices: 10 executions -> module disappeared (ie. not scheduled?)

It sounds like the well-known problem of a module which is explicitly both in a Sequence and in a Task, but I don't find any of them

@silviodonato
Copy link
Contributor Author

ahh, probably I've understood it.
The reason is that the module

process.hltL2TauTagNNProducer = cms.EDProducer("L2TauNNProducer",
   [...]
    pataTracks = cms.InputTag("hltPixelTracksSoA"),
    pataVertices = cms.InputTag("hltPixelVerticesSoA"),
)

takes directly the hltPixelTracksSoA and hltPixelVerticesSoA, that is the reason why the other modules are not scheduled!

@silviodonato
Copy link
Contributor Author

Yes, sorry for the noise, my mistake.
Let's keep focused on the issue summarized by Andrea ( #37207 (comment) )

@Dr15Jones
Copy link
Contributor

@silviodonato what do you get when you do

Events->Scan("recoTracks_hltPixelTracks__HLTX.present")

That is what is actually used by the framework to determine if something was stored. Else we can't tell if the data product was stored or it was just empty.

@Dr15Jones
Copy link
Contributor

Also, is the OutputModule on an EndPath or a FinalPath? If on a FinalPath that means data products consumed by the OutputModule will not be prefetched (i.e. will not cause unscheduled execution).

@makortel
Copy link
Contributor

Also, is the OutputModule on an EndPath or a FinalPath?

I checked the two configurations in /afs/cern.ch/work/s/sdonato/public/HLT_Task_noPixelTracks and the OutputModule was in FinalPath in both cases.

About

Let's keep focused on the issue summarized by Andrea ( #37207 (comment) )

could you elaborate the problem? Is it just about having to adjust the keep/drop statements?

@fwyzard
Copy link
Contributor

fwyzard commented Mar 11, 2022

One problem is that this approach does not allow us to keep both "branches" (for example, to send them from the HLT jobs to the DQM jobs and perform an online validation):

product = SwitchProducer(
    cpu = cms.EDProducer("CPUProducer", ...),
    acc = cms.EDProducer("AccProducer", ...)
)

and I'm not sure if the other approach has any downsides or not:

productCpu = cms.EDProducer("CPUProducer", ...)

productAcc = cms.EDProducer("AccProducer", ...)

product = SwitchProducer(
    cpu = cms.EDAlias("productCpu"),
    acc = cms.EDAlias("productAcc")
)

In particular, I'm concerned about what would happen if we use productCpu or productAcc directly in the reconstruction, while also using (or keeping) the product alias.

@makortel
Copy link
Contributor

The design goals included

  • the consumers of a SwitchProducer do not care which of the case-EDProducers get run
    • product@cpu etc being available for consumers in the same process is accidental
  • allow running CPU/CUDA/etc versions, but those need to be explicit

In the first case, even if the "inline products" of SwitchProducer were allowed to be persisted, they couldn't be stored at the same time with the SwitchProducer "output product" for the same "duplicate branch" reason as the with the second case.

(for example, to send them from the HLT jobs to the DQM jobs and perform an online validation)

For this use case, would the aim be to

  • let the SwitchProducer control the running of the case EDProducer, and store the cpu/acc versions just as they exist
  • run both cpu and acc versions on each Event
  • use other Path(s) to control how often the cpu and acc versions get run for the output
  • something else?

In particular, I'm concerned about what would happen if we use productCpu or productAcc directly in the reconstruction, while also using (or keeping) the product alias.

Currently with Tasks each consumer of productCpu/productAcc will cause the consumed EDProducer to be run, and each consumer of product will cause the chosen case of the two to be run. With ConditionalTask (#36938), only the consumers in the same Path as the ConditionalTask would cause the EDProducers to run.

@silviodonato
Copy link
Contributor Author

#37359

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants