Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NanoAOD prototype [RFC] #20626

Merged
merged 46 commits into from
Oct 11, 2017
Merged

NanoAOD prototype [RFC] #20626

merged 46 commits into from
Oct 11, 2017

Conversation

gpetruc
Copy link
Contributor

@gpetruc gpetruc commented Sep 22, 2017

Rebased #20563, removed data files, relocated dataformats, added first cmsDriver integration (but not runTheMatrix nor DQM yet).
Github doesn't allow me to re-open that PR because the branch was force-pushed, so I had to make a new one.

Some files are still to be relocated from PhysicsTools/NanoAOD to PhysicsTools/PatAlgos (or PatUtils)
Comment #20563 (comment) from @lgray also not implemented yet, it will probably come later in the follow-up of other multithreaded performance issues in https://hypernews.cern.ch/HyperNews/CMS/get/edmFramework/3791.html

@arizzi @peruzzim @emanueledimarco

@cmsbuild
Copy link
Contributor

The code-checks are being triggered in jenkins.

@cmsbuild
Copy link
Contributor

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/PR-20626/907

@cmsbuild
Copy link
Contributor

A new Pull Request was created by @gpetruc (Giovanni Petrucciani) for master.

It involves the following packages:

Configuration/Applications
DataFormats/NanoAOD
DataFormats/PatCandidates
PhysicsTools/NanoAOD
RecoEgamma/PhotonIdentification

The following packages do not have a category, yet:

DataFormats/NanoAOD
PhysicsTools/NanoAOD
Please create a PR for https://github.com/cms-sw/cms-bot/blob/master/categories.py to assign category

@perrotta, @monttj, @cmsbuild, @franzoni, @slava77, @davidlange6 can you please review it and eventually sign? Thanks.
@ghellwig, @Sam-Harper, @jainshilpi, @rovere, @lgray, @Martin-Grunewald, @gouskos, @cbernet this is something you requested to watch as well.
@davidlange6, @slava77 you are the release manager for this.

cms-bot commands are listed here

@slava77
Copy link
Contributor

slava77 commented Sep 22, 2017 via email

@cmsbuild
Copy link
Contributor

cmsbuild commented Oct 6, 2017

The tests are being triggered in jenkins.
https://cmssdt.cern.ch/jenkins/job/ib-any-integration/23490/console Started: 2017/10/06 11:19

@cmsbuild
Copy link
Contributor

cmsbuild commented Oct 6, 2017

@cmsbuild
Copy link
Contributor

cmsbuild commented Oct 6, 2017

Comparison job queued.

@cmsbuild
Copy link
Contributor

cmsbuild commented Oct 6, 2017

Comparison is ready
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-20626/23490/summary.html

Comparison Summary:

  • No significant changes to the logs found
  • Reco comparison results: 0 differences found in the comparisons
  • DQMHistoTests: Total files compared: 26
  • DQMHistoTests: Total histograms compared: 2761427
  • DQMHistoTests: Total failures: 246
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 2760994
  • DQMHistoTests: Total skipped: 187
  • DQMHistoTests: Total Missing objects: 0
  • Checked 107 log files, 10 edm output root files, 26 DQM output files

@@ -150,7 +150,8 @@ float pat::PackedCandidate::dxy(const Point &p) const {
float pat::PackedCandidate::dz(const Point &p) const {
maybeUnpackBoth();
const float phi = float(p4_.load()->Phi())+dphi_;
return (vertex_.load()->Z()-p.Z()) - ((vertex_.load()->X()-p.X()) * std::cos(phi) + (vertex_.load()->Y()-p.Y()) * std::sin(phi)) * p4_.load()->Pz()/p4_.load()->Pt();
const float pzpt = deta_ ? std::sinh(etaAtVtx()) : p4_.load()->Pz()/p4_.load()->Pt();
return (vertex_.load()->Z()-p.Z()) - ((vertex_.load()->X()-p.X()) * std::cos(phi) + (vertex_.load()->Y()-p.Y()) * std::sin(phi)) * pzpt;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changes in this file are merged already as a part of #20491
it looks like in this PR 60f6f31 comes before 852157f while in #20491 the order is reversed and there is also 5db31d6 in between, which is not also in this PR.
Some cleanup of the history would be nice.

@@ -42,13 +42,13 @@ class PhotonIDValueMapProducer : public edm::stream::EDProducer<> {
public:

explicit PhotonIDValueMapProducer(const edm::ParameterSet&);
~PhotonIDValueMapProducer();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changes in this file are in the release already (the same commit IDs and in the same order).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so?

@gpetruc
Copy link
Contributor Author

gpetruc commented Oct 9, 2017

Hi Slava,

#20626 (and #20563 before it) was done on top of #20491 so it's no surprise that you find the same commits in the two; if you see them in the same order or you see other commits in between it's just the github display that doesn't do the sorting properly when mapping a graph into a 1d timeline (since it's the same commits and not copies, by definition they are in the same order in the history graph, as it's the same graph)

New developments have been already done by many people on top of this PR during, so I'll not rebase it as that would mess up the history of all the rest remaining developments.

@slava77
Copy link
Contributor

slava77 commented Oct 9, 2017 via email

@arizzi
Copy link
Contributor

arizzi commented Oct 9, 2017

dear all,
given that

  • this is PR is large and it may take a lot of time to fully review the code
  • it doesn't by design (and by tests results) affect any existing workflow
  • all technical tests are ok
  • the code is far from final, being a prototype, but is nevertheless urgently needed in a release
  • we have a growing pipeline of developments (the development is very active being a prototype)

I propose that we proceed as follow:

  • merge now
  • open github issues on anything that you think should be improved with asignment to me&giovanni and we guarantee that we will address them all in the very next PRs

isInit_(false)
{
auto vwp = iConfig.getParameter<std::vector<std::string>>("WorkingPoints");
for (auto wp : vwp) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

auto const &

lheVpt = std::hypot( pup[v.first][0] + pup[v.second][0], pup[v.first][1] + pup[v.second][1] );
}

out.addColumnValue<uint8_t>("Njets", lheNj, "Number of jets (partons) at LHE step", nanoaod::FlatTable::UInt8Column);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it doesn't look normal that every event we have to specify types and documentation description string.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's an overhead but it gives us the flexibility we need (eg allowing the schema to depend on the configuration) while keeping the output module generic and the code simple (avoid percolating stuff through provenance or run/lumi branches).
we may re-engineer things in the future if the overhead becomes large (now i/o and other physics object code dominate the costs)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not just the overhead: this implementation apparently allows changes in format event by event, which sounds dangerous.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the nanoaod format is by design less structured.
We made some design choice balancing risks and benefits. We would be happy to discuss this design in any of the XPOG meeting where the NanoAOD format is discussed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what this response means.
Does the Nano format support event-by-event changes in branch/table content?
Please clarify.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in some cases (not in the one above) it does. E.g. HLT bits are allowed to change from one event to another and proper (back)filling of the branches is implemented.
As for the rest, while it is not meant to change event by event, there is no "configuration" of the event content specified explicitelly anywhere (not at compile time in the DataFormats, not explicitely at config time).
The configuration of the branches to write is configured basically at first event.

I do not think this discussion belong to a github PR. If you want to understand why we designed it so, we can have a chat on this topic.

reader_.AddVariable(v,(&values_.front())+i);
i++;
}
// reader_.BookMVA(name_,iConfig.getParameter<edm::FileInPath>("weightFile").fullPath() );
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this commented out code needed?
Please remove or add comments inline in the code why the commented out block is relevant

@@ -0,0 +1,229 @@
// -*- C++ -*-
//
// Package: PhysicsTools/NanoAOD
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this doesn't match the file location.
full path given in the line with \class doesn't match either.

@slava77
Copy link
Contributor

slava77 commented Oct 9, 2017

+1

for #20626 144f8b9

  • mostly nano-AOD package/algorithm developments; several helper tools relevant for PAT object manipulation were added to PhysicsTools/PatAlgos
  • jenkins tests pass (the unit test with a nanoAOD workflow is running OK)
  • local test on 3K events with ttbar PU shows somewhat reasonable technical performance
    • output size is about 3.6 kiB, roughly as expected
    • memory peak with one thread is about 1.1 GiB and about 1.3 GiB with 8 threads
    • CPU efficiency with 8 threads is ~60%
    • CPU use per event is ~0.1 s/event (compared to about 1.6 s/event on the same machine to make miniAOD or about 17 s/event to make RECO/AOD)
      • about 50% of the CPU is taken by photonIDValueMapProducer. Some optimization here could help to speedup the full workflow.

commented out code noted in PhysicsTools/PatAlgos can be cleaned up in a follow up PR

@arizzi
Copy link
Contributor

arizzi commented Oct 11, 2017

@davidlange6 any chance merging this before the next IB ?

@davidlange6
Copy link
Contributor

merge

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants