-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make validate_histos.py from the IRIS-HEP repo pass on RDF's output histograms #28
Comments
@andriiknu let's use this issue to track the state of the agreement between the histograms produced by the RDF implementation and the reference implementation. At the moment we are implementing v1 so I guess the references are the ones here: https://github.com/iris-hep/analysis-grand-challenge/tree/v1.1.0/analyses/cms-open-data-ttbar/reference (tag v1.1.0 of the reference implementation). Can you summarize what the current state is? |
I guess the important underlying question is: do you think the current agreement (and the current state of the implementation) is good enough that we can tag a v1? |
I just checked for the current version of main (6c7e045). It turns out that the only mismatches are caused by:
The IRIS-HEP implementation should tag a v1.2 version soon that solves these issues. Then our outputs should pass the validation script, as far as I can tell. I'll leave this open until then. One question remaining (for @andriiknu and @alexander-held): the RDF implementation does not produce histograms 4j1b_pseudodata and 4j2b_pseudodata: should it? |
I would say we do not explicitly require those histograms to be written out in the analysis task. They are needed for the statistical inference stage, and https://agc.readthedocs.io/en/latest/taskbackground.html#statistical-model describes how to obtain them, but they are built from other histograms (which by themselves are required). In that sense I would argue that it is left up to the implementation to decide whether to write out dedicated histograms (and then use those for the dataset to fit the statistical model to) or to build that dataset on-the-fly from the required histograms at a later stage. |
@eguiraud , @alexander-held
|
Thank you @andriiknu , the reference implementation will soon release a v1.2 tag which will use EDIT: |
@alexander-held then should |
Good point, it doesn't do harm to have them in the reference files but we should just skip that comparison if the histograms were not produced by a given implementation. |
Cool, I'll open a PR in a few minutes |
Alright, I think the little mismatches we currently have are all understood. I think we'll just wait for the reference implementation to tag a v1.2, adapt to that (e.g. making the btag cuts more uniform) and we'll tag a v1.0 of the RDF implementation! If v1.2 will also remove the 1e-6 offsetting of the histogram bin contents mentioned at iris-hep/analysis-grand-challenge#166 (comment) we should have perfect agreement between our output histograms. Otherwise that's the one remaining snag and we'll sync up in v2. |
This is fixed: current main (6557e89) passes validation against v1.2 of the reference implementation. |
No description provided.
The text was updated successfully, but these errors were encountered: