Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make the sqlite theorydb into a folder of yaml files #1997

Merged
merged 2 commits into from
Mar 14, 2024

Conversation

scarlehoff
Copy link
Member

@scarlehoff scarlehoff commented Mar 11, 2024

This is my proposal for what we discussed in Amsterdam.

At the moment (for validphys) this doesn't change anything. All fields are mandatory as they were before. I think we should first change it to the layout we want (yaml file or other) and afterwards decide what fields are important and what are no.

All interactions with the theory database for validphys happened through the validphys2/src/validphys/theorydbutils.py module. All other changes to validphys are such that the database points to theory_cards instead of theory.db.

I've also modified the docs (I've searched for theory.db and changed stuff whenever they were mentioned).

I have to admit I am not fully convinced RE having a file per theory. Having them in a single file would allow us to do this:

theory_41: &theory_41
  Comments: theory 41
  <other parameters>

theory_42:
  <<: *theory_41
  Comments: Theory inheriting from 41

40 000 000.yml --- baseline
40 000 001.yml

40 001 000.yml --- baseline
40 001 001.yml

40 001 000: 40 000 000.yml

which might be very nice.

Another option is that we leave the ones we currently have as "one per file" (but with the syntax id: <data>) and going forward organize the theories as 41_012.yaml and inside all the theories 41_012_000: <data>, 41_012_000: <data> etc.

(edit: the regression test failed, maybe by casting some of the quantities as int or float changed slightly the result, but I'll deal with it at the end)

@felixhekhorn
Copy link
Contributor

I have to admit I am not fully convinced RE having a file per theory. Having them in a single file would allow us to do this:

  • Implementing (simple) inheritance should be easy enough for us to use
  • having a huge theory file (~600*30 lines) is not much better then a sql database; it wouldn't be much readable either and we would couple (more or less) again several changes to the db together

so I would still prefer one unique theory per file

@scarlehoff
Copy link
Member Author

Implementing (simple) inheritance should be easy enough for us to use

Actually, inheritance might also work across files with the default yaml loading.

I guess you are also against the mixture?

Another option is that we leave the ones we currently have as "one per file" (but with the syntax id: ) and going forward organize the theories as 41_012.yaml and inside all the theories 41_012_000: , 41_012_000: etc.

Copy link
Contributor

@giacomomagni giacomomagni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

  • Can you fix also:

nnpdf/pyproject.toml

Lines 30 to 34 in 41471d8

# The default profile is included together with the validphys package
"validphys2/src/validphys/nnprofile_default.yaml",
# Same for commondata and theory.db
"validphys2/src/validphys/datafiles/commondata/*",
"validphys2/src/validphys/datafiles/theory.db",

*And:

https://github.com/NNPDF/nnpdf/blob/theory_to_yaml_files/doc/sphinx/source/tutorials/reproduce.rst

@RoyStegeman
Copy link
Member

https://github.com/NNPDF/nnpdf/blob/theory_to_yaml_files/doc/sphinx/source/tutorials/reproduce.rst

Why would this need to be fixed? Note that both the dockerfile and yaml environment contain a specific (now outdated) version of the nnpdf code

@giacomomagni
Copy link
Contributor

Why would this need to be fixed? Note that both the dockerfile and yaml environment contain a specific (now outdated) version of the nnpdf code

Ah okay, then no need to fix it.

@RoyStegeman
Copy link
Member

If we wish to search the files using grep as you suggested it would also help to have a single file per theory such. Also in that case we'll want to be very strict about formatting e.g. ktThr: 1.0 should always be that and not ktThr: 1.0. Perhaps a formatting check in the tests can help with that?

XIF: float
NfFF: int
MaxNfAs: int
MaxNfPdf: int
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would start removing parameters that we already know are useless. For example this MaxNfPdf but also MaxNfAs and EScaleVar (and most likely others...)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will do that in subsequent PRs. The idea is that this leaves validphys essentially unchanged.

@scarlehoff
Copy link
Member Author

should always be that and not ktThr: 1.0. Perhaps a formatting check in the tests can help with that?

This is important, yes. Maybe it is time to add a pre-commit hook to this repo...

@RoyStegeman
Copy link
Member

Maybe it is time to add a pre-commit hook to this repo...

Something like that, but at least initially I'd be happy to have it check only this theory database and not the entire repository

@felixhekhorn
Copy link
Contributor

I guess you are also against the mixture?

Another option is that we leave the ones we currently have as "one per file" (but with the syntax id: ) and going forward organize the theories as 41_012.yaml and inside all the theories 41_012_000: , 41_012_000: etc.

I was a bit hesitant, but then this

If we wish to search the files using grep as you suggested it would also help to have a single file per theory such. Also in that case we'll want to be very strict about formatting e.g. ktThr: 1.0 should always be that and not ktThr: 1.0. Perhaps a formatting check in the tests can help with that?

is a good argument against and have really one file per theory

doc/sphinx/source/data/data-config.rst Outdated Show resolved Hide resolved
doc/sphinx/source/data/data-config.rst Show resolved Hide resolved
doc/sphinx/source/data/data-config.rst Outdated Show resolved Hide resolved
doc/sphinx/source/theory/theoryparamsinfo.rst Outdated Show resolved Hide resolved
doc/sphinx/source/theory/theoryparamsinfo.rst Show resolved Hide resolved
@scarlehoff scarlehoff force-pushed the theory_to_yaml_files branch from 41471d8 to 01b5e18 Compare March 13, 2024 13:53
Update validphys2/src/validphys/datafiles/disp_theory.py

Co-authored-by: Felix Hekhorn <[email protected]>

apply comments and rebase

cast explicitly as float
@scarlehoff scarlehoff force-pushed the theory_to_yaml_files branch from 01b5e18 to 44d5fee Compare March 13, 2024 13:53
Copy link
Contributor

@andreab1997 andreab1997 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As promised, I had a second look to the code part and it seems fine to me

@scarlehoff scarlehoff merged commit d577591 into master Mar 14, 2024
6 checks passed
@scarlehoff scarlehoff deleted the theory_to_yaml_files branch March 14, 2024 11:46
@scarlehoff scarlehoff mentioned this pull request May 20, 2024
34 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants