Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support records for theoretical calculations associated with experimental papers #58

Closed
GraemeWatt opened this issue Jul 4, 2016 · 6 comments · Fixed by #683
Closed
Assignees
Labels

Comments

@GraemeWatt
Copy link
Member

Various theoretical frameworks provide multiple analyses/code/predictions that are each closely tied to an experimental paper (that might have its own HEPData record). Some examples are Rivet, MadAnalysis5, fastNLO, APPLgrid. A normal HEPData record can be used to store these results and the normal Coordinator-Uploader-Reviewer workflow can be followed, where the Coordinator would be a senior person responsible for a particular theoretical framework. But instead of prompting the Coordinator for a single Inspire ID (of the experimental paper), there should be an option "Theoretical analysis associated with experimental paper" and the Coordinator would enter two Inspire IDs corresponding to both the experimental paper and the theoretical paper describing the framework. The HEPData record of the analysis would then be linked both to the Inspire (and HEPData if it exists) record of the experimental paper and the Inspire (and HEPData if it exists) record of the theoretical paper. A possible HEPData record of the theoretical paper might contain core code that is independent of particular experimental results. Each theoretical paper could be associated with multiple experimental analyses and each experimental paper could be associated with multiple theoretical analyses.

@eamonnmag
Copy link
Contributor

Nice! I think that can work nicely.

@GraemeWatt
Copy link
Member Author

GraemeWatt commented Jul 4, 2016

Some comments from Klaus Rabbertz (fastNLO author) on this issue in July 2015:

A point that is still worth considering is how to add and reference new theory
tables to existing experimental data. For example we might provide updated tables
including additional features like access to electroweak corrections.
Or theory colleagues might want to provide tables for new NNLO
predictions that didn't even exist at the time the data were published.

At the moment (in the old HepData) I have added fastNLO tables to each of the corresponding HEPData records (for the experimental publications). But it would be better to do this in a more elegant way, giving the theorists control over uploading their theoretical analyses and separating this information from the experimental data. The theoretical analyses might not be endorsed by the experimental collaborations, so it should not appear together with the experimental data, but it should be linked from it.

@eamonnmag
Copy link
Contributor

Would also attach the code to these records.

@GraemeWatt
Copy link
Member Author

After discussion at this morning's IPPP workshop we decided to revive this issue following a slightly different approach to allow linking at the level of data tables rather than whole records. Similar to the proposal for linking error matrices to measurements in #140, the input submission.yaml file could contain a field like:

related_to_table_dois: [10.17182/hepdata.34567.v1/t1, 10.17182/hepdata.89012.v1/t2]

A field like related_tables would be added to the DataSubmission object. The process_data_file function would persist the new field from the input YAML file to the database.

When rendering a data table, the get_table_details function would retrieve related_tables from the database:

 table_contents["related_tables"] = datasub_record.related_tables

The hepdata_tables.js and table_details.html files would be modified to render the list of related tables with the DOIs as links, e.g. "This table related to: 10.17182/hepdata.34567.v1/t1, 10.17182/hepdata.89012.v1/t2"

A database query should also be made to find entries where the doi of the current table matches an item in the related_tables of other DataSubmission objects and where the corresponding HEPSubmission object has overall_status='finished'. In this case, the corresponding DOIs could again be rendered as links, e.g. "This table referred to by: 10.17182/hepdata.12345.v1/t3".

It's probably not necessary to add information on related tables to the OpenSearch index at this stage. The submission_schema.json file of the hepdata-validator package would need to be modified to add the new related_to_table_dois field.

To allow for linking between whole HEPData records as well as individual tables (as was the original idea), a similar field like related_to_hepdata_recids could be added to the first document of the submission.yaml file and to the HEPSubmission object of the database model like related_recids. Again bidirectional links could be created between the two related records.

@GraemeWatt
Copy link
Member Author

@ItIsJordan: following our discussion on Tuesday about the best database model for this feature, instead of adding a new field like related_tables to the existing DataSubmission object, I was thinking that it might be better to create a new object RelatedTable (__tablename__ = "relatedtable") with fields id, table_doi and related_to_table_doi. This would simplify the bidirectional linking, i.e. instead of searching the related_tables field of all DataSubmission objects, you would query the related_to_table_doi field of the (few) RelatedTable objects.

Similarly, instead of adding a new field related_recids to the existing HEPSubmission object, you would create a new object RelatedRecids (__tablename__ = "relatedrecids") with fields id, recid and related_to_recid.

The names of the new objects and their fields should be chosen to fit into the existing database model.

@GraemeWatt
Copy link
Member Author

GraemeWatt commented May 19, 2023

Summarising the tasks needed to complete this feature:

  • Update the hepdata-validator package to support new related_to_table_dois and related_to_hepdata_recids fields (Modify JSON schema to support bidirectional linking hepdata-validator#50). Initially, modifications can be made on a new branch, but a new release of the hepdata-validator package should be made after everything is working.
  • Extend the database model to add new tables to support the bidirectional linking. Check if a migration will be required for the production database or if the new tables will be created automatically.
  • Extend the submission code to persist the new fields from the input submission.yaml file to the database. This probably only needs to be done for normal records (not in the Sandbox). DOIs are not assigned for Sandbox records and bidirectional linking will not be required, so there is no need to store information in the new database tables for Sandbox records.
  • Extend the Python/JavaScript/HTML code to extract information on related data tables or records from the database and render it on the web pages.
  • Check that deletion is working as expected, i.e. if a submission is made with the new fields, followed by another upload with different fields (or none), check that the original fields are deleted from the new database tables. Also check that deletion of a record removes the relevant information on related data tables or records.
  • Extend the tests to automatically test the new functionality, i.e. that the new fields are persisted to the database and that they appear rendered on the web pages.
  • Extend the submission documentation to mention the new fields (Explain how to use bidirectional linking hepdata-submission#13).
  • Probably also need to extend the hepdata_lib package to provide methods to write the new fields.
  • Check that the hepdata-converter package can convert from YAML to other formats (CSV, ROOT, YODA) if the new fields are present. Modifications might be needed to write the new fields in the various output formats.

@ItIsJordan, this is my understanding of what's needed after our discussion yesterday, but feel free to edit or add to these tasks.

@GraemeWatt GraemeWatt moved this from To do to In Progress in @HEPData Jun 28, 2023
@github-project-automation github-project-automation bot moved this from In Progress to Done in @HEPData Aug 18, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

4 participants