Skip to content
Sünje edited this page Feb 7, 2017 · 12 revisions

Notes from the CERN Analysis Preservation/DASPOS/RECAST workshop

February 2nd, 2016 at CERN

Agenda: https://github.com/cernanalysispreservation/analysispreservation.cern.ch/wiki/Joint-CERN-Analysis-Preservation-DASPOS-RECAST-workshop

Additional Materials (slides): https://indico.cern.ch/event/611389/

Attendees: Representatives from LHC experiments, CAP team, members from DASPOS and RECAST

Notes

###Introduction

  • Reminder of the original use case: Preserving the analysis for later access and reuse. With researchers submitting content to CAP, it becomes an aggregator of analysis contents. Hence, use cases are extended for example to easier internal discoverability of analysis elements or better search functionalities. See user stories for more details.
  • This mini workshop aimed at getting everyone involved onto the same page on what has been happening, see the latest prototype, give feedback and decide on the next steps
  • The DASPOS project (presented by Mike Hildreth), consisting of a team of computer scientists, digital librarians and members of the scientists community, has been working very closely with CAP and RECAST recently. The focus has been on metadata aspects (i.e. Ontologies) and the reuse part (with Umbrella and RECAST). Another phase will focus on the integration of CAP into the research workflow, i.e. build connectors to it.
  • RECAST collaborates closely with CAP to build the “reuse” part of the CAP environment (see details below)
  • Work on CERN Analysis Preservation is organized in three pillars:
    • Describe: In order to understand the steps and results of analysis it is crucial to identify the main elements of the analysis. This varies by collaboration/working group so there is a challenge to handle standardisation vs. completeness
    • Capture: To later access and use the content it is needed to capture the content. Additional challenge arise due to large files, reused information, versioning.
    • Reuse: users accessing content on CAP should be able to instantiate it.

Describe Pillar

  • Overview chart of the describe pillar
  • The description of the analysis is done in JSON format. A range of schemas exist for CAP now. The CAP team aims to standardize these as much as possible (with the limitation that only very few preservation standards exist) while allowing flexibility to adjust to community practices. Schemas are versioned. https://github.com/cernanalysispreservation/analysispreservation.cern.ch/tree/master/cap/jsonschemas
  • The forms for each collaboration (accessible through CAP) are a representation of these JSON schemas. Depending on the preference and work environment of the collaboration, the functionalities of the form can be adjusted, i.e. to provide sufficient detail on the physics details and dependencies.

Capture Pillar

  • Overview chart of the capture pillar
  • The fundamental architecture to preserve files has been set up using CERN EOS, the Invenio Digital Library Software with e.g. JSON schema management and Elasticsearch and AngularJS for frontend See slides: https://indico.cern.ch/event/611389/attachments/1406852/2150011/project_architecture.pdf
  • The new web page has been set up and is now in a prototype state. Naturally, the prototype will evolve further during the next weeks and months so that it can be tested further and files are stored. Access restrictions are following those established through CERN SSO/Egroups.

Reuse Pillar

From the discussion - “to be done”

  • Help the CAP team building the connectors which are considered crucial for the adoption of CAP as a tool
    • Clear responsible person on either side (collaboration and CAP Team)
    • CAP team can support building the connector, but we need help to know which information goes where, which fields are relevant/change/update regularly
    • If possible, an API would be great to interface with internal tools
  • A draft version of a CAP “record”/”entry” should also be versioned and should be shareable within small group. Request: Enhance the draft mode so one can share it with selected people or the working group. Current draft record default foresees that only creator can view and edit.
  • Investigate different types of records, ordinary analysis records and “reference records” (e.g. for Rivet) that can be referenced/used by other records
    • Need for many to many relationships.
    • Towards an analysis registry (suggestion by Kyle)
  • Need to underline that the definition of the analysis within CAP can vary slightly from collaboration to collaboration, e.g. when it should be put into CAP - from the start of analysis of towards the end. This influences cross-linking functionalities for example. If researchers submit early it might happen that the analysis is not finally published as a publication. If the content is submitted for publication approval only, there are fewer versions and always a publication associated to it. CAP can support both as it features versioning.
  • Need for export to HEPData or others (where to?). Lukas Heinrich will start building a trial connector to HEPData. one use case of CAP for REANA/RECAST: run analyses from CAP every month or so to see if any breaks due to external dependencies
  • Question of payment for storage space. Contribution by Tim Smith indicated that it needs more discussion. Currently it is suggested to include in the standard plans and allocations for the LHC experiments.

Next steps

  1. Test prototype with first set of analysis (for robustness) at CERN
  2. Open URLs outside CERN asap and share which ones to use
  3. Establish connectors with experiment/collaborations’ databases (that is partially in parallel with other activities)
  4. Test outside CERN, i.e. submission and retrieval. Diversify testing according to the use cases presented. Deadline mid March (DPHEP workshop)
  5. 1st internal political note (internal in the collaborations, CERN hierarchy)
  6. Widening testing scope (based on internal communications (5.) hopefully with internal support of the collaborations)
  7. Beta-release
  8. Political approval (internal in the collaborations and CERN)

Next meeting at DPHEP workshop March 13th to 15th. Time depending on the final workshop agenda.