-
Notifications
You must be signed in to change notification settings - Fork 31
Fifth CAP meeting
CERN Analysis Preservation fifth meeting took place on Thursday 12th of May 2016, 4-6 PM CEST/10-12 AM EDT
https://indico.cern.ch/event/526921/
Prototype available at: https://analysis-preservation-qa.cern.ch/
With focus on the following features and interfaces:
- New backend (Invenio 3) underlying CAP
- Revised analysis Schemas (JSON) for LHCb, CMS and ATLAS. Testing in progress.
- Integration with more experiments’ databases for autocomplete functionality
- New functionalities around an analysis: sub-schemas, permissions (works with CERN e-groups), files tab to understand quickly the preservation readiness
- Search (results) and API
- Workflow integration with RECAST
CAP will be rolled out this year. Different pillars need to worked on:
1st pillar: Connecting the content sources, standardizing it
- Increase JSON schema testing with targeted working groups to refine the (sub)schemas and use cases. Meet the evolution of an analysis with a usable interface and JSON schema.
- Standardize JSON metadata schema (Ontology).
- Connect the databases from the experiments.
2nd pillar: Aggregate and grab the content
- Grab files/content from experiments’ databases, GitHub, GitLab.
- Intelligent search.
3rd pillar
- Containerize analyses by using Docker with various workflow engines and running environments e.g. GitLab CI (LHCb).
- Rerun analyses on OpenStack Magnum, allow first steps towards reproducible research.
The long-term goal is to capture the analysis, the descriptive metadata, the physics information, and all the relevant data, containers, and software so that an analysis can be reproduced locally. This is why the 3rd pillar needs the 2nd pillar to build upon.
The discussion showed a preference to start working on these pillars in parallel and not in consecutive steps. In particular, a fast track for the execution of workflows should be facilitated. There was overall consensus to do that. It needs to be understood, however, how to enable that fast track. Facilitating execution of workflows without hosting the content can be done faster, so that the two pillars can be worked on in parallel. A proper balance between short-term goals and long-term goals is needed.
Val. (CMS) and others request a policy document with a roadmap. The CAP team agrees that this should be prepared, i.e. to lay out the release plans more clearly. This will be helpful to enable the internal discussions for integrations, application and approval procedures. A draft of such a document will be circulated to this group for comments by the end of June. Please note also that the CAP team will refine its development milestones on GitHub.
Discussion of containerization vs having all the (metadata)information to see how the analysis was/is done. How is CAP set up:
- CAP as a container of full analysis or
- CAP as a store for the detailed components of an analysis, i.e. the JSON schemas and additional information
There is consensus that this is a discussion that happens across disciplines. CAP, however, facilitates both approaches.
Serving the “long tail of files”: Need of a generic file store for additional files, such as plots or other supplementary materials (see HEPData who does that already). General consensus to support this.
CAP integration with HEPData : CAP should smoothly integrate with HEPData (an open data publishing platform), that lowers the submission burden on the researchers/experiments’ side. General consensus to support this.
Access to API:
- Open for now but later we should see the access granularity, as more content will have been stored
- Different use cases for each experiment
Edit/Deletion of an analysis:
- Versioning in CAP allows “fall back options”
- CMS needs flexibility during the active analysis steps
- Agreement: after an approval “stamp” has been given to an analysis, it should not be possible to alter/delete it.
Representation
- What is a good representation of an analysis
- DASPOS has done work on that and can share their experience
ATLAS
- Internal policy documents are entering the final phase
- Interested to use CAP when analysis is in a more final phase, i.e. before publication approval.
- Technical problem to solve: Need of a reference/link from a Glance record to an AMI record. Both are needed for CAP, but information is not linked.
CMS
- Interested to use CAP from the beginning of the analysis and throughout the progress to capture all the needed information in all important steps. This means it is expected to be used before the approval phase.
- Started to test the interfaces of the analysis submission interface with specific WG, i.e. with the Heavy-Ion groups. The results: additions to the general schema. See for example: https://github.com/cernanalysispreservation/analysis-preservation.cern.ch/issues/132
LHCb
- Interested to use CAP as part of the publication approval procedures
- Interested to know how a Docker container (and post n-tuple analysis steps run via GitLab CI - S. N. is working on this) can be integrated into CAP.
- Asking feedback from its working groups. Testing is under way.
- Next collaboration meeting in June where prototype will be presented
DASPOS: Collaboration with CAP planned, focus on computational workflows to rerun analysis and ontology descriptions.
RECAST: Update: The execution backup will be more like CERN’s new container technology (Open Stack Magnum).
EVERWARE: friendly to run workflows with several containers.
- Forthcoming weeks: Testing analysis forms. We need more detailed use cases and examples from each experiment so that we can develop workflows accordingly. CMS and LHCb are already reaching to the working groups. Once ATLAS is set up and integrated with their platforms, testing should begin there as well.
- Need for clarification in regards to working on the 3rd pillar without requiring 2nd pillar (see above): Ask RECAST if they want to use CAP only to store their JSON schemas and run the analyses on their side or they will give us access to their data and CAP will be able to rerun the analyses.
- G. to circulate DASPOS examples of analysis representation
- CAP team to refine milestones on GitHub (publicly) and prepare policy document for circulation in this group
Link to GitHub repo: https://github.com/cernanalysispreservation
Previous meeting notes: Fourth CAP meeting