Skip to content

Latest commit

 

History

History
84 lines (42 loc) · 5.1 KB

2020-06-16.md

File metadata and controls

84 lines (42 loc) · 5.1 KB

I3S meeting notes

16 June 2020: Istat-Insee mini-hackathon

Context

A virtual hackathon between Istat and Insee is held in order to progress on the integration between ARC and Relais.

Roadmap

The following actions were identified in preparation of the hackathon, by decreasing order of priority. Meeting notes are added in each section.

Fix problem with database connection script.

Manu has described the problem in a GitHub issue.

Notes

All agree to implement schema creation and optionally data loading via scripts launched at setup. Two scripts will be added at the root of the project: Manu will propose their creation in a pull request. On a related subject, Romain will create a GitHub issue proposing the use of Liquibase or similar technology.

Other problems were also identified while using IS2/Relais:

  • JavaScript/JQuery problem affecting multi-selection of matching variable. Istat will look into it (Insee offers help of JavaScript experts), but the target UI will use Vue.js.
  • No cascade update on deleting an operation. Istat encountered and fixed the problem on MySQL, but it seems to persist with PostgreSQL. Istat will study the question.
  • Non-convergence of FS linking when using STREET on the example files: Istat will consult the methodology team.
  • Various UI bugs or suggested enhancements: Insee will open issues on GitHub.
  • User manual: a light version of IS2 workbench manual is available on GitHub (Istat to verify), which should be improved.

Re-use of ARC by Istat.

Istat has decided to change the use case and reuse ARC for administrative data. Pina will provide more details.

Insee has provided a tutorial and an online ARC test instance.

Notes

Pina indicates that Istat wishes to select a re-use case more related to how ARC is used in Insee. Monica will contact subject-matter experts in order to further define the re-use case. In the meantime, Manu will make a demonstration of a real use Insee case (Siera) during a specific online session, and provide presentations to be studied beforehand by Istat.

Pina mentions that the tutorial was useful to understand how ARC works. She will continue to use the software with Mauro, with support from Manu in case of questions.

Re-use of Relais by Insee.

The re-use case is currently tentatively defined. The data files and the currrent process description are available. Nevertheless, the current process is based on a chain of deterministic linkages, which does not seem to be easily adaptable in Relais. More detail are available at https://hackmd.io/@EgVaFRsUQ-ywTiFcXIsWig/BkcKkvr6L.

Notes

A closer look shows that the current chain of linkages actually corresponds to different combinations of using all or only two of the three matching variables. Since Relais produces all the combinations in one pass, it should be possible to use it for the linking. Nevertheless, there are still questions about the definition of the matching variables. Insee will get more information from the subject-matter experts for validation by Istat methodology team next week. Insee will also make more tests with deterministic or probabilistic linking.

Abstraction of the database schema.

A data access layer should allow to implement the original schema and the new proposal agreed upon in Toulouse.

Notes

As a first step, an "export as table" option will be included in the IS2 interface. Manu will provide the relevant parameters. The functionality could be added to the web service layer later on.

Service deployment.

Deployment procedures for ARC and Relais should be strenghtened, with specifications from SSB.

Insee has experimented deployment procedures based on the following process steps:

  • The continuous integration process run by Travis CI under the responsability of the Developing Organization publishes the Docker image(s).
  • The actual deployment is then carried out by the Reusing Organization via Docker (Kubernetes, Helm...) scripts and possibly GitOps techniques (ArgoCD, Flux...)

Franck and Romain will follow-up with SSB on this subject.

Notes

Istat is interested in using this process for Relais. The first part of the process seems to be active already, with publication of the images on https://hub.docker.com/u/mecdcme.

Data and workflow sharing between ARC and Relais.

Manu has adapted the existing "Zoo" toy use case in order to have 3 matching variables, but this needs to be discussed since Istat plans to partially implement deterministic linkage soon. In any case, this will allow to test the communication between ARC and Relais.

Notes

Manu will modify the use case in order to use only open datasets. All agree on the idea of showing two-way communication (processing in ARC, integration of result in Relais for linking, retrieval of results in ARC for enrichement). This requires the "export as table" already mentioned.