The following report summarizes our work on the FieldLab Support implementation Archive law with e-mail box analysis under the supervision of the Data Science in Practice team of Leiden University and in collaboration with the Ministry of Economic affairs and Climate policy and the Ministry of Agriculture, Nature and Food quality of The Netherlands. In this project, we follow the Cross-industry standard process for data mining (CRISP-DM) and carefully covers each steps. Our result is a text pre-processing model coupled with an LDA algorithm that can train on a corpus of documents and accurately assign topics for each documents, as well as a Locality Sensitive Hashing (LSH) method to find out most important or most similar emails.
Authors: Atish Kulkarni (Leiden University), Nick van der Linden (Leiden University), Maxime Casara (Leiden University), Thijs van Meurs (Leiden University).