This respository contains the HOTMapper tool, a tool that allows the user to manage his historical data using a mapping protocol.
Bellow we have a simple usage tutorial, if you want a more complete tutorial or know more about all HOTMapper aspects, please head to our wiki page.
The Open Data sources extracted and processed by the tool can be found at the link: INEP in the section "Censo Escolar" and "Censo da Educação Superior".
To make it easier to execute the tool, we have dowloaded all data from "Local Oferta" is in the directory open_data
. This way it is not necessary to search for the original sources.
NOTE: It's important to verify if there is a column identifying the year of the dataset
- Python 3 (It's recommended to use a virtual environment, such as virtualenv)
- MonetDB (We plan to make other databases to work with HOTMapper in the future)
NOTICE: We assume that Python 3.x is installed in the local computer and that all the following commands that use Python will use Python 3.x.
- Install virtualenv
1a) On Linux/macOS
$ sudo -H pip3 install virtualenv
1b) On Windows (with administrator privilleges)
$ pip install virtualenv
- Clone this repository
$ git clone [email protected]:tools/hotmapper.git
or
$ git clone https://github.com/C3SL/hotmapper.git
- Go to the repository
$ cd hotmapper
- Create a virtual environment
$ virtualenv env
- Start the virtual environment
5a) On Linux/macOS
$ source env/bin/activate
5b) On Windows (with administrator privilleges)
$ .\env\Scripts/activate
- Install dependencies
$ pip install -r requirements.txt
The CLI (Command Line Interface) uses the standard actions provided by manage.py, which means that to invoke a command it follows the following patterns:
$ python manage.py [COMMAND] [POSITIONAL ARGUMENTS] [OPTIONAL ARGUMENTS]
Where COMMAND can be:
- create: Creates a table using the mapping protocol.
$ python manage.py create <table_name>
NOTICE that the HOTMapper will use the name of the protocol as the name of the table.
- insert: Inserts a CSV file in an existing table.
$ python manage.py insert <full/path/for/the/file> <table_name> <year> [--sep separator] [--null null_value]
<full/path/for/the/file> : The absolute file path
<table_name>: The name of the table where the file will be inserted
<year>: The column of the mapping protocol that the HOTMapper should use to insert data
[--sep separator]: The custom separtor of the CSV. To change it you should just replace 'separator' with the token your file uses
[--null null_value]: Define what will replace the null value. Replace the 'null_value' with what you wish to do.
- drop: Delete a table from the database
$ python manage.py drop <table_name>
NOTICE: The command does not handle foreign keys that points to the table that are being deleted.
- remap: syncronizes a table with the mapping definition.
$ python manage.py remap <table_name>
This command should be run everytime a mapping definition is updated.
The remap allows the creation of new columns, the exclusion of existing columns, the renaming of columns and the modification of the type of columns. Be aware that the bigger the table the bigger the useage of RAM memory.
- update_from_file: Updates the data in the table
$ python manage.py update_from_file <csv_file> <table_name> <year> [--columns="column_name1","column_name2"] [--sep=separator]
- generate_pairing_report: generates reports to compare data from diferent years.
$ python manage.py generate_pairing_report [--output xlsx|csv]
The reports will be created in the folder "pairing"
- generate_backup: Create/Update a file to backup the database.
$ python manage.py generate_backup
In this Section we will explain how to execute the demo scenarios that were submitted to EDBT 2019. Demo scenario 1 uses the dataset "local oferta", which is included in the directory open_data
. Demo scenario 2 uses the dataset "matricula" which can be downloaded from the INEP's Link in the section "Censo Escolar".
In both scenarios, we assume that you started the virtual environment as explained in Section Installation - 5
This Section contains the commands used in the scenario 1, which is the creation of a new table and the inclusion of the corresponding data.
- First we need to create the table in the database, to do so we execute the following command:
$ ./manage.py create localoferta_ens_superior
- Now, as we already have the mapping definition, we need to insert the open data in the database. To do it we must execute the following commands:
NOTE: FILEPATH is the full path for the directory where the open data table is, for example (in a Linux environment): /home/c3sl/HOTMapper/open_data/DM_LOCAL_OFERTA_2010.CSV
a) To insert 2010:
$ ./manage.py insert FILEPATH/hotmapper/open_data/DM_LOCAL_OFERTA_2010.CSV localoferta_ens_superior 2010 --sep="|"
b) To insert 2011:
$ ./manage.py insert FILEPATH/hotmapper/open_data/DM_LOCAL_OFERTA_2011.CSV localoferta_ens_superior 2011 --sep="|"
c) To insert 2012:
$ ./manage.py insert FILEPATH/hotmapper/open_data/DM_LOCAL_OFERTA_2012.CSV localoferta_ens_superior 2012 --sep="|"
d) To insert 2013:
$ ./manage.py insert FILEPATH/hotmapper/open_data/DM_LOCAL_OFERTA_2013.CSV localoferta_ens_superior 2013 --sep="|"
e) To insert 2014:
$ ./manage.py insert FILEPATH/hotmapper/open_data/DM_LOCAL_OFERTA_2014.CSV localoferta_ens_superior 2014 --sep="|"
f) To insert 2015:
$ ./manage.py insert FILEPATH/hotmapper/open_data/DM_LOCAL_OFERTA_2015.CSV localoferta_ens_superior 2015 --sep="|"
g) To insert 2016:
$ ./manage.py insert FILEPATH/hotmapper/open_data/DM_LOCAL_OFERTA_2016.CSV localoferta_ens_superior 2016 --sep="|"
This Section contains the commands used in the scenario 2, which is an update of an existing table.
- First we need to create the table in the database, to do so we execute the following command:
$ ./manage.py create matricula
- Now, as we already have the mapping protocol, we need to insert the open data in the data base. To do it we must execute the following commands:
NOTE: FILEPATH is the full path for the directory where the open data table is, for example (in a Linux environment): /home/c3sl/HOTMapper/open_data/MATRICULA_2013.CSV
a) To insert 2013:
$ ./manage.py insert FILEPATH/MATRICULA_2013.CSV matricula 2013 --sep="|"
b) To insert 2014:
$ ./manage.py insert FILEPATH/MATRICULA_2014.CSV matricula 2014 --sep="|"
c) To insert 2015:
$ ./manage.py insert FILEPATH/MATRICULA_2015.CSV matricula 2015 --sep="|"
d) To insert 2016:
$ ./manage.py insert FILEPATH/MATRICULA_2016.CSV matricula 2016 --sep="|"
-
Change the matricula's mapping protocol. You can use the
matricula_remap.csv
(To do so, rename the currentmatricula.csv
to something else and thematricula_remap.csv
tomatricula.csv
). In that case, the only column that will change is the "profissionalizante", because now, instead of theELSE returns 0
it returns9
. -
Run the remap command
$ ./manage.py remap matricula
The above command will update the table Fonte
and the schema from the table matricula
- Update the table
$ ./manage.py update_from_file FILEPATH/MATRICULA_2013.CSV matricula 2013 --columns="profissionalizante" --sep="|"
The above command will update the data in the table matricula
.
- Henrique Varella Ehrenfried, Eduardo Todt, Daniel Weingaertner, Luis Carlos Erpen de Bona, Fabiano Silva, and Marcos Didonet Del Fabro. Managing Open Data Evolution through Bi-dimensional Mappings. IEEE/ACM BDCAT ’19, pp 159-162, December 2–5, 2019, Auckland, New Zealand Extended version available
- Henrique Varella Ehrenfried, Rudolf Eckelberg, Hamer Iboshi, Eduardo Todt, Daniel Weingaertner and Marcos Didonet Del Fabro, HOTMapper: Historical Open Data Table Mapper. EDBT 2019, Demo paper. pp. 550-553, March 2019, Lisbon, Portugal. Open Proceeddings available