Open Redatam is a software for extracting raw information from REDATAM databases.
For the standalone C++ command line application and desktop app, see the main directory of this repository.
Install the Python package using a virtual environment:
git clone https://github.com/pachadotdev/open-redatam.git
cd open-redatam/pypkg
python -m venv venv
source venv/bin/activate
python -m pip install --upgrade pip
pip install pandas numpy pybind11
pip install --use-pep517 .
As a developer, be sure to delete the previous build after doing changes and re-installing:
rm -rf build dist redatam.egg-info
pip install --use-pep517 .
As an optional step, you can run the tests:
python tests/basic-test.py
If you only need the processed data, you can download the microdata repository. It is available in RDS format for easy loading into R.
Available datasets:
- Argentina: 1991, 2001, 2010
- Bolivia: 2001, 2012
- Chile: 2017
- Ecuador: 2010
- El Salvador: 2007
- Guatemala: 2018
- Mexico: 2000
Python 3.8 or higher.
For a given census, such as the Chilean Census 2017, run the following command:
import redatam
redatam.read_redatam("input-dir/dictionary.dicx")
Please read the vignette for a more detailed explanation and how this package can be used in conjunction with dplyr
and other packages.
The Python package uses a modified copy of the C++ code to read the REDATAM databases that parses data into dictionary of data frames instead of writing to CSV files.
Open Redatam was created and is supported by Lital Barkai ([email protected]).
The tests, installation instructions and Python package were created by Mauricio "Pacha" Vargas Sepulveda ([email protected])
The original converter was created by Pablo De Grande. See here for more information.
This project uses pugixml created by Arseny Kapoulkine to structure a part of the output data.