OpenRefine is a great tool for exploring and cleaning datasets prior to analysing them. It also records an undo history of all actions that you can export as a sort of script in JSON format. However, in order to execute that script on a new dataset, you need to manually import it through the graphical interface or set up a BatchRefine server, neither of which is quick.
PyRefine allows you to execute OpenRefine JSON scripts against datasets without firing up a full Java/OpenRefine server. It has a commandline tool for quick use, or you can use it as a library to integrate it into your pandas-based data analysis pipeline.
More details in this blog post.
Please note: PyRefine is still very much alpha-quality. It probably doesn't work exactly how you're expecting right now. That said, please try it out, and consider :doc:`contributing`!
- Free software: MIT license
- Documentation: https://pyrefine.readthedocs.io.
- Execute OpenRefine JSON against a dataset from the command line
- Execute OpenRefine JSON from a Python script
This package was created with Cookiecutter and the audreyr/cookiecutter-pypackage project template.