GrimoireLab include a set of interesting tools, but sometimes I need to run specific analysis or proof of concept ideas not convered yet by the current platform. Usually, I need easier ways to play with projects and communities data without setting up the whole Grimoire Lab infrastructure.
This repository is my personal playground to test some of these ideas, mostly as Jupyter notebooks.
Feel free to play with them!
For most of the ideas you need:
- Jupyter notebooks
- GrimoireLab/Perceval
- Elasticsearch, elasticsearch-py and elasticsearch-dsl-py
- To play with generated data, you might need Kibana
- utils.py file has some extra dependencies
There is a settings example file where you can define some variables to be used.
Index generators:
- Light Git index generator.ipynb: given a list of git urls in the settings file, it generate an elasticsearch index with items showing info about commits at file level.
- Light Meetup index generator.ipynb: given a list of Meetup groups names in the settings file, it generate an elasticsearch index with items showing info about meetup groups rvsps.
- Light Github index generator.ipynb: given a list of Github repositories urls in the settings file, it generate two elasticsearch indexes called with items showing info about commits at files level, github issues and github pull requests.
Other ideas:
- Genderize Index.ipynb: given an elasticsearch index, names field in the index, an optional
names.csv
file (containingname
,gender
,probability
,count
), it update each item in the index with gender information for the indicated names field.
Of course, there will be issues! I am not a computer scientist, and I am self-learning Python, Elasticsearch, etc. during this journey.
If you find any issue, feel free to report it.
Pull requests are also welcome, but I wouldn't recommend you losing time with this poor code. If you wanna help, go for the real thing!
If you wanna know more about GrimoireLab, I recommend you to read GrimoireLab Training free and open book.
My colleague Daniel Izquierdo has been developing an interesting toolkit called Ceres combining Perceval and Pandas for data massaging. It's worth taking a look into it.
100% free, open source software.. of course! MIT License