The goal of this project was to analyze the evolution of daily returns of four major US stock markets indices(DowJones, Nasdaq, Russell2000, SP500) over the period 1987 2016 using persistent homology following the approach proposed by M. Gidea and Y. Katz in [1].
This project is part of the Foundations of Geometric Methods in Data Analysis course at CentraleSupélec. It is a fairly free project in the form that we have a few questions to guide us around a research paper [1]. This course is given by Frederic Cazals and Frederic Chazal, research director, Inria.
We have developed a Command line interface under Python allowing to reproduce the results of our approach. The interface is documented but does not have automatic tests (shame on us!)
The whole module is based on the Gudhi library [2]. The graphics are displayed using matplotlib [3]. Since Gudhi is not located in the pip, we advise you to use Anaconda.
To install the requirements, you can execute the following command:
conda install --file='requirements.txt' -y
We also advise you to create a new environment:
conda create --name='GMDA-env' -y
conda activate GMDA-env
You can run the demo script to get a preview of the contents of the package.
chmod +x ./demo.sh
./demo.sh
You can locate the environment using the command
conda env list
Note your python interpreter linked to anaconda "pyconda". You can find it by locating your environment. The interpreter is usually found in the bin/python folder
The general idea of the command is to use the "manage.py" file.
pyconda manage.py <\command>
At the first launch, the program will download the datasets from the internet.
This documentation is not meant to be exhaustive, however, it gives a good idea of the contents of the package. You find details on the command arguments using the '-h' argument that you can associate with any what an command.
Access to the dataset
pyconda manage.py dataset <\subcommand>
subcommand | explaination | arguments |
---|---|---|
visualise | visualise the dataset | --log --save |
Access to the landscape persistence
pyconda manage.py landscape <\subcommand>
subcommand | explaination | arguments |
---|---|---|
visualise | plot the landscape graphs | -w_size --end_date --save |
get | get the persistence tree and the landscape | -w_size --end_date |
clean | clean the hidden working database |
Access to the norm L1 & L2 of the persistence
pyconda manage.py norm <\subcommand>
subcommand | explaination | arguments |
---|---|---|
visualise | plot the norm graph | -w_size --start_date --end_date --save |
get | get the norm | -w_size --start_date --end_date |
crash_stats | get and plot statistics on crashs | -w_size -year --test --plot save |
clean | clean the hidden working database |
Access to the bottleneck of the persistence
pyconda manage.py bottleneck <\subcommand>
subcommand | explaination | arguments |
---|---|---|
visualise | plot the bottleneck graph | -w_size --start_date --end_date --save |
get | get the bottleneck | -w_size --start_date --end_date |
crash_stats | get and plot statistics on crashs | -w_size -year --test --plot save |
clean | clean the hidden working database |
[1]: Marian Gidea and Yuri Katz. Topological data analysisof
financial time series : Landscapes of crashes.PhysicaA
: Statistical Mechanics and its Applications, 491 :820– 834, 2018
[2]: https://gudhi.inria.fr
[3]: https://matplotlib.org