by Jérôme E. Blanchet, Senior Analyst | Data Scientist
981 Gulf Pl, Ottawa, ON K1K 3X9 (613) 746-4120 [email protected]
My name is Jerome Blanchet, my educational background is in pure mathematics and economics. I am senior analyst at CMHC national office. This is my personal data science site at Github. This is actually my first notebook ever and I am very excited about it. I am interested about Numerai, a Silicon Valley firm focusing about predicting the Stock Market. The dataset include many advantages and also disadvantages. First of all, the dataset is very clean. There is 21 continuous variable and all of them are normalized, uniformely distributed with no outliers and no missing value. The target rate is near 50%. That kind of dataset is perfect for spending less time on preprocessing and focusing more about testing new algorithms. The main drawback of the dataset is its structural nature. The stock market data is well knowns to be very chaotic.
Part 1) Data Description .................................................................................................XXX
Part 2) Data Interaction..................................................................................................XXX
Part 3) Benchmark (Modelling without any Preprocessing)...................................................................XXX
3.1 Manual Tuning with Various Algorithms................................................................................................................XXX
3.2 Manual Tuning with Neural Network.....................................................................................XXX
Part 4) Preprocessing.....................................................................................................XXX
4.1 Dimensionality Reduction with PCA.....................................................................................XXX
4.2 Dimensionality Reduction with T-distributed stochastic neighbor embedding (t-SNE) on top of PCA.......................XXX
Part 5) Grid Search, Random Search and Bayesian Hyperparameter Search.....................................................XXX