Reference-Links

This repository contains a curated list of articles for ML/AI/NLP. I have browsed many of these articles while researching on problems that I have faced while applying ML to industry problems and could be useful to other practitioners as well. Although I have tried to arrange the topics at high level, there is not strict sequence to the references present.

General trivia about basic DS libraries

matplotlib plotting in 2D and 3D: http://nbviewer.jupyter.org/github/jrjohansson/scientific-python-lectures/blob/master/Lecture-4-Matplotlib.ipynb
Difference b/w size and count with groupby in pandas: https://stackoverflow.com/questions/33346591/what-is-the-difference-between-size-and-count-in-pandas
pandas regex to create columns: https://chrisalbon.com/python/data_wrangling/pandas_regex_to_create_columns/
regex in python and pandas: https://www.dataquest.io/blog/regular-expressions-data-scientists/
plotting boxplots in plotly in python: https://plot.ly/python/box-plots/
extract high correlation values: https://stackoverflow.com/questions/17778394/list-highest-correlation-pairs-from-a-large-correlation-matrix-in-pandas
converting group by object to data frame (also how to avoid converting columns to indices when doing group by):https://stackoverflow.com/questions/10373660/converting-a-pandas-groupby-object-to-dataframe
Progress Bars in jupyter notebook with tqdm : https://towardsdatascience.com/progress-bars-in-python-4b44e8a4c482
list comprehensions: https://www.machinelearningplus.com/python/list-comprehensions-in-python/
Numpy 1 - basic: https://www.machinelearningplus.com/python/numpy-tutorial-part1-array-python-examples/
Numpy 2 - advanced: https://www.machinelearningplus.com/python/numpy-tutorial-python-part2/
Numpy 101 practice: https://www.machinelearningplus.com/python/101-numpy-exercises-python/
Pandas 101 practice: https://www.machinelearningplus.com/python/101-pandas-exercises-python/

Inferential Statistics

General

p value: https://www.statsdirect.com/help/basics/p_values.htm
normality tests in python: https://machinelearningmastery.com/a-gentle-introduction-to-normality-tests-in-python/
parametric significance tests in python: https://machinelearningmastery.com/parametric-statistical-significance-tests-in-python/
non-parametric significance tests in python: https://machinelearningmastery.com/nonparametric-statistical-significance-tests-in-python/
multicollinearity in regression analysis: http://statisticsbyjim.com/regression/multicollinearity-in-regression-analysis/
Effect of multicollinearity on Ordinary Least Squares solution for regression: https://en.wikipedia.org/wiki/Multicollinearity
What is the difference between Liklihood and probability: https://stackoverflow.com/questions/879432/what-is-the-difference-between-a-generative-and-discriminative-algorithm

Frequentist AB testing:

http://ethen8181.github.io/machine-learning/ab_tests/frequentist_ab_test.html

ANOVA tests:

Machine Learning

Dimensionality Reduction

PCA

variance in PCA explained: https://ro-che.info/articles/2017-12-11-pca-explained-variance
PCA on large matrices: https://amedee.me/post/pca-large-matrices/
radomizedSVD: http://alimanfoo.github.io/2015/09/28/fast-pca.html

t-SNE

Laurens van der Maaten's (creator of t-SNE) website: https://lvdmaaten.github.io/tsne/
Visualising data using t-SNE: Journal of Machine Learning Research: http://www.jmlr.org/papers/volume9/vandermaaten08a/vandermaaten08a.pdf
How to use t-SNE effectively: https://distill.pub/2016/misread-tsne/

ICA

Stanford notes on ICA: http://cs229.stanford.edu/notes/cs229-notes11.pdf

Model Stacking

stacking: https://dkopczyk.quantee.co.uk/stacking/

Distance Metrics

Mahalnobis distance: https://www.machinelearningplus.com/statistics/mahalanobis-distance/
Cosine similarity: https://www.machinelearningplus.com/nlp/cosine-similarity/
3 basic Distance Measurement in Text Mining: https://towardsdatascience.com/3-basic-distance-measurement-in-text-mining-5852becff1d7
Word Mover’s Distance (WMD): https://towardsdatascience.com/word-distance-between-word-embeddings-cc3e9cf1d632
WMD Tutorial: https://markroxor.github.io/gensim/static/notebooks/WMD_tutorial.html
Word Mover’s Distance in Python: https://vene.ro/blog/word-movers-distance-in-python.html
probability distance metrics:https://markroxor.github.io/gensim/static/notebooks/distance_metrics.html

Clustering

assessing clustering tendancy:https://www.datanovia.com/en/lessons/assessing-clustering-tendency/
Hopkins test for cluster tendency: https://matevzkunaver.wordpress.com/2017/06/20/hopkins-test-for-cluster-tendency/
Clustering validation tests: http://www.sthda.com/english/wiki/print.php?id=241
silhoutte method for cluster quality: https://scikit-learn.org/stable/auto_examples/cluster/plot_kmeans_silhouette_analysis.html
K Modes Clustering: https://shapeofdata.wordpress.com/2014/03/04/k-modes/
Hirarchial Clustering: http://www.saedsayad.com/clustering_hierarchical.htm
Linkage methods of hierarchical agglomerative cluster analysis (HAC): https://stats.stackexchange.com/questions/195446/choosing-the-right-linkage-method-for-hierarchical-clustering

Handling imbalanced data set:

how to handle imbalanced data with code: https://elitedatascience.com/imbalanced-classes
good read: https://www.analyticsvidhya.com/blog/2017/03/imbalanced-classification-problem/
concept read: https://machinelearningmastery.com/tactics-to-combat-imbalanced-classes-in-your-machine-learning-dataset/
imbalanced-learn library: https://github.com/scikit-learn-contrib/imbalanced-learn
anomaly detection in python: https://www.datascience.com/blog/python-anomaly-detection
scikit learn novelty and outlier detection: https://www.datascience.com/blog/python-anomaly-detection
Imbalanced data handling tutorial in Python: https://blog.dominodatalab.com/imbalanced-datasets/
imbalanced data sets (Good Read): https://www.svds.com/learning-imbalanced-classes/
cost sensitive learning: https://machinelearningmastery.com/cost-sensitive-learning-for-imbalanced-classification/
develop cost sensitive neural network: https://machinelearningmastery.com/cost-sensitive-neural-network-for-imbalanced-classification/

Handling Skewed data:

Top 3 methods for handling skewed data: https://towardsdatascience.com/top-3-methods-for-handling-skewed-data-1334e0debf45

Multi-label Classification

https://towardsdatascience.com/journey-to-the-center-of-multi-label-classification-384c40229bff

Probability Callibration:

https://scikit-learn.org/stable/modules/calibration.html

Sparse Matrix

https://dziganto.github.io/Sparse-Matrices-For-Efficient-Machine-Learning/

Design of Experiment

Model Explainibility/Interpretable ML/ Fairness in AI/ Responsible AI

ELI5 - TextExplainer: debugging black-box text classifiers: https://eli5.readthedocs.io/en/latest/tutorials/black-box-text-classifiers.html
interpretable ML book: https://christophm.github.io/interpretable-ml-book/
Interpretable Machine Learning - Christoph Molnar: https://www.youtube.com/watch?v=0LIACHcxpHU
AI Fairness 360: This extensible open source toolkit can help you examine, report, and mitigate discrimination and bias in machine learning models throughout the AI application lifecycle: http://aif360.mybluemix.net/
ML Interpretability: SHAP/LIME: https://www.youtube.com/watch?v=jhopjN08lTM&t=730s
Fairness using sklearn-lego: https://scikit-lego.readthedocs.io/en/latest/fairness.html
Introducing Transformers Interpret — Explainable AI for Transformers: https://towardsdatascience.com/introducing-transformers-interpret-explainable-ai-for-transformers-890a403a9470
Model Interpretability for PyTorch: https://captum.ai/
Interfaces for Explaining Transformer Language Models:https://jalammar.github.io/explaining-transformers/
Alibi Explain: https://docs.seldon.io/projects/alibi/en/latest/index.html
Trust Scores: https://docs.seldon.io/projects/alibi/en/latest/methods/TrustScores.html

Anomaly Detection

Note on anomaly detection: https://towardsdatascience.com/a-note-about-finding-anomalies-f9cedee38f0b
Four Techniques for Anomaly detection: https://dzone.com/articles/four-techniques-for-outlier-detection-knime
Novelty and Outlier Detection: https://scikit-learn.org/stable/modules/outlier_detection.html#novelty-detection
One Class SVM Anomaly detection: https://www.kaggle.com/amarnayak/once-class-svm-to-detect-anomaly
PyOD for anomaly detection: https://github.com/yzhao062/Pyod#quick-start-for-outlier-detection
text anomaly detection: https://arxiv.org/pdf/1701.01325.pdf
Outlier Detection for Text Data: https://epubs.siam.org/doi/pdf/10.1137/1.9781611974973.55
Text Anomaly Detection using Doc2Vec and cosine sim: https://medium.com/datadriveninvestor/unsupervised-outlier-detection-in-text-corpus-using-deep-learning-41d4284a04c8
https://github.com/avisheknag17/public_ml_models/blob/master/outlier_detection_in_movie_plots_ann/notebook/movie_plots_outlier_detector.ipynb

Semi Supervised Learning & Active Learning

How Active Learning can help you train your models with less Data: https://towardsdatascience.com/how-active-learning-can-help-you-train-your-models-with-less-data-389da8a5f7ea
Active Learning Tutorial: https://towardsdatascience.com/active-learning-tutorial-57c3398e34d
From Research to Production with Deep Semi-Supervised Learning: https://medium.com/@nairvarun18/from-research-to-production-with-deep-semi-supervised-learning-7caaedc39093
semi supervised learning resources:https://madewithml.com/topics/semi-supervised-learning/

Model Fracking and Concept Drift:

Time Series

time series analysis in python: https://www.machinelearningplus.com/time-series/time-series-analysis-python/
Introduction to Machine Learning with Time Series | PyData Fest Amsterdam 2020: https://www.youtube.com/watch?v=Wf2naBHRo8Q&t=3831s
sktime: https://github.com/alan-turing-institute/sktime

Open Datasets

https://skymind.ai/wiki/open-datasets
Chest X-ray data: https://www.kaggle.com/nih-chest-xrays
common crawl data (massive): https://commoncrawl.org/
comcrawl: is a python package for easily querying and downloading pages from commoncrawl.org:https://github.com/michaelharms/comcrawl

XGBoost Installation:

check you python version - by opening CMD and typing python -> ENTER
Go to this link and search on XGBoost: https://www.lfd.uci.edu/~gohlke/pythonlibs/
download the installable based on python version + Windows 32 or 64 bit, for example download xgboost-0.71-cp36-cp36m-win_amd64.whl for python version 3.6 and 64 bit machine.
open cmd in downloaded location and run the following command: pip install xgboost-0.71-cp36-cp36m-win_amd64.whl

Deep Learning:

General

The Perceptron Learning Algorithm and its Convergence: https://www.cse.iitb.ac.in/~shivaram/teaching/old/cs344+386-s2017/resources/classnote-1.pdf
Deep Dive into Math Behind Deep Networks: https://towardsdatascience.com/https-medium-com-piotr-skalski92-deep-dive-into-deep-networks-math-17660bc376ba
Recent Advances for a Better Understanding of Deep Learning − Part I: https://towardsdatascience.com/recent-advances-for-a-better-understanding-of-deep-learning-part-i-5ce34d1cc914
using neural nets to recognize handwritten digits: http://neuralnetworksanddeeplearning.com/chap1.html
Tinker with Neural Networks in browser: https://playground.tensorflow.org
Dimensions and manifolds: https://datascience.stackexchange.com/questions/5694/dimensionality-and-manifold
Play with Generative Adverserial Networks: https://poloclub.github.io/ganlab/
Overfitting and how to prevent it: https://hackernoon.com/memorizing-is-not-learning-6-tricks-to-prevent-overfitting-in-machine-learning-820b091dc42
37 reasons for neural n/w not working properly: https://blog.slavv.com/37-reasons-why-your-neural-network-is-not-working-4020854bd607
list of cost functions to be used with Gradient Descent: https://stats.stackexchange.com/questions/154879/a-list-of-cost-functions-used-in-neural-networks-alongside-applications
What is validation data used for in a Keras Sequential model? : https://stackoverflow.com/questions/46308374/what-is-validation-data-used-for-in-a-keras-sequential-model

CNN and Image Processing:

Why do we need to normalize the images before we put them into CNN? : https://stats.stackexchange.com/questions/185853/why-do-we-need-to-normalize-the-images-before-we-put-them-into-cnn
Neural Network data type conversion - float from int? : https://datascience.stackexchange.com/questions/13636/neural-network-data-type-conversion-float-from-int
Image Pre-processing (Keras): https://keras.io/preprocessing/image/
Trick to prevent Overfitting: https://hackernoon.com/memorizing-is-not-learning-6-tricks-to-prevent-overfitting-in-machine-learning-820b091dc42
keras callbacks: https://keras.io/callbacks/
How to Check-Point Deep Learning Models in Keras: https://machinelearningmastery.com/check-point-deep-learning-models-keras/
In Depth understanding of Convolutions: http://timdettmers.com/2015/03/26/convolution-deep-learning/
friendly introduction to Cross Entropy: https://rdipietro.github.io/friendly-intro-to-cross-entropy-loss/
Understanding Cross Entropy Loss - Visual Information Theory: http://timdettmers.com/2015/03/26/convolution-deep-learning/
Papers on imp CNN architectures: https://adeshpande3.github.io/adeshpande3.github.io/The-9-Deep-Learning-Papers-You-Need-To-Know-About.html
CNN using numpy: https://becominghuman.ai/only-numpy-implementing-convolutional-neural-network-using-numpy-deriving-forward-feed-and-back-458a5250d6e4
Image Transformations Using OpenCV: https://docs.opencv.org/trunk/d9/d61/tutorial_py_morphological_ops.html
List of Open Source Medical Image Analysis Softwares:http://www0.cs.ucl.ac.uk/opensource_mia_ws_2012/links.html
Natural Images: https://stats.stackexchange.com/questions/25737/definition-of-natural-images-in-the-context-of-machine-learning
ResNet - understanding the bottleneck unit: https://stats.stackexchange.com/questions/347280/regarding-the-understanding-of-bottleneck-unit-of-resnet
Visual Question Answering: https://github.com/anujshah1003/VQA-Demo-GUI
https://iamaaditya.github.io/2016/04/visual_question_answering_demo_notebook
CNN+LSTM: https://machinelearningmastery.com/cnn-long-short-term-memory-networks/

1D - CNNs

Introduction to 1D CNNs: https://blog.goodaudience.com/introduction-to-1d-convolutional-neural-networks-in-keras-for-time-sequences-3a7ff801a2cf
Why does each filter learn a different feature in CNN: https://www.quora.com/Why-does-each-filter-learn-different-features-in-a-convolutional-neural-network

Keras Embedding Layer

Using Embedding Layer in Keras: https://blog.keras.io/using-pre-trained-word-embeddings-in-a-keras-model.html
how does keras embedding layer work: https://stats.stackexchange.com/questions/270546/how-does-keras-embedding-layer-work

Keras generators

A detailed example of how to use data generators with Keras: https://stanford.edu/~shervine/blog/keras-how-to-generate-data-on-the-fly

Saving Keras Models

https://jovianlin.io/saving-loading-keras-models/

Clustering using DL

unsupervised clustering in keras: https://www.dlology.com/blog/how-to-do-unsupervised-clustering-with-keras/
overview of DL based clustering methods: https://divamgupta.com/unsupervised-learning/2019/03/08/an-overview-of-deep-learning-based-clustering-techniques.html

Large Model Support usage in keras

Image Captioning

https://towardsdatascience.com/image-captioning-with-keras-teaching-computers-to-describe-pictures-c88a46a311b8
https://www.analyticsvidhya.com/blog/2018/04/solving-an-image-captioning-task-using-deep-learning/
https://machinelearningmastery.com/develop-a-deep-learning-caption-generation-model-in-python/
A Comprehensive Survey of Deep Learning for Image Captioning: https://arxiv.org/pdf/1810.04020.pdf

Image Segmentation:

Image Segmentation Keras : Implementation of Segnet, FCN, UNet, PSPNet and other models in Keras: https://github.com/divamgupta/image-segmentation-keras

Natural Language Processing (NLP) and Natural Language Understanding (NLU)

General

text tutorial: https://scikit-learn.org/stable/tutorial/text_analytics/working_with_text_data.html
text classification ref: https://scikit-learn.org/0.19/auto_examples/text/document_classification_20newsgroups.html#sphx-glr-auto-examples-text-document-classification-20newsgroups-py
Multiclass vs Multilabel vs Multioutput classification: https://scikit-learn.org/stable/modules/multiclass.html
Out of core classification of text documents: https://scikit-learn.org/0.15/auto_examples/applications/plot_out_of_core_classification.html#example-applications-plot-out-of-core-classification-py
Library for handling multilabel classification: http://scikit.ml/index.html
NLTK Book: http://www.nltk.org/book/
NLTK tutorial: https://pythonprogramming.net/tokenizing-words-sentences-nltk-tutorial/
Extracting Text Meta Features: https://www.kaggle.com/shivamb/spacy-text-meta-features-knowledge-graphs
jaccard distance using NLP: https://python.gotrained.com/nltk-edit-distance-jaccard-distance/#Jaccard_Distance
Text Encoding Unicode: https://docs.python.org/3/howto/unicode.html
Roudup of Python NLP libraries: https://nlpforhackers.io/libraries/
Generate text using word level neural language model: https://machinelearningmastery.com/how-to-develop-a-word-level-neural-language-model-in-keras/
Generate text using LSTM: https://machinelearningmastery.com/text-generation-lstm-recurrent-neural-networks-python-keras/
SIF embeddings implementation: https://www.kaggle.com/procode/sif-embeddings-got-69-accuracy
Ontology based text classification: https://sci2lab.github.io/mehdi/icsc2014.pdf
fast text analysis using Vowpal Wabbit : https://www.kaggle.com/kashnitsky/vowpal-wabbit-tutorial-blazingly-fast-learning
sentiment analysis using VADER: https://medium.com/analytics-vidhya/simplifying-social-media-sentiment-analysis-using-vader-in-python-f9e6ec6fc52f

Text Classification using Deep Learning:

what kagglers are using for text classification: https://mlwhiz.com/blog/2018/12/17/text_classification/
text CNN: https://www.kaggle.com/mlwhiz/learning-text-classification-textcnn/comments
text pre-processing methods for DL: https://mlwhiz.com/blog/2019/01/17/deeplearning_nlp_preprocess/
How to pre-process when using embeddings: https://www.kaggle.com/christofhenkel/how-to-preprocessing-when-using-embeddings
A Layman guide to moving from Keras to Pytorch: https://mlwhiz.com/blog/2019/01/06/pytorch_keras_conversion/
Toxic comments classification: https://www.kaggle.com/larryfreeman/toxic-comments-code-for-alexander-s-9872-model
Text Blob: Simplified Text Processing: https://textblob.readthedocs.io/en/dev/

Spacy resources

text classification using spacy: https://www.dataquest.io/blog/tutorial-text-classification-in-python-using-spacy/
ml for text classification using spacy: https://towardsdatascience.com/machine-learning-for-text-classification-using-spacy-in-python-b276b4051a49
tricks for using spacy at scale: https://towardsdatascience.com/a-couple-tricks-for-using-spacy-at-scale-54affd8326cf
Modified skip gram based on spacy dependency parser: https://medium.com/reputation-com-datascience-blog/keywords-extraction-with-ngram-and-modified-skip-gram-based-on-spacy-14e5625fce23
SpaCy Tutorial: https://course.spacy.io/
Spacy NLP faster using Cython: https://medium.com/huggingface/100-times-faster-natural-language-processing-in-python-ee32033bdced
enable previously disabled pipes: https://stackoverflow.com/questions/53052687/spacy-enable-previous-disabled-pipes
Spacy rule based matching - https://github.com/explosion/spaCy/blob/develop/website/docs/usage/rule-based-matching.md#combining-models-and-rules-models-rules
Spacy Information extraction examples: https://spacy.io/usage/examples
Training custom ner model in spacy: https://www.machinelearningplus.com/nlp/training-custom-ner-model-in-spacy/
BRAT: open source annotation tool: http://brat.nlplab.org/examples.html

Topic Modelling:

Topic modeling in gensim: https://www.machinelearningplus.com/nlp/topic-modeling-gensim-python/
Topic modeling in sklearn (with NMF): https://www.machinelearningplus.com/nlp/topic-modeling-python-sklearn-examples/
sklearn: https://scikit-learn.org/stable/auto_examples/applications/plot_topics_extraction_with_nmf_lda.html
What is topic coherence: https://rare-technologies.com/what-is-topic-coherence/
Evaluation of Topic Modeling: Topic Coherence: https://datascienceplus.com/evaluation-of-topic-modeling-topic-coherence/
Exploring the Space of Topic Coherence Measures (paper): http://svn.aksw.org/papers/2015/WSDM_Topic_Evaluation/public.pdf
Good Read - choosing topics using coherence measures in LDA,LSI,HDP etc. https://markroxor.github.io/gensim/static/notebooks/gensim_news_classification.html
Dynamic Topic Models tutorial: https://markroxor.github.io/gensim/static/notebooks/ldaseqmodel.html
Dynamic topic model google talk: https://www.youtube.com/watch?v=7BMsuyBPx90
LDA using TF-IDF: https://towardsdatascience.com/topic-modeling-and-latent-dirichlet-allocation-in-python-9bf156893c24

Top2Vec

Text Summarization:

https://becominghuman.ai/text-summarization-in-5-steps-using-nltk-65b21e352b65

keyword-phrase extraction/keyphrase extraction/phrase extraction

Intro to Automatic Keyphrase Extraction: https://bdewilde.github.io/blog/2014/09/23/intro-to-automatic-keyphrase-extraction/
Beyond bag of words: Using PyTextRank to find Phrases and Summarize text: https://medium.com/@aneesha/beyond-bag-of-words-using-pytextrank-to-find-phrases-and-summarize-text-f736fa3773c5
NLP keyword extraction tutorial with RAKE and Maui: https://www.airpair.com/nlp/keyword-extraction-tutorial
Python Keyphrase Extraction library: https://github.com/boudinfl/pke
YAKE (paper): https://www.sciencedirect.com/science/article/abs/pii/S0020025519308588?via%3Dihub
KeyBERT: keyphrase extraction using BERT extracting phrases most similar to document: https://github.com/MaartenGr/keyBERT
Self-supervised Contextual Keyword and Keyphrase Retrieval with Self-Labelling: https://www.preprints.org/manuscript/201908.0073/download/final_file

Gensim

introduction to gensim: https://www.machinelearningplus.com/nlp/gensim-tutorial/
usine soft cosine similarity in gensim: https://github.com/RaRe-Technologies/gensim/blob/develop/docs/notebooks/soft_cosine_tutorial.ipynb

Natural Language Understanding

https://web.stanford.edu/class/cs224u/

NLG

NLG using markovify: https://github.com/jsvine/markovify
training bot to comment on current affairs: https://www.kaggle.com/aashita/training-a-bot-to-comment-on-current-affairs
Template based NLG: gramex-NLG: https://github.com/gramener/gramex-nlg
gramex NLG notebook: https://github.com/gramener/gramex-nlg/blob/dev/examples/intro-narrative-api.ipynb

EVT

Reducing Uncertainty in Document Classification with Extreme Value Theory: https://medium.com/cognigo/reducing-uncertainty-in-document-classification-with-extreme-value-theory-97508ebd76f

Doc2Vec

Doc2Vec : https://radimrehurek.com/gensim/models/doc2vec.html
https://github.com/RaRe-Technologies/gensim/blob/develop/docs/notebooks/doc2vec-lee.ipynb
Doc2Vec : https://medium.com/@mishra.thedeepak/doc2vec-simple-implementation-example-df2afbbfbad5
How to use Doc2Vec as input to Keras model: https://stackoverflow.com/questions/50564928/how-to-use-sentence-vectors-from-doc2vec-in-keras-sequntial-model-for-sentence-s

Information Retrieval, text search & semantic search:

101 ways to solve search: https://www.youtube.com/watch?v=VHm6_uC4vxM
BM25 - https://www.quora.com/How-does-BM25-work
Python library for BM25 - https://pypi.org/project/rank-bm25/
Building NLP text search system: https://towardsdatascience.com/building-a-sentence-embedding-index-with-fasttext-and-bm25-f07e7148d240
https://www.kaggle.com/davidmezzetti/cord-19-analysis-with-sentence-embeddings
Faiss is a library for efficient similarity search and clustering of dense vectors - https://github.com/facebookresearch/faiss
MILVUS: Open source vector similarity search engine: https://milvus.io/
Text similarity search with vector fields: https://www.elastic.co/blog/text-similarity-search-with-vectors-in-elasticsearch
Softcosine similarity paper: http://www.scielo.org.mx/pdf/cys/v18n3/v18n3a7.pdf
Text similarity using Softcosine similarity: https://github.com/RaRe-Technologies/gensim/blob/develop/docs/notebooks/soft_cosine_tutorial.ipynb
Document Similarity Queries: https://radimrehurek.com/gensim/auto_examples/core/run_similarity_queries.html#sphx-glr-auto-examples-core-run-similarity-queries-py
Document Distance metrics: https://radimrehurek.com/gensim/auto_examples/tutorials/run_distance_metrics.html#sphx-glr-auto-examples-tutorials-run-distance-metrics-py
Similarity Queries with Annoy and Word2Vec: https://radimrehurek.com/gensim/auto_examples/tutorials/run_annoy.html#sphx-glr-auto-examples-tutorials-run-annoy-py
similarities.docsim – Document similarity queries: https://radimrehurek.com/gensim/similarities/docsim.html

Longform Question Answering

https://yjernite.github.io/lfqa.html

NLP spell correction:

Norvig's spell corrector: https://norvig.com/spell-correct.html
Norvig's spell corrector (notebook):https://nbviewer.jupyter.org/url/norvig.com/ipython/How%20to%20Do%20Things%20with%20Words.ipynb
1000x faster Spelling Correction: https://towardsdatascience.com/symspellcompound-10ec8f467c9b
pyspellchecker: https://pypi.org/project/pyspellchecker/
The Unreasonable Effectiveness of the Transformer Spell Checker:http://www.realworldnlpbook.com/blog/ (https://github.com/mhagiwara/xfspell)

Transfer learning in NLP:

BERT: https://ai.googleblog.com/2018/11/open-sourcing-bert-state-of-art-pre.html
BERT Research Paper: https://arxiv.org/abs/1810.04805
blog: http://jalammar.github.io/
Blog for understanding ELMO and BERT: http://jalammar.github.io/illustrated-bert/
ULMFIT tutorial: https://www.analyticsvidhya.com/blog/2018/11/tutorial-text-classification-ulmfit-fastai-library/
DSSM: https://www.microsoft.com/en-us/research/project/dssm/

Latest Language Models usage & applications

When Not to Choose the Best NLP Model (Comparison of Elmo, USE, BERT & XLNET): https://blog.floydhub.com/when-the-best-nlp-model-is-not-the-best-choice/
Using NLP to Automate Customer Support, Part Two (using Universal Sentence Encoding - USE): https://blog.floydhub.com/automate-customer-support-part-two/
Paper Dissected: “XLNet: Generalized Autoregressive Pretraining for Language Understanding” Explained: https://mlexplained.com/2019/06/30/paper-dissected-xlnet-generalized-autoregressive-pretraining-for-language-understanding-explained/
using USE + Keras: https://www.dlology.com/blog/keras-meets-universal-sentence-encoder-transfer-learning-for-text-data/
NLP as service: Project Insight: https://github.com/abhimishra91/insight

Text Data Augmentation

Adapting Text Augmentation to Industry problems: https://gitlost-murali.github.io/blogs/nlp/augmentation/exploiting-contextual-models-for-data
A Visual Survey of Data Augmentation in NLP: https://amitness.com/2020/05/data-augmentation-for-nlp/

Advanced Topics

Knowledge Graphs

Knowledge Graphs: https://web.stanford.edu/class/cs520/
Domain specific knowledge graphs: https://www.springer.com/gp/book/9783030123741
Ampligraph Open source Python library that predicts links between concepts in a knowledge graph - https://docs.ampligraph.org/en/1.3.1/index.html
IBM technique: https://github.com/IBM/build-knowledge-base-with-domain-specific-documents/blob/master/README.md
KG Intro: https://github.com/kramankishore/Knowledge-Graph-Intro
Neo4j - Graph Data Science - GDS: https://neo4j.com/blog/announcing-neo4j-for-graph-data-science/
Mining Knowledge Graph from text: https://kgtutorial.github.io/
KG pipeline: https://towardsdatascience.com/conceptualizing-the-knowledge-graph-construction-pipeline-33edb25ab831
Graph Data Base: Neo4j - https://neo4j.com/
(Good Summary of resources 2020) https://dzone.com/articles/knowledge-graphs-power-scientific-research-and-bus
From Text to Knowledge: The Information Extraction Pipeline: https://towardsdatascience.com/from-text-to-knowledge-the-information-extraction-pipeline-b65e7e30273e
KG code notebooks from above author:https://github.com/tomasonjo/blogs
grakn graph database: https://grakn.ai/
KGCNs - Knowledge Graph Convolutional Networks:https://github.com/graknlabs/kglib/tree/master/kglib/kgcn
Introduction to Knowledge Graphs: https://www.youtube.com/watch?v=bCxpNNzbz8M
KGCN | Knowledge Graph Convolutional Networks - Machine Learning over a Knowledge Graph: https://www.youtube.com/watch?v=JlcGfwb6CDE

Deep Learning and Graphs:

Graph Neural Networks an Overview: https://towardsdatascience.com/graph-neural-networks-an-overview-dfd363b6ef87
Deep Graph Library: https://www.dgl.ai/
Tensorflow: Neural Structured Learning: https://www.tensorflow.org/neural_structured_learning

Geometric Deep Learning & Graph learning.

(Part 1) What is Geometric Deep Learning & its relation to Graph Learning: https://medium.com/@flawnsontong1/what-is-geometric-deep-learning-b2adb662d91d
(Part 2) Everything you need to know about Graph Theory for Deep Learning: https://towardsdatascience.com/graph-theory-and-deep-learning-know-hows-6556b0e9891b
(Part 3) Graph Embedding for Deep Learning: https://towardsdatascience.com/overview-of-deep-learning-on-graph-embeddings-4305c10ad4a4
(Part 4) Graph Convolutional Networks for Geometric Deep Learning: https://towardsdatascience.com/graph-convolutional-networks-for-geometric-deep-learning-1faf17dee008
Graph Embeddings Summary: https://towardsdatascience.com/graph-embeddings-the-summary-cc6075aba007

Text GCN:

RBF Neural Networks

https://pythonmachinelearning.pro/using-neural-networks-for-regression-radial-basis-function-networks/
http://mccormickml.com/2013/08/15/radial-basis-function-network-rbfn-tutorial/
rbf - custom keras layer: https://www.kaggle.com/residentmario/radial-basis-networks-and-custom-keras-layers
research net discussion for unknown class: https://www.researchgate.net/post/How_to_determine_unknown_class_using_neural_network
Titanic survivors using RBF: https://medium.com/datadriveninvestor/building-radial-basis-function-network-with-keras-estimating-survivors-of-titanic-a06c2359c5d9
Custom RBF Keras Layer: https://github.com/PetraVidnerova/rbf_keras

Graph Neural Networks (GNN) good resources

CS224W: Machine Learning with Graphs: https://www.youtube.com/playlist?list=PLoROMvodv4rPLKxIpqhjhPgdQy7imNkDn
CS224W: Machine Learning with Graphs (Home Page): http://web.stanford.edu/class/cs224w/
Colab notebook references (introduction): https://colab.research.google.com/drive/1h3-vJGRVloF5zStxL5I0rSy4ZUPNsjy8?usp=sharing#scrollTo=H_VTFHd0uFz6
Node Classification: https://colab.research.google.com/drive/14OvFnAXggxB8vM4e8vSURUp1TaKnovzX
Graph Classification: https://colab.research.google.com/drive/1I8a0DfQ3fI7Njc62__mVXUlcAleUclnb
Scaling Graph Neural Networks: https://colab.research.google.com/drive/1XAjcjRHrSR_ypCk_feIWFbcBKyT4Lirs#scrollTo=KDy46FIQ6OWN

Probabilistic programming in tensorflow

Bayesian Optimization

RoBo - Bayesian Optimization Framework: https://automl.github.io/RoBO/tutorials.html
Implementing bayesian optimization from scratch: https://machinelearningmastery.com/what-is-bayesian-optimization/

Variational Autoencoder

VAE an intutive explanation: https://hsaghir.github.io/data_science/denoising-vs-variational-autoencoder/
text generation using VAE: https://nicgian.github.io/text-generation-vae/
text VAE in keras: http://alexadam.ca/ml/2017/05/05/keras-vae.html
tutorial on VAE: https://tiao.io/post/tutorial-on-variational-autoencoders-with-a-concise-keras-implementation/
From Autoencoders to Beta-VAE (Disentangled VAE): https://lilianweng.github.io/lil-log/2018/08/12/from-autoencoder-to-beta-vae.html
Autoencoder - Image Compression: https://ai.googleblog.com/2016/09/image-compression-with-neural-networks.html
video: https://www.youtube.com/watch?v=9zKuYvjFFS8

Reinforcement Learning

Dynamic Programming: https://web.stanford.edu/class/cs97si/04-dynamic-programming.pdf
When are Monte Carlo methods preferred over Temporal Difference methods: https://stats.stackexchange.com/questions/336974/when-are-monte-carlo-methods-preferred-over-temporal-difference-ones
https://simoninithomas.github.io/Deep_reinforcement_learning_Course/#
Off-Policy Monte Carlo Control: https://cs.wmich.edu/~trenary/files/cs5300/RLBook/node56.html
https://www.learndatasci.com/tutorials/reinforcement-q-learning-scratch-python-openai-gym/
Reinforcement Learning Course: https://simoninithomas.github.io/deep-rl-course/

PGM/ Causal Inference in ML

Using Deep Neural Network Approximate Bayesian Network: https://arxiv.org/pdf/1801.00282.pdf
A Comprehensive guide to Bayesian Convolutional Neural Network with Variational Inference https://arxiv.org/pdf/1901.02731.pdf
Bayesian Methods for Hackers: https://github.com/CamDavidsonPilon/Probabilistic-Programming-and-Bayesian-Methods-for-Hackers
Causal Inference survey of current areas of research: https://stats.stackexchange.com/questions/328602/what-are-some-current-research-areas-of-interest-in-machine-learning-and-causal
Causal Inference in everyday ML: https://www.youtube.com/watch?v=HOgx_SBBzn0
Causal Inference in everyday ML notebook: https://colab.research.google.com/drive/1rjjjA7teiZVHJCMTVD8KlZNu3EjS7Dmu#scrollTo=qsuGNCvtVbsr

Information Theory of Deep Learning

Bayesian Deep Learning

Kalman Filters

http://www.bzarg.com/p/how-a-kalman-filter-works-in-pictures/

Geometric deep learning

http://geometricdeeplearning.com/

Neuraxle

Math & Deep learning

Aerin Kim is a senior research engineer at Microsoft and writes about topics related to applied Math and deep learning: https://towardsdatascience.com/@aerinykim
Matrices as tensor n/w diagrams: https://www.math3ma.com/blog/matrices-as-tensor-network-diagrams

Multitask Learning (MTL)

An Overview of Multi-Task Learning in Deep Neural Networks: https://ruder.io/multi-task/

Weak Supervison & Semi-Supervised Learning

Hazy Research Group (leading research group on Weak Supervision): http://hazyresearch.stanford.edu/
Software 2.0 and Data Programming: Lessons Learned, and What’s Next: http://hazyresearch.stanford.edu/software2
Weak Supervision: A New Programming Paradigm for Machine Learning: http://ai.stanford.edu/blog/weak-supervision/
Introducing Snorkel: https://www.snorkel.org/blog/hello-world-v-0-9
Snorkel Tutorial: https://www.snorkel.org/use-cases/
Fonduer: Knowledge Base Construction from Richly Formatted Data (Intro): https://github.com/HazyResearch/fonduer-tutorials/tree/master/intro
Snorkel (Paper): https://arxiv.org/pdf/1711.10160.pdf
Training Classifiers with Natural Language Explanations (paper): https://arxiv.org/pdf/1805.03818.pdf
Babble Labble: https://github.com/HazyResearch/babble
Sippy Cup (tool for Semantic Parsing): https://github.com/wcmac/sippycup
Data Programming:Creating Large Training Sets, Quickly (paper): https://arxiv.org/pdf/1605.07723.pdf
Snuba: Automating Weak Supervision to Label Training Data (paper): http://www.vldb.org/pvldb/vol12/p223-varma.pdf
Confident Learning: Estimating Uncertainty in Dataset Labels (paper): https://arxiv.org/abs/1911.00068
CleanLab package: https://l7.curtisnorthcutt.com/cleanlab-python-package
CleanLab (github): https://github.com/cgnorthcutt/cleanlab
An Introduction to Confident Learning: https://l7.curtisnorthcutt.com/confident-learning
Simplified Confident Learning tutorial: https://github.com/cgnorthcutt/cleanlab/blob/master/examples/simplifying_confident_learning_tutorial.ipynb
HoloClean:Weakly Supervised Data Repairing: https://holoclean.github.io/gh-pages/blog/holoclean.html
Introduction to Semantic Parsing (SippyCup): https://nbviewer.jupyter.org/github/wcmac/sippycup/blob/master/sippycup-unit-0.ipynb
Using Snorkel for Multilabel classification: https://towardsdatascience.com/using-snorkel-for-multi-label-annotation-cc2aa217986a
MixMatch: A Holistic Approach to Semi-Supervised Learning: https://arxiv.org/abs/1905.02249
Human Learn (humans in the loop ML): https://koaning.github.io/human-learn/
Vincent Warmerdam - Playing by the Rules-Based-Systems: https://www.youtube.com/watch?v=nJAmN6gWdK8
Intro to Data Slicing (Slicing functions paradigm): https://www.snorkel.org/use-cases/03-spam-data-slicing-tutorial
(SliciClassifier code): https://github.com/snorkel-team/snorkel/tree/master/snorkel/slicing
Slice-based Learning: A Programming Model for Residual Learning in Critical Data Slices: https://arxiv.org/pdf/1909.06349.pdf
Crowdsourcing label generalization using Snorkel: https://www.snorkel.org/use-cases/crowdsourcing-tutorial

AutoML

AutoML org: https://www.automl.org/book/
AutoML free course: https://ki-campus.org/courses/automl-luh2021
Deep Learning 2.0: How Bayesian Optimization May Power the Next Generation of DL by Frank Hutter: https://www.youtube.com/watch?v=LPFPp7594Zc&t=49s

Future research topics

self supervised learning - https://thenextweb.com/neural/2020/04/05/self-supervised-learning-is-the-future-of-ai-syndication/
Software 2.0 and Data Programming: Lessons Learned, and What’s Next: http://hazyresearch.stanford.edu/software2
NLP Transfer learning Thesis (Sebastian Ruder): https://ruder.io/thesis/neural_transfer_learning_for_nlp.pdf
Mixture Density Networks: https://towardsdatascience.com/a-hitchhikers-guide-to-mixture-density-networks-76b435826cca
Vincent Warmerdam: Gaussian Progress: https://www.youtube.com/watch?v=aICqoAG5BXQ&t=738s

ML Engineering

Docker

https://www.analyticsvidhya.com/blog/2017/11/reproducible-data-science-docker-for-data-science/
Docker for ML: https://pratos.github.io/2017-04-24/docker-for-data-science-part-1/
conceptual - introduction to VM's and Docker: https://medium.freecodecamp.org/a-beginner-friendly-introduction-to-containers-vms-and-docker-79a9e3e119b
lighter docker images: https://medium.com/swlh/build-fast-deploy-faster-creating-lighter-docker-images-11540ce0db14
How to reduce Python docker image size: https://stackoverflow.com/questions/48543834/how-do-i-reduce-a-python-docker-image-size-using-a-multi-stage-build
Docker Multi-stage builds for Python: https://pythonspeed.com/articles/multi-stage-docker-python/

Advanced/Intermediate Python

OOPS & others

Python OOP tutorial: https://www.youtube.com/watch?v=ZDa-Z5JzLYM
Python 10 mins a day: https://python-10-minutes-a-day.rocks/
OOPS illustrated using ML example: https://dziganto.github.io/classes/data%20science/linear%20regression/machine%20learning/object-oriented%20programming/python/Understanding-Object-Oriented-Programming-Through-Machine-Learning/
vectorized string operations in Python(using pandas): https://jakevdp.github.io/PythonDataScienceHandbook/03.10-working-with-strings.html
Use YouTube as a Free Screencast Recorder: https://www.youtube.com/watch?v=0i9C8GpRedc
Parallel processing in Python: https://www.machinelearningplus.com/python/parallel-processing-python/
https://thispointer.com/5-different-ways-to-read-a-file-line-by-line-in-python/
https://www.learnpython.org/en/Map,_Filter,_Reduce
cytoolz: https://pypi.org/project/cytoolz/
https://cmdlinetips.com/2019/03/how-to-get-top-n-rows-with-in-each-group-in-pandas/
intermediate, tips for python: https://book.pythontips.com/en/latest/index.html
Logging best practices: https://tutorialedge.net/python/python-logging-best-practices/
Logging best practices: https://coralogix.com/blog/python-logging-best-practices-tips/
logging cookbook: https://docs.python.org/3/howto/logging-cookbook.html

writing better code for DS:

Generators

Jeff Knup's blog: 'Yield' and Generator Functions: https://jeffknupp.com/blog/2013/04/07/improve-your-python-yield-and-generators-explained/
Corey Schafer (YouTube video): Generator functions: https://www.youtube.com/watch?v=bD05uGo_sVI
Data streaming in Python: generators, iterators, iterables: https://rare-technologies.com/data-streaming-in-python-generators-iterators-iterables/
Python inheritance, multiple inheritance & operator overloading: https://www.programiz.com/python-programming/inheritance
Python Closure, Decorators & Python property: https://www.programiz.com/python-programming/closure
Inheritance and Composition: A Python OOP Guide: https://realpython.com/inheritance-composition-python/
Python’s super() considered super!: https://rhettinger.wordpress.com/2011/05/26/super-considered-super/
Dunder or magic methods in Python: https://rszalski.github.io/magicmethods/
intermediate python - https://realpython.com/intermediate-python/
What Does It Take To Be An Expert At Python?:https://www.youtube.com/watch?v=7lmCu8wz8ro&list=FLAKjydk4USm6YWYdxcdzjoA&index=18&t=138s (notebook: https://github.com/saneshashank/code-vault/blob/master/python_expert_notebook.ipynb)

Data products

Designing great data products:The Drivetrain Approach - https://www.oreilly.com/radar/drivetrain-approach-data-products/
What do machine learning practitioners actually do? - https://www.fast.ai/2018/07/12/auto-ml-1/
From Predictive Modelling to Optimization: The Next Frontier: https://www.youtube.com/watch?v=vYrWTDxoeGg

Architecture considerations

https://towardsdatascience.com/putting-ml-in-production-i-using-apache-kafka-in-python-ce06b3a395c8
https://towardsdatascience.com/putting-ml-in-production-ii-logging-and-monitoring-algorithms-91f174044e4e
https://towardsdatascience.com/getting-started-with-mlflow-52eff8c09c61
https://towardsdatascience.com/creating-a-solid-data-science-development-environment-60df14ce3a34
https://medium.com/bcggamma/an-ensemble-approach-to-large-scale-fuzzy-name-matching-b3e3fa124e3c
DVC version control for ML projects: https://dvc.org/
papermill :https://papermill.readthedocs.io/en/latest/
Minimum Valuable Data Products: From 0 to data science pipeline: https://www.youtube.com/watch?v=UZg45yRTzwo
FastAPI: https://github.com/tiangolo/fastapi
Serve a machine learning model using Sklearn, FastAPI and Docker: https://medium.com/analytics-vidhya/serve-a-machine-learning-model-using-sklearn-fastapi-and-docker-85aabf96729b
Deploying Machine Learning Models with FastAPI and Angular: https://rubikscode.net/2020/11/23/deploying-machine-learning-models-with-fastapi-and-angular/
FastAPI Video Tutorial: https://www.youtube.com/watch?v=Fzdn_ZovZUY&list=PL5gdMNl42qynpY-o43Jk3evfxEKSts3HS
Fullstack DS 1: https://medium.com/applied-data-science/the-full-stack-data-scientist-part-1-productionise-your-models-with-django-apis-7799b893ce7c
Fullstack DS 2: https://medium.com/applied-data-science/the-full-stack-data-scientist-part-2-a-practical-introduction-to-docker-1ea932c89b57
Fullstack DS 3: https://medium.com/applied-data-science/a-case-for-interpretable-data-science-using-lime-to-reduce-bias-e44f48a95f75
Fullstack DS 4: https://medium.com/applied-data-science/the-full-stack-data-scientist-part-4-building-front-ends-in-streamlit-1c2903d4b1fe
Fullstack Deep Learning: https://course.fullstackdeeplearning.com/

Streamlit

Repository of Awesome streamlit apps: https://awesome-streamlit.org/
Summarizer and Named Entity Checker App with Streamlit and SpaCy: https://blog.jcharistech.com/2019/11/28/summarizer-and-named-entity-checker-app-with-streamlit-and-spacy/

programming environments

nbdev: https://www.fast.ai/2019/12/02/nbdev/
https://github.com/fastai/nbdev/
https://medium.com/overstoryai/how-nbdev-helps-us-structure-our-data-science-workflow-in-jupyter-notebooks-9cf6081b051f
How to use Jupyter Notebooks in 2020 (Part 2: Ecosystem growth): https://ljvmiranda921.github.io/notebook/2020/03/16/jupyter-notebooks-in-2020-part-2/

Additional steps for nbdev on Windows 10 (so that `make docs_serve` command runs and documentation is visible locally)

Installing MinGW
- Install MinGW (for make command to be run):http://www.mingw.org/wiki/getting_started (download mingw-get-setup.exe and follow default instructions)
- In C:\MinGW\bin locatio rename mingw32-make.exe to make.exe
- Add path C:\MinGW\bin to system path variable (https://stackoverflow.com/questions/23723364/windows-7-make-is-not-recognized-as-an-internal-or-external-command-operabl)

You might also have to add Git (C:\Program Files\Git\bin) to Path in windows system variables.

Installing Ruby & Jeykyll:
- Install Ruby: https://rubyinstaller.org/
- Install the jekyll and bundler gems:gem install jekyll bundler
- go into the docs folder S:\deck_of_cards\docs (in this folder you will find the Gem and run the following command: bundle install`

That's it this will complete setup, now make docs_serve can be run

Dashboarding in jupyter notebook

Dashboarding with Jupyter Notebooks, Voila and Widgets: https://www.youtube.com/watch?v=VtchVpoSdoQ
Voila: https://voila.readthedocs.io/en/stable/index.html

The missing CS semester

https://missing.csail.mit.edu/

writing your own blog:

Blog: https://www.fast.ai/2020/01/16/fast_template/

Visual C++ build tools:

https://visualstudio.microsoft.com/downloads/ --> choose --> Build Tools for Visual Studio 2019 --> in the installer choose build tools --> choose win 10 SDK only

Chatbots

Using voice to control a website with Amazon Alexa: https://blog.prototypr.io/using-voice-commands-to-control-a-website-with-amazon-echo-alexa-part-1-6-a35edbfef405
How to build a State-of-the-Art Conversational AI with Transfer Learning: https://medium.com/huggingface/how-to-build-a-state-of-the-art-conversational-ai-with-transfer-learning-2d818ac26313
How to build a State-of-the-Art Conversational AI with Transfer Learning (github link): https://github.com/huggingface/transfer-learning-conv-ai

Scaling ML projects:

Spark

Why should one use spark for ML: https://www.infoworld.com/article/3031690/analytics/why-you-should-use-spark-for-machine-learning.html
Multi-Class Text Classification with PySpark: https://towardsdatascience.com/multi-class-text-classification-with-pyspark-7d78d022ed35

Scaling Pandas: Comparing Dask, Ray, Modin, Vaex, and RAPIDS

https://www.datarevenue.com/en-blog/pandas-vs-dask-vs-vaex-vs-modin-vs-rapids-vs-ray

Modin

Increase pandas speed 4x using modin (also contains comparison b/w pandas & modin - cases where pandas is actually better!!): https://www.kdnuggets.com/2019/11/speed-up-pandas-4x.html
value error while using modin: modin-project/modin#872
DataFrame._to_pandas() to convert modin dataframe to pandas

Dask

Dask Overview (Why Dask): https://docs.google.com/presentation/d/e/2PACX-1vSTH2kAR0DCR0nw8pFBe5kuYbOk3inZ9cQfZbzOIRjyzQoVaOoMfI2JONGBz-qsvG_P6g050ddHxSXT/pub?start=false&loop=false&delayms=60000&slide=id.p
Difference b/w Dask & Spark: https://docs.dask.org/en/latest/spark.html

Ray

Tips & Tricks for first time Ray users: https://rise.cs.berkeley.edu/blog/ray-tips-for-first-time-users/

Prefect

Prefect: https://docs.prefect.io/core/getting_started/first-steps.html
Complete Prefect Tutorial: https://docs.prefect.io/core/tutorial/01-etl-before-prefect.html
Why Prefect (Why not Airflow ?): https://docs.prefect.io/core/getting_started/why-not-airflow.html#overview

Project Structure

Manage your Data Science project structure in early stage: https://towardsdatascience.com/manage-your-data-science-project-structure-in-early-stage-95f91d4d0600
Productionizing NLP Models: https://medium.com/modern-nlp/productionizing-nlp-models-9a2b8a0c7d14
Hypermodern Python: https://cjolowicz.github.io/posts/hypermodern-python-01-setup/
Hypermodern python template: https://github.com/cjolowicz/cookiecutter-hypermodern-python
Cookiecutter ds template: https://github.com/drivendata/cookiecutter-data-science
Cookiecutter (creating project templates): https://github.com/cookiecutter/cookiecutter

MLOPs

Components of a Production ML System Using Only Python: https://towardsdatascience.com/a-simple-mlops-pipeline-on-your-local-machine-db9326addf31
Video Link for above: https://www.youtube.com/watch?v=VMj-3S1tku0&t=129s

Machine Learning Design Patterns

Design Patterns in Machine Learning Code and Systems: https://eugeneyan.com/writing/design-patterns/

References (Usecase specific applications):

What is an ad impression: https://www.mediapost.com/publications/article/219695/the-definition-of-an-ad-impression.html
ML in fraud detection: https://www.marutitech.com/machine-learning-fraud-detection/
Customer Segmentation: http://analyticstraining.com/2011/cluster-analysis-for-business/
Telecom churn customer model: https://parcusgroup.com/Telecom-Customer-Churn-Prediction-Models
customer churn in mobile markets: https://arxiv.org/ftp/arxiv/papers/1607/1607.07792.pdf
Survey text analytics: https://www.linkedin.com/pulse/how-choose-survey-text-analysis-software-discussion-draft-fitzgerald?trk=prof-post
What’s Your Customer Effort Score?: https://www.gartner.com/smarterwithgartner/unveiling-the-new-and-improved-customer-effort-score/
A Guide to Customer Satisfaction Metrics - NPS vs CSAT and CES: https://www.retently.com/blog/customer-satisfaction-metrics/
Building an NLP solution to provide in-depth analysis of what your customers are thinking is a serious undertaking, and this guide helps you scope out the entire project: https://www.kdnuggets.com/2020/03/build-feedback-analysis-solution.html
Predictive Customer Analytics (Good overview of CSAT solution): https://towardsdatascience.com/predictive-customer-analytics-4064d881b649 (part 1)
Latent Aspect Ratio Analysis (LARA) for CSAT: https://github.com/redris96/LARA
CSAT Key topics extraction and contextual sentiment of users’ reviews: https://tech.goibibo.com/key-topics-extraction-and-contextual-sentiment-of-users-reviews-20e63c0fd7ca
Predicting Parts required in Field Service Industry: https://medium.com/analytics-vidhya/parts-prediction-given-the-problem-description-6767c3d7e8ed (https://github.com/navraj28/OSPPre)
Graph of Google Query (Can be used in Market Research): https://anvaka.github.io/vs/?query=
Machine learning for predictive maintenance: where to start?:https://medium.com/bigdatarepublic/machine-learning-for-predictive-maintenance-where-to-start-5f3b7586acfb
machine failure prediction dataset: https://www.kaggle.com/c/machine-failure-prediction/data
Making sense of news — the knowledge graph way: https://medium.com/neo4j/making-sense-of-news-the-knowledge-graph-way-d33810ce5005

Some good sites to follow

MWL (Made with ML): Your one-stop platform to explore, learn and build all things machine learning: https://madewithml.com/
Code Calmly: https://calmcode.io/
Practical Business Python: https://pbpython.com/
Sebastian Ruder (NLP Research): https://ruder.io/
Putting ML in Prod: https://madewithml.com/courses/putting-ml-in-production/#lessons
https://www.pratik.ai/
https://www.datarevenue.com/en-blog
(Vincent Warmerdam's site ) https://koaning.io/
Algorithm Whiteboard: https://www.youtube.com/playlist?list=PL75e0qA87dlG-za8eLI6t0_Pbxafk-cxb
labml.ai Annotated PyTorch Paper Implementations: https://nn.labml.ai/
Chip Huyen (Designing ML systems): https://github.com/chiphuyen
Lillian weng blog: https://lilianweng.github.io/
eugeneyan blog: https://eugeneyan.com/

Name		Name	Last commit message	Last commit date
Latest commit History 189 Commits
README.md		README.md
p-value-brief.md		p-value-brief.md

saneshashank/Reference-Links

Folders and files

Latest commit

History

Repository files navigation

Reference-Links

General trivia about basic DS libraries

Inferential Statistics

General

Frequentist AB testing:

ANOVA tests:

Machine Learning

General

Feature Scaling

Feature reduction/Feature Selection:

dummy vars

Pipelines in sklearn

Boosting:

Metrics for ML model evaluation

Dimensionality Reduction

PCA

t-SNE

ICA

Model Stacking

Distance Metrics

Clustering

Handling imbalanced data set:

Handling Skewed data:

Multi-label Classification

Probability Callibration:

Sparse Matrix

Design of Experiment

Model Explainibility/Interpretable ML/ Fairness in AI/ Responsible AI

Anomaly Detection

Semi Supervised Learning & Active Learning

Model Fracking and Concept Drift:

Time Series

Open Datasets

XGBoost Installation:

Deep Learning:

General

CNN and Image Processing:

1D - CNNs

Keras Embedding Layer

Keras generators

Saving Keras Models

Clustering using DL

Large Model Support usage in keras

Image Captioning

Image Segmentation:

Natural Language Processing (NLP) and Natural Language Understanding (NLU)

General

Text Classification using Deep Learning:

Spacy resources

Topic Modelling:

Top2Vec

Text Summarization:

keyword-phrase extraction/keyphrase extraction/phrase extraction

Gensim

Natural Language Understanding

NLG

EVT

Doc2Vec

Information Retrieval, text search & semantic search:

Longform Question Answering

NLP spell correction:

Transfer learning in NLP:

Latest Language Models usage & applications

Text Data Augmentation

Advanced Topics

Knowledge Graphs

Deep Learning and Graphs:

Geometric Deep Learning & Graph learning.

Text GCN:

RBF Neural Networks

Graph Neural Networks (GNN) good resources

Probabilistic programming in tensorflow

Bayesian Optimization

Variational Autoencoder

Reinforcement Learning

Additional steps for nbdev on Windows 10 (so that `make docs_serve` command runs and documentation is visible locally)

Packages