Day 1 Morning
- Course intro (Dafne)
- Introduction slides (1h) (Dafne)
- NB1 (45min) (Sven)
- NB2 (30min) (Dafne)
- Theory on (binary) classification (45min) (Sven)
- Decisiontree
- Nearest neighbor
- NB3 (45min) (Sven)
Afternoon
- Theory on performance (20min) (Dafne
- NB4 (45min) (Dafne)
- Apply skills to LISS (Sven)
- Presentation Joris Mulder
Day 2 Morning
- Theory on regression (45min)
- Linear regression
- Neural net
- NB5 (30min)
- NB6 - Feature selection
- Best practices
Afternoon
-
Apply to LISS data (Sven)
-
Presentation Wouter van Atteveldt
A laptop with anaconda, python 3.9 and the latest versions of the following dependencies:
- scikit-learn
- pandas
- numpy
- matplotlib
- jupyter-notebook
- jupyterlab
- seaborn
To be honest, any recent version of python and the aforementioned list of dependencies will probably work fine. However, if you are running into problems, the instructions below should give you a working setup.
You will need to have anaconda installed. The website will provide instructions for your operating system.
Open a terminal, (conda prompt on windows), and clone our setup git repo:
git clone https://github.com/esciencecenter-digital-skills/SICSS-setup.git
Then install the conda environment as follows:
cd SICSS-setup
conda env create -f environment.yml
Now activate this conda environment:
conda activate SICSS
To check if your environment is running correclty, you can run our test script:
python check_setup.py
It should output Your environment is has been correctly set up!
if it ran succesfully.
- Introduction - Slides to be created
from introduction content
- What is ML
- AI, ML and DL
- ML and Statistics
- Types of ML
- Supervised learning
- Regression
- Classification
- Unsupervised learning
- Clustering
- Dimensionality Reduction
- Reinforcement learning
- Supervised learning
- Limitations of machine learning
- Data
- Extrapolation
- Interpretation of Results
- Machine learning glossary
- ML Workflow (with scikit-learn code) - **
Adapt notebook 1**
- Formulate / Outline the problem
- Identify inputs and outputs (data exploration)
- Intro Pandas, numpy, seaborn
- Data statistics and plots
- conversion (e.g. from Yes/No to 1/0)
- Prepare data (preprocessing)
- notebook 2
- check missing data
- clean data
- splitting data
- notebook 2
- Choose an algorithm
- notebook 3
- Use sklearn.dummy.DummyRegressor
- notebook 3
- Train the model
- Perform a Prediction/Classification (applying the model)
- Measure performance (validate the model)
- notebook 4
- Cross validation
- notebook 4
- Save model
- Regression example - Create slides on models
- Ordinary Least squares
- SVM
- Classification example - Create slides on models
- Nearest neighbors
- Decision trees
- Random forest
- Metrics- Create slides
- Classification
- F1 score
- Accuracy
- Confusion matrix
- ROC
- F1 score
- Regression
- Classification
- Feature selection / dimensionality reduction - Create notebook
- Cross correlation
- PCA
- tSNE
- Hyper-parameter optimizers
- notebook 4
- sk-learn.model_selection.GridSearchCV
- notebook 4
- ML algorithms
- Nearest neighbors
- Ordinary Least squares
- Logistic regression
- Naïve Bayes
- Decision trees
- Random forest
- SVM
- Neural net
- Single-layer perceptron
- Multi-layer perceptron
- Best practices
- Exercise (+Q&A, whole afternoon)
- Setup own experiment (with their own dataset and questions)
- Useful resources