Skip to content

esciencecenter-digital-skills/SICSS-odissei-machine-learning

Repository files navigation

odissei-machine-learning

TOC

Day 1 Morning

  • Course intro (Dafne)
  • Introduction slides (1h) (Dafne)
  • NB1 (45min) (Sven)
  • NB2 (30min) (Dafne)
  • Theory on (binary) classification (45min) (Sven)
    • Decisiontree
    • Nearest neighbor
  • NB3 (45min) (Sven)

Afternoon

  • Theory on performance (20min) (Dafne
  • NB4 (45min) (Dafne)
  • Apply skills to LISS (Sven)
  • Presentation Joris Mulder

Day 2 Morning

  • Theory on regression (45min)
    • Linear regression
    • Neural net
  • NB5 (30min)
  • NB6 - Feature selection
  • Best practices

Afternoon

  • Apply to LISS data (Sven)

  • Presentation Wouter van Atteveldt

Technical Requirements

A laptop with anaconda, python 3.9 and the latest versions of the following dependencies:

  • scikit-learn
  • pandas
  • numpy
  • matplotlib
  • jupyter-notebook
  • jupyterlab
  • seaborn

Setup instructions

To be honest, any recent version of python and the aforementioned list of dependencies will probably work fine. However, if you are running into problems, the instructions below should give you a working setup.

You will need to have anaconda installed. The website will provide instructions for your operating system.

Open a terminal, (conda prompt on windows), and clone our setup git repo:

git clone https://github.com/esciencecenter-digital-skills/SICSS-setup.git

Then install the conda environment as follows:

cd SICSS-setup
conda env create -f environment.yml

Now activate this conda environment:

conda activate SICSS

To check if your environment is running correclty, you can run our test script:

python check_setup.py

It should output Your environment is has been correctly set up! if it ran succesfully.

Potential table of content

  • Introduction - Slides to be created from introduction content
    • What is ML
    • AI, ML and DL
    • ML and Statistics
    • Types of ML
      • Supervised learning
        • Regression
        • Classification
      • Unsupervised learning
        • Clustering
        • Dimensionality Reduction
      • Reinforcement learning
    • Limitations of machine learning
      • Data
      • Extrapolation
      • Interpretation of Results
    • Machine learning glossary
  • ML Workflow (with scikit-learn code) - ** Adapt notebook 1**
    • Formulate / Outline the problem
    • Identify inputs and outputs (data exploration)
      • Intro Pandas, numpy, seaborn
      • Data statistics and plots
      • conversion (e.g. from Yes/No to 1/0)
    • Prepare data (preprocessing)
      • notebook 2
        • check missing data
        • clean data
        • splitting data
    • Choose an algorithm
    • Train the model
    • Perform a Prediction/Classification (applying the model)
    • Measure performance (validate the model)
    • Save model
  • Regression example - Create slides on models
    • Ordinary Least squares
    • SVM
  • Classification example - Create slides on models
    • Nearest neighbors
    • Decision trees
      • Random forest
  • Metrics- Create slides
    • Classification
      • F1 score
        • Accuracy
      • Confusion matrix
      • ROC
    • Regression
  • Feature selection / dimensionality reduction - Create notebook
    • Cross correlation
    • PCA
    • tSNE
  • Hyper-parameter optimizers
    • notebook 4
      • sk-learn.model_selection.GridSearchCV
  • ML algorithms
    • Nearest neighbors
    • Ordinary Least squares
    • Logistic regression
    • Naïve Bayes
    • Decision trees
    • Random forest
    • SVM
    • Neural net
      • Single-layer perceptron
      • Multi-layer perceptron
  • Best practices
  • Exercise (+Q&A, whole afternoon)
    • Setup own experiment (with their own dataset and questions)
  • Useful resources