odissei-machine-learning

Technical Requirements

A laptop with anaconda, python 3.9 and the latest versions of the following dependencies:

scikit-learn
pandas
numpy
matplotlib
jupyter-notebook
jupyterlab
seaborn

Setup instructions

To be honest, any recent version of python and the aforementioned list of dependencies will probably work fine. However, if you are running into problems, the instructions below should give you a working setup.

You will need to have anaconda installed. The website will provide instructions for your operating system.

Open a terminal, (conda prompt on windows), and clone our setup git repo:

git clone https://github.com/esciencecenter-digital-skills/SICSS-setup.git

Then install the conda environment as follows:

cd SICSS-setup
conda env create -f environment.yml

Now activate this conda environment:

conda activate SICSS

To check if your environment is running correclty, you can run our test script:

python check_setup.py

It should output Your environment is has been correctly set up! if it ran succesfully.

Potential table of content

Introduction - Slides to be created from introduction content
- What is ML
- AI, ML and DL
- ML and Statistics
- Types of ML
  - Supervised learning
    - Regression
    - Classification
  - Unsupervised learning
    - Clustering
    - Dimensionality Reduction
  - Reinforcement learning
- Limitations of machine learning
  - Data
  - Extrapolation
  - Interpretation of Results
- Machine learning glossary
ML Workflow (with scikit-learn code) - ** Adapt notebook 1**
- Formulate / Outline the problem
- Identify inputs and outputs (data exploration)
  - Intro Pandas, numpy, seaborn
  - Data statistics and plots
  - conversion (e.g. from Yes/No to 1/0)
- Prepare data (preprocessing)
  - notebook 2
    - check missing data
    - clean data
    - splitting data
- Choose an algorithm
  - notebook 3
    - Use sklearn.dummy.DummyRegressor
- Train the model
  - notebook 3
- Perform a Prediction/Classification (applying the model)
  - notebook 3
- Measure performance (validate the model)
  - notebook 4
    - Cross validation
- Save model
Regression example - Create slides on models
- Ordinary Least squares
- SVM
Classification example - Create slides on models
- Nearest neighbors
- Decision trees
  - Random forest
Metrics- Create slides
- Classification
  - F1 score
    - Accuracy
  - Confusion matrix
  - ROC
- Regression
Feature selection / dimensionality reduction - Create notebook
- Cross correlation
- PCA
- tSNE
Hyper-parameter optimizers
- notebook 4
  - sk-learn.model_selection.GridSearchCV
ML algorithms
- Nearest neighbors
- Ordinary Least squares
- Logistic regression
- Naïve Bayes
- Decision trees
- Random forest
- SVM
- Neural net
  - Single-layer perceptron
  - Multi-layer perceptron
Best practices
Exercise (+Q&A, whole afternoon)
- Setup own experiment (with their own dataset and questions)
Useful resources

Name		Name	Last commit message	Last commit date
Latest commit History 152 Commits
.github/workflows		.github/workflows
docs		docs
files		files
notebooks		notebooks
.gitignore		.gitignore
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
check_install.py		check_install.py
environment.yml		environment.yml
existing_resources.md		existing_resources.md
index.html		index.html
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

odissei-machine-learning

TOC

Technical Requirements

Setup instructions

Potential table of content

About

Releases

Packages

Contributors 5

Languages

License

esciencecenter-digital-skills/SICSS-odissei-machine-learning

Folders and files

Latest commit

History

Repository files navigation

odissei-machine-learning

TOC

Technical Requirements

Setup instructions

Potential table of content

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 5

Languages

Packages