-
Notifications
You must be signed in to change notification settings - Fork 2
/
UpskillingResearchersInML.qmd
484 lines (348 loc) · 13.3 KB
/
UpskillingResearchersInML.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
---
title: "Upskilling Researchers in Machine Learning"
# subtitle: "Lightning talk ⚡️"
author: "Maxime Rio, Jens Brinkmann"
institute: "New Zealand eResearch Infrastructure (NeSI), The University of Auckland (UoA)"
date: 2023-10-17
date-format: full
# bibliography: refs.bib
from: markdown+emoji
# embed-resources: false
format:
revealjs:
self_contained: true
# tbl-cap-location: bottom
# number-sections: true
theme: UoAtemplate.scss
multiplex:
id: 'a140d30bbdb06469'
secret: '16962778612049606149'
# css: ./logo.css
# disableLayout: true
navigation-mode: vertical
controls-layout: bottom-right
controls-tutorial: true
transition: convex
view-distance: 10
width: 1600
height: 900
margin: 0.1
# logo: "./UoALogoDarkBlueRGBLandscape.png"
title-slide-attributes:
data-background-image: NeSIAndUoA.jpg
data-background-size: 20%
# data-background-postion: right
data-background-position: 95% 76%
# data-background-position: bottom 10px right 20px
# background-image: ./ResBaz_transparent_Logo_cropped.svg
# background-opacity: 0.5
# background-position: bottom 10px right 20px
# background-size: contain
# data-background-repeat: no-repeat
# background-size: 80px
# background-repeat: no-repeat
# background-position: 0% 100%
# logo: ResBaz_transparent_cropped.svg
# self_contained: false
reveal_plugins: ["menu"]
reveal_options:
menu:
numbers: true
header: Upskilling Researchers in Machine Learning
header-logo: NeSIAndUoA.jpg
hide-from-titleSlide: all
filters:
- reveal-header
- line-highlight
editor:
render-on-save: true
execute:
enabled: true
---
<!-- based on [this](https://conference.eresearch.edu.au/guidelines-presenter/?utm_source=sendgrid.com&utm_medium=email&utm_campaign=website): *Oral Presentations (15 minutes plus 5 minutes for questions and changeover): Short conversation starters which provide enough information to encourage the audience to engage and seek further information.* -->
# Who are we?
:::: {.columns}
::: {.column width=45%}
**Maxime Rio**
![](./UpskillingResearchersInML_Assets/flying_max.jpg){width=40% fig-align="center"}
- *Data Science Engineer @ NeSI*
- *Data Scientist @ NIWA*
- Help researchers optimise and scale-up <br> their code
- Develop ML pipelines and models
- Organise ML and data science **training**
:::
::: {.column width=55%}
**Jens Brinkmann**
![](./UpskillingResearchersInML_Assets/Jens_2022.jpg){width=30% fig-align="center"}
- *Senior eResearch Engagement Specialist @ UoA*
- Mechanical Engineer with a background in Photography/Videography
- Support researchers with their computational needs and training around that
:::
::::
# About this talk
- We want to tell you about our **experience** with Machine Learning (ML) workshops.
- We want to share some **recommendations**.
- *You can do it!*
<!-- MAKE SURE SPACE IS HERE -->
- The [Lightning Talk earlier today](https://conference.eresearch.edu.au/developing-a-carpentries-style-machine-learning-workshop/) was mainly about answering *what* we did and providing metrics.
- [This current talk](https://conference.eresearch.edu.au/upskilling-researchers-in-machine-learning/) is focused on the **delivery**.
- [BoF](https://conference.eresearch.edu.au/ai-skill-training-pathway-bridging-gaps-and-fostering-inclusivity/) (right after this talk) will be a *broader discussion*
# A shared goal
:::: {.columns}
::: {.column width=50%}
- Introduce researchers to Machine Learning and Deep Learning
- Start with **foundational** ML aspects and build up from there
- Research-field agnostic
- **Hands-on** approach (teaching by doing)
- **no** show and tell of existing commercial solutions
- **no** theoretical lectures
- **no** deep dive into the maths behind ML
- Not an exhaustive course but make attendees confident to try things by themselves
:::
::: {.column width=50%}
![](./UpskillingResearchersInML_Assets/BingAIDalle3.jpeg){width=70% fig-align="center"}
*Figure 1: [AI Created Photo (DallE3, Bing)](https://www.bing.com/images/create/a-cool-photo-showing-machine-learning/6525c46fdd2b4dabb631b07a2a527ba6?id=BeZ%2bVfneQKjueVSP0cGXoQ%3d%3d&view=detailv2&idpp=genimg&idpclose=1&FORM=SYDBIC&ssp=1&setlang=en-NZ&safesearch=moderate).*
<!-- had to be an AI generated image, we can adjust -->
:::
::::
# Workshops overview
<!-- goal: give some credibility to our recommendations and advise about recruitment process -->
## UoA workshops
:::: {.columns}
::: {.column}
- Audience: The University of Auckland (UoA) researchers
- Two runs
- Run #1: March 2023
- Run #2: September 2023
- Well-received
- filtering by mandatory Expression of Interest (EoI)
- about 100 applications for 40 spots
:::
::: {.column}
![](./UpskillingResearchersInML_Assets/ZoomParticipantsLine.svg)
:::
::::
## NeSI workshops
:::: {.columns}
::: {.column}
![](./UpskillingResearchersInML_Assets/ml101_1.jpg)
*First ML101 workshop at eResearch NZ 2021*
:::
::: {.column}
- Audience: Aotearoa -- NZ researchers
- ML 101
- Intro to Machine Learning
- started in 2021
- 7 workshops (in person, online)
- 127 attendees in total (from 10 to 32)
- ML 102
- Intro to Deep Learning (CNNs)
- started in 2022
- 2 workshops (online)
- 44 attendees in total (20 and 24)
- Mixture of direct registration and EoIs
:::
::::
<!-- numbers (check with Nisha & Matt for ML102)
ML101 2021 eRNZ ~10
ML101 2021 Greta 19
ML101 2021 CRIs 32
ML101 2021 Uni 22
ML101 2022 Pradeesh 11
ML101 2023 May 18
ML101 2023 August 15
ML102 2022 June 20
ML102 2023 July 24
-->
## Recommendations {.center}
There will be **a lot** of interest so...
- use an Expression of Interest for registration and filter,
- 30 participants is a good number for an online training,
- expect people to not show up (if free and online).
# Platform(s)
## UoA workshops
:::: {.columns}
::: {.column}
- online only event: [Zoom.us](https://zoom.us/)
- BYOD (*bring your own device*)
- major deviation from [The Carpentries](https://carpentries.org/index.html): No local Python installs
- [Goolge Colab](https://colab.research.google.com/) ![Colab](./Google_Colaboratory_SVG_Logo.svg){ width=3% },a browser-based Jupyter Notebook using Google infrastructure (a virtual machine; a GPU can be added)
:::
::: {.column}
![*Google Colab in a Browser*](./UpskillingResearchersInML_Assets/colab2.jpg){fig-align="center"}
:::
::::
## NeSI workshops
:::: {.columns}
::: {.column}
<!-- TODO use a better capture on a larger screen -->
![](./UpskillingResearchersInML_Assets/penguins.png)
*JupyterLab session running on Jupyter-on-NeSI*
:::
::: {.column}
- Online and in person
- 2 delivered in person (1 had wifi issues 😓)
- 5 delivered online
- Use Jupyter-on-NeSI
- JupyterHub platform
- Requires a NeSI account
- ML101: 2 cores & 4 GB of RAM
- ML102: 4 cores & 8 GB of RAM
- Use Slurm-based job for GPU training (a little bit)
- Tip: make sure the Platform team does not schedule upgrades that day 😬...
:::
::::
## Recommendations {.center}
- Make it online
- Leverage online computational platforms (Google Colab, JupyterHub, Open OnDemand...)
- No need for GPU to start (or small ones on Google Colab available)
# Schedule
<!-- so this is purely about the hours and the split of content over these hours -->
## UoA workshops
:::: {.columns}
::: {.column}
**Run #1**
| Time Budget | Activity |
|-----------------------|--------------------------|
| two afternoons (8h) | Python |
| one afternoon (4h) | ML |
| two afternoons (8h) | DL |
**Run #2 **
| Time Budget | Activity |
|-----------------------|--------------------------|
| two afternoons (8h) | ML |
| two afternoons (8h) | DL |
:::
::: {.column}
- all workshops took place in the **same week**
- no *mixing and matching*, signing up = coming to **all sessions**
- Major adjustment for Run #2: **Python** as a **prerequisite**, not part of the series
:::
::::
## NeSI workshops
:::: {.columns}
::: {.column}
- ML 101
- 6 hours with 3 breaks ☕
- at first in one day
- now split over 2 mornings
- ML 102
- 3 hours with 2 breaks 🍵
- Independent workshops
- But organised "close" to each other
:::
::: {.column}
![](./UpskillingResearchersInML_Assets/ml101_runsheet.png)
*ML101 runsheet, used to keep track of time*
:::
::::
## Recommendations {.center}
- Split/shorter sessions (bearing in mind the scheduling challenges for researchers)
- Stick to scheduled breaks
- Follow best practices for online audiences:
- get a Zoom DJ, some helpers, get multiple co-hosts
- keep a QA document
- prepare your intro and outro
- make attendees join from the same computer running the code
- [Webinar: Tips & tricks for hosting a successful online event](https://youtu.be/XTeCHUZ2H_w?si=OD4GDRNSV460zK7O)
# Material
## UoA workshops
| Lesson Title | Status | Run #1 | Run #2 | |
|----------------------------------------------------|----------------------------------------------------------------------------------------------|----------|---------------------------|---|
| Programming with Python | [Released](https://swcarpentry.github.io/python-novice-inflammation/) | Mon, Tue | - | |
| Introduction to Machine Learning with Scikit Learn | [Alpha](https://mike-ivs.github.io/machine-learning-novice-sklearn/02-regression/index.html) | Wed | Mon, Tue | |
| Introduction to Deep Learning | [Beta](https://carpentries-incubator.github.io/deep-learning-intro/aio.html) | Thu, Fri | Wed, Thu | |
::: {layout="[[-1], [1], [-1]]"}
![](UpskillingResearchersInML_Assets/Carps.jpg){fig-align="center"}
:::
## NeSI workshops
:::: {.columns}
::: {.column}
![](./UpskillingResearchersInML_Assets/jake.png)
*My rehearsal and source of inspiration 💓*
:::
::: {.column}
**NeSI workshops**
- ML 101 -- [github.com/nesi/sklearn_tutorial](https://github.com/nesi/sklearn_tutorial)
- [Scikit-learn Tutorial](https://github.com/jakevdp/sklearn_tutorial) by
[Jake Vanderplas](https://github.com/jakevdp)
- online recordings exist <br> (very good to rehearse!)
- Jupyter Notebook based
- very few changes (updated package version)
- ML 102 -- [github.com/nesi/ml102_workshop](https://github.com/nesi/ml102_workshop)
- TensorFlow tutorials
- Jupyter Notebook based
- added an introduction
- added section to submit a Slurm job
:::
::::
## Recommendations {.center}
CeR and NeSI independently decided to base the workshops on **existing material**.
- Don't reinvent the wheel
- Reuse/adapt content
<!-- should we mention notebooks? -->
# Content
<!-- this differs from the previous perspective as it is more about the abstract topics covered
IS THIS PREVIOUS STATEMENT WHAT YOU HAD IN MIND? yes :) -->
## {.center}
:::: {.columns}
::: {.column width="60%"}
**Machine Learning**
- *Data preparation*
- Supervised vs. unsupervised learning
- regression
- classification
- clustering
- dimensionality reduction
- Ensemble models (random forests)
- Validation
- train/test/validation split
- cross-validation
- validation and learning curves
:::
::: {.column width="40%"}
**Deep Learning**
- Model architectures
- Multi-layer perceptron
- Convolutional neural network <br> (CNN | computer vision)
- Model training
- optimisers and mini-batch
- overfitting and early stopping
- data augmentation
- dropout, batch normalisation, ...
- Transfer learning
<!-- we should include the DeepLearning topic here, too? Or was the point for you to show: This was covered by both of us?
we should add this one, good point
-->
:::
::::
## Recommendations {.center}
- **Random forest** is a **good** first non-linear model to learn
- intuitive to understand how it works
- good performances on tabular data
- doesn't require too much care in terms of data preparation
- Resist the temptation of **MLP (multi-layer perceptron)** for an ML intro
- require more notions (architecture, training, data preprocessing, ...)
- keep it for Deep Learning introduction
# Summary
- Use Expression-of-Interest for registrations
- Make it online
- Use an online compute platform
- (*Google Colab*, *JupyterHub* or *Open OnDemand* at your institution)
- You don't need fancy GPUs, even though you can have some
- Re-use and adapt existing material
- Use shorter sessions and stick to breaks
- Keep MLP for the Deep Learning section, start with Random Forests
# How to get in touch {.center}
:::: {.columns}
::: {.column width="60%"}
![](./UpskillingResearchersInML_Assets/kereru-taieri-mouth-1920-2.webp)
*Kererū / New Zealand pigeon, not use(ful) for mails* 😅
*[Image Credit: Department of Conservation](https://www.doc.govt.nz/nature/native-animals/birds/birds-a-z/nz-pigeon-kereru/)*
:::
:::{.column width="40%"}
**Maxime Rio** <br> 📨 [[email protected]](mailto:[email protected])
**Jens Brinkmann** <br> 📨 [[email protected]](mailto:[email protected])
:::
::::
<!-- maybe a QR code or two? -->