diff --git a/2024/pics/PedroValero-Lara.jpeg b/2024/pics/PedroValero-Lara.jpeg
deleted file mode 100644
index 2c2ae2c..0000000
Binary files a/2024/pics/PedroValero-Lara.jpeg and /dev/null differ
diff --git a/2024/pics/PhilippeTillet.jpeg b/2024/pics/PhilippeTillet.jpeg
new file mode 100644
index 0000000..ece293b
Binary files /dev/null and b/2024/pics/PhilippeTillet.jpeg differ
diff --git a/2024/program.html b/2024/program.html
index f5fc5dd..d7516ad 100644
--- a/2024/program.html
+++ b/2024/program.html
@@ -3,14 +3,77 @@
title: AsHES Workshop
---
-TBD
-
+
+
Opening Remarks
+
10:30 am - 10:40 am
-
+
Session 1: High-Performance Computing
+
10:40 am - 12:00 pm
+
Session Chair: Shintaro Iwasaki, Meta
+
+ -
+ 10:40 am - 11:00 am
+ Performance Versus Maintainability: A Case Study of Scream on Frontier
+ James White
+
+ -
+ 11:00 am - 11:30 am
+ ParaGraph: Weighted Graph Representation for Performance Optimization of HPC Kernels
+ Ali Tehranijamsaz, Alok Mishra, Akash Dutta, Abid M. Malik, Barbara Chapman, and Ali Jannesari
+
+ -
+ 11:30 am - 12:00 pm
+ Alternative Quadrant Representations with Morton Index and AVX2 Vectorization for AMR Algorithms within the p4rest Software Library
+ Mikhail Kirilin and Carsten Burstedde
+
+
+
+
Lunch Break
+
12:00 pm - 1:00 pm
+
+ -
+ Lunch will not be provided by the conference.
+
+
+
+
Keynote
+
1:00 pm - 2:00 pm
+
Block-based GPU Programming with Triton
+
Philippe Tillet, OpenAI
+
Abstract:
+
+ Traditional single instruction, multiple threads (SIMT) programming with CUDA, for all its benefits, can be daunting to machine learning researchers in need of fast custom kernels. We'll shed light on alternative programming models capable of improving GPU programmability without too much of an impact on expressivity. Some such models have recently emerged (e.g., Exo, MLIR Affine), but these are rarely applicable beyond dense tensor algebra — making them a poor fit for workloads requiring (for example) custom data structures. We'll describe the design and implementation of Triton, a mid-level programming language that uses block-based abstractions to simplify kernel development and fusion for researchers without any GPU programming expertise.
+
+
Bio:
+ Philippe Tillet first began working with GPUs in 2011 as a contributor to the ViennaCL library. He then received his B.S. from Telecom SudParis (France) in 2012, his M.S. from NCTU (Taiwan) in 2014, and his Ph.D. from Harvard University in 2020 with a dissertation on compilers for blocked algorithms on GPUs. He joined OpenAI full time in 2020 to pursue his work on the Triton compiler — a project he started in 2018 after being frustrated by the difficulty of writing auto-tuners for matrix multiplications in CUDA. Since then, he grew the Triton language into a reference for block-based programming model, and wrote all the training kernels that were used by GPT4.
+
+
+
Session 2: Accelerating AI/ML Workloads
+
2:00 pm - 3:10 pm
+
Session Chair: Carl Pearson, Sandia National Laboratories
+
+ -
+ 2:00 pm - 2:30 pm
+ Avoiding Training in the Platform-Aware Optimization Process for Faster DNN Latency Reduction
+ Raúl Marichal, Ernesto Dufrechou, and Pablo Ezzatti
+
+ -
+ 2:30 pm - 2:50 pm
+ A Comparative Study on Simulation Frameworks for AI Accelerator Evaluation
+ Christoffer Åleskog, Håkan Grahn, and Anton Borg
+
+ -
+ 2:50 pm - 3:10 pm
+ Extending the SYCL Joint Matrix for Binarized Neural Networks
+ Zheming Jin
+
+
+
+
Closing Remarks
+
3:10 pm - 3:20 pm
+
+
Presentation
+ All presentations will be in-person.
+ Presenters are expected to target 25 minutes (full papers) or 15 minutes (short papers) for the talks with 5 minutes for questions.
+