Skip to content

Commit

Permalink
WisIO project added.
Browse files Browse the repository at this point in the history
  • Loading branch information
izzet committed Nov 14, 2023
1 parent 0c21d30 commit a4087cf
Show file tree
Hide file tree
Showing 8 changed files with 104 additions and 1 deletion.
15 changes: 15 additions & 0 deletions src/data/projects.ts
Original file line number Diff line number Diff line change
Expand Up @@ -114,6 +114,21 @@ const projects: Project[] = [
status: "active",
type: "funded",
},
{
id: "wisio",
name: "WisIO",
title:
"WisIO: Automated I/O Bottleneck Detection via Multi-Perspective Views for HPC Workloads",
shortDescription:
"Explore WisIO, an automated I/O bottleneck detection tool with multi-perspective views for I/O trace data analysis. Overcoming large-scale I/O challenges, WisIO utilizes distributed computing and an extensible rule engine for tailored solutions. Elevate your I/O analysis in HPC environments with WisIO.",
link: "/research/projects/wisio",
isFeatured: false,
isOpenSource: false,
isOurs: true,
researchStatus: "r&d",
status: "active",
type: "student",
},
];

export default projects;
Expand Down
86 changes: 86 additions & 0 deletions src/pages/research/projects/wisio.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,86 @@
---
title: "WisIO: Automated I/O Bottleneck Detection via Multi-Perspective Views for HPC Workloads"
---

import ProjectBadges from "@site/src/components/projects/ProjectBadges";

<p>
<img
src={require("@site/static/img/projects/wisio/logo.png").default}
width="200"
/>
</p>

# WisIO: Automated I/O Bottleneck Detection via Multi-Perspective Views for HPC Workloads

<ProjectBadges projectId="wisio" />

Highly data-dependent HPC workloads create pressure on storage systems. With increasing storage
diversity in modern HPC systems, complexity challenges users and leads to I/O bottlenecks. Earlier
solutions combined I/O characteristics with expert insight, while recent approaches use performance
analysis tools. However, the multifaceted problem, with numerous metrics, remains challenging for
manual resolution, even for experts.

To this end, we introduce **WisIO**, an automated I/O bottleneck detection tool for HPC workloads.
The key contributions of this work are:

- **Design of WisIO** for automating I/O bottleneck detection
- **A novel approach** to reducing the search space within I/O traces
- **A global and extensible rule engine** for bottleneck detection rules

## Motivation

![](/img/projects/wisio/evalmotivation.png)

Out-of-core I/O analysis queries in a memory-limited environment necessitate query and dataset
optimization for distributed analysis.

## Methodology

* Execute HPC workloads, capture I/O traces via Darshan or Recorder
* Convert and optimize I/O traces to Parquet format
* Generate a multi-perspective view with I/O characteristics
* Compute bottleneck severity scores and produce user-friendly text-based diagnoses through a rule-based engine

## Use Case: Montage (Workflow with Complex Dependencies)

![](/img/projects/wisio/evalmontage.png)

* For efficiency, the process IDs are hashed with respect to their node addresses and hostnames
* Allows us to analyze groups of process IDs effectively
* 1280 ranks perform 3.2M read and 1.6M write operations
* Average bandwidth is low (~3MB/s)
* Multi-perspective analysis takes **42 seconds**
* Process-based analysis produces diagnoses with specific nodes or apps
* **14 diagnoses** produced with severity scores between 52-61.4%

## Use Case: CM1 (Simulation with Separate I/O Phases)

![](/img/projects/wisio/evalcm1.png)

* Timestamps are converted into microseconds as indexing in Dask works faster with non-decimal values
* For precision, the middle point of two timestamp is used in analysis instead of their range
* 94.6% of I/O time is spent during the first 20 seconds
* Rank 0 performs 100% of the writes on small files
* Multi-perspective analysis takes **22 seconds**
* Time-based analysis produces diagnoses with specific time ranges
* Above is an actual text-based diagnosis for CM1
* **20 diagnoses** produced with severity scores between 55-85%

## Use Case: 1000 Genomes (Data-Intensive Workflow)

![](/img/projects/wisio/evalgenome.png)

* The filenames are hashed with respect to folder hierarchy
* Allows us to analyze file directories effectively
* There are 21m files and all of them are accessed file-per-process basis
* Multi-perspective analysis takes **12 minutes**
* File-based analysis produces diagnoses with specific folder hierarchy
* **30 diagnoses** produced with severity scores between 54-67.1%

## Key Takeaways

* WisIO automates I/O bottleneck detection for HPC workloads
* WisIO's novel search space reduction approach enable I/O analysis for large-scale I/O traces
* WisIO's multi-perspective views can detect I/O bottlenecks that might be otherwise overlooked
* WisIO's extensible rule engine allows users to define custom rules
4 changes: 3 additions & 1 deletion src/types.ts
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,9 @@ export type ProjectId =
| "hermes"
| "iris"
| "dtio"
| "labios";
| "labios"
| "viper"
| "wisio";

export type Project = {
id: ProjectId;
Expand Down
Binary file added static/img/projects/wisio/evalcm1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added static/img/projects/wisio/evalgenome.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added static/img/projects/wisio/evalmontage.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added static/img/projects/wisio/evalmotivation.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added static/img/projects/wisio/logo.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit a4087cf

Please sign in to comment.