Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update data policy #704

Closed
wants to merge 2 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion ALCFBeginnersGuide
2 changes: 1 addition & 1 deletion GettingStarted
Submodule GettingStarted updated 75 files
+1 −3 Applications/Polaris/LAMMPS/Makefile.polaris_gnu
+8 −5 Applications/Polaris/LAMMPS/Makefile.polaris_gnu_kokkos
+2 −3 Applications/Polaris/LAMMPS/Makefile.polaris_nvhpc
+2 −5 Applications/Polaris/LAMMPS/Makefile.polaris_nvhpc_kokkos
+7 −13 Applications/Polaris/LAMMPS/README.md
+0 −7 Applications/Polaris/LAMMPS/setup_lammps_gnu.sh
+0 −5 Applications/Polaris/LAMMPS/setup_lammps_nvhpc.sh
+1 −3 Applications/Polaris/LAMMPS/submit_gpu.sh
+1 −3 Applications/Polaris/LAMMPS/submit_kokkos.sh
+200 −200 DataScience/DeepHyper/dh_projects/breast_cancer/exp/results.csv
+3 −3 DataScience/DeepHyper/dh_projects/polynome2_hps/exp/results.csv
+4 −4 DataScience/DeepHyper/dh_projects/polynome2_nas/exp/results.csv
+1 −1 Examples/Aurora/affinity_gpu/lz/submit_1pergpu.sh
+1 −1 Examples/Aurora/affinity_gpu/opencl/submit_1pergpu.sh
+1 −1 Examples/Aurora/affinity_gpu/openmp/submit_1pergpu.sh
+1 −1 Examples/Aurora/affinity_gpu/sycl/Makefile
+67 −31 Examples/Aurora/affinity_gpu/sycl/main.cpp
+1 −1 Examples/Aurora/affinity_gpu/sycl/submit_1pergpu.sh
+0 −45 Examples/Crux/affinity/Makefile
+0 −56 Examples/Crux/affinity/README.md
+0 −61 Examples/Crux/affinity/main.cpp
+0 −21 Examples/Crux/affinity/submit.sh
+0 −45 Examples/Crux/affinity_omp/Makefile
+0 −190 Examples/Crux/affinity_omp/README.md
+0 −66 Examples/Crux/affinity_omp/main.cpp
+0 −23 Examples/Crux/affinity_omp/submit.sh
+0 −28 Examples/Crux/affinity_omp/submit_ht.sh
+0 −45 Examples/Crux/ensemble/Makefile
+0 −163 Examples/Crux/ensemble/README.md
+0 −66 Examples/Crux/ensemble/main.cpp
+0 −36 Examples/Crux/ensemble/submit_8x16.sh
+0 −34 Examples/Crux/ensemble/submit_8x16_omp.sh
+0 −39 Examples/Crux/ensemble/submit_multinode.sh
+0 −50 Examples/Crux/ensemble/submit_multinode_8x16.sh
+0 −4 Examples/Crux/python/create_python_env.sh
+0 −13 Examples/Crux/python/hello_world.py
+0 −26 Examples/Crux/python/submit.sh
+1 −1 Examples/Polaris/ensemble/Makefile.nvhpc
+0 −11 Examples/Polaris/ensemble/main.cpp
+0 −43 Examples/Polaris/ensemble/submit_4x8_omp.sh
+0 −12 Examples/Polaris/python/hello_world.py
+0 −24 Examples/Polaris/python/submit.sh
+0 −57 Examples/Sophia/affinity/Makefile.mpi_stub
+0 −63 Examples/Sophia/affinity/README.md
+0 −66 Examples/Sophia/affinity/main.cpp
+0 −23 Examples/Sophia/affinity/submit_1gpu.sh
+0 −23 Examples/Sophia/affinity/submit_1node.sh
+0 −59 Examples/Sophia/affinity_gpu/Makefile.mpi_stub
+0 −85 Examples/Sophia/affinity_gpu/README.md
+0 −100 Examples/Sophia/affinity_gpu/main.cpp
+0 −42 Examples/Sophia/affinity_gpu/offload.cpp
+0 −23 Examples/Sophia/affinity_gpu/submit_1gpu.sh
+0 −23 Examples/Sophia/affinity_gpu/submit_1node.sh
+0 −18 Examples/Sophia/mpi_stub/mpi.h
+0 −67 HelperScripts/Crux/cpu_binding.py
+2 −2 ProgrammingModels/Aurora/OpenMP/vecadd_mpi/main.cpp
+0 −57 ProgrammingModels/Polaris/CUDA/vecadd_mpi/Makefile
+0 −66 ProgrammingModels/Polaris/CUDA/vecadd_mpi/Makefile.gnu
+0 −56 ProgrammingModels/Polaris/CUDA/vecadd_mpi/Makefile.oneapi
+0 −107 ProgrammingModels/Polaris/CUDA/vecadd_mpi/README.md
+0 −27 ProgrammingModels/Polaris/CUDA/vecadd_mpi/main.cpp
+0 −162 ProgrammingModels/Polaris/CUDA/vecadd_mpi/offload.cpp
+0 −19 ProgrammingModels/Polaris/CUDA/vecadd_mpi/submit.sh
+0 −26 ProgrammingModels/Polaris/Julia/test_mpi/main.jl
+0 −15 ProgrammingModels/Polaris/Julia/test_mpi/setup_env.sh
+0 −30 ProgrammingModels/Polaris/Julia/test_mpi/submit.sh
+2 −2 ProgrammingModels/Polaris/OpenACC/vecadd_mpi/main.cpp
+2 −2 ProgrammingModels/Polaris/OpenMP/vecadd_mpi/main.cpp
+9 −13 ProgrammingModels/Polaris/SYCL/vecadd_mpi/README.md
+3 −4 ProgrammingModels/Polaris/SYCL/vecadd_mpi/main.cpp
+1 −1 ProgrammingModels/Polaris/SYCL/vecadd_mpi/submit.sh
+1 −1 ProgrammingModels/ThetaGPU/CUDA/vecadd_fortran_c_mpi/main.F
+2 −2 ProgrammingModels/ThetaGPU/CUDA/vecadd_mpi/main.cpp
+2 −2 ProgrammingModels/ThetaGPU/OpenACC/vecadd_mpi/main.cpp
+2 −2 ProgrammingModels/ThetaGPU/OpenMP/vecadd_mpi/main.cpp
32 changes: 14 additions & 18 deletions docs/policies/data-and-software-policies/data-policy.md
Original file line number Diff line number Diff line change
Expand Up @@ -41,21 +41,17 @@ All principal investigators using ALCF resources and ALCF staff members working

Data stored for any length of time on ALCF resources should only be data directly related to work done on any of the ALCF leadership computing systems. Specific policies apply to the three types of data storage systems maintained at ALCF. Read these policies carefully and plan accordingly in terms of space, usage, and data protection.

### Home File System Space

**agile-home**
### Home File System Space (agile-home, gecko-home)

The home file system (/home) is intended to hold your executable files, configuration files, etc. It is NOT meant to hold the output from your application runs (use the data/parallel file system for that purpose). The home file system space is generally moderate in size and is the best protected. Because of its size, backups are practical to accomplish. The system performs tape backups, enabling the recovery of files more than seven days old or recovery from a catastrophic disk failure. Users should email [[email protected]](mailto:[email protected]) if they need assistance. The table below indicates the capabilities and characteristics of each file system.

**AI Testbed home**

`/home/` shared across the ALCF AI testbed systems, including the AI testbed's login and compute nodes, is different from mira-home. Default user quota on the AI testbed's home is 1 TB storage and 1,000,000 files. This space is backed up.

### Team Project or Campaign File System

**Eagle**
### Team Project or Campaign File System (Eagle, Flare)

The team project/campaign file system is intended primarily for results output from your computational runs on the ALCF computing systems. This space is accessible to the team members of your project that have an ALCF account. Default storage quota is 1 TB. Consider this space intermediate-term storage. Once any active production and/or analysis is complete and you no longer need regular access to the data, archive it within the ALCF (explained below) or transfer it to your home institution or move it to Eagle to share it with the broader community (explained below).
The team project/campaign file system is intended primarily for results output from your computational runs on the ALCF computing systems. This space is accessible to the team members of your project that have an ALCF account. Default storage quota is 1 TB and the default period is 1 year. Consider this space intermediate-term storage. Once any active production and/or analysis is complete and you no longer need regular access to the data, archive it within the ALCF (explained below) or transfer it to your home institution or move it to Eagle to share it with the broader community (explained below).

This space has redundancy in the servers and storage but is so large that replication, snapshots, and backups are not practical. Eagle is a Lustre global parallel file system. All new projects will be given storage allocations on Eagle. More information on Lustre File Striping Basics: Lustre File Striping Basics.

Expand All @@ -65,14 +61,14 @@ This space has redundancy in the servers and storage but is so large that replic

The team project/campaign file system /projects mounted on AI Testbed's login and compute nodes is intended to facilitate project collaboration and is accessible to the team members of your project that have an ALCF account. Default group storage quota is 2 TB and 2,000,000 files. Please note that this space isn't backed up. Our policy is that data will be purged from disk 6 months after project completion.

### Shared Community Project or Campaign File System (Eagle)
### Shared Community Project or Campaign File System (Eagle, Flare)

The file system Eagle, a Lustre global parallel file system, has community sharing abilities and is useful for sharing the project/campaign data with the broader research community via Globus. This space does not have redundancy in the servers or storage and is so large that replication, snapshots, and backups are not practical. The table below indicates the capabilities and characteristics of each file system. Default storage quota on Eagle is 1 TB and the default period is 1 year. More information on Lustre file striping can be found in this [presentation](https://www.alcf.anl.gov/sites/default/files/2021-05/IO-optimization_mcpheeters.pdf).
These Lustre global parallel file systems have community sharing abilities and are useful for sharing the project/campaign data with the broader research community via Globus. This space does not have redundancy in the servers or storage and is so large that replication, snapshots, and backups are not practical. The table below indicates the capabilities and characteristics of each file system. Default storage quota is 1 TB and the default period is 2 years. More information on Lustre file striping can be found in this [presentation](https://www.alcf.anl.gov/sites/default/files/2021-05/IO-optimization_mcpheeters.pdf).

**Eagle Data Pullback Policy:**
**Data Pullback Policy:**
Projects that do not use a minimum of 50% of their allocated space after 6 months will be subject to a quota limit reduction.

**Eagle Access Termination Policy:**
**Access Termination Policy:**
Project endpoints that have exhibited no activity* for a period of 6 months will be disabled and the storage space will be reclaimed. Notification will be sent to the PI and project members 30 days prior to and the day of the action.

Activity is defined as, but not limited to:
Expand All @@ -90,7 +86,7 @@ The archive space is intended for offline storage of results you wish to retain

### Disk Capacity and Retention Policies

| ---- | /home | lus/eagle/projects or /eagle or /grand |
| ---- | /home | lus/eagle/projects, /eagle, /grand, lus/flare/projects or /flare |
|------------------------------------------------|-------------|----------------------------------------------------------------------------------------------------------|
| Default Quota ^1^ | 50 GB | 1 TB / 1 million files |
| Quota Enforcement ^2^ | hard/soft | hard/soft |
Expand All @@ -99,20 +95,20 @@ The archive space is intended for offline storage of results you wish to retain
| File Server Metadata Redundancy | yes | yes |
| File Server Metadata Replication ^4^ | yes | yes |
| File Server Data Replication ^5^ | yes | no |
| Data Purged from Disk | n/a | After 6 months of inactivity (see Eagle Access termination policy listed in the Eagle section above) ^8^ |
| Data Purged from Disk | n/a | After 6 months of inactivity (see Access termination policy listed in the section above) ^8^ |

### Tape Capacity and Retention Policies

| ---- | /home | lus/eagle/projects or /eagle or /grand |
|------------------------------------------------|-------|----------------------------------------|
| Automatic Backup to Tape? ^6^ | yes | no |
| Archived to Tape Before Deleted from Disk? ^8^ | yes | no |
| ---- | /home | lus/eagle/projects, /eagle, /grand, lus/flare/projects or /flare |
|------------------------------------------------|-------|-------------------------------------------------------------------|
| Automatic Backup to Tape? ^6^ | yes | no |
| Archived to Tape Before Deleted from Disk? ^8^ | yes | no |

1. While quotas are subject to negotiation on a case-by-case basis, disk space is a finite resource and projects must exercise good data management practices for their own sake and the sake of other users of the facility. With Lustre, it has become necessary to enforce file quotas as well, which are also negotiable.
2. “Hard quota enforcement” means a job will fail when writing output if you exceed the hard quota limit. "Soft quota enforcement" means you may exceed the soft quota limit (but never the higher hard quota value) for up to seven days. If you do not drop back below the soft quota limit within seven days, writes will begin to fail.
3. Hard drives are in redundancy groups of 10 disks (8 data + 2 parity). In other words, three out of 10 drives would have to fail before data loss occurred.
4. Metadata (i.e., information listing which blocks are part of which files) is written twice to two different storage arrays. Thus, even if an entire array were lost, the metadata would be preserved.
5. Refers to the fact that data (user output) is written twice with each block on two different storage arrays, so that even if an entire array were lost, the data would be preserved.
6. “Yes” denotes that ALCF does regular backups without intervention from the user.
6. “Yes” denotes that ALCF does regular backups without intervention from the user. Currently gecko-home is unable to be backed up.
7. The project directory is available on disk for the stipulated period but project quotas are reduced immediately following project end date. Access to the directory will be removed after 180 days. Requests to restore/extend access or reset the quota are reviewed on a case-by-case basis.
8. Users who wish to retain data must archive or transfer their data elsewhere at the end of the project. Users need an active ALCF account to access archived data on HPSS. See Account Retention Policy for more information. The user is responsible for archiving the data to HPSS or copying it to another facility as desired. Data will be retained on tape for 2 years, at which time it is eligible for removal (subject to change).