-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
example_submission_scripts #350
base: main
Are you sure you want to change the base?
Conversation
WalkthroughThe changes involve three job submission scripts for high-performance computing (HPC) environments, each tailored for different systems and configurations. The scripts specify job parameters such as resource allocations, software versions, and execution commands for the VASP software. They include directives for managing CPU and GPU resources, memory requirements, and output logging, ensuring compatibility with the respective computing environments. Changes
Poem
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Outside diff range and nitpick comments (4)
example_hpc_submission_scripts/australia_pawsey_setonix.sh (1)
7-7
: Consider using a generic job name for the example scriptThe current job name
TSR_RATTLE_struct_1871_2_Mn_8.sh
is very specific. For an example submission script, it might be more appropriate to use a generic name likevasp_job
or include a placeholder (e.g.,${JOB_NAME}
) that users can easily replace.Would you like me to suggest a more generic job name or a way to make it easily customizable?
example_hpc_submission_scripts/australia_nci_gadi_gpu.sh (3)
1-8
: Resource allocation looks good, consider fine-tuning memory.The PBS directives for resource allocation are well-structured and appropriate for a GPU job on the NCI Gadi system. The walltime, CPU, and GPU requests are reasonable.
Consider fine-tuning the memory request if you find that your jobs consistently use less than 160GB. This could potentially improve queue times and resource utilization. Monitor your job's memory usage and adjust accordingly.
14-16
: Helpful storage access note, consider making it more prominent.The comment about storage access requirements is informative and includes a useful link to documentation. This is valuable information for users who need access to additional directories.
Consider making this note more prominent, perhaps by adding a clear TODO comment or moving it to the top of the script. This will ensure users don't overlook this important configuration step. For example:
# TODO: If job needs access to specific directories, uncomment and modify the following line: # #PBS -l storage=scratch/ab12+gdata/yz98
18-18
: Execution command is well-structured, consider adding error handling and job monitoring.The mpirun command is well-constructed, using the correct number of processes and an appropriate mapping strategy for GPU jobs. Redirecting output to a log file is also good practice.
Consider enhancing the script with error handling and job monitoring:
- Add error checking:
mpirun -np $PBS_NGPUS --map-by ppr:1:numa vasp_std-gpu >vasp.log 2>&1 || { echo "VASP job failed"; exit 1; }
- Implement basic job monitoring:
echo "Job started at $(date)" mpirun -np $PBS_NGPUS --map-by ppr:1:numa vasp_std-gpu >vasp.log 2>&1 || { echo "VASP job failed"; exit 1; } echo "Job finished at $(date)"
- Consider adding a simple progress check, e.g., periodically echoing the size of the log file:
(while true; do sleep 300; echo "Log file size: $(wc -c <vasp.log) bytes"; done) & mpirun -np $PBS_NGPUS --map-by ppr:1:numa vasp_std-gpu >vasp.log 2>&1 || { echo "VASP job failed"; exit 1; } kill $! # Kill the background progress checking processThese additions will make the script more robust and informative.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (3)
- example_hpc_submission_scripts/australia_nci_gadi.sh (1 hunks)
- example_hpc_submission_scripts/australia_nci_gadi_gpu.sh (1 hunks)
- example_hpc_submission_scripts/australia_pawsey_setonix.sh (1 hunks)
🧰 Additional context used
🪛 Shellcheck
example_hpc_submission_scripts/australia_pawsey_setonix.sh
[warning] 14-14: Use 'cd ... || exit' or 'cd ... || return' in case cd fails.
(SC2164)
🔇 Additional comments (8)
example_hpc_submission_scripts/australia_nci_gadi.sh (4)
12-13
: Consider uncommenting and adjusting the storage directive if neededThe comment about including a PBS storage directive is informative. However, if your job actually requires access to specific storage paths, you should uncomment and adjust this directive.
For example, if you need access to
/scratch/ab12/
and/g/data/yz98/
, you would add:#PBS -l storage=scratch/ab12+gdata/yz98
To check if your job might need access to specific storage paths, you could search for file paths in your VASP input files:
#!/bin/bash # Description: Search for file paths in VASP input files # Search for file paths in POSCAR, INCAR, KPOINTS, and POTCAR files rg -t txt '/' POSCAR INCAR KPOINTS POTCARIf you see paths outside the job's default accessible areas, you'll need to include the appropriate storage directive.
15-15
: Consider adding error handling and job completion notificationThe VASP execution command is correct, and redirecting output to a log file is good practice. However, consider enhancing the script with error handling and job completion notification. This can help with monitoring job status and debugging if issues occur.
Here's an example of how you could modify the execution line:
mpirun vasp_std > vasp.log 2>&1 || { echo "VASP execution failed"; exit 1; } echo "VASP job completed successfully" | mail -s "VASP Job Status" [email protected]This modification will:
- Capture both stdout and stderr in the log file.
- Exit with an error message if VASP execution fails.
- Send an email notification upon successful job completion.
To check if error handling is commonly used in other scripts, you could search for similar patterns:
#!/bin/bash # Description: Search for error handling patterns in shell scripts # Search for error handling patterns in shell scripts rg -t sh '(set -e|exit 1||| \{)' # Search for job completion notification patterns rg -t sh '(echo.*completed|mail -s)'Consider implementing similar error handling and notification mechanisms if they're commonly used in other scripts in your repository.
1-7
: Review and possibly adjust resource allocationsThe resource allocations look reasonable for a small VASP job, but consider the following:
- The walltime of 1 hour might be too short for complex VASP calculations. Consider increasing it based on your specific use case.
- 10GB of memory might not be sufficient for larger systems. Adjust this based on your system size and complexity.
- While 8 CPUs is reasonable, VASP can often utilize more for better performance. Consider increasing if your calculations would benefit from more parallelization.
To help determine appropriate resource allocations, you could run a test job and analyze its resource usage. Here's a script to help with that:
Adjust your resource allocations based on the results of these analyses.
9-10
: Consider checking for newer VASP versionsLoading a specific version (5.4.4) is good for reproducibility. However, this might not be the latest version of VASP. Newer versions could offer bug fixes, performance improvements, or new features.
To check available VASP versions on the system, you could use:
Consider updating to the latest stable version if it's available and compatible with your calculations.
example_hpc_submission_scripts/australia_pawsey_setonix.sh (2)
2-2
: Verify intentionality of commented SBATCH directivesThe
--nodes=1
and--exclusive
SBATCH directives are currently commented out. If these are meant to be active, please uncomment them. If they're intentionally commented for flexibility, consider adding a comment explaining why.Could you confirm if these lines should be active or if they're intentionally commented out for flexibility?
Also applies to: 12-12
15-15
: Verify the necessity of unlimited stack sizeSetting an unlimited stack size with
ulimit -s unlimited
is common in some HPC environments, but it's not always necessary and could potentially lead to issues if a program has a memory leak.Could you confirm if VASP 5.4.4 specifically requires an unlimited stack size on the Setonix system? If not, consider either removing this line or setting a large but finite limit.
example_hpc_submission_scripts/australia_nci_gadi_gpu.sh (2)
9-9
: Good use of working directory directive.The
-l wd
directive ensures the job runs in the submission directory, which is a best practice. It simplifies script management and reduces the chance of file path-related errors.
11-12
: Good practice loading a specific VASP version, verify if it's the most appropriate.Loading a specific version of VASP (6.2.1) is excellent for reproducibility. The comment is also helpful.
Please verify that 6.2.1 is the most appropriate VASP version for your needs on this system. You can check available versions with:
Consider updating to a newer version if available and compatible with your research requirements.
#SBATCH --mem=32GB | ||
##SBATCH --exclusive | ||
module load vasp/5.4.4 | ||
cd "$PBS_O_WORKDIR" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider using SLURM_SUBMIT_DIR and add error handling
-
This script appears to be for a SLURM-based system (Setonix), but it's using
PBS_O_WORKDIR
. Consider usingSLURM_SUBMIT_DIR
instead, which is the SLURM equivalent. -
Add error handling to the
cd
command as suggested by Shellcheck.
Here's a suggested modification:
-cd "$PBS_O_WORKDIR"
+cd "$SLURM_SUBMIT_DIR" || { echo "Failed to change directory"; exit 1; }
This change ensures that the script exits if it fails to change to the correct directory, preventing potential issues with file access or job execution in the wrong location.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
cd "$PBS_O_WORKDIR" | |
cd "$SLURM_SUBMIT_DIR" || { echo "Failed to change directory"; exit 1; } |
🧰 Tools
🪛 Shellcheck
[warning] 14-14: Use 'cd ... || exit' or 'cd ... || return' in case cd fails.
(SC2164)
cd "$PBS_O_WORKDIR" | ||
ulimit -s unlimited | ||
run_cmd="srun --export=ALL -N 1 -n 32" | ||
$run_cmd vasp_std &> vasp.log |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Enhance error handling and output management
While the current command will capture both stdout and stderr in vasp.log, consider implementing more robust error handling and output management. Here are some suggestions:
- Separate stdout and stderr for easier debugging.
- Add a check for the exit status of the VASP run.
- Consider using a timestamp in the log filename to prevent overwriting in case of multiple runs.
Here's a suggested modification:
-$run_cmd vasp_std &> vasp.log
+log_file="vasp_$(date +%Y%m%d_%H%M%S).log"
+$run_cmd vasp_std > "$log_file" 2>&1
+exit_status=$?
+if [ $exit_status -ne 0 ]; then
+ echo "VASP run failed with exit status $exit_status. Check $log_file for details." >&2
+ exit $exit_status
+fi
This change separates stdout and stderr (but still captures both in the log file), checks the exit status of the VASP run, uses a timestamp in the log filename, and provides an informative error message if the run fails.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
$run_cmd vasp_std &> vasp.log | |
log_file="vasp_$(date +%Y%m%d_%H%M%S).log" | |
$run_cmd vasp_std > "$log_file" 2>&1 | |
exit_status=$? | |
if [ $exit_status -ne 0 ]; then | |
echo "VASP run failed with exit status $exit_status. Check $log_file for details." >&2 | |
exit $exit_status | |
fi |
@jan-janssen as discussed
Summary by CodeRabbit
New Features
Documentation