[PRE REVIEW]: strucscan: A lightweight Python-based framework for high-throughput material simulation #4519

editorialbot · 2022-06-27T16:03:42Z

Submitting author: @thohamm (Thomas Hammerschmidt)
Repository: https://github.com/ICAMS/strucscan
Branch with paper.md (empty if default branch):
Version: 1.0
Editor: @ppxasjsm
Reviewers: @mturiansky, @wcwitt
Managing EiC: Kevin M. Moerman

Status

Status badge code:

HTML: <a href="https://joss.theoj.org/papers/cf152ba42d55db682d1ac29f951bcfe1"><img src="https://joss.theoj.org/papers/cf152ba42d55db682d1ac29f951bcfe1/status.svg"></a>
Markdown: [![status](https://joss.theoj.org/papers/cf152ba42d55db682d1ac29f951bcfe1/status.svg)](https://joss.theoj.org/papers/cf152ba42d55db682d1ac29f951bcfe1)

Author instructions

Thanks for submitting your paper to JOSS @thohamm. Currently, there isn't an JOSS editor assigned to your paper.

@thohamm if you have any suggestions for potential reviewers then please mention them here in this thread (without tagging them with an @). In addition, this list of people have already agreed to review for JOSS and may be suitable for this submission (please start at the bottom of the list).

Editor instructions

The JOSS submission bot @editorialbot is here to help you find and assign reviewers and start the main review. To find out what @editorialbot can do for you type:

@editorialbot commands

The text was updated successfully, but these errors were encountered:

editorialbot · 2022-06-27T16:03:44Z

Hello human, I'm @editorialbot, a robot that can help you with some common editorial tasks.

For a list of things I can do to help you, just type:

@editorialbot commands

For example, to regenerate the paper pdf after making changes in the paper's md or bib files, type:

@editorialbot generate pdf

editorialbot · 2022-06-27T16:03:45Z

Software report:

github.com/AlDanial/cloc v 1.88  T=0.08 s (967.0 files/s, 123169.0 lines/s)
--------------------------------------------------------------------------------
Language                      files          blank        comment           code
--------------------------------------------------------------------------------
Python                           25            902           1981           3338
Markdown                         13             95              0            403
TeX                               1             13              0            137
Jupyter Notebook                  4              0           1763            121
reStructuredText                  8             51             81             80
YAML                              8             20            135             68
Bourne Again Shell               14             25            137             57
make                              1              4              6              9
--------------------------------------------------------------------------------
SUM:                             74           1110           4103           4213
--------------------------------------------------------------------------------


gitinspector failed to run statistical information for the repository

editorialbot · 2022-06-27T16:03:47Z

Reference check summary (note 'MISSING' DOIs are suggestions that need verification):

OK DOIs

- 10.1016/0927-0256(96)00008-0 is OK
- 10.1103/PhysRevB.59.1758 is OK
- 10.1103/physrevb.54.11169 is OK
- 10.1016/j.cpc.2021.108171 is OK
- 10.1016/j.commatsci.2021.110731 is OK
- 10.1016/j.commatsci.2018.07.043 is OK
- 10.1016/j.commatsci.2017.07.030 is OK
- 10.1002/cpe.3505 is OK
- 10.1109/CCGRID.2001.923173 is OK
- 10.1088/1361-648x/aa680e is OK
- 10.1088/1367-2630/15/11/115016 is OK
- 10.1038/s41597-020-00638-4 is OK

MISSING DOIs

- 10.1007/10968987_3 may be a valid DOI for title: SLURM: Simple Linux Utility for Resource Management

INVALID DOIs

- doi.org/10.1016/j.commatsci.2012.10.028 is INVALID because of 'doi.org/' prefix

editorialbot · 2022-06-27T16:03:47Z

Wordcount for paper.md is 1447

editorialbot · 2022-06-27T16:04:42Z

👉📄 Download article proof 📄 View article proof on GitHub 📄 👈

thohamm · 2022-06-27T17:41:03Z

Missing and invalid DOI fixed.

thohamm · 2022-06-27T18:00:09Z

@editorialbot generate pdf

editorialbot · 2022-06-27T18:01:11Z

👉📄 Download article proof 📄 View article proof on GitHub 📄 👈

Kevin-Mattheus-Moerman · 2022-06-29T13:00:03Z

@editorialbot check references

editorialbot · 2022-06-29T13:00:10Z

Reference check summary (note 'MISSING' DOIs are suggestions that need verification):

OK DOIs

- 10.1016/0927-0256(96)00008-0 is OK
- 10.1103/PhysRevB.59.1758 is OK
- 10.1103/physrevb.54.11169 is OK
- 10.1016/j.commatsci.2021.110731 is OK
- 10.1016/j.commatsci.2018.07.043 is OK
- 10.1016/j.commatsci.2017.07.030 is OK
- 10.1002/cpe.3505 is OK
- 10.1109/CCGRID.2001.923173 is OK
- 10.1007/10968987_3 is OK
- 10.1088/1361-648x/aa680e is OK
- 10.1088/1367-2630/15/11/115016 is OK
- 10.1038/s41597-020-00638-4 is OK
- 10.1016/j.commatsci.2012.10.028 is OK

MISSING DOIs

- None

INVALID DOIs

- None

Kevin-Mattheus-Moerman · 2022-06-29T13:03:09Z

@thohamm I have had a quick look at your paper, can you check the following:

Please add city and country to your affiliation, do not use acronyms for countries.

Kevin-Mattheus-Moerman · 2022-06-29T13:05:40Z

@editorialbot invite @jedbrown as editor

editorialbot · 2022-06-29T13:05:41Z

Invitation to edit this submission sent!

thohamm · 2022-06-29T13:14:07Z

Affiliation fixed.

thohamm · 2022-06-29T13:14:11Z

@editorialbot generate pdf

editorialbot · 2022-06-29T13:15:38Z

👉📄 Download article proof 📄 View article proof on GitHub 📄 👈

jedbrown · 2022-06-30T22:02:43Z

👋 Hi @thohamm. A few notes from my initial scan

I see there are some "TODO" marks in the document.
Can you please share with us a little about the provenance of this project? I see it was a new project importing source from elsewhere just prior to submission.
It looks like there is intended to be Sphinx documentation, but I don't see it linked. Does that exist?
Those docs say that pytest is used, but it looks like the code doesn't use pytest.
Are there unit tests and can they be made to run with continuous integration (such as GitHub Actions)?
If the output is tabular, why not put it into a Pandas dataframe for easy plotting and statistics?
How does this package related to generic workflow management tools with support for batch computing (e.g., Snakemake, Parsl, Swift, libensemble)?
To what extent does this package provide machine- or human-auditable provenance?
To what extent does this package convey research, versus serve as a client to research software (VASP, etc.)? Would it be in scope for strucscan to compute summary statistics, make common figures, and/or do quality control to make the research more efficient or reliable?

danielskatz · 2022-07-07T12:45:28Z

👋 @thohamm - we do need a response from you - if we don't hear back in another week, we'll mark this submission as withdrawn.

thohamm · 2022-07-10T15:47:27Z

Thank you for your comments. We are working on them.

thohamm · 2022-07-12T12:44:14Z

@danielskatz, @jedbrown - Thank you for your detailed comments.

We reply to them point by point below. We would be willing to include according remarks in the paper if you find this helpful.

Thank you for your comment, we fixed them.
The development of this python code started about 6 months ago on the basis of 10-year old collection of shell scripts.
After several iterations of restructuring program flow and data handling we now arrived at a converged version that is also
already in practical use in our group. We didn't see much benefit in sharing older versions and therefore decided to
set up a new git repo for this project.
The Sphinx documentation has now been prepared and is available on https://strucscan.readthedocs.io/.
We prepared tests using pytest but are still struggling with executing them due to write permission problems with GitHub
Actions. For the moment we removed the corresponding remark from the documentation in the git repo and will include it
again as soon as the tests are up and running.
Some of the examples are planned to be used as unit tests as soon as the issues with github actions are resolved. (cf. 4)
The data structure and connectivity of the output depends on the particular set of calculations and is therefore not very well
suited for a pandas dataframe with an a-priori layout. We are instead storing dictionaries from which the user can collect
data selectively for constructing dedicated dataframes. This is more convenient and flexible in our experience.
In contrast to generic tools for batch computing, strucscan is dedicated to high-throughput simulations and provides a more
sophisticated framework than a generic workflow manager. In particular, it offers generalized interfaces to different
simulation software packages, interfaces to schedulers including monitoring tools, a transparent storage structure for
convenient post-processing, and measures for data connectivity.
The provenance of the strucscan code itself is given by the github versioning that starts at the first converged version of the
software (cf. 2). The provenance of individual calculations with strucscan is realized by storing all input and output files in
a folder tree. This storage is in addition to the collection of central results in dictionaries. This approach of storing the full
data set as provenance of individual calculations provides the user full flexibility of adapting to external meta-data schema,
e.g. of third-party databases.
The goal of strucscan is to support the user to convey high-throughput research. This means particularly to cope with the
challenges of starting, monitoring, collecting and analyzing very large numbers (thousands, ten-thousands) of calculations.
The goal of strucscan is to provide the basis for exactly the mentioned purposes, i.e., summary statistics, common figures
and quality control. This means a framework for generating large sets of quality-checked data stored in a transparent data
structure. On this ground the user can then use the basic post-processing of strucscan or further tools of data visualization
and analytics.

thohamm · 2022-07-12T13:01:54Z

@editorialbot generate pdf

editorialbot · 2022-07-12T13:03:32Z

👉📄 Download article proof 📄 View article proof on GitHub 📄 👈

arfon · 2022-07-18T08:43:20Z

@jedbrown – do these comments satisfy your concerns, and if so, would you be able to edit this submission for JOSS?

Kevin-Mattheus-Moerman · 2022-07-29T15:31:54Z

@jedbrown 👋 ☝️

Kevin-Mattheus-Moerman · 2022-08-08T15:04:57Z

@jedbrown Can you check those comments?

jedbrown · 2022-08-11T05:14:43Z

I'm sorry to be slow catching up after our family got covid (via 2yo's childcare). @thohamm, thanks for your updates and comprehensive response. I think we still need a scope check with other editors in light of it being nominally general purpose (thus user is responsible for app-specific reliability checks/statistics/plotting), but the examples and known uses thus far are for VASP. The main question is to what extent people outside your research group would use this tool for their research or would disseminate research via this tool (say, as a platform for CS research).

Regarding packaging, it looks like you're missing pyyaml. This in a fresh virtualenv:

$ pip install .
Processing /home/jed/joss/strucscan
  Preparing metadata (setup.py) ... done
[...]
Successfully installed strucscan-0.post0.dev68
$  strucscan --help
Traceback (most recent call last):
  File "/home/jed/joss/strucscan/VENV/bin/strucscan", line 5, in <module>
    from strucscan.cli import main
  File "/home/jed/joss/strucscan/VENV/lib/python3.10/site-packages/strucscan/__init__.py", line 1, in <module>
    from strucscan.__init__ import *
  File "/home/jed/joss/strucscan/VENV/lib/python3.10/site-packages/strucscan/__init__.py", line 7, in <module>
    from strucscan.utils import *
  File "/home/jed/joss/strucscan/VENV/lib/python3.10/site-packages/strucscan/utils.py", line 2, in <module>
    import yaml
ModuleNotFoundError: No module named 'yaml'

Installing pyyaml fixes this, but strucscan --help still doesn't work. (I know it doesn't claim to, but this is a normal expectation.)

ppxasjsm · 2022-08-16T08:52:56Z

@editorialbot assign me as editor

editorialbot · 2022-08-16T08:52:58Z

Assigned! @ppxasjsm is now the editor

thohamm · 2022-08-16T13:31:23Z

@jedbrown Thank you for your comments!

We included pyyaml in the installation setup for pip and conda-forge.
We added a brief print out page when calling strucscan with '--help'.
Regarding the scope we would like to emphasize that we developed strucscan as tool for conducting, monitoring and collecting data from high-throughput atomistic simulation calculations. We included an interface to a widely used DFT code (VASP) as example and provide a simple interface for extension to other codes. We know from our own experience and from discussions with other groups that there is a need for transparent and easily-extendable high-throughput tools for atomistic simulations. We are therefore convinced that strucscan will be useful beyond our group and will support users to adapt it for their needs.

jedbrown · 2022-08-16T13:43:13Z

Thanks @thohamm. Also, welcome @ppxasjsm 👋 and thanks for agreeing to edit this submission.

ppxasjsm · 2022-08-17T15:51:30Z

@thohamm, I'll be acting as your editor for this submission. I am new to this role, so bear with me.
However, to get things going, could you please suggest 5-6 reviewers from this list that you consider appropriate to vet this paper?

thohamm · 2022-08-18T06:29:42Z

@ppxasjsm Thank you for acting as editor for our submission. I would like to suggest bocklund, mturiansky, utf, raghurama123, wcwitt, lucydot as reviewers.

ppxasjsm · 2022-08-21T17:46:37Z

Hi @mturiansky, would you be willing to review this submission?

ppxasjsm · 2022-08-21T17:48:12Z

Hi @utf, would you be willing to review this submission?

ppxasjsm · 2022-08-21T17:49:00Z

@thohamm thank you for your suggestions. Let's see what people say.

mturiansky · 2022-08-23T02:23:47Z

Hi @mturiansky, would you be willing to review this submission?

Hello! I should be able to review this, but I will need a few weeks if that's okay, since I am a bit busy at the moment.

ppxasjsm · 2022-08-23T16:45:09Z

Hello! I should be able to review this, but I will need a few weeks if that's okay, since I am a bit busy at the moment.

Yes that should be fine! Thank you for agreeing to do the review.

ppxasjsm · 2022-08-23T16:45:52Z

@editorialbot add @mturiansky as reviewer

editorialbot · 2022-08-23T16:45:58Z

@mturiansky added to the reviewers list!

utf · 2022-08-23T16:47:10Z

Hi @ppxasjsm, I would love to review this, It looks right up my alley but unfortunately I won't have time over the next couple of months.

ppxasjsm · 2022-08-23T16:54:13Z

Hi @ppxasjsm, I would love to review this, It looks right up my alley but unfortunately I won't have time over the next couple of months.

No problem! Thank you for letting me know!

ppxasjsm · 2022-08-23T16:54:56Z

@lucydot, would you be willing to review this submission?

ppxasjsm · 2022-08-30T09:57:51Z

@wcwitt, would you be willing to review this submission?

wcwitt · 2022-08-30T21:16:46Z

Yes, no problem.

ppxasjsm · 2022-08-31T07:04:40Z

Brilliant thank you 👍

ppxasjsm · 2022-08-31T07:07:13Z

@editorialbot add @wcwitt as reviewer

editorialbot · 2022-08-31T07:07:14Z

@wcwitt added to the reviewers list!

ppxasjsm · 2022-08-31T07:07:35Z

@editorialbot start review

editorialbot · 2022-08-31T07:07:39Z

OK, I've started the review over in #4719.

editorialbot added the pre-review label Jun 27, 2022

editorialbot added Python Shell TeX labels Jun 27, 2022

arfon added query-scope Submissions of uncertain scope for JOSS and removed query-scope Submissions of uncertain scope for JOSS labels Aug 13, 2022

editorialbot assigned ppxasjsm Aug 16, 2022

editorialbot assigned mturiansky Aug 23, 2022

editorialbot assigned wcwitt Aug 31, 2022

editorialbot closed this as completed Aug 31, 2022

[PRE REVIEW]: strucscan: A lightweight Python-based framework for high-throughput material simulation #4519

[PRE REVIEW]: strucscan: A lightweight Python-based framework for high-throughput material simulation #4519

Comments

editorialbot commented Jun 27, 2022 • edited Loading

Status

editorialbot commented Jun 27, 2022

editorialbot commented Jun 27, 2022

editorialbot commented Jun 27, 2022

editorialbot commented Jun 27, 2022

editorialbot commented Jun 27, 2022

thohamm commented Jun 27, 2022

thohamm commented Jun 27, 2022

editorialbot commented Jun 27, 2022

Kevin-Mattheus-Moerman commented Jun 29, 2022

editorialbot commented Jun 29, 2022

Kevin-Mattheus-Moerman commented Jun 29, 2022

Kevin-Mattheus-Moerman commented Jun 29, 2022

editorialbot commented Jun 29, 2022

thohamm commented Jun 29, 2022

thohamm commented Jun 29, 2022

editorialbot commented Jun 29, 2022

jedbrown commented Jun 30, 2022

danielskatz commented Jul 7, 2022

thohamm commented Jul 10, 2022

thohamm commented Jul 12, 2022

thohamm commented Jul 12, 2022

editorialbot commented Jul 12, 2022

arfon commented Jul 18, 2022

Kevin-Mattheus-Moerman commented Jul 29, 2022

Kevin-Mattheus-Moerman commented Aug 8, 2022

jedbrown commented Aug 11, 2022

ppxasjsm commented Aug 16, 2022

editorialbot commented Aug 16, 2022

thohamm commented Aug 16, 2022

jedbrown commented Aug 16, 2022

ppxasjsm commented Aug 17, 2022

thohamm commented Aug 18, 2022 • edited Loading

ppxasjsm commented Aug 21, 2022

ppxasjsm commented Aug 21, 2022

ppxasjsm commented Aug 21, 2022

mturiansky commented Aug 23, 2022

ppxasjsm commented Aug 23, 2022

ppxasjsm commented Aug 23, 2022

editorialbot commented Aug 23, 2022

utf commented Aug 23, 2022

ppxasjsm commented Aug 23, 2022

ppxasjsm commented Aug 23, 2022

ppxasjsm commented Aug 30, 2022

wcwitt commented Aug 30, 2022

ppxasjsm commented Aug 31, 2022

ppxasjsm commented Aug 31, 2022

editorialbot commented Aug 31, 2022

ppxasjsm commented Aug 31, 2022

editorialbot commented Aug 31, 2022

editorialbot commented Jun 27, 2022 •

edited

Loading

thohamm commented Aug 18, 2022 •

edited

Loading