Add parse.py for reading spectral data from completed jobs #218

chuntian236 · 2023-11-16T21:15:40Z

Comments on the function for parsing spectra goes in this PR.

codecov-commenter · 2023-11-16T23:59:58Z

Codecov Report

Attention: 143 lines in your changes are missing coverage. Please review.

Comparison is base (ff29d13) 77.21% compared to head (dd6feee) 67.87%.
Report is 3 commits behind head on master.

Files	Patch %	Lines
lightshow/postprocess/parse.py	0.00%	143 Missing ⚠️

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files

@@            Coverage Diff             @@
##           master     #218      +/-   ##
==========================================
- Coverage   77.21%   67.87%   -9.34%     
==========================================
  Files          13       14       +1     
  Lines        1040     1183     +143     
==========================================
  Hits          803      803              
- Misses        237      380     +143

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

matthewcarbone

I've reviewed extract_FEFF for now. There are a few places where I think we need changes.

One thing in particular, I don't think we necessarily should be catching these errors in try/except statements. Since the user is going to use extract_FEFF directly via our API, we want these errors to be thrown if, for example, the files are not found.

I'll review the rest of this after we make changes to extract_FEFF!

lightshow/postprocess/parse.py

matthewcarbone · 2023-11-17T03:05:07Z

@chuntian236 we should also clear the output of the Jupyter notebook before pushing so it takes up less space.

lightshow/postprocess/parse.py

FCMeng · 2023-11-20T21:37:20Z

I have a few general comments here:

The output quantity is different for these codes, some of them are epsilon2, while some are cross section. Might need a dict key to indicate this distinction.
The output from the current parser is the averaged spectra among all polarization direction. Might be better if we can have an option to output spectra for different polarization direction.
For XSpectra, might be great if the total energy from the gs_out is also recorded, which is important for the alignment using delta SCF.
For OCEAN, the spectra from all the sites are recorded, which might be redundant if they are equivalent sites.
For OCEAN, the spectra from all the elements are collected, which leads the output from the OCEAN parser different from others in terms of the spectra key. Might be good if we can keep the key-value pair consistent for all the codes, especially for the spectrum. In this case, might need a human input for the target element.
Might need a key for the site index.
I cannot remember correctly for this one so I might be wrong. We might need a better way to match the site index between OCEAN and other codes.The site index for all elements in OCEAN starts from 1. For example a system has three Ti sites (all equivalent) and three O sites (all equivalent). For other codes other than OCEAN, the site would be Ti_000 and O_003. For Ti in OCEAN, it would be absspct_Ti.0001_1s_0?, absspct_Ti.0002_1s_0?, absspct_Ti.0003_1s_0?; for O in OCEAN, then it would be absspct_O.0001_1s_0?, absspct_O.0002_1s_0?, absspct_O.0003_1s_0?, which leads the site index quite different compared to other codes.

deyulu · 2023-11-21T20:28:13Z

Thank @fanchen0121 for a very comprehensive message. 1. Excellent point. ocean, exciting and vasp calculate epsilon2; feff and xspectra calculate cross section. 2. It is a good idea to save spectra from different polarization; in that event, we also need to save polarization directions. --perhaps for the next update. 3. I'd like to keep parsing spectra and performing edge alignment separately, as users may not need to do edge alignment. 4. Just from output, I don't think there is an easy way to remove this redundancy. 5. I think it is ok that ocean parser generate spectra on multiple elements, as far as the dict keys are clear. 6. That's a good idea. 7. I remember the same. We may need an option to "standardize" the site index. @chuntian236 please think about these suggestions, which can be addressed in this round and which can be pushed to a later update?

FCMeng · 2023-11-21T20:41:54Z

Another thought on point 1, we do the conversion on the fly when parsing the data or we also save the structure data, which might also be useful for the site index. Otherwise, we cannot do the conversion if we do not have the structure data or don't do the conversion on the fly.
I want to clarify for point 3, I recommend also record the total energy from gs.out because I saw the total energy from es.out is recorded. I think we can try to find gs.out, if the file not found, then just record a nan for the gs total energy key.

deyulu · 2023-11-21T21:31:42Z

@fanchen0121 For point 1, I think epsilon2 to cross section is not a general use case and most of the users don't have to worry about. I suggest that we treat this separately, and as you said we also need structure. For point 3, gs.out is always generated when a user compute xanes with xspectra, but gs.out is not required. So I think it makes sense to parse out total energy from es.out and leave the total energy in gs.out for the alignment module.

matthewcarbone · 2023-11-22T14:03:11Z

@deyulu @FCMeng thank you both for the points and discussion. Fanchen, can you open up some issues so we can track these suggestions carefully? Thanks!

chuntian236 · 2023-12-07T03:42:16Z

Replying to Fanchen's comments:
First thank you Fanchen (and all others) for the valuable suggestions!
On 12.3.2023 update for parse.py:

A dict key "label" is added to indicate if the result is cross-section or epsilon2.
Different polarization direction is added in the output dictionary.
I decided not to add total energy from gs.out in XSpectra, because not all people calculate ground state, so that file might not exist.
The different structure in OCEAN output folder is a problem. Now I just make sure the "sub-dictionary" of the OCEAN parsing dictionary has the same structure and keys as the outputs from other codes. I.e., dict_ocean['Ti']['0001_1s'] has the same keys as dict_vasp.
About key for site index - I decided not to include it, because the function do not have site index info from the inputs, while getting this info from input folder name is not a good idea because the folder name format may be different. I assume the user should know the site from their folders. For similar reasons, I didn't do site index matching either.

matthewcarbone · 2023-12-16T03:02:00Z

@chuntian236 @deyulu Since I know you both want to merge we can basically proceed. Before we do, Chuntian, can you run the Black formatter on this code? It's failing Black and flake8 checks. It might seem petty but these types of formatting differences creep in and can make the code very hard to maintain/read. Once that's done we can merge.

We also (later) might want to look into using something faster than a for loop over every line of the OUTCAR. I have a feeling that's going to be painfully slow.

To pass code quality check.

To pass quality check.

matthewcarbone · 2024-01-02T19:36:41Z

@chuntian236 I'm going to merge this for now and I'll take care of why Black is failing later.

chuntian236 · 2024-01-02T19:54:08Z

@chuntian236 I'm going to merge this for now and I'll take care of why Black is failing later.

I think I know why's Black is failing. Shall I commit in the master branch? Or make a new branch to try?

matthewcarbone · 2024-01-03T14:18:01Z

@chuntian236 go for another PR through another branch if you want! Thanks!

chuntian236 added 2 commits November 16, 2023 16:12

Add files via upload

a367ab6

Add files via upload

6c6989b

matthewcarbone requested changes Nov 17, 2023

View reviewed changes

matthewcarbone marked this pull request as draft November 17, 2023 03:05

matthewcarbone assigned matthewcarbone and chuntian236 Nov 17, 2023

matthewcarbone changed the title ~~Please comment on "parse.py" code in branch "postprocess-benchmark-cc"~~ Add parse.py for reading spectral data from completed jobs Nov 17, 2023

FabiPi3 reviewed Nov 17, 2023

View reviewed changes

chuntian236 added 2 commits December 6, 2023 22:20

Update parse.py

ea86153

Add files via upload

c7eb8e6

Update parse.py

ae8f261

chuntian236 added 4 commits December 27, 2023 15:56

Update parse.py

aa9f747

Update parse.py

98610f2

To pass code quality check.

Update parse.py

8bf8c19

To pass code quality check.

Update parse.py

dd6feee

To pass quality check.

matthewcarbone marked this pull request as ready for review January 2, 2024 19:36

matthewcarbone merged commit 030f9eb into master Jan 2, 2024
13 of 14 checks passed

matthewcarbone deleted the postprocess-benchmark-cc branch January 2, 2024 19:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add parse.py for reading spectral data from completed jobs #218

Add parse.py for reading spectral data from completed jobs #218

chuntian236 commented Nov 16, 2023

codecov-commenter commented Nov 16, 2023 •

edited

Loading

matthewcarbone left a comment

matthewcarbone commented Nov 17, 2023

FCMeng commented Nov 20, 2023 •

edited

Loading

deyulu commented Nov 21, 2023

FCMeng commented Nov 21, 2023

deyulu commented Nov 21, 2023

matthewcarbone commented Nov 22, 2023

chuntian236 commented Dec 7, 2023

matthewcarbone commented Dec 16, 2023

matthewcarbone commented Jan 2, 2024

chuntian236 commented Jan 2, 2024

matthewcarbone commented Jan 3, 2024

Add parse.py for reading spectral data from completed jobs #218

Add parse.py for reading spectral data from completed jobs #218

Conversation

chuntian236 commented Nov 16, 2023

codecov-commenter commented Nov 16, 2023 • edited Loading

Codecov Report

matthewcarbone left a comment

Choose a reason for hiding this comment

matthewcarbone commented Nov 17, 2023

FCMeng commented Nov 20, 2023 • edited Loading

deyulu commented Nov 21, 2023

FCMeng commented Nov 21, 2023

deyulu commented Nov 21, 2023

matthewcarbone commented Nov 22, 2023

chuntian236 commented Dec 7, 2023

matthewcarbone commented Dec 16, 2023

matthewcarbone commented Jan 2, 2024

chuntian236 commented Jan 2, 2024

matthewcarbone commented Jan 3, 2024

codecov-commenter commented Nov 16, 2023 •

edited

Loading

FCMeng commented Nov 20, 2023 •

edited

Loading