Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add parse.py for reading spectral data from completed jobs #218

Merged
merged 9 commits into from
Jan 2, 2024

Conversation

chuntian236
Copy link
Collaborator

Comments on the function for parsing spectra goes in this PR.

parse.py

Jupyter notebook:
parse_spectra.ipynb

@codecov-commenter
Copy link

codecov-commenter commented Nov 16, 2023

Codecov Report

Attention: 143 lines in your changes are missing coverage. Please review.

Comparison is base (ff29d13) 77.21% compared to head (dd6feee) 67.87%.
Report is 3 commits behind head on master.

Files Patch % Lines
lightshow/postprocess/parse.py 0.00% 143 Missing ⚠️

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@            Coverage Diff             @@
##           master     #218      +/-   ##
==========================================
- Coverage   77.21%   67.87%   -9.34%     
==========================================
  Files          13       14       +1     
  Lines        1040     1183     +143     
==========================================
  Hits          803      803              
- Misses        237      380     +143     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Contributor

@matthewcarbone matthewcarbone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've reviewed extract_FEFF for now. There are a few places where I think we need changes.

One thing in particular, I don't think we necessarily should be catching these errors in try/except statements. Since the user is going to use extract_FEFF directly via our API, we want these errors to be thrown if, for example, the files are not found.

I'll review the rest of this after we make changes to extract_FEFF!

lightshow/postprocess/parse.py Outdated Show resolved Hide resolved
lightshow/postprocess/parse.py Outdated Show resolved Hide resolved
lightshow/postprocess/parse.py Outdated Show resolved Hide resolved
lightshow/postprocess/parse.py Outdated Show resolved Hide resolved
lightshow/postprocess/parse.py Outdated Show resolved Hide resolved
@matthewcarbone
Copy link
Contributor

@chuntian236 we should also clear the output of the Jupyter notebook before pushing so it takes up less space.

@matthewcarbone matthewcarbone marked this pull request as draft November 17, 2023 03:05
@matthewcarbone matthewcarbone changed the title Please comment on "parse.py" code in branch "postprocess-benchmark-cc" Add parse.py for reading spectral data from completed jobs Nov 17, 2023
lightshow/postprocess/parse.py Outdated Show resolved Hide resolved
lightshow/postprocess/parse.py Outdated Show resolved Hide resolved
lightshow/postprocess/parse.py Outdated Show resolved Hide resolved
lightshow/postprocess/parse.py Outdated Show resolved Hide resolved
lightshow/postprocess/parse.py Outdated Show resolved Hide resolved
lightshow/postprocess/parse.py Outdated Show resolved Hide resolved
lightshow/postprocess/parse.py Outdated Show resolved Hide resolved
lightshow/postprocess/parse.py Show resolved Hide resolved
lightshow/postprocess/parse.py Outdated Show resolved Hide resolved
@FCMeng
Copy link
Collaborator

FCMeng commented Nov 20, 2023

I have a few general comments here:

  1. The output quantity is different for these codes, some of them are epsilon2, while some are cross section. Might need a dict key to indicate this distinction.
  2. The output from the current parser is the averaged spectra among all polarization direction. Might be better if we can have an option to output spectra for different polarization direction.
  3. For XSpectra, might be great if the total energy from the gs_out is also recorded, which is important for the alignment using delta SCF.
  4. For OCEAN, the spectra from all the sites are recorded, which might be redundant if they are equivalent sites.
  5. For OCEAN, the spectra from all the elements are collected, which leads the output from the OCEAN parser different from others in terms of the spectra key. Might be good if we can keep the key-value pair consistent for all the codes, especially for the spectrum. In this case, might need a human input for the target element.
  6. Might need a key for the site index.
  7. I cannot remember correctly for this one so I might be wrong. We might need a better way to match the site index between OCEAN and other codes.The site index for all elements in OCEAN starts from 1. For example a system has three Ti sites (all equivalent) and three O sites (all equivalent). For other codes other than OCEAN, the site would be Ti_000 and O_003. For Ti in OCEAN, it would be absspct_Ti.0001_1s_0?, absspct_Ti.0002_1s_0?, absspct_Ti.0003_1s_0?; for O in OCEAN, then it would be absspct_O.0001_1s_0?, absspct_O.0002_1s_0?, absspct_O.0003_1s_0?, which leads the site index quite different compared to other codes.

@deyulu
Copy link
Collaborator

deyulu commented Nov 21, 2023

Thank @fanchen0121 for a very comprehensive message. 1. Excellent point. ocean, exciting and vasp calculate epsilon2; feff and xspectra calculate cross section. 2. It is a good idea to save spectra from different polarization; in that event, we also need to save polarization directions. --perhaps for the next update. 3. I'd like to keep parsing spectra and performing edge alignment separately, as users may not need to do edge alignment. 4. Just from output, I don't think there is an easy way to remove this redundancy. 5. I think it is ok that ocean parser generate spectra on multiple elements, as far as the dict keys are clear. 6. That's a good idea. 7. I remember the same. We may need an option to "standardize" the site index. @chuntian236 please think about these suggestions, which can be addressed in this round and which can be pushed to a later update?

@FCMeng
Copy link
Collaborator

FCMeng commented Nov 21, 2023

Another thought on point 1, we do the conversion on the fly when parsing the data or we also save the structure data, which might also be useful for the site index. Otherwise, we cannot do the conversion if we do not have the structure data or don't do the conversion on the fly.
I want to clarify for point 3, I recommend also record the total energy from gs.out because I saw the total energy from es.out is recorded. I think we can try to find gs.out, if the file not found, then just record a nan for the gs total energy key.

@deyulu
Copy link
Collaborator

deyulu commented Nov 21, 2023

@fanchen0121 For point 1, I think epsilon2 to cross section is not a general use case and most of the users don't have to worry about. I suggest that we treat this separately, and as you said we also need structure. For point 3, gs.out is always generated when a user compute xanes with xspectra, but gs.out is not required. So I think it makes sense to parse out total energy from es.out and leave the total energy in gs.out for the alignment module.

@matthewcarbone
Copy link
Contributor

@deyulu @FCMeng thank you both for the points and discussion. Fanchen, can you open up some issues so we can track these suggestions carefully? Thanks!

@chuntian236
Copy link
Collaborator Author

Replying to Fanchen's comments:
First thank you Fanchen (and all others) for the valuable suggestions!
On 12.3.2023 update for parse.py:

  1. A dict key "label" is added to indicate if the result is cross-section or epsilon2.
  2. Different polarization direction is added in the output dictionary.
  3. I decided not to add total energy from gs.out in XSpectra, because not all people calculate ground state, so that file might not exist.
  4. The different structure in OCEAN output folder is a problem. Now I just make sure the "sub-dictionary" of the OCEAN parsing dictionary has the same structure and keys as the outputs from other codes. I.e., dict_ocean['Ti']['0001_1s'] has the same keys as dict_vasp.
  5. About key for site index - I decided not to include it, because the function do not have site index info from the inputs, while getting this info from input folder name is not a good idea because the folder name format may be different. I assume the user should know the site from their folders. For similar reasons, I didn't do site index matching either.

@matthewcarbone
Copy link
Contributor

@chuntian236 @deyulu Since I know you both want to merge we can basically proceed. Before we do, Chuntian, can you run the Black formatter on this code? It's failing Black and flake8 checks. It might seem petty but these types of formatting differences creep in and can make the code very hard to maintain/read. Once that's done we can merge.

We also (later) might want to look into using something faster than a for loop over every line of the OUTCAR. I have a feeling that's going to be painfully slow.

To pass code quality check.
To pass code quality check.
To pass quality check.
@matthewcarbone
Copy link
Contributor

@chuntian236 I'm going to merge this for now and I'll take care of why Black is failing later.

@matthewcarbone matthewcarbone marked this pull request as ready for review January 2, 2024 19:36
@matthewcarbone matthewcarbone merged commit 030f9eb into master Jan 2, 2024
13 of 14 checks passed
@matthewcarbone matthewcarbone deleted the postprocess-benchmark-cc branch January 2, 2024 19:37
@chuntian236
Copy link
Collaborator Author

@chuntian236 I'm going to merge this for now and I'll take care of why Black is failing later.

I think I know why's Black is failing. Shall I commit in the master branch? Or make a new branch to try?

@matthewcarbone
Copy link
Contributor

@chuntian236 go for another PR through another branch if you want! Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants