-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement CN bond fraction, System Analysis for binary/ternary, implement cifkit, site-map #22
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
bobleesj
added
documentation
Improvements or additions to documentation
enhancement
New feature or request
labels
Jun 25, 2024
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What has been added:
.cif
files.cifkit
to conduct bonding and CN analysis.cifkit
to preprocess.cif
files.CIF Bond Analyzer (CBA)
The CIF Bond Analyzer (CBA) is an interactive, command-line-based application designed for high-throughput extraction of bonding information from CIF (Crystallographic Information File) files. CBA offers Site Analysis, System Analysis for binary/ternary systems, and Coordination Analysis. The outputs are saved in
.json
,.xlsx
, and.png
formats.The current README.md serves as a tutorial and documentation.
Value
CBA simplifies crystal structure analysis by automating the extraction of minimum bond lengths, which are crucial for understanding geometric configurations and identifying irregularities. Histograms and figures assist in identifying distinct bond lengths and structural patterns.
Demo
The code is designed for interactive use without the need to write any code.
Installation and tutorial
Copy each line into your command-line applications:
Once the code is executed using
python main.py
, the following prompt will appear, asking you to choose one of the three analysis options:For any option, CBA will ask you to choose folders containing
.cif
files:You may then choose to process folders either sequentially or select specific folders by entering numbers associated with the folders prompted.
For each folder, CBA generates site pair data saved in
site_pairs.json
orsite_pairs.xlsx
.Preprocess
The following discusses formatting, supercell generation, and atomic mixing information.
1. Format files
CBA uses the
CifEnsemble
object fromcifkit
to conduct preprocessing automatically.CBA standardizes the site labels in
atom_site_label
. Some site labels may contain a comma or a symbol such asM
due to atomic mixing. CBA reformats eachatom_site_label
so it can be parsed into an element type that matchesatom_site_type_symbol
.CBA removes the content of
publ_author_address
. This section often has an incorrect format that otherwise requires manual modifications.CBA relocates any ill-formatted files, such as those with duplicate labels in
atom_site_label
, missing fractional coordinates, or files that require supercell generation.2. Supercell generation
For each
.cif
file, a unit cell is generated by applying the symmetry operations. A supercell is generated by applying ±1 shifts from the unit cell.3. Atomic mixing info
Each bonding pair is defined with one of four atomic mixing categories:
Analysis Options
CBA provides three options for analysis.
Option 1. Site Analysis
Purpose: Site Analysis determines the shortest distance and its nearest neighbor for each label in
atom_site_label
.Process: For each atom in the unit cell, Euclidean distances are calculated from the atom to all atoms in the supercell. The position of the atom in the unit cell for each site label is determined based on the atom with the greatest number of shortest distances to its neighbors.
Example: If a
.cif
file underatom_site_label
contains four site labels:Er1
,Er2
,Er3
, andEr4
. The bonding pair from the site labelEr4
and its nearest neighborEr2
is unique and recorded. The bonding pair fromEr3
toEr2
is also considered unique. However, the pairsEr4-Er2
andEr2-Er4
are considered identical. Out of the two pairs, the pair with the shorter distance is recorded below.Output 1.1 Excel and JSON
Data for each folder is saved in
site_pairs.json
orsite_pairs.xlsx
. Below is an example of the JSON structure for bond pairs:The minimum bond pair for each file is saved in
element_pairs.json
andelement_pairs.xlsx
.Here is a screenshot of
element_pairs.xlsx
.Output 1.2 text summary
A summary text file,
summary_element.txt
, lists the shortest bonding pairs and identifies missing pairs across selected folders:Output 1.3 histograms
histogram_element_pair.png
andhistogram_site_pair.png
are used visualize data, with colors indicating atomic mixing types.python plot-histogram.py
. This script allows you to interactively specify parameters such as the bin width and x-axis range:Option 2. System Analysis
Purpose: System Analysis provides an overview of bond fractions acquired from Option 1: Site Analysis, or bond fractions in coordination number geometries.
Scope: System Analysis is applicable for folders containing either 2 or 3 unique elements.
4 types of folders are applicable for System Analysis.
Here is an example of CBA detecting folders containing 2 or 3 unique elements.
Output 2.1 Binary/ternary figures
For Types 2, 3, and 4:
To customize the legend position in the ternary diagram, you may modify the values of
X_SHIFT = 0.0
andY_SHIFT = 0.0
incore/configs/ternary.py
.For Type 1:
All of the individual hexagon figures also saved in order.
Output 2.2 Color map
For Types 2, 3, and 4, color maps for each bond type and overall are generated.
Output 2.3 Excel
Bond count per each
cif
file is recorded insystem_analysis_files.xlsx
.Average bond lenghts, count, and statistical values are recorded in
system_analysis_main.xlsx
.Option 3. Coordination Analysis
Purpose: This option determines the best coordination geometry using four methods provided in
cifkit
. Excel files and JSON are saved with nearest neighbor information.Customization: The Excel contains
Δ
, which is defined as the interatomic distance subtracted by the sum of atomic radii. You may provide your radii values by modifying the radii.xlsx file.Ouput 3.1 JSON
Output 3.2 Excel
A screenshot is provided below. Each sheet contains the file name and the formula associated with the file.
Installation
If you are interested in using
Conda
with a new environment run the following:Contributors
Questions?
Please feel free to reach out via [email protected] for any questions.
Changelog
black
. See Pull #12.