Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add new functions/methods to featurize module #330

Draft
wants to merge 96 commits into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from 6 commits
Commits
Show all changes
96 commits
Select commit Hold shift + click to select a range
328f315
add new utils functions to featurize module
naik-aakash Sep 10, 2024
51a7bdd
attempt to fix coverage files missing in CI
naik-aakash Sep 10, 2024
2804eb8
revert workflow changes
naik-aakash Sep 10, 2024
2085d3e
update artifacts version in CI
naik-aakash Sep 10, 2024
67d6d97
Merge branch 'JaGeo:main' into update_featurizers
naik-aakash Sep 11, 2024
6a3f177
add method to get unique atom pair ICOHPs (option to scale by reduced…
naik-aakash Sep 11, 2024
b29aa97
add batch unique bonds df method
naik-aakash Sep 11, 2024
6b7a15e
Merge branch 'JaGeo:main' into update_featurizers
naik-aakash Sep 11, 2024
a8f9399
Merge branch 'JaGeo:main' into update_featurizers
naik-aakash Sep 16, 2024
272ad66
include unique atom pair bond stats
naik-aakash Sep 16, 2024
3823d08
include unique atom pair bond stats
naik-aakash Sep 16, 2024
d15200a
scaled with / instead of *
naik-aakash Sep 16, 2024
fd2d9ba
update ref test durations file
naik-aakash Sep 16, 2024
030af3c
add rm_weighted_icohps args to batch featu
naik-aakash Sep 16, 2024
e834c2e
add tests for core, utils and batch
naik-aakash Sep 16, 2024
1d5aefd
Merge branch 'JaGeo:main' into update_featurizers
naik-aakash Oct 1, 2024
832d7f2
Merge branch 'main' into update_featurizers
naik-aakash Oct 18, 2024
be85afa
pre-commit auto-fixes
pre-commit-ci[bot] Oct 18, 2024
c0e0feb
Merge branch 'JaGeo:main' into update_featurizers
naik-aakash Oct 26, 2024
fcdc986
update pytest durations file
naik-aakash Oct 26, 2024
8640bda
remove unique bonds featurizers > redundant
naik-aakash Oct 26, 2024
c9441d0
remove redundant tests
naik-aakash Oct 26, 2024
e8cb3d7
update test durations for new tests
naik-aakash Oct 26, 2024
3f6a718
add BWDF computation featurizer
naik-aakash Oct 26, 2024
d6fdc9b
add tests for BWDF featurizer
naik-aakash Oct 26, 2024
ff6be02
reduce test splits
naik-aakash Oct 26, 2024
60b3a93
add option to get site based bwdf
naik-aakash Oct 28, 2024
a116be5
Merge branch 'JaGeo:main' into update_featurizers
naik-aakash Oct 29, 2024
2149508
fix BWDF implementation
naik-aakash Oct 30, 2024
aac8567
fix test
naik-aakash Oct 30, 2024
c79a750
add option to get bwdf for specific label and none normalization
naik-aakash Oct 30, 2024
82204dd
add test for label bwdf
naik-aakash Oct 30, 2024
388c15d
Merge branch 'JaGeo:main' into update_featurizers
naik-aakash Nov 5, 2024
0ff13ff
Merge branch 'JaGeo:main' into update_featurizers
naik-aakash Nov 18, 2024
d5ccf32
add option to normalize with EIN
naik-aakash Nov 21, 2024
619cf2c
minor refactor to reduce code repetition and remove commented lines
naik-aakash Nov 24, 2024
23758cc
tweak test to refactored outputs
naik-aakash Nov 24, 2024
0efa7d0
Merge branch 'JaGeo:main' into update_featurizers
naik-aakash Nov 24, 2024
085393e
add bwdf dist stats
naik-aakash Nov 25, 2024
2a6e5f9
update test to accomate noise exclusion
naik-aakash Nov 25, 2024
53f2b54
add counts norm
naik-aakash Nov 26, 2024
fa21a80
adapt test
naik-aakash Nov 26, 2024
758bb33
fix EIN normalization
naik-aakash Nov 27, 2024
008c584
handle zerodevision errors on EIN norm
naik-aakash Dec 2, 2024
e09ddd3
include tests in lint
naik-aakash Dec 2, 2024
a5b10ad
update bwdf tests
naik-aakash Dec 2, 2024
7761328
add missing doc-strings
naik-aakash Dec 2, 2024
35749d2
pre-commit auto-fixes
pre-commit-ci[bot] Dec 2, 2024
e5976b1
exclude test data from pre-commit
naik-aakash Dec 2, 2024
0a67fd0
add tests
naik-aakash Dec 2, 2024
cf5fb8e
rename method for getting bwdf stats
naik-aakash Dec 2, 2024
b259744
add icoxx feat exception test
naik-aakash Dec 2, 2024
8926ddf
Merge branch 'JaGeo:main' into update_featurizers
naik-aakash Dec 3, 2024
4cc548d
pre-commit auto-fixes
pre-commit-ci[bot] Dec 3, 2024
2736782
address pre-commit remapping warning
naik-aakash Dec 3, 2024
52757bc
fix linting
naik-aakash Dec 3, 2024
51effde
tackle incorrect icoxxlist trans vectors
naik-aakash Dec 4, 2024
90df179
update tests
naik-aakash Dec 4, 2024
30a7805
rename variables for clarity
naik-aakash Dec 5, 2024
2b70cb2
accept suggestion
naik-aakash Dec 6, 2024
7aa750a
move CoxxFingerprint to featurize.utils
naik-aakash Dec 6, 2024
78e5569
replace POSCAR.gz with CONTCAR.gz
naik-aakash Dec 6, 2024
bc9a163
change default structure file read to CONTCAR
naik-aakash Dec 6, 2024
b057c53
adapt tests to default structure file name change
naik-aakash Dec 6, 2024
9fc6381
replace POSCAR with CONTCAR
naik-aakash Dec 6, 2024
294848b
replace POSCAR with CONTCAR in examples
naik-aakash Dec 6, 2024
7097dd5
adapt tutorials to match the change
naik-aakash Dec 6, 2024
c60e7b5
change structure default in cli.py from POSCAR to CONTCAR
naik-aakash Dec 6, 2024
8358ae8
remove ein norm and unique option
naik-aakash Dec 6, 2024
04a3223
remove debug print
naik-aakash Dec 6, 2024
5dace1b
revise tests
naik-aakash Dec 6, 2024
c04a42c
use np.isclose for numerical comparisons
naik-aakash Dec 6, 2024
0c3ca4e
final numerical comparisons fix
naik-aakash Dec 7, 2024
8ac99ab
accomodate minor changes in expected results due to numerical prec is…
naik-aakash Dec 7, 2024
9657efe
fix [0,0,0] rev trans being left out, refactor code a bit for clarity
naik-aakash Dec 8, 2024
68ad785
update test
naik-aakash Dec 8, 2024
40710ef
update tests
naik-aakash Dec 8, 2024
4c37cec
fix number of bins, include wasserstein distance to rdf
naik-aakash Dec 10, 2024
9b82fe6
update tests
naik-aakash Dec 10, 2024
f1f9dca
Added FeaturizeIcoxxlist method that returns BWDF values sorted by di…
kaueltzen Dec 10, 2024
772f77f
Added FeaturizeIcoxxlist method that returns distances sorted by BWDF…
kaueltzen Dec 10, 2024
e879b3e
Changed error handling, added tests.
kaueltzen Dec 10, 2024
4cbfad0
Added test case with pos. and neg. BWDF values, some refactoring and …
kaueltzen Dec 11, 2024
2517fed
Moved exception testing
kaueltzen Dec 11, 2024
528b414
Refactoring and started to add sorted feats into batch
kaueltzen Dec 11, 2024
cfd891f
remove EXE002
naik-aakash Dec 11, 2024
6736817
fix doc-string formatting
naik-aakash Dec 11, 2024
b272d6a
adapt test to reflect lastest change
naik-aakash Dec 11, 2024
e90a660
filter encoding warnings of pymatgen>monty update (new release of pym…
naik-aakash Dec 11, 2024
307bae0
Added more core tests
kaueltzen Dec 11, 2024
2d40372
Added sorting featurizers to batch.py, started adding batch tests.
kaueltzen Dec 12, 2024
8782b93
More batch tests
kaueltzen Dec 12, 2024
3dcc1f5
Batch test fix
kaueltzen Dec 12, 2024
18f63a0
Added batch exception test for sorting featurizer
kaueltzen Dec 12, 2024
7a9947e
accept suggestion
naik-aakash Dec 12, 2024
77a4fbc
accept suggestion
naik-aakash Dec 12, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion src/lobsterpy/featurize/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,4 +3,4 @@

"""This package provides the modules for featurzing Lobster data ready for ML studies."""

from .utils import get_file_paths, get_structure_path
from .utils import get_electronegativities, get_file_paths, get_reduced_mass, get_structure_path, sort_dict_by_value
75 changes: 74 additions & 1 deletion src/lobsterpy/featurize/core.py
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@

from lobsterpy.cohp.analyze import Analysis

from . import get_file_paths
from . import get_electronegativities, get_file_paths, get_reduced_mass

warnings.filterwarnings("ignore")

Expand Down Expand Up @@ -318,6 +318,79 @@ def get_lobsterpy_cba_dict(path_to_lobster_calc: str | Path, bonds: str, orbital

return data

@staticmethod
def get_unique_bonds_df(
path_to_lobster_calc: str | Path,
bonds: str,
summed_icohps: bool,
rm_weighted_icohps: bool,
ids: str | None = None,
) -> pd.DataFrame:
"""
Generate a Pandas dataframe with icohps mean / summed icohps for unique bonds.

:param path_to_lobster_calc: path to lobsterpy lightweight json file
:param bonds: "all" or "cation-anion" bonds
:param summed_icohps: bool indicating whether to sum the icohps for the unique bonds
:param rm_weighted_icohps: bool indicating whether to use reduced mass as weights for icohps
:param ids: set index name in the pandas dataframe. Default is None.

Returns:
Returns a pandas dataframe with icohps for unique bonds as columns

"""
file_paths = get_file_paths(
path_to_lobster_calc=path_to_lobster_calc, requested_files=["structure", "cohpcar", "icohplist", "charge"]
)

try:
analyse = Analysis(
path_to_poscar=str(file_paths.get("structure")),
path_to_icohplist=str(file_paths.get("icohplist")),
path_to_cohpcar=str(file_paths.get("cohpcar")),
path_to_charge=str(file_paths.get("charge")),
cutoff_icohp=0.10,
which_bonds=bonds,
orbital_resolved=False,
)

except ValueError:
analyse = None

if not ids:
ids = Path(path_to_lobster_calc).name

# define a pandas dataframe
df = pd.DataFrame(index=[ids])

pair_icohps = {}
rm_pairs = {}
for plot_label in analyse.get_site_bond_resolved_labels():
atom_pair = plot_label.split(":")[-1].strip().split("-")
pair_rm = get_reduced_mass(atom_pair)
pair_en = get_electronegativities(atom_pair)
# sort atom names based on electronegativity
en_sorted_atom_pair = [x for _, x in sorted(zip(pair_en, atom_pair))]
rm_pairs["-".join(en_sorted_atom_pair)] = pair_rm
for lab in analyse.get_site_bond_resolved_labels()[plot_label]:
val = analyse.chemenv.Icohpcollection.get_icohp_by_label(lab)
if "-".join(en_sorted_atom_pair) not in pair_icohps:
pair_icohps["-".join(en_sorted_atom_pair)] = [val]
else:
pair_icohps["-".join(en_sorted_atom_pair)].append(val)

for atom_pair in pair_icohps:
if not summed_icohps and not rm_weighted_icohps:
df.loc[ids, f"{atom_pair}_icohp_mean"] = np.mean(pair_icohps[atom_pair])
elif summed_icohps and not rm_weighted_icohps:
df.loc[ids, f"{atom_pair}_icohp_sum"] = np.sum(pair_icohps[atom_pair])
elif not summed_icohps and rm_weighted_icohps:
df.loc[ids, f"{atom_pair}_rm_icohp_mean"] = np.mean(pair_icohps[atom_pair]) * rm_pairs[atom_pair]
else:
df.loc[ids, f"{atom_pair}_rm_icohp_sum"] = np.sum(pair_icohps[atom_pair]) * rm_pairs[atom_pair]

return df


class CoxxFingerprint(NamedTuple):
"""
Expand Down
38 changes: 38 additions & 0 deletions src/lobsterpy/featurize/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@

from pathlib import Path

from mendeleev import element
from monty.os.path import zpath


Expand Down Expand Up @@ -91,3 +92,40 @@ def get_structure_path(lobster_path: Path) -> Path:
return gz_file_path

raise Exception


def get_reduced_mass(atom_pair: list[str]) -> float:
"""
Compute reduced mass between a pair of atoms.

:param atom_pair: list of atomic species symbols in string

:return: reduced mass
"""
atom1 = element(atom_pair[0])
atom2 = element(atom_pair[1])
return (atom1.atomic_weight * atom2.atomic_weight) / (atom1.atomic_weight + atom2.atomic_weight)
naik-aakash marked this conversation as resolved.
Show resolved Hide resolved


def get_electronegativities(atom_pair: list[str]) -> list[float]:
"""
Get allen electronegativities for a pair of atoms.
naik-aakash marked this conversation as resolved.
Show resolved Hide resolved

:param atom_pair: list of atomic species symbols in string

:return: list of allen electronegativities
naik-aakash marked this conversation as resolved.
Show resolved Hide resolved
"""
atom1 = element(atom_pair[0])
atom2 = element(atom_pair[1])
return [atom1.electronegativity_allen(), atom2.electronegativity_allen()]


def sort_dict_by_value(input_dict: dict[str, float]) -> dict:
"""
Sort dictionary by values.

:param input_dict: input dictionary

:return: sorted dictionary
"""
return dict(sorted(input_dict.items(), key=lambda item: item[1]))
Loading