(OTF) Normalization and element references #715

lbluque · 2024-05-24T22:08:09Z

This PR enables (on the fly) fitting and estimation of normalization values and element references

Normalizers and LinearReference modules are trainer attributes.
This also cleans up the use of linear references previously inside datasets - they are now saved as part of the checkpoint (no need to insert them into checkpoints after training for testing/inference)
Snuck in a fix when reading ASE Datasets from a list of paths
normalization values and/or linear references can be estimated at runtime before training. The config also allows to hard set a value for mean or rmsd (root mean square difference). ie a config to enable this, in which forces mean is set to zero and so the estimated rmsd will correspond to rms force:

dataset:
  train:
    tranforms:
      normalizer:
          fit:
              targets:
                 energy: {}
                 forces: { mean: 0.0 }
          batch_size: 32
          num_batches: 1000
      element_references:
        fit:
          targets:
            - energy
          batch_size: 32
          num_batches: 1000

added scripts to fit linear references and/or normalizers using the train dataset in a standard config (with fitting directive as specified above), i.e.

python src/fairchem/core/scripts/fit_references.py --config path/to/config.yml
python src/fairchem/core/scripts/fit_normalizers.py --config path/to/config.yml --linref-path path/energy_linref.pt

linear references can also be passed as a file in the dataset/transforms block (for example if fit with above script, or legacy npz files):

      element_references:
        energy:
          file: /path/to/file.pt/or/npz

normalization values can also be passed from a file for many targets (the script above generates a dict with targets and normalizers):

      normalizer:
        file: norms.pt

or they can be passed by individual files (an npz or state_dict.pt with "mean" and "std")

      normalizer:
        energy:
          file: energy_norms.pt  # or .npz

using lin_ref for linear references inside datasets is still enabled for backwards compatibility.

TODO:

Make sure that otf_fit does not refit on resubmission
Write unit-tests
Add option to run fit normalizers/element references and save

codecov · 2024-05-24T22:36:50Z

Codecov Report

Attention: Patch coverage is 93.05994% with 22 lines in your changes missing coverage. Please review.

Files	Patch %	Lines
.../fairchem/core/modules/normalization/normalizer.py	92.92%	8 Missing ⚠️
...fairchem/core/modules/normalization/_load_utils.py	80.55%	7 Missing ⚠️
...m/core/modules/normalization/element_references.py	95.69%	4 Missing ⚠️
src/fairchem/core/trainers/ocp_trainer.py	90.00%	2 Missing ⚠️
src/fairchem/core/datasets/ase_datasets.py	87.50%	1 Missing ⚠️

Files	Coverage Δ
src/fairchem/core/common/distutils.py	`56.89% <100.00%> (+2.35%)`	⬆️
src/fairchem/core/modules/transforms.py	`55.17% <100.00%> (ø)`
src/fairchem/core/trainers/base_trainer.py	`89.55% <100.00%> (+0.72%)`	⬆️
src/fairchem/core/datasets/ase_datasets.py	`86.81% <87.50%> (-0.22%)`	⬇️
src/fairchem/core/trainers/ocp_trainer.py	`68.96% <90.00%> (+1.47%)`	⬆️
...m/core/modules/normalization/element_references.py	`95.69% <95.69%> (ø)`
...fairchem/core/modules/normalization/_load_utils.py	`80.55% <80.55%> (ø)`
.../fairchem/core/modules/normalization/normalizer.py	`92.92% <92.92%> (ø)`

... and 1 file with indirect coverage changes

…ms-and-refs

lbluque · 2024-08-02T15:51:18Z

@mshuaibii @misko @wood-b finally here are some validation training runs,
https://fairwandb.org/fairchem/norms-refs-val

we should be set to go now!

wood-b

LGTM. Great job pushing this through and thanks for the validation!

(cherry picked from commit 029d4d3)

* denorm targets in _forward only * linear reference class * atomref in normalizer * raise input error * clean up normalizer interface * add element refs * add element refs correctly * ruff * fix save_checkpoint * reference and dereference * 2xnorm linref trainer add * clean-up * otf linear reference fit * fix tensor device * otf element references and normalizers * use only present elements when fitting * lint * _forward norm and derefd values * fix list of paths in src * total mean and std * fitted flag to avoid refitting normalizers/references on rerun * allow passing lstsq driver * element ref unit tests * remove superfluous type * lint fix * allow setting batch_size explicitly * test applying element refs * normalizer tests * increase distributed timeout * save normalizers and linear refs in otf_fit * remove debug code * fix removing refs * swap otf_fit for fit, and save all normalizers in one file * log loading and saving normalizers * fit references and normalizer scripts * lint fixes * allow absent optim key in config * lin-ref description * read files based on extension * pass seed * rename dataset fixture * check if file is none * pass generator correctly * separate method for norms and refs * add normalizer code back * fix Generator construction * import order * log warnings if multiple inputs are passed * raise Error if duplicate references or norms are set * use len batch * assert element reference targets are scalar * fix name and rename method * load and save norms and refs using same logic * fix creating normalizer * remove print statements * adding new notebook for using fairchem models with NEBs without CatTSunami enumeration (#764) * adding new notebook for using fairchem models with NEBs * adding md tutorials * blocking code cells that arent needed or take too long * warn instead of error when duplicate norm/ref target names * allow timeout to be read from config * test seed noseed ref fits * lotsa refactoring * lotsa fixing * more fixing... * num_workers zero to prevent mp issues * add otf norms smoke test and fixes * allow overriding normalization fit values * update tests * fix normalizer loading * use rmsd instead of only stdev * fix tests * correct rmsd calc and fix loading * clean up norm loading and log values * logg linear reference metrics * load element references state dict * fix loading and tests * fix imports in scripts * fix test? * fix test * use numpy as default to fit references * minor fixes * rm torch_tempdir fixture --------- Co-authored-by: Brook Wander <[email protected]> Co-authored-by: Muhammed Shuaibi <[email protected]> Former-commit-id: 4ad6633733df9c76620ee779b6851a119e920f0b

lbluque added 16 commits May 20, 2024 13:50

denorm targets in _forward only

4f0d91a

linear reference class

c5e997b

atomref in normalizer

03a3f66

raise input error

57174cc

clean up normalizer interface

80d71c4

add element refs

2219d2c

add element refs correctly

390a19e

ruff

fb99a52

fix save_checkpoint

bc6b864

reference and dereference

2a7804f

2xnorm linref trainer add

c2914f4

clean-up

8e4f491

otf linear reference fit

578a73f

fix tensor device

7607591

otf element references and normalizers

64eb32d

use only present elements when fitting

8de9a30

lbluque marked this pull request as draft May 24, 2024 22:08

lint

caad844

lbluque and others added 11 commits May 24, 2024 16:06

_forward norm and derefd values

75e72b1

Merge branch 'main' into norms-and-refs

e944a06

fix list of paths in src

ad36406

total mean and std

27b9e7f

fitted flag to avoid refitting normalizers/references on rerun

0ca227a

allow passing lstsq driver

d2af7c9

Merge branch 'main' into norms-and-refs

2913e12

element ref unit tests

4295330

remove superfluous type

75f3a51

lint fix

d7b4a98

Merge branch 'main' of https://github.com/FAIR-Chem/fairchem into nor…

c26362f

…ms-and-refs

lbluque added 5 commits July 19, 2024 15:15

correct rmsd calc and fix loading

e6c2252

clean up norm loading and log values

02ebac2

logg linear reference metrics

72a5f50

load element references state dict

b82b0dc

fix loading and tests

c2896a5

lbluque requested review from wood-b, mshuaibii and misko July 20, 2024 00:19

lbluque and others added 6 commits July 22, 2024 15:23

fix imports in scripts

f7bae74

fix test?

a8ebdbd

fix test

71d19da

use numpy as default to fit references

6875b93

minor fixes

998c401

Merge branch 'main' into norms-and-refs

86b65e9

misko previously approved these changes Aug 2, 2024

View reviewed changes

merge upstream

b4f3096

lbluque dismissed misko’s stale review via b4f3096 August 2, 2024 22:22

rm torch_tempdir fixture

f35ce82

wood-b approved these changes Aug 3, 2024

View reviewed changes

lbluque added this pull request to the merge queue Aug 5, 2024

Merged via the queue into main with commit 029d4d3 Aug 5, 2024
7 checks passed

lbluque deleted the norms-and-refs branch August 5, 2024 04:14

lbluque added a commit that referenced this pull request Aug 6, 2024

(OTF) Normalization and element references (#715)

67ac99a

(cherry picked from commit 029d4d3)

lbluque added a commit that referenced this pull request Aug 6, 2024

(OTF) Normalization and element references (#715)

8388d8f

(cherry picked from commit 029d4d3)

zulissimeta added enhancement New feature or request minor Minor version release labels Aug 13, 2024

lbluque mentioned this pull request Sep 4, 2024

Need tests for loading checkpoints trained with otf normalizers #830

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

(OTF) Normalization and element references #715

(OTF) Normalization and element references #715

lbluque commented May 24, 2024 •

edited

Loading

codecov bot commented May 24, 2024 •

edited

Loading

lbluque commented Aug 2, 2024 •

edited

Loading

wood-b left a comment

(OTF) Normalization and element references #715

(OTF) Normalization and element references #715

Conversation

lbluque commented May 24, 2024 • edited Loading

codecov bot commented May 24, 2024 • edited Loading

Codecov Report

lbluque commented Aug 2, 2024 • edited Loading

wood-b left a comment

Choose a reason for hiding this comment

lbluque commented May 24, 2024 •

edited

Loading

codecov bot commented May 24, 2024 •

edited

Loading

lbluque commented Aug 2, 2024 •

edited

Loading