Skip to content

Latest commit

 

History

History
16 lines (11 loc) · 642 Bytes

README.md

File metadata and controls

16 lines (11 loc) · 642 Bytes

Regression

The simulation data is obtained from https://www.osti.gov/dataexplorer/biblio/dataset/1198813

The embedding data is generated for each model

The regression task uses a linear model to predict the energy, and we compare the cross-validated MSE on simulation data only with combined simulation data and LLM embeddings

  • Apply LLM embeddings to improve regression, e.g.
python llm-regression.py --data FePt_lsms.csv --emb FePt_emb_matscibert.csv

MSE on simulation data:  0.22071666644181817
MSE on simulation data + LLM embedding:  0.20490496034455158
Improvement: 7%