GitHub - ChemistryLLMs/SMILES-probing

Chemical Language Models Have Problems with Chemistry: A Case Study on Molecule Captioning Task

Augmentations

'./code/augmentation.py' creates 4 types of augmentations:

rdkit canonicalization
explicit addition of hydrogens
kekulization
replacement of cycle identifiers by random numbers

Full description is provided in the paper.

Experimental dataset

Experimental dataset is provided in the folder "data" and was created by "augmentation" code. Original (non-augmented) sample of dataset is a test part of CHEBI-20.

Model evaluation

There are 4 models used in the experiment:

'laituan245/molt5-base-smiles2caption'
'laituan245/molt5-large-smiles2caption'
'GT4SD/multitask-text-and-chemistry-t5-base-standard'
'GT4SD/multitask-text-and-chemistry-t5-base-augm'

Code for model inference is located in the "code" folder.

References

If you use our repository, please cite the following related paper:

@inproceedings{probing,
  title={Chemical Language Models Have Problems with Chemistry: A Case Study on Molecule Captioning Task},
  author={Ganeeva, Veronika and Khrabrov, Kuzma and Kadurin, Artur and Savchenko, Andrey and Tutubalina, Elena},
  booktitle={The Second Tiny Papers Track at ICLR 2024},
  url={https://openreview.net/pdf?id=JoO6mtCLHD}
}

Name	Name	Last commit message	Last commit date
Latest commit KuzmaKhrabrov fix Jun 13, 2024 1e9859c · Jun 13, 2024 History 11 Commits
code	code	fix	Jun 13, 2024
data	data	Add files via upload	Apr 22, 2024
images	images	poster image added	May 3, 2024
LICENSE	LICENSE	Initial commit	Apr 1, 2024
README.md	README.md	Update README.md	May 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Chemical Language Models Have Problems with Chemistry: A Case Study on Molecule Captioning Task

Augmentations

Experimental dataset

Model evaluation

References

About

Releases

Packages

Contributors 3

Languages

License

ChemistryLLMs/SMILES-probing

Folders and files

Latest commit

History

Repository files navigation

Chemical Language Models Have Problems with Chemistry: A Case Study on Molecule Captioning Task

Augmentations

Experimental dataset

Model evaluation

References

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages