Skip to content

scikit-learn classes for molecular vectorization using RDKit

License

Notifications You must be signed in to change notification settings

enricogandini/scikit-mol

 
 

Repository files navigation

scikit-mol

Fancy logo Fancy logo

Scikit-Learn classes for molecular vectorization using RDKit

The intended usage is to be able to add molecular vectorization directly into scikit-learn pipelines, so that the final model directly predict on RDKit molecules or SMILES strings

As example with the needed scikit-learn and -mol imports and RDKit mol objects in the mol_list_train and _test lists:

pipe = Pipeline([('mol_transformer', MorganTransformer()), ('Regressor', Ridge())])
pipe.fit(mol_list_train, y_train)
pipe.score(mol_list_test, y_test)
pipe.predict([Chem.MolFromSmiles('c1ccccc1C(=O)C')])

>>> array([4.93858815])

The scikit-learn compatibility should also make it easier to include the fingerprinting step in hyperparameter tuning with scikit-learns utilities

The first draft for the project was created at the RDKIT UGM 2022 hackathon 2022-October-14

Implemented

  • Descriptors
    • MolecularDescriptorTransformer

  • Fingerprints
    • MorganFingerprintTransformer
    • MACCSKeysFingerprintTransformer
    • RDKitFingerprintTransformer
    • AtomPairFingerprintTransformer
    • TopologicalTorsionFingerprintTransformer
    • MHFingerprintTransformer
    • SECFingerprintTransformer
    • AvalonFingerprintTransformer

  • Conversions
    • SmilesToMol

  • Standardizer
    • Standardizer

  • Utilities
    • CheckSmilesSanitazion

Installation

Users can install latest tagged release from pip

pip install scikit-mol

Bleeding edge

pip install git+https://github.com:EBjerrum/scikit-mol.git

Documentation

There are a collection of notebooks in the notebooks directory which demonstrates some different aspects and use cases

Contributing

There are more information about how to contribute to the project in CONTRIBUTION.md

BUGS

Probably still, please check issues at GitHub and report there

Contributers:

About

scikit-learn classes for molecular vectorization using RDKit

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Jupyter Notebook 72.8%
  • Python 27.1%
  • Shell 0.1%