-
Notifications
You must be signed in to change notification settings - Fork 603
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
With multiple ligand copies (SMILES), sometimes get "Failed to construct RDKit reference structure" #102
Comments
Thanks for the report. This happens if rdkit fails to generate a conformer for some random seeds, and there is no fallback idealised coordinates given in the ccd cif defining the ligand input. You can work around this by adding idealised coordinates. When there are no conformer coordinates, we cannot generate frames for PAE and without a frame we give up on generating a confidence. However that is behavior we could change - we had single-atom ions in mind for that case (where there were no frames in training either), full ligands should be fine at inference time, as the frames aren't actually used at inference time. But perhaps given there are no reference coordinates, its better to have nans here, so that users are aware by looking at the output that something is different in these cases (likely not as good a prediction). |
I had little experience with rdkit. But it sure fails on a lot of the SMILES and ccdCodes I have tried lately. If I understand correctly, even with ccdCodes that have coordinates, AF3 first tries to generate initial molecular coordinates with rdkit, then fails (quite often is my recent attempts), and only then will it use the ccdCodes coordinates? Is it an initial attempt to generate random molecular conformation for the ligands? |
you are correct, the code first tries to generate a conformer for a ligand, and if that fails then it looks for coordinates in the ccd input alphafold3/src/alphafold3/model/features.py Line 1523 in cdbcf41
|
This is a band-aid fix, but by adding the |
Input is one protein +
N
copies of the same ligand.Depending on the value of
N
(40, 50, 60, 80, 100, ..., 200), I get between 1 and 6 rdkit warning during "constructing SMILES reference structure". The warning message is :also, if I get one rdkit warning, I also get the following (the number of lines = number of atoms in the ligand).
The structure inference proceed without warning / error, and the ligand with rdkit warning have coordinates
However, all metrics related to that ligand are
null
in summary_confidences.json:The number of problematic ligands varies between runs with different ligands, and sometimes between different seeds within the same run, eg:
For 30+ runs with N >= 40 : they all get at least one warning (with associated
null
metrics.)For all runs with N<= 30: no rdkit warning
The structure of the problematic ligand appears normal.
If it wasn't for the
null
metrics associated to that ligand, I would not worry. Maybe all is fine, and it might just be a problem with the metrics computation routine if there is somehow something "wrong" with that ligand at the start (i.e.Found identical coordinates: Assigning as colinear.
).The text was updated successfully, but these errors were encountered: