Handling duplicated SMILES with Libinvent #166
-
HI everyone, I am trying to use Libinvent to propose R-groups to decorate a scaffold and using maize as scoring workflow. As you can see, the first few compounds have all the same structure, just with the R-group SMILES having a different attachment point. Could you help me figure out what is happening? |
Beta Was this translation helpful? Give feedback.
Replies: 4 comments 6 replies
-
What exacty is the question? I do not see any of you structures just scaffold SMILES. |
Beta Was this translation helpful? Give feedback.
-
Hi Hannes, sorry about the long delay, I completely forgot I asked this question... What I meant is: I was wondering how these duplicates are handled in the REINVENT code, since they all get a score of 0 except of the first one. Could you explain why has it been set up like this? To me it looks like this will give the model "mixed signals" about what optimise for, making the exploration less efficient. Again, sorry for the delay in replying and thanks for your assistance. |
Beta Was this translation helpful? Give feedback.
-
So, the idea to zero-score duplicate SMILES is to promote a level of diversity. We have done internal tests to switch this off and we found that it did not seem to have any benefit regarding learning rate but lowered diversity. You also need to keep on mind that the final aggregated total score is a single float value calculated as the average from the individual SMILES scores. So the effect may be rather minimal in practice. If you start sampling excessive number of duplicates, you are running out of chemical space anyway and should probably stop RL. |
Beta Was this translation helpful? Give feedback.
-
Hi Hannes, thanks for your reply! Just to clarify what you mean: Is the model being updated on the final "complete" (scaffold + R-group) SMILES using the average score of all "complete" SMILES in each step? I.e.:
Will the final score that the model uses be 33 [(100 + 0 +0)/ 3]? |
Beta Was this translation helpful? Give feedback.
So, the idea to zero-score duplicate SMILES is to promote a level of diversity. We have done internal tests to switch this off and we found that it did not seem to have any benefit regarding learning rate but lowered diversity. You also need to keep on mind that the final aggregated total score is a single float value calculated as the average from the individual SMILES scores. So the effect may be rather minimal in practice. If you start sampling excessive number of duplicates, you are running out of chemical space anyway and should probably stop RL.