Generate compounds containing a substructure #64

ypevzner · 2024-04-25T16:08:08Z

ypevzner
Apr 25, 2024

Hello,
Is it possible to tell REINVENT to generate compounds that contain specific substructure?
It seems there's a MatchingSubstructure config parameter, but that seems to be an exclusion criteria rather than inclusion, unless I'm understanding it incorrectly.
Thank you.
Yuri.

Answered by halx

Apr 25, 2024

Hi,

welcome to the REINVENT community and many thanks for your interest in the software.

MatchingSubstructure is implemented as a penalty (currently the only one) which means that the final score is multipled by 1.0 if the generated compound matches the SMARTS pattern (a single string) or by 0.5 if it does not. So its seems that this is what you are looking for.

Many thanks,
Hannes.

View full answer

halx · 2024-04-25T17:36:34Z

halx
Apr 25, 2024
Maintainer

Hi,

welcome to the REINVENT community and many thanks for your interest in the software.

MatchingSubstructure is implemented as a penalty (currently the only one) which means that the final score is multipled by 1.0 if the generated compound matches the SMARTS pattern (a single string) or by 0.5 if it does not. So its seems that this is what you are looking for.

Many thanks,
Hannes.

5 replies

ypevzner Apr 29, 2024
Author

Thank you for your response, Hannes. It sounds like this penalty is applied last. When docking is also performed as part of scoring, is it possible to have this penalty applied prior to Docking and have it throw out a compound before it even gets to Docking? Docking is a rather time consuming step and it would be nice if it didn't waste time on it if the generated compound didn't have the matching substructure. Thank you!

enricogandini Oct 4, 2024

Thanks @ypevzner for asking this question, and @halx for answering it so clearly.

For what matters, I also had the same impression as @ypevzner : that MatchingSubstructure was an exclusion criteria. I think this misunderstanding comes from reading the SCORING.md file.

MatchingSubstructure: penalty applied to final score when SMARTS pattern is found

In my opinion this sentence is a bit misleading. By reading it, I understand "the score will be penalized if the SMARTS pattern is found, otherwise the score will not be penalized".

Instead, from @halx answer, I understand that the "penalty mechanism" means that, if the SMARTS pattern is found, the final score will not be changed (i.e., multiplied by 1). If the SMARTS pattern is not found, the final score will be penalized (i.e., multiplied by 0.5).

Maybe a better description could be included in SCORING.md? Something along the lines of

MatchingSubstructure: preserve the final score if the SMARTS pattern is found, otherwise penalize it.

halx Oct 5, 2024
Maintainer

Hi,

the phrasing could be better, I agree. At this point I would also restate that we are happy to include such changes via pull requests.

Many thanks,
Hannes.

enricogandini Oct 21, 2024

Thanks @halx , I created a small PR to address this: #150

halx Oct 22, 2024
Maintainer

Many thanks for this. Can you also please mention that the penalize factor is 0.5.

halx · 2024-04-30T05:43:11Z

halx
Apr 30, 2024
Maintainer

Yes, penalties are applied last in the current implementation. What you are really asking for is having this as a filter which would run first and would ensure that further scoring only happens for molecules that pass the filter. Now, there are several things here to consider.

It takes a while until the agent learns to generate molecules with the preferred pattern, Compounds not matching the SMARTS pattern will still be scored with 0.5. So there will be some sampling inefficiency in the RL run, you can easily check that by only running the penalty (probably together with some othe scoring function to create sensible molecules). Also, a matching substructure alone will not guarantee a sensible docked structure.

A major reason why we have staged learning aka curriculum learning is to start out with cheaply to compute scoring components and then phase in computationally more demanding components in a later stage. The idea here is obviously to invoke e.g. docking only at a time when the agent is expected to generate molecules with the preferred properties with high probability.

What you really seem to be asking for is a fixed sub-pattern in the molecule. It may therefore be possible to use our constraint (conditioned) priors Linkinvent and/or Libinvent. The more "natural" one would seem to be Linkinvent as here the idea here is to link two fragments with a common scaffold/linker (think PROTACS). But in its current implementation the requirement is exactly two fragments and we will support, I believe, 1-4 fragments only with a new prior to be published later. The idea of Libinvent is to decorate a scaffold with R-groups. Here we already support 1-4 attachments. I am a bit doubtful if this can work for you because the training data for Libinvent may not support the type of molecules you are after. But it may be worth a try to see if you can push an RL agent into the chemical space you are looking for.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Generate compounds containing a substructure #64

{{title}}

Replies: 2 comments 5 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

Generate compounds containing a substructure #64

ypevzner Apr 25, 2024

Replies: 2 comments · 5 replies

halx Apr 25, 2024 Maintainer

ypevzner Apr 29, 2024 Author

enricogandini Oct 4, 2024

halx Oct 5, 2024 Maintainer

enricogandini Oct 21, 2024

halx Oct 22, 2024 Maintainer

halx Apr 30, 2024 Maintainer

ypevzner
Apr 25, 2024

Replies: 2 comments 5 replies

halx
Apr 25, 2024
Maintainer

ypevzner Apr 29, 2024
Author

halx Oct 5, 2024
Maintainer

halx Oct 22, 2024
Maintainer

halx
Apr 30, 2024
Maintainer