Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why didn't we use random scaffold split? #47

Open
lvkd84 opened this issue Jan 18, 2022 · 1 comment
Open

Why didn't we use random scaffold split? #47

lvkd84 opened this issue Jan 18, 2022 · 1 comment

Comments

@lvkd84
Copy link

lvkd84 commented Jan 18, 2022

Hi,
Thank you for publishing such an amazing paper!
I have a question regarding the splitting of the datasets used in the experiments. As shown in the paper and in the code, you used scaffold splitting. However, it is also reasonable to use random scaffold splitting on the small datasets reported. Is there a intuitions for preferring scaffold splitting over random scaffold splitting?

@linfeng-du
Copy link

Also having doubt on this. I would also like to know if the results in paper are averaged among determinsitc scaffold splits with different model seeds or randomized scaffold splits with different splitting seeds.

I think a possible explationation is that scaffold splitting generates out-of-distribution train/val/test sets since molecules in them contains completely different backbond structures (scaffolds), so it might be diffcult to tune a model that generalizes well to a number of OOD scenarios (i.e., different splitting seeds using randomized_scaffold_split)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants