Why didn't we use random scaffold split? #47

lvkd84 · 2022-01-18T21:22:30Z

Hi,
Thank you for publishing such an amazing paper!
I have a question regarding the splitting of the datasets used in the experiments. As shown in the paper and in the code, you used scaffold splitting. However, it is also reasonable to use random scaffold splitting on the small datasets reported. Is there a intuitions for preferring scaffold splitting over random scaffold splitting?

linfeng-du · 2022-01-26T02:45:43Z

Also having doubt on this. I would also like to know if the results in paper are averaged among determinsitc scaffold splits with different model seeds or randomized scaffold splits with different splitting seeds.

I think a possible explationation is that scaffold splitting generates out-of-distribution train/val/test sets since molecules in them contains completely different backbond structures (scaffolds), so it might be diffcult to tune a model that generalizes well to a number of OOD scenarios (i.e., different splitting seeds using randomized_scaffold_split)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why didn't we use random scaffold split? #47

Why didn't we use random scaffold split? #47

lvkd84 commented Jan 18, 2022

linfeng-du commented Jan 26, 2022

Why didn't we use random scaffold split? #47

Why didn't we use random scaffold split? #47

Comments

lvkd84 commented Jan 18, 2022

linfeng-du commented Jan 26, 2022