Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

why do we manually loop through each batch ? #22

Open
linminhtoo opened this issue Apr 28, 2022 · 3 comments
Open

why do we manually loop through each batch ? #22

linminhtoo opened this issue Apr 28, 2022 · 3 comments

Comments

@linminhtoo
Copy link

linminhtoo commented Apr 28, 2022

Hello authors,

I am in the process of modifying dMaSIF for the downstream task of protein-ligand binding affinity prediction. While reading & modifying your code, I noticed that in data_iteration.iterate, https://github.com/FreyrS/dMaSIF/blob/master/data_iteration.py#L290
we actually extract individual proteins/protein-pairs in a batch, and then do forward pass on each of those batches.

Effectively, doesn't this equate to a batch_size of 1 ? even though in the benchmark_scripts, the --batch_size argument is set to 64, it is not actually used and the batch_size is hardcoded to 1. https://github.com/FreyrS/dMaSIF/blob/master/main_training.py#L51

Is there a reason for doing this, rather than just doing a forward pass on the entire batch?

As a side note, this line (https://github.com/FreyrS/dMaSIF/blob/master/data_iteration.py#L299) also indicates that the code is hardcoded to a batch_size of 1. My understanding was that it should be
P1["rand_rot"] = protein_pair.rand_rot1.view(-1, 3, 3)[protein_it] instead of P1["rand_rot"] = protein_pair.rand_rot1.view(-1, 3, 3)[0]

Thank you and appreciate your help.

@Wendysigh
Copy link

Hi @linminhtoo , I have same question as you. The scripts make batchsize=64 while actually batchsize is hardcoded as 1.

I noticed the line (https://github.com/FreyrS/dMaSIF/blob/master/data_iteration.py#L353) also ensures the batchsize=1 when optimize the model.

@FreyrS
Copy link
Owner

FreyrS commented Jun 10, 2022

Hi @linminhtoo,

You're absolutely right, we generate the surfaces of a batch but then iterate individually through them.
The reason for this is that I found that during training a larger batch size causes instability in the training process.
In a follow-up work that I'm currently working on we were able to solve these issues and training with larger batch sizes is no longer a problem. I'll try to update this code appropriately after we finish our experiments of the follow-up

@camel2000
Copy link

camel2000 commented Sep 15, 2023

@FreyrS @linminhtoo @Wendysigh @jeanfeydy
I have modified the code and tried to test it by batch, but it is found that when the batch_size is different, the output embedding is not consistent, is this normal?
###############################
batch_size=1:
image

batch_size=2:
image

the first example in the masif-site test

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants