You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Following the readme instructions I have downloaded the crossdocked, unzipped it and am trying to run the preprocessing script on it with and without the flag --ca_only.
KeyError "'R' not in amino acid dict (.data/crossdocked_pocket10/WNK1_HUMAN_202_483_0/5tf9_A_rec_5wdy_a6s_lig_tt_min_0_pocket10.pdb, .data/crossdocked_pocket10/WNK1_HUMAN_202_483_0/5tf9_A_rec_5wdy_a6s_lig_tt_min_0.sdf)" WNK1_HUMAN_202_483_0/5tf9_A_rec_5wdy_a6s_lig_tt_min_0_pocket10.pdb WNK1_HUMAN_202_483_0/5tf9_A_rec_5wdy_a6s_lig_tt_min_0.sdf
#failed: 10: 100%|█████████████| 10/10 [00:00<00:00, 128.31it/s]
Traceback (most recent call last):
File "/home/stratis/repos/DiffSBDD/process_crossdock.py", line 364, in <module>
lig_coords = np.concatenate(lig_coords, axis=0)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ValueError: need at least one array to concatenate
It looks like in the second case, the script is failing to find certain entries in the amino acid dict and is skipping all protein-ligand complexes, resulting in an empty list for lig_coords which can't be concatenated. Looking at the dataset_params dictionary, it seems that there's two sets of preprocessing parameter settings crossdock_full and crossdock. Changing line 24 in the preprocessing script from dataset_info = dataset_params['crossdock_full'] to dataset_info = dataset_params['crossdock'] and running the preprocessing with --ca_only works without any errors, but I'm not sure the resulting data is correctly preprocessed. Is there something wrong with the preprocessing script or am I doing something wrong on my side?
The text was updated successfully, but these errors were encountered:
Hi Stratis,
I think the process_crossdock.py file is indeed outdated and should be updated. As far as I can tell, your solution should be fine as a temporary fix because dataset_params['crossdock'] contains the correct amino acid types required for the coarse-grained model (maybe @yuanqidu can confirm). We will try to upload a correct version as soon as possible.
Sorry for the inconvenience!
Following the readme instructions I have downloaded the crossdocked, unzipped it and am trying to run the preprocessing script on it with and without the flag
--ca_only
.Running
runs without errors, but running
fails, giving the error
It looks like in the second case, the script is failing to find certain entries in the amino acid dict and is skipping all protein-ligand complexes, resulting in an empty list for
lig_coords
which can't be concatenated. Looking at thedataset_params
dictionary, it seems that there's two sets of preprocessing parameter settingscrossdock_full
andcrossdock
. Changing line 24 in the preprocessing script fromdataset_info = dataset_params['crossdock_full']
todataset_info = dataset_params['crossdock']
and running the preprocessing with--ca_only
works without any errors, but I'm not sure the resulting data is correctly preprocessed. Is there something wrong with the preprocessing script or am I doing something wrong on my side?The text was updated successfully, but these errors were encountered: