-
Notifications
You must be signed in to change notification settings - Fork 0
testing DeepRank-Mut #33
Comments
Hello and thanks for showing interest in Deeprank-Mut. |
Thank you for your help. |
To design a script, let me start by asking how your data is organized.
Do you perhaps have a sample of your data table? That could help us design a preprocessing script. |
I have all the requested information. Please find in attachment a zip archive with the pdb and the pssm files as well as a tentative script for the first two steps (inside there are the information about the known mutations, but I don't know how to proceed with more than one mutation). No idea on how to write the third script. |
That's an odd looking PSSM. How does one read this?
|
It's in json format instead of matrix. We will convert it if necessary. But it is not mandatory right? |
I think you need to convert it. Sorry! |
I've looked through your generate.py script. You can use it for preprocessing known variant data (1) just as well for preprocessing unknown variant data (3). With only slight modifications for (3): A) leave out the |
|
|
Hi, thank you for provicing the 3 scripts. I created the table.csv and the generate.py. Here the error I'm getting now:
Also, two other minor points:
It would be very useful to have a working example of this scripts, if possible... |
Sorry about the syntax error. I fixed it. I'm trying to fix the pytest script, but it will take some time. Sorry, currently things are quite busy on our side. |
Sorry, there are still some errors. python generate.py I don't see how to solve it. There is also and indentation error here: But that was easy to solve. |
Indeed. I pushed the fix. Sorry! |
Ok, now it's running. But I got this error.
I don't understand with my molecules are removed. The substitutions that I have indicated in the table.csv are valid. |
Were any logs created? If not, then add this to the beginning of your script:
This should output the errors, that cause your variants to be skipped. |
|
This suggests that something might have changed in the output of pdb2sql. But I don't see this happening at my end.
|
I'm not able to use pytest. I'm attaching the pdb file, the table file and the generate.py script, maybe you can test them (deeprank) [imerelli@slurmlogin DeepRank-Mut]$ pytest test/operate/test_pdb.py (deeprank) [imerelli@slurmlogin DeepRank-Mut]$ pytest |
So apparently pdb2sql behaves differently on some PDB files. I made a unit test and a fix for it. |
Ok, the loading of the pdb file seems fixed. Now I have problems with the pssm matrix. I was able to achieve the matrix in the format required by your software using psi-blast and a python conversion script (I wan't able to use https://github.com/DeepRank/pssmgen), but now I have this problem:
In attachment the pdb and pssm files. |
It expects the chain id in the pssm filename. Like: |
Ok, now the pssm file is read, but there are still errors in parsing it
The pssm file is the one attached before, it has the following structure, which seems identical to the one you suggested:
|
Your first line says |
Ok, thank you. Unfortunately, I have another error:
|
OK, pdb2sql seems to have a problem with your PDB file. Where did you get this PDB file? The pdb's original 1CR4 file looks different. |
Thank you. It works. Now I moved to the second scripts:
|
Dear Ivan Merelli, We highly appreciate your continued interest in getting DeepRank-Mut work for your data. Thank you for using our package. For instance, the learning task at hand is classification instead of regression. Hence, the correct line of commands would be
The error is thrown as the classification task uses TensorFlow for plots. The argument 'plot' is from the master version DeepRank for protein complexes. Also, I notice you do not have validation or test datasets; you can divide your training data if you wish by specifying the following in the learn.py script: You also have the option of feeding your own validation and test datasets. I would recommend you to go through the codes in DataSet.py, NeuralNet.py and model3d.py to see what options would work best for your learning task. Thanks again for your interest. |
Dear Gayatri Ramakrishnan, Thank you for your help. Your tool is very interesting, but objectively difficult to use. Essentially, there isn't still a working example in the repository. The problem is also that there is no much documentation and there are some inaccuracies in the explanation, such as that pytest is not usable and that the PSSM matrix is actually mandatory, while it is listed as optional. I can't go through all your code to understand how it works, it was already very complicated to create the PSSM matrix. I just want to be able to model mutations in my protein, and we are trying to do this, so I thank you. I think that setting up a working example could also be useful for you. Once this working example will be completed, you can certainly use it for documentation, so this work is important for everyone. That said, I did not understand your suggestion. Perhaps neural_net should be model? Should dataset be data_set? Do I need to import cnn_class? And yet it still tells me
Also, I wouldn't know where to insert divide_trainset= [0.8, 0.2], since that variable does not exist. Please, once the database is created with the first script, could you please provide me with a script to perform the learning of the network in the simplest way possible? |
Actually, PSSM is optional. But if you omit it, then you must remove 'deeprank.features.neighbour_profile' from the feature list. The OutputExporter problem can be solved by importing it, as shown in the readme. |
Hi, thank ypu for your help. I certainly made some progress. I solved some issues according to your suggestions, but now there is something out of my capacity. I'm pasting here the code (maybe you can paste it in your readme) and the the result of running it:
|
It seems like you are combining regression ( |
Like I mentioned earlier, please do not use regression for classification tasks Unfortunately, this package isn't hardcoded for plug-n-play scenarios. We do not intend to make it that way, instead we have made it modular. The current repo would soon be archived as DeepRank2 gets finalized and released. I do agree having an example workflow would be better to add. Thanks for the input. |
Thank you. It worked. Now I will create a new database with the unseen variants to make predictions using the last script. Then I will upload everything here in case you need it. A very naive question meanwhile: is it possibile to have (maybe download from the database?) also the pdb structures of the mutants to perform further analysis? |
PDB structures are downloadable from the wwpdb. Instructions are here: Not sure if that's what you mean. |
Hi,
|
Concerning the PDB structures, I was wondering if I can download from DeepRank the pdb coordinates of the modelled proteins with the variants, for example to perform docking experiments after the prediction of their CLASS (BENIGN or PATHOGENIC). |
I made a recent push to allow the model to run on unlabeled data. |
Ok, thank you. I downloaded the last version of the github, but I still get errors.
|
Whoops! Looks like your output is slightly different from mine. Doesn't matter! |
Sorry, still not working. I'm attaching the files to reproduce the analysis. Files with the 2 suffis are related to the inference part, while the others to the learning part.
|
So this error happened during normalization. Sorry for not testing this. It should be fixed now. |
I see the computation go ahead a little, but now I have this errror:
|
Looks like torch has trouble interpreting your pretrained model. |
I retrained the model, but I have the same error. Here a link to the pretrained model, it's too big for github |
So it appeared that there was a bug in loading the optimizer settings from the preloaded model. But you don't even need an optimizer in step 3. So I made it optional in my last push. |
Okay, the computation of learn2.py ran smoothly. But now, where can I find the results? I mean, where are the predictions about whether my variants are benign or pathogenic? |
Sorry, I forgot about that. I pushed a fix. You'll need to pull, use the Output will go to a file in the output directory you set for it. |
Sorry, but after the last pull I have this error:
Here my script:
|
Remove the |
Ok, it works! Very last question. How can I interpret these results? cat output-test-epoch-0.csv |
The target is set to -1, because it's unknown. I'm sorry that we didn't make a better output exporter for this. I hope you can work with this format. |
So, the attached examples are to be considered pathogenic because the right column is higher than the left column. Right? |
exactly. |
Hi, I'm trying to use DeepRank-Mut. The first problem is that I don't get how to run the tests, because in the documentation it is stated to enter in the test directory and run pytest, but this command is not valid.
However, from the root directory I can tun the test scripts that are in the test directory. Here is the output. While test/test_tools.py and test/test_atomic_features.py provide an output, test/test_generate.py and test/test_learn.py do not provide any output. Is this excepted? There is something that I can do differently?
The text was updated successfully, but these errors were encountered: