New biobb_pytorch Molecular dynamics autoencoder wrapper #173

PauAndrio · 2024-06-26T14:13:40Z

Wrappers for the new collection of blocks to train and apply autoencoder models to molecular dynamics data.

tools/biobb_pytorch/biobb_apply_mdae.xml

tools/biobb_pytorch/.shed.yml

tools/biobb_pytorch/biobb_apply_mdae.xml

tools/biobb_pytorch/.shed.yml

Co-authored-by: Simon Bray <[email protected]>

tools/biobb_pytorch/.shed.yml

tools/biobb_pytorch/biobb_apply_mdae.xml

tools/biobb_pytorch/biobb_train_mdae.xml

Co-authored-by: Björn Grüning <[email protected]>

tools/biobb_pytorch/biobb_apply_mdae.xml

tools/biobb_pytorch/test-data/.DS_Store

bgruening · 2024-07-12T10:37:13Z

Please have a look at the linting logs: https://github.com/galaxycomputationalchemistry/galaxy-tools-compchem/actions/runs/9906049158/job/27368207101?pr=173

planemo can also test this with planemo lint

tools/biobb_pytorch/biobb_apply_mdae.xml

bgruening · 2024-07-13T18:43:21Z

tools/biobb_pytorch/biobb_apply_mdae.xml

+
+    <outputs>
+      <data format="npy" name="output_reconstructed_data_npy_path" label="output_reconstructed_data_npy_path" />
+      <data format="npy" name="output_latent_space_npy_path" label="output_latent_space_npy_path" />


If this output is optional you can use a filter here to only define this output when your boolean is true.

bgruening · 2024-07-16T16:03:32Z

tools/biobb_pytorch/biobb_apply_mdae.xml

+      `#Required Outputs using file extension`
+      touch '$output_reconstructed_data_npy_path' &&
+      ln -s '$output_reconstructed_data_npy_path' ./output_reconstructed_data_npy_path.$output_reconstructed_data_npy_path.ext &&
+
+      `#Optional Outputs using file extension`
+      #if $output_latent_space_npy_path:
+        touch '$output_latent_space_npy_path' &&
+        ln -s '$output_latent_space_npy_path' ./output_latent_space_npy_path.$output_latent_space_npy_path.ext &&
+      #end if


Suggested change

`#Required Outputs using file extension`

touch '$output_reconstructed_data_npy_path' &&

ln -s '$output_reconstructed_data_npy_path' ./output_reconstructed_data_npy_path.$output_reconstructed_data_npy_path.ext &&

`#Optional Outputs using file extension`

#if $output_latent_space_npy_path:

touch '$output_latent_space_npy_path' &&

ln -s '$output_latent_space_npy_path' ./output_latent_space_npy_path.$output_latent_space_npy_path.ext &&

#end if

This looks very complicated, I guess you can remove this and try something like below

Thank you for your feedback. I removed the part you mentioned, used a literal in the command, and utilized the "from_work_dir" attribute. This has significantly simplified things.

Cool glad you found that useful. And sorry this PR takes so long, my hope is that if we do this one properly the rest will be a piece of cake :-)

Thank you. I have also removed the comments inside the command section to avoid potential issues with quotes or other special characters in the future. Is there anything else that needs to be changed before we merge?

tools/biobb_pytorch/biobb_apply_mdae.xml

tools/biobb_pytorch/biobb_train_mdae.xml

bgruening · 2024-07-16T16:12:25Z

You could also, and that is the simplest way possible, just write in the command section: --output ./foo.npz.

And than in your outputs do <output name="blabla" from_work_dir="foo.npz"/>. The from_work_dir will then pick up the file and you don't need a mv or ln

bgruening · 2024-07-16T16:12:54Z

https://docs.galaxyproject.org/en/latest/dev/schema.html#id82

bgruening

Just an idea. Is that something that is interesting?

bgruening · 2024-09-21T22:49:22Z

tools/biobb_pytorch/biobb_apply_mdae.xml

+
+    <inputs>
+      <param name="input_data_npy_path"  type="data" format="npy"  optional="False" label="Input NPY file" help="Input data file"/>
+      <param name="input_model_pth_path" type="data" format="pth"  optional="False" label="Input PTH file" help="Input model file"/>


pickled objects are a bit dangerous, would it be possible to use ONNX https://pytorch.org/docs/stable/onnx.html

ping, any comments here?
Pickles contain arbitrary code and are a bad exchange format, as you need to trust them.

While ONNX is a great choice for model export, as you suggested, and exporting models to ONNX is straightforward, there is currently no built-in functionality in PyTorch for importing ONNX models back into PyTorch. This has been a long-standing request from the PyTorch community since 2019, but as of now, there's no reliable or officially supported method for importing ONNX models back into PyTorch for further training or fine-tuning.

Given this limitation, switching entirely to ONNX would mean losing the ability to easily retrain, modify, or extend our models within the PyTorch ecosystem. For these reasons, despite the risks associated with pickling, we still favor the use of PyTorch’s torch.save() and torch.load() methods for model serialization. We're aware that these methods are not ideal from a security perspective, but they provide the necessary flexibility for ongoing model development.

Our intent developing this tools is to provide a way to create or improve very specialized models not to deploy them.

Yeah, its a bit of a pain, I had hoped that https://github.com/Talmaj/onnx2pytorch is working for you.

bgruening · 2024-09-21T22:51:42Z

tools/biobb_pytorch/biobb_apply_mdae.xml

+    <inputs>
+      <param name="input_data_npy_path"  type="data" format="npy"  optional="False" label="Input NPY file" help="Input data file"/>
+      <param name="input_model_pth_path" type="data" format="pth"  optional="False" label="Input PTH file" help="Input model file"/>
+      <param name="config_json" type="data" format="json" optional="True" label="Configuration file" help="File containing tool settings"/>


Are users supposed to create such JSON files?

How many parameters do such files have in average? Is that 1-2 or more like 20-30?

The biobb_object has 38 configurable properties, and in addition, train_mdae introduces 16 specific properties, bringing the total to 54 potential parameters that users could modify in the JSON files. However, all of these properties come with default values, so users only need to tweak those that are relevant to their specific use case. We’ve put considerable effort into documenting these parameters and guiding users to focus on the most impactful ones, ensuring they can make adjustments as needed without being overwhelmed.

tools/biobb_pytorch/biobb_train_mdae.xml

bgruening · 2024-09-21T22:57:04Z

tools/biobb_pytorch/biobb_train_mdae.xml

+      ;
+      ]]>
+    </command>
+


Suggested change

<configfiles>

<configfile name="train_config">

{

"properties": {

"num_epochs": $num_epoch,

"seed": $seed

}

}

</configfile>

</configfiles>

This way you can create those configfiles on the fly and ask your users for the inputs

That’s something we’re planning for a future release, where the most relevant properties will be integrated into Galaxy's UI through sliders, multi-select options, number validators, filters, etc., making the configuration process more user-friendly. However, for now, I’d like to keep things as simple as possible and focus on getting my first tool published in the Galaxy Toolshed.

bgruening · 2024-09-21T22:57:33Z

tools/biobb_pytorch/biobb_train_mdae.xml

+      #if $config_json:
+        ln -s '$config_json' ./config_json.$config_json.ext &&
+      #end if


Suggested change

#if $config_json:

ln -s '$config_json' ./config_json.$config_json.ext &&

#end if

That’s something we’re planning for a future release, where the most relevant properties will be integrated into Galaxy's UI through sliders, multi-select options, number validators, filters, etc., making the configuration process more user-friendly. However, for now, I’d like to keep things as simple as possible and focus on getting my first tool published in the Galaxy Toolshed.

bgruening · 2024-09-21T22:57:50Z

tools/biobb_pytorch/biobb_train_mdae.xml

+      train_mdae
+
+      #if $config_json:
+        --config ./config_json.$config_json.ext


Suggested change

--config ./config_json.$config_json.ext

--config ./$train_config

That’s something we’re planning for a future release, where the most relevant properties will be integrated into Galaxy's UI through sliders, multi-select options, number validators, filters, etc., making the configuration process more user-friendly. However, for now, I’d like to keep things as simple as possible and focus on getting my first tool published in the Galaxy Toolshed.

Co-authored-by: Björn Grüning <[email protected]>

bgruening

@PauAndrio looks good to me. Just this one security issue with the python pickels.

PauAndrio · 2024-10-07T13:27:41Z

@PauAndrio looks good to me. Just this one security issue with the python pickels.

Sorry I thought this was already answered:

While ONNX is a great choice for model export, as you suggested, and exporting models to ONNX is straightforward, there is currently no built-in functionality in PyTorch for importing ONNX models back into PyTorch. This has been a long-standing request from the PyTorch community since 2019, but as of now, there's no reliable or officially supported method for importing ONNX models back into PyTorch for further training or fine-tuning.

Given this limitation, switching entirely to ONNX would mean losing the ability to easily retrain, modify, or extend our models within the PyTorch ecosystem. For these reasons, despite the risks associated with pickling, we still favor the use of PyTorch’s torch.save() and torch.load() methods for model serialization. We're aware that these methods are not ideal from a security perspective, but they provide the necessary flexibility for ongoing model development.

Our intent developing this tools is to provide a way to create or improve very specialized models not to deploy them.

bgruening

Thanks @PauAndrio and sorry for the long turn-around.

Please change the shed.yml and then we merge.

bgruening · 2024-11-03T17:41:52Z

tools/biobb_pytorch/biobb_apply_mdae.xml

+
+    <inputs>
+      <param name="input_data_npy_path"  type="data" format="npy"  optional="False" label="Input NPY file" help="Input data file"/>
+      <param name="input_model_pth_path" type="data" format="pth"  optional="False" label="Input PTH file" help="Input model file"/>


Yeah, its a bit of a pain, I had hoped that https://github.com/Talmaj/onnx2pytorch is working for you.

bgruening · 2024-11-03T17:43:50Z

tools/biobb_pytorch/.shed.yml

+owner: chemteam
+description: "biobb_pytorch is the Biobb module collection to create and train ML & DL models using the popular [PyTorch](https://pytorch.org/) Python library."
+homepage_url: https://github.com/bioexcel/biobb_pytorch
+long_description: |


can you convert this file into a suite-based file? https://github.com/galaxyproject/tools-iuc/blob/main/tools/semibin/.shed.yml

bgruening · 2024-11-03T17:44:12Z

tools/biobb_pytorch/biobb_train_mdae.xml

+      #if $output_performance_npz_path:
+        --output_performance_npz_path ./output_performance_npz_path.npz
+      #end if
+      ;


Suggested change

;

PauAndrio · 2024-12-02T10:25:25Z

Hi, @bgruening,

Please, let me know if there is anything else that I have to do to resolve and merge this pull request?

Regards,
Pau

bgruening · 2024-12-05T17:10:33Z

Let's get this in. Thanks a lot @PauAndrio!

New biobb_pytorch Molecular dynamics autoencoder wrapper

d54bb05

gbayarri approved these changes Jun 26, 2024

View reviewed changes

simonbray reviewed Jun 27, 2024

View reviewed changes

simonbray requested review from bgruening and blankenberg June 27, 2024 12:31

bgruening reviewed Jun 27, 2024

View reviewed changes

tools/biobb_pytorch/.shed.yml Outdated Show resolved Hide resolved

PauAndrio and others added 4 commits June 28, 2024 09:52

Update tools/biobb_pytorch/biobb_apply_mdae.xml

faaeb9e

Co-authored-by: Simon Bray <[email protected]>

Update tools/biobb_pytorch/biobb_apply_mdae.xml

5509029

Co-authored-by: Simon Bray <[email protected]>

Update tools/biobb_pytorch/biobb_apply_mdae.xml

b413da0

Co-authored-by: Simon Bray <[email protected]>

Fixing PR comments and suggestions

c9460d3

PauAndrio requested a review from bgruening June 28, 2024 09:15

Removing temporal test reports wrongly uploaded

741a02b

bgruening reviewed Jul 5, 2024

View reviewed changes

PauAndrio and others added 7 commits July 8, 2024 18:07

Update tools/biobb_pytorch/biobb_apply_mdae.xml

bc83afa

Co-authored-by: Björn Grüning <[email protected]>

Update .shed.yml

435c75c

Update tools/biobb_pytorch/biobb_train_mdae.xml

18c2565

Co-authored-by: Björn Grüning <[email protected]>

Update tools/biobb_pytorch/biobb_train_mdae.xml

18e481b

Co-authored-by: Björn Grüning <[email protected]>

Update tools/biobb_pytorch/biobb_apply_mdae.xml

5ae9eec

Co-authored-by: Björn Grüning <[email protected]>

Update biobb_apply_mdae.xml

2b562df

Update biobb_train_mdae.xml

021d2ea

bgruening reviewed Jul 8, 2024

View reviewed changes

tools/biobb_pytorch/biobb_apply_mdae.xml Outdated Show resolved Hide resolved

bgruening reviewed Jul 8, 2024

View reviewed changes

tools/biobb_pytorch/test-data/.DS_Store Outdated Show resolved Hide resolved

PauAndrio added 2 commits July 12, 2024 11:52

Removing .DS_Store

2d8b35a

Fixing input files description

6e74d3f

Fixing linting issues

5513c18

bgruening reviewed Jul 13, 2024

View reviewed changes

tools/biobb_pytorch/biobb_apply_mdae.xml Outdated Show resolved Hide resolved

bgruening reviewed Jul 13, 2024

View reviewed changes

tools/biobb_pytorch/biobb_apply_mdae.xml Outdated Show resolved Hide resolved

bgruening reviewed Jul 13, 2024

View reviewed changes

tools/biobb_pytorch/biobb_apply_mdae.xml Outdated Show resolved Hide resolved

bgruening reviewed Jul 13, 2024

View reviewed changes

tools/biobb_pytorch/biobb_apply_mdae.xml Outdated Show resolved Hide resolved

bgruening reviewed Jul 13, 2024

View reviewed changes

tools/biobb_pytorch/biobb_apply_mdae.xml Outdated Show resolved Hide resolved

bgruening reviewed Jul 13, 2024

View reviewed changes

tools/biobb_pytorch/biobb_apply_mdae.xml Outdated Show resolved Hide resolved

bgruening reviewed Jul 13, 2024

View reviewed changes

PauAndrio added 2 commits July 15, 2024 11:12

Changing .shed categories

c3bfe94

Removing named inputs_outputs

aa48ad5

bgruening reviewed Jul 16, 2024

View reviewed changes

Using from_work_dir attribute to avoid touch and ln on outputs

5fb9146

bgruening reviewed Sep 21, 2024

View reviewed changes

Update tools/biobb_pytorch/biobb_train_mdae.xml

aca822a

Co-authored-by: Björn Grüning <[email protected]>

bgruening reviewed Oct 5, 2024

View reviewed changes

PauAndrio requested a review from bgruening October 7, 2024 13:27

bgruening approved these changes Nov 3, 2024

View reviewed changes

bgruening merged commit 891dd7d into galaxycomputationalchemistry:master Dec 5, 2024
12 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New biobb_pytorch Molecular dynamics autoencoder wrapper #173

New biobb_pytorch Molecular dynamics autoencoder wrapper #173

PauAndrio commented Jun 26, 2024

bgruening commented Jul 12, 2024

bgruening Jul 13, 2024

bgruening Jul 16, 2024

PauAndrio Jul 22, 2024

bgruening Jul 22, 2024

PauAndrio Jul 25, 2024

bgruening commented Jul 16, 2024

bgruening commented Jul 16, 2024

bgruening left a comment

bgruening Sep 21, 2024

bgruening Oct 5, 2024

PauAndrio Oct 7, 2024

bgruening Nov 3, 2024

bgruening Sep 21, 2024

PauAndrio Sep 25, 2024

bgruening Sep 21, 2024

PauAndrio Sep 25, 2024

bgruening Sep 21, 2024

PauAndrio Sep 25, 2024

bgruening Sep 21, 2024

PauAndrio Sep 25, 2024

bgruening left a comment

PauAndrio commented Oct 7, 2024

bgruening left a comment

bgruening Nov 3, 2024

bgruening Nov 3, 2024

bgruening Nov 3, 2024

PauAndrio commented Dec 2, 2024 •

edited

Loading

bgruening commented Dec 5, 2024

+    <configfiles>
+        <configfile name="train_config">
+{
+  "properties": {
+    "num_epochs": $num_epoch,
+    "seed": $seed
+  }
+}
+        </configfile>
+    </configfiles>

	#if $config_json:
	ln -s '$config_json' ./config_json.$config_json.ext &&
	#end if

	--config ./config_json.$config_json.ext
	--config ./$train_config

New biobb_pytorch Molecular dynamics autoencoder wrapper #173

New biobb_pytorch Molecular dynamics autoencoder wrapper #173

Conversation

PauAndrio commented Jun 26, 2024

bgruening commented Jul 12, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bgruening commented Jul 16, 2024

bgruening commented Jul 16, 2024

bgruening left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bgruening left a comment

Choose a reason for hiding this comment

PauAndrio commented Oct 7, 2024

bgruening left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

PauAndrio commented Dec 2, 2024 • edited Loading

bgruening commented Dec 5, 2024

PauAndrio commented Dec 2, 2024 •

edited

Loading