Skip to content

Commit

Permalink
Merge pull request #153 from weberlab-hhu/deprecatation_settings
Browse files Browse the repository at this point in the history
deprecation warnings and disabling of unused args
  • Loading branch information
alisandra authored Nov 20, 2024
2 parents 1df0ce2 + 630b185 commit 73e6aca
Show file tree
Hide file tree
Showing 7 changed files with 89 additions and 39 deletions.
2 changes: 1 addition & 1 deletion .github/workflows/python-app.yml
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ jobs:
strategy:
matrix:
os: [ubuntu-latest]
python-version: ['3.8', '3.9']
python-version: ['3.10']

steps:
- uses: actions/checkout@v2
Expand Down
17 changes: 9 additions & 8 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -158,7 +158,7 @@ fasta2h5.py --species Arabidopsis_lyrata --h5-output-path Arabidopsis_lyrata.h5
# improve prediction quality at subsequence ends by creating and overlapping
# sliding-window predictions.)
HybridModel.py --load-model-path $HOME/.local/share/Helixer/models/land_plant/land_plant_v0.3_a_0080.h5 \
--test-data Arabidopsis_lyrata.h5 --overlap --val-test-batch-size 32 -v
--test-data Arabidopsis_lyrata.h5 --overlap --val-test-batch-size 32 -v --predict-phase

# order of input parameters:
# helixer_post_bin <genome.h5> <predictions.h5> <window_size> <edge_threshold> <peak_threshold> <min_coding_length> <output.gff3>
Expand All @@ -181,13 +181,14 @@ the transcriptome, using a standard parser, for instance [gffread](https://githu
| --h5-output-path | / | **Required**; HDF5 output file for the encoded data. Must end with ".h5". |
| --species | / | **Required**; Species name. Will be added to the .h5 file. |
###### HybridModel.py
| Parameter | Default | Explanation |
|:----------------------|:--------|:---------------------------------------------------------------------------------------------------------------------------------------------------------|
| -l/--load-model-path | / | Path to a trained/pretrained model checkpoint. (HDF5 format) |
| -t/--test-data | / | Path to one test HDF5 file. |
| --overlap | False | Add to improve prediction quality at subsequence ends by creating and overlapping sliding-window predictions (with proportional increase in time usage). |
| --val-test-batch-size | 32 | Batch size for validation/test data |
| -v/--verbose | False | Add to run HybridModel.py in verbosity mode (additional information will be printed) |
| Parameter | Default | Explanation |
|:----------------------|:--------|:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| -l/--load-model-path | / | Path to a trained/pretrained model checkpoint. (HDF5 format) |
| -t/--test-data | / | Path to one test HDF5 file. |
| --overlap | False | Add to improve prediction quality at subsequence ends by creating and overlapping sliding-window predictions (with proportional increase in time usage). |
| --val-test-batch-size | 32 | Batch size for validation/test data |
| -v/--verbose | False | Add to run HybridModel.py in verbosity mode (additional information will be printed) |
| --predict-phase | False | Add this to also predict phases for CDS (recommended); format: [None, 0, 1, 2]; 'None' is used for non-CDS regions, within CDS regions 0, 1, 2 correspond to phase (number of base pairs until the start of the next codon) |
###### helixer_post_bin
(positional arguments, not specified via name but order)

Expand Down
2 changes: 1 addition & 1 deletion docs/fine_tuning.md
Original file line number Diff line number Diff line change
Expand Up @@ -452,7 +452,7 @@ HybridModel.py -v --batch-size 140 --val-test-batch-size 280 \
--class-weights "[0.7, 1.6, 1.2, 1.2]" --transition-weights "[1, 12, 3, 1, 12, 3]" \
--predict-phase --learning-rate 0.0001 --resume-training --fine-tune \
--load-model-path <$HOME/.local/share/Helixer/models/land_plant/land_plant_v0.3_a_0080.h5> \
--input-coverage --coverage-norm log --data-dir --save-model-path <best_tuned_rnaseq_model.h5>
--input-coverage --coverage-norm log --data-dir <fine_tuning_data_dir> --save-model-path <best_tuned_rnaseq_model.h5>
```
###### Previous parameters
| Parameter | Default | Explanation |
Expand Down
2 changes: 1 addition & 1 deletion docs/helixer_options.md
Original file line number Diff line number Diff line change
Expand Up @@ -109,7 +109,7 @@ command line.
|:------------------|:--------|:----------------------------------------------------------------------------------------------------------|
| --float-precision | float32 | Precision of model weights and biases |
| --gpu-id | 1 | Sets GPU index, use if you want to train on one GPU on a multi-GPU machine without a job scheduler system |
| --workers | 1 | Number of threads used to fetch input data. Consider setting to match the number of GPUs |
| --workers | 1 | Number of threads used to fetch input data for training. Consider setting to match the number of GPUs |

### Miscellaneous parameters
| Parameter | Default | Explanation |
Expand Down
20 changes: 18 additions & 2 deletions helixer/prediction/HelixerModel.py
Original file line number Diff line number Diff line change
Expand Up @@ -529,12 +529,12 @@ def __init__(self, cli_args=None):
"region will be cropped. (default: subsequence_length * 3 / 4)")
# resources
self.parser.add_argument('--float-precision', type=str, default='float32')
self.parser.add_argument('--cpus', type=int, default=8)
self.parser.add_argument('--cpus', type=int, default=8, help=argparse.SUPPRESS)
self.parser.add_argument('--gpu-id', type=int, default=-1,
help='sets GPU index, use if you want to train on one GPU on a multi-GPU machine '
'without a job scheduler system')
self.parser.add_argument('--workers', type=int, default=1,
help='umber of threads used to fetch input data; '
help='number of threads used to fetch input data for training; '
'consider setting to match the number of GPUs')
# misc flags
self.parser.add_argument('--save-every-check', action='store_true')
Expand Down Expand Up @@ -566,6 +566,18 @@ def parse_args(self):
takes a list of cli arguments from self.cli_args. This can be used to invoke a HelixerModel from
another script."""
args = vars(self.parser.parse_args(args=self.cli_args))

# hack to deprecate a few args
for arg in ['large_eval_folder', 'cpus', 'stretch_transition_weights', 'coverage_weights', 'coverage_offset',
'calculate_uncertainty', 'no_utrs', 'load_predictions']:
default = self.parser.get_default(arg)
if args[arg] != default:
print(colored(
f"Warning: the argument '{arg}' is deprecated and will be "
f"removed in the future. The argument will have no effect.", 'yellow'))
# set arg back to default
args[arg] = default

self.__dict__.update(args)

if self.nni:
Expand Down Expand Up @@ -1050,3 +1062,7 @@ def insert_coverage_before_hat(self, oldmodel, dense_at):

model = Model(raw_input, output)
return model

if __name__ == '__main__':
print(colored("ERROR: 'HelixerModel.py' is not meant to be executed by the user. "
"Please use 'Helixer.py' or 'HybridModel.py'.", 'red'))
58 changes: 58 additions & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,58 @@
[build-system]
requires = ["setuptools < 72.0"]
build-backend = "setuptools.build_meta"

[project]
name = "helixer"
version = "0.3.4"
description = "Deep Learning fun on gene structure data"
readme = "README.md"
requires-python = ">=3.10.12"
classifiers = [
"Programming Language :: Python :: 3",
"Operating System :: POSIX :: Linux",
"License :: OSI Approved :: GNU General Public License v3 (GPLv3)",
]
dependencies = [
"geenuff @ git+https://github.com/weberlab-hhu/[email protected]",
"sqlalchemy==1.3.22",
"tensorflow>=2.6.2",
"tensorflow-addons>=0.21.0",
"nni",
"seaborn",
"Keras<3.0.0",
"keras_layer_normalization",
"terminaltables",
"HTSeq",
"intervaltree",
"numpy",
"h5py",
"multiprocess",
"numcodecs",
"appdirs"
]
authors = [
{name = "Alisandra K. Denton"},
{name = "Felix Holst"},
{name = "Janina Mass"},
{name = "Anthony Bolger"},
{name = "Felicitas Kindel"},
{name = "Christopher Guenther"},
]

[project.urls]
"Homepage" = "https://github.com/weberlab-hhu/Helixer"
"Documentation" = "https://github.com/weberlab-hhu/Helixer"

[tool.setuptools]
packages = [
"helixer",
"helixer.core",
"helixer.prediction",
"helixer.evaluation",
"helixer.tests",
"helixer.export"
]
package-data = {helixer = ["testdata/*.fa", "testdata/*.gff"]}
script-files = ["Helixer.py", "fasta2h5.py", "geenuff2h5.py", "helixer/prediction/HybridModel.py",
"scripts/fetch_helixer_models.py"]
27 changes: 1 addition & 26 deletions setup.py
Original file line number Diff line number Diff line change
@@ -1,28 +1,3 @@
from setuptools import setup

setup(
name='helixer',
version='0.3.4',
description='Deep Learning fun on gene structure data',
packages=['helixer', 'helixer.core', 'helixer.prediction', 'helixer.evaluation', 'helixer.tests', 'helixer.export'],
package_data={'helixer': ['testdata/*.fa', 'testdata/*.gff']},
install_requires=["geenuff @ git+https://github.com/weberlab-hhu/[email protected]",
"sqlalchemy==1.3.22",
"tensorflow>=2.6.2",
"tensorflow-addons>=0.21.0",
"nni",
"seaborn",
"Keras<3.0.0",
"keras_layer_normalization",
"terminaltables",
"HTSeq",
"intervaltree",
"numpy",
"h5py",
"multiprocess",
"numcodecs",
"appdirs",
],
scripts=["Helixer.py", "fasta2h5.py", "geenuff2h5.py", "helixer/prediction/HybridModel.py", "scripts/fetch_helixer_models.py"],
zip_safe=False,
)
setup(zip_safe=False)

0 comments on commit 73e6aca

Please sign in to comment.