Skip to content

Commit

Permalink
Added only --ps option for pure DCA
Browse files Browse the repository at this point in the history
  • Loading branch information
niklases committed Jan 5, 2024
1 parent 4a338c7 commit 1bdd482
Show file tree
Hide file tree
Showing 9 changed files with 106 additions and 10 deletions.
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -387,3 +387,4 @@ scripts/Setup/linux/apc.png
datasets/ANEH/KARS160122_PLS_LOOCV_ML_Model_Performance.png
datasets/ANEH/CV_performance/KARS160122_PLS_LOOCV_5-fold-CV.png
datasets/ANEH/CV_performance/KARS160122_PLS_LOOCV_CV_Results.txt
datasets/AVGFP/Predictions_Hybrid_TopTS.txt
53 changes: 53 additions & 0 deletions .vscode/launch.json
Original file line number Diff line number Diff line change
Expand Up @@ -104,6 +104,23 @@
]
},

{
"name": "Python: PyPEF hybrid/only-PS-zero-shot GREMLIN-DCA avGFP",
"type": "python",
"request": "launch",
"env": {"PYTHONPATH": "${workspaceFolder}"},
"program": "${workspaceFolder}/pypef/main.py",
"console": "integratedTerminal",
"justMyCode": true,
"cwd": "${workspaceFolder}/datasets/AVGFP/",
"args": [
"hybrid",
//"-m", "GREMLIN", // optional, not required
"--ps", "TS.fasl",
"--params", "GREMLIN"
]
},

{ // PLMC zero-shot steps:
// 1. $pypef param_inference --params uref100_avgfp_jhmmer_119_plmc_42.6.params
// 2. $pypef hybrid -t TS.fasl --params PLMC
Expand Down Expand Up @@ -136,6 +153,42 @@
"--params", "PLMC",
"--threads", "24"
]
},

{
"name": "Python: PyPEF hybrid/only-PS-zero-shot PLMC-DCA avGFP",
"type": "python",
"request": "launch",
"env": {"PYTHONPATH": "${workspaceFolder}"},
"program": "${workspaceFolder}/pypef/main.py",
"console": "integratedTerminal",
"justMyCode": true,
"cwd": "${workspaceFolder}/datasets/AVGFP/",
"args": [
"hybrid",
//"-m", "PLMC", // optional, not required
"--ps", "TS.fasl",
"--params", "uref100_avgfp_jhmmer_119_plmc_42.6.params",
"--threads", "24"
]
},

{
"name": "Python: PyPEF hybrid/only-PS-zero-shot PLMC-DCA variant 2 avGFP",
"type": "python",
"request": "launch",
"env": {"PYTHONPATH": "${workspaceFolder}"},
"program": "${workspaceFolder}/pypef/main.py",
"console": "integratedTerminal",
"justMyCode": true,
"cwd": "${workspaceFolder}/datasets/AVGFP/",
"args": [
"hybrid",
//"-m", "PLMC", // optional, not required
"--ps", "TS.fasl",
"--params", "PLMC",
"--threads", "24"
]
}
]
}
3 changes: 3 additions & 0 deletions .vscode/settings.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
{
"markdown.extension.toc.updateOnSave": false
}
17 changes: 14 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,8 +17,9 @@ Preprint available at bioRxiv: https://doi.org/10.1101/2022.06.07.495081.
<sup>*§*</sup><sub>Equal contribution</sub> <br>

---

## Table of Contents
- [PyPEF: Pythonic Protein Engineering Framework](#pypef-pythonic-protein-engineering-framework)
[PyPEF: Pythonic Protein Engineering Framework](#pypef-pythonic-protein-engineering-framework)
- [Quick Installation](#quick-installation)
- [Requirements](#requirements)
- [Running Examples](#running-examples)
Expand Down Expand Up @@ -398,15 +399,19 @@ python3 ./pypef/main.py
```
5.2 After [installing plmc](https://github.com/debbiemarkslab/plmc#compilation), generate the evolutionary coupling file, which is used for encoding sequences. For example, set `-le` to the value output by `sto2a2m`:
```
plmc -o ANEH_72.6.params -le 72.6 -m 100 -g -f WT_ANEH ANEH_jhmmer.a2m
```
The output parameter (.params) file can be used for encoding sequences with the DCA-based encoding technique (`-e dca`) by providing it to PyPEF; e.g. for pure ML modeling:
```
pypef ml -e dca -l LS.fasl -t TS.fasl --regressor pls --params ANEH_72.6.params
```
Or for hybrid modeling:
```
pypef hybrid -l LS.fasl -t TS.fasl --params ANEH_72.6.params
```
Expand All @@ -420,21 +425,27 @@ To make zero-shot predictions using PyPEF (plmc-DCA or GREMLIN-DCA) just do not
```
pypef param_inference --msa uref100_avgfp_jhmmer_119.a2m
pypef hybrid -t AVGFP_TS.fasl --params GREMLIN
pypef hybrid -t TS.fasl --params GREMLIN
```
using the GREMLIN parameters, or,
```
pypef param_inference --params uref100_avgfp_jhmmer_119_plmc_42.6.params
pypef hybrid -t TS.fasl --params PLMC
```
using the plmc parameters.
Other well-performing zero-shot prediction methods with available source code are (list not complete, see ProteinGym [repository](https://github.com/OATML-Markslab/ProteinGym) and [website](https://proteingym.org/) for a more detailed overview of available methods and achieved performances):
Other well-performing zero-shot prediction methods with available source code are:
- ESM-1v/ESM-2 (https://github.com/facebookresearch/esm)
- DeepSequence (https://github.com/debbiemarkslab/DeepSequence)
- EVcouplings (plmc-DCA, https://github.com/debbiemarkslab/EVcouplings)
- EVE (https://github.com/OATML/EVE)
- Tranception (https://github.com/OATML-Markslab/Tranception)
This list is by no means complete, see ProteinGym [repository](https://github.com/OATML-Markslab/ProteinGym) and [website](https://proteingym.org/) for a more detailed overview of available methods and achieved performances (as well as for getting many benchmark data sets).
<a name="api-usage"></a>
## API Usage for Sequence Encoding
Expand Down
2 changes: 1 addition & 1 deletion pypef/dca/dca_run.py
Original file line number Diff line number Diff line change
Expand Up @@ -61,7 +61,7 @@ def run_pypef_hybrid_modeling(arguments):
label=arguments['--label']
)

elif arguments['--params'] and arguments['--model']:
elif arguments['--params'] and arguments['--model'] or arguments['--ps']:
prediction_dict = {}
prediction_dict.update({
'drecomb': arguments['--drecomb'],
Expand Down
9 changes: 6 additions & 3 deletions pypef/dca/hybrid_model.py
Original file line number Diff line number Diff line change
Expand Up @@ -1094,11 +1094,14 @@ def predict_ps( # also predicting "pmult" dict directories
in the respective created folders).
"""
logger.info(f'Taking model from saved model (Pickle file): {model_pickle_file}...')

if model_pickle_file is None:
model_pickle_file = params_file
logger.info(f'Trying to load model from saved parameters (Pickle file): {model_pickle_file}...')
else:
logger.info(f'Loading model from saved model (Pickle file): {model_pickle_file}...')
model, model_type = get_model_and_type(model_pickle_file)

if model_type == 'PLMC':
if model_type == 'PLMC' or model_type == 'GREMLIN':
logger.info(f'No hybrid model provided – falling back to a statistical DCA model.')
elif model_type == 'Hybrid':
beta_1, beta_2, reg = model.beta_1, model.beta_2, model.regressor
Expand Down
3 changes: 2 additions & 1 deletion pypef/main.py
Original file line number Diff line number Diff line change
Expand Up @@ -145,7 +145,8 @@
pypef shift_pos --input CSV_FILE --offset OFFSET
[--sep CSV_COLUMN_SEPARATOR] [--mutation_sep MUTATION_SEPARATOR] [--fitness_key FITNESS_KEY]
pypef sto2a2m --sto STO_MSA_FILE [--inter_gap INTER_GAP] [--intra_gap INTRA_GAP]
pypef hybrid --ts TEST_SET
pypef hybrid
[--ts TEST_SET] [--ps PREDICTION_SET]
[--model MODEL] [--params PARAM_FILE]
[--ls LEARNING_SET] [--label] [--threads THREADS]
pypef hybrid --model MODEL --params PARAM_FILE
Expand Down
12 changes: 11 additions & 1 deletion scripts/CLI/run_cli_tests_linux.sh
Original file line number Diff line number Diff line change
Expand Up @@ -368,11 +368,21 @@ echo
$pypef hybrid -m PLMC -t TS.fasl --params PLMC --threads $threads
echo

# pure statistical
# Hybrid: pure statistical
$pypef hybrid -t TS.fasl --params PLMC --threads $threads
echo
$pypef hybrid -p TS.fasl --params PLMC --threads $threads
echo
# Same as above command
$pypef hybrid -p TS.fasl -m PLMC --params PLMC --threads $threads
echo
$pypef hybrid -t TS.fasl --params GREMLIN
echo
$pypef hybrid -p TS.fasl --params GREMLIN
echo
# Same as above command
$pypef hybrid -p TS.fasl -m GREMLIN --params GREMLIN
echo
$pypef hybrid -m GREMLIN -t TS.fasl --params GREMLIN
echo
$pypef hybrid -l LS.fasl -t TS.fasl --params GREMLIN
Expand Down
16 changes: 15 additions & 1 deletion scripts/CLI/run_cli_tests_win.ps1
Original file line number Diff line number Diff line change
Expand Up @@ -511,13 +511,27 @@ pypef hybrid -m PLMC -t TS.fasl --params PLMC --threads $threads
ExitOnExitCode
Write-Host

# pure statistical
# Hybrid: pure statistical
pypef hybrid -t TS.fasl --params PLMC --threads $threads
ExitOnExitCode
Write-Host
pypef hybrid -p TS.fasl --params PLMC --threads $threads
ExitOnExitCode
Write-Host
# Same as above command
pypef hybrid -p TS.fasl -m PLMC --params PLMC --threads $threads
ExitOnExitCode
Write-Host
pypef hybrid -t TS.fasl --params GREMLIN
ExitOnExitCode
Write-Host
pypef hybrid -p TS.fasl --params GREMLIN
ExitOnExitCode
Write-Host
# Same as above command
pypef hybrid -p TS.fasl -m GREMLIN --params GREMLIN
ExitOnExitCode
Write-Host
pypef hybrid -m GREMLIN -t TS.fasl --params GREMLIN
ExitOnExitCode
Write-Host
Expand Down

0 comments on commit 1bdd482

Please sign in to comment.