Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds modules nextflow pseudocode #79

Merged
merged 40 commits into from
Nov 24, 2020
Merged

Conversation

cgpu
Copy link
Collaborator

@cgpu cgpu commented Nov 24, 2020

This PR adds the nextflow specific files in the root of the repo. The files were auto-generated (using the nf-core create command) and editted to simplify and remove boilerplate.

environment.yml Outdated Show resolved Hide resolved
main.nf Outdated Show resolved Hide resolved
environment.yml Outdated Show resolved Hide resolved
main.nf Outdated Show resolved Hide resolved
main.nf Outdated Show resolved Hide resolved
main.nf Outdated Show resolved Hide resolved
To respect the rule, "we do not choose to modify cod ebehaviour by commenting in and out code chunks",
main.nf Outdated Show resolved Hide resolved
@cgpu cgpu requested a review from gsheynkman November 24, 2020 20:41
@cgpu cgpu merged commit 6e42ac3 into dev Nov 24, 2020
@cgpu cgpu deleted the adds-modules-nextflow-pseudocode branch November 24, 2020 20:43
gsheynkman added a commit that referenced this pull request Dec 10, 2020
* Adds author name in README.md

* Adds author name in README.md

* Deletes temp file

* Adds author name in README.md

* Adds Author Name in README (#15)

* Adds author name in README.md

* Adds author name in README.md

* Deletes temp file

* Adds author name in README.md

* add long read info basics

* gui app manifest gitignore fix

* Adds @trishorts name in README.md (#18)

* add author name to readme.md

* add one line to refresh commit

* add author name

Co-authored-by: Michael Shortreed <[email protected]>
Co-authored-by: cgpu <[email protected]>

* Adds @gsheynkman name to README.md (#16)

* Added authorname in README

Co-authored-by: cgpu <[email protected]>

* Adds Rachel Miller to the author names in the README (#14)

* Adds Rachel Miller to the author names in the README

* Minor typo

Co-authored-by: cgpu <[email protected]>

* added author and ORCID (#12)

Co-authored-by: cgpu <[email protected]>

* Refined database orig code (#29)

* Add initial code to extract and cluster pacbio protein sequences, based on input from LR_ORFCalling

* aggregation of FL and CPM by cluster

Co-authored-by: Robert Millikin <[email protected]>
Co-authored-by: Gloria Sheynkman <[email protected]>

* Bj8th orf calling (#25)

* Add author name in README.md

* orf calling updted to run from command line

Co-authored-by: gsheynkman <[email protected]>

* Adds nextflow files and folders based on nf-core template (#26)

* Adds nf-core template for nextflow pips

* Cleans up template main.nf and adds swag cli message

* Updates nextflow.config

* Adds Dockerfile and env yaml updates

* Removes redundant files from assets

* Deleted nf schema json

* Removes redundant configs

* Updates README with template structure

* Updates docs/

* Updates repo name in changelog

* Updates template test.config

* Adds bin folder and template wrapper R script

* Adds pbccs in env.yml

* Changes the location of pipeline info, logs

* Adds .github folder

* Removes redundant files from GH actions

* Removes AWS tests

* Adds misspelling test

* Removes linting.yml

* Removes igenomes config

* Adds tentative LICENSE (MIT)

* Adds nudge for asking help via GH issues

* added pull scripts from the zenodo site

* weighted protein inference

* fixes

* remove script to hopefully avoid merge conflict..

* update mzLib

* require equal long read weight for indistinguishable proteins

* add contrib

* Modified README.md File in LR_TranscriptomeSummary

* Add files via upload

This adds the genomic data compilation and comparison jupyter notebook script and adds several custom module dependencies.

* update main readme with author names (#38)

* update README contributions

* new readme

* fix minor errors in readme (#40)

* update README contributions

* new readme

* fix readme errors

* excel compatible tsv by default

* accept thermo license by default

* Updated READMEs, uploaded scripts for isoform mapping, protein clustering. (#41)

* Updated various readmes. Uploaded scripts for isoform mappings, protein clustering, etc.

* Add scripts that run blast and parse blast results to find at-length matches between isoforms. Need to combine with Anne's pipeline.

* add scripts for blast mapping between protein databases

* delete files that shouldn't be part of the previous commit, old scripts accidentally put in DatabaseAnnotation

* updated contributions (#44)

* Attempt to create a container for blast mapping process within nextflow. (#48)

* Updated various readmes. Uploaded scripts for isoform mappings, protein clustering, etc.

* Add scripts that run blast and parse blast results to find at-length matches between isoforms. Need to combine with Anne's pipeline.

* add scripts for blast mapping between protein databases

* Attempt to create nextflow container to do blast searching for isoform accession mapping

* correct spelling error

* adding cpat and transdecoder containers (#36)

* added pull scripts from the zenodo site

* adding cpat and transdecoder containers

* adding the empty data directory with README.md explaining why it is empty on github

* github actions  caught my spelling error

* left out  in front of the conda commands for both these containers

* added debugged containers

* moved test.config to conf/executor/test.config

* fixed syntax error executor -> executors

* Update README.md

* Updated version of previous files with less typos

* Delete Transcriptomic_Proteomic_Comparison.ipynb

* Delete m_MMprocess.py

* Delete m_gen_maps.py

* Delete m_make_gene_length_table.py

* Delete m_sqantitable.py

* Delete m_squantitable.py

* Updated version with less typos

* Update README.md

* Preliminary module for analyzing peptide space

* Added commands to run CPAT. 'output' directory to .gitignore. (#53)

* Updated various readmes. Uploaded scripts for isoform mappings, protein clustering, etc.

* Add scripts that run blast and parse blast results to find at-length matches between isoforms. Need to combine with Anne's pipeline.

* add scripts for blast mapping between protein databases

* Attempt to create nextflow container to do blast searching for isoform accession mapping

* correct spelling error

* Made amendments to orf_calling.py.

* CPAT commands. Added output directory to gitignore. Outupt directly to hold intermediate datafiles.

* Add nextflow-related files, sundries. (#55)

* Updated various readmes. Uploaded scripts for isoform mappings, protein clustering, etc.

* Add scripts that run blast and parse blast results to find at-length matches between isoforms. Need to combine with Anne's pipeline.

* add scripts for blast mapping between protein databases

* Attempt to create nextflow container to do blast searching for isoform accession mapping

* correct spelling error

* Made amendments to orf_calling.py.

* CPAT commands. Added output directory to gitignore. Outupt directly to hold intermediate datafiles.

* Adding Simi's transcriptomic and peptide analysis scripts. (#56)

* Updated various readmes. Uploaded scripts for isoform mappings, protein clustering, etc.

* Add scripts that run blast and parse blast results to find at-length matches between isoforms. Need to combine with Anne's pipeline.

* add scripts for blast mapping between protein databases

* Attempt to create nextflow container to do blast searching for isoform accession mapping

* correct spelling error

* Made amendments to orf_calling.py.

* CPAT commands. Added output directory to gitignore. Outupt directly to hold intermediate datafiles.

* Gloria adding Simi's transcriptome analysis and peptide analysis scripts.

* argparse modification (#57)

* adds lr_orfcalling.nf and lr_orfcalling_nextflow.config in the module LR_ORFCalling (#59)

* adding cpat and transdecoder containers

* adding the empty data directory with README.md explaining why it is empty on github

* github actions  caught my spelling error

* left out  in front of the conda commands for both these containers

* added debugged containers

* moved test.config to conf/executor/test.config

* fixed syntax error executor -> executors

* created local lr_orfcalling.nf and _nextflow.config to aid in local testing and debugging before final merge into pipeline

* adding main.nf and nextflow.config

* Protein Inference Analysis Module Custom Script (#64)

* Adds Rachel Miller to the author names in the README

* custom script for the comparison of protein group output from MetaMorpheus searches using different protein database reference models

* Add files via upload

Update of peptide analysis jupyter notebook script

* Need to merge with latest dev (disregarding differences in dev_gloria, which are outdated) (#67)

* Updated various readmes. Uploaded scripts for isoform mappings, protein clustering, etc.

* Add scripts that run blast and parse blast results to find at-length matches between isoforms. Need to combine with Anne's pipeline.

* add scripts for blast mapping between protein databases

* Attempt to create nextflow container to do blast searching for isoform accession mapping

* correct spelling error

* Made amendments to orf_calling.py.

* CPAT commands. Added output directory to gitignore. Outupt directly to hold intermediate datafiles.

* Gloria adding Simi's transcriptome analysis and peptide analysis scripts.

* Update orf_calling.py

Removed a line to match spacing with the py file in dev.

* Convert jupyter notebook into python

* ORF Calling - bug fix (#70)

* argparse modification

* small import bug fix

Co-authored-by: gsheynkman <[email protected]>

* Update README.md (#65)

* Update README.md

* Update README.md

spelling fixes

* Update README.md

* ORF Filtering bug fixes and RefineDB (#71)

* argparse modification

* small import bug fix

* fixed bugs in orf_filter. module for refining db

* refine orf working

Co-authored-by: gsheynkman <[email protected]>

* Adds Dockerfile, environment.yml for SQANTI3 (#72)

* Adds Dockerfile, environment.yml for SQANTI3

* Improves container files

Co-authored-by: EC2 Default User <[email protected]>

* Updated peptide_analysis script for review and added required files/tables

* Update peptide_analysis.py

* Updated .gitignore with a local data file

* Updated peptide_analysis.py to include new path info

* Delete gene_based_info.tsv

* Delete trans_to_gene.tsv

* Updated peptide analysis file  (#66)

* Adds author name in README.md

* Adds author name in README.md

* Deletes temp file

* Adds author name in README.md

* Modified README.md File in LR_TranscriptomeSummary

* Add files via upload

This adds the genomic data compilation and comparison jupyter notebook script and adds several custom module dependencies.

* Update README.md

* Updated version of previous files with less typos

* Delete Transcriptomic_Proteomic_Comparison.ipynb

* Delete m_MMprocess.py

* Delete m_gen_maps.py

* Delete m_make_gene_length_table.py

* Delete m_sqantitable.py

* Delete m_squantitable.py

* Updated version with less typos

* Update README.md

* Preliminary module for analyzing peptide space

* Add files via upload

Update of peptide analysis jupyter notebook script

* Convert jupyter notebook into python

* Updated peptide_analysis script for review and added required files/tables

* Update peptide_analysis.py

* Updated .gitignore with a local data file

* Updated peptide_analysis.py to include new path info

* Delete gene_based_info.tsv

* Delete trans_to_gene.tsv

* 6frm readme (#45)

* Add initial code to extract and cluster pacbio protein sequences, based on input from LR_ORFCalling

* Started code for protein group mapping

* add toy tables for the protein inference mapping

* edited 6frm translate readme

* delete mock files for protein inference (protein group) comparisons. Rachel and Kyndalanne have continued to work on this and these may be outdated.

Co-authored-by: Robert Millikin <[email protected]>
Co-authored-by: Gloria Sheynkman <[email protected]>

* Protein Inference (#74)

* Separate module for greedy protein inference

* protein_inference bug fix

* added rescue to greedy algorithm

* connected peptides changed to set

* small bug fix. cleaned up notebook

* Removed unnecessary files from Transcriptome Module

* Removed unnecessary files from Transcriptome module

* Removed unnecessary files from Transcriptome module

* Removed unnecessary files from Transcriptome module

* Removed unnecessary files from Transcriptome module  (#75)

* Adds author name in README.md

* Adds author name in README.md

* Deletes temp file

* Adds author name in README.md

* Modified README.md File in LR_TranscriptomeSummary

* Add files via upload

This adds the genomic data compilation and comparison jupyter notebook script and adds several custom module dependencies.

* Update README.md

* Updated version of previous files with less typos

* Delete Transcriptomic_Proteomic_Comparison.ipynb

* Delete m_MMprocess.py

* Delete m_gen_maps.py

* Delete m_make_gene_length_table.py

* Delete m_sqantitable.py

* Delete m_squantitable.py

* Updated version with less typos

* Update README.md

* Preliminary module for analyzing peptide space

* Add files via upload

Update of peptide analysis jupyter notebook script

* Convert jupyter notebook into python

* Updated peptide_analysis script for review and added required files/tables

* Update peptide_analysis.py

* Updated .gitignore with a local data file

* Updated peptide_analysis.py to include new path info

* Delete gene_based_info.tsv

* Delete trans_to_gene.tsv

* Removed unnecessary files from Transcriptome Module

* Removed unnecessary files from Transcriptome module

* Removed unnecessary files from Transcriptome module

* Removed unnecessary files from Transcriptome module

* Simi and Gloria - Update referencetable, transcriptome, and peptide modules (#78)

* Files in progress to create three modules: ReferenceTables, TranscriptomeAnalysis, PeptideAnalysis. Also, debugged orf_calling.py, found that minus strand ORFs not included.

* Prepared a script that makes reference tables

* Updated Transcriptomic Script

* Updated Transcriptomic Script (#77)

Co-authored-by: kyuubi430 <[email protected]>

* Remove files for making three modules with simi.

* Cleaned up referencetable module, Simi to edit.

* Modified Reference Tables Script

* Deleted plots.

* Simi and Gloria finalized the prepare_reference_tables. Works on commandline. Correct outputs to results/PG_ReferenceTables.

* Small edits to peptide_analysis, not done, push to Simi.

* Modified the names out output files from Prepare Reference Tabe script

* Changed file names in reference tables script and modified the transcriptome summary

* Delete unneeded files in transcriptome summary module.

* Finalized ReferenceTables. tested Transcriptome Summary. Started modifying the PeptideAnalysis.

* Made the transcriptome summary script command line executable

* Made the peptide analysis script command line runnable

* In process of modifying MMprocessing script

* Move scripts between TranscriptomeSummary and PeptideAnalysis modules. Code related to MM peptide/protein processing will now be exclusively in PeptideAnalysis.

* Added fasta/tsv and the results directory to gitignore

* Delete jurkat_orf_refined.fasta

Don't want to include *fasta in pull request.

* Delete genes_in_refined.tsv

Don't want to include *tsv output file in PR.
Added *tsv to gitignore, so shouldn't upload in future PR.

Co-authored-by: kyuubi430 <[email protected]>

* Adds modules nextflow pseudocode (#79)

* Adds nf-core template for nextflow pips

* Cleans up template main.nf and adds swag cli message

* Updates nextflow.config

* Adds Dockerfile and env yaml updates

* Removes redundant files from assets

* Deleted nf schema json

* Removes redundant configs

* Updates README with template structure

* Updates docs/

* Updates repo name in changelog

* Updates template test.config

* Adds bin folder and template wrapper R script

* Adds pbccs in env.yml

* Changes the location of pipeline info, logs

* Adds .github folder

* Removes redendant files from GH actions

* Updates CONTRIBUTING.md

* Updates ISSUE_TEMPLATE

* Update PULL_REQUEST_TEMPLATE.md

* Removes AWS tests

* Adds misspelling test

* Removes linting.yml

* Corrects typo

* Removes igenomes config

* Fixes typos caught by review-dog

* Adds tentative LICENSE

* Adds environment.yml with pandas, numpy, biopython

* Adds CCS process

* Adds pbbam (required for ccs --chunk subsequent routine)

* Adds pbindex, ccs processes (w/ parallel --chunks)

* Removes redundant bai (pbi is needed)

* Adds temp process mock ccs and flag for testing

* Deletes commented out section

To respect the rule, "we do not choose to modify cod ebehaviour by commenting in and out code chunks",

* Makes the section note more informative

* Dev rmmiller protein inference (#83)

* Adds Rachel Miller to the author names in the README

* custom script for the comparison of protein group output from MetaMorpheus searches using different protein database reference models

* Make protein inference analysis script command line executable

* spelling fixes

* Update PI_proteinInferenceAnalysis.py

fix merge conflicts

* Adds nextflow referencetable, notes to SQANTI module. (#88)

* Initiated files for nextflowifying reference-table module.

* Add nextflow code for reference table module. Successfully run on Lifebit/CloudOS.

* Rename transcript script. Add notes on requirements and commands to run SQANTI.

* Adds nextflow refined db (#80)

* Moves refine_orfs.py

* Restores tsv vs csv in refine_orfs.py

* Adds Nextflow files for refined db generation

* Refactor Dockerfile to eliminate duplication of env name

* Updates README.md in the modules/LR_ORFCalling subdirectory and lr_orfcalling.nf and lr_orfcalling_nextflow.config (#63)

* adding cpat and transdecoder containers

* adding the empty data directory with README.md explaining why it is empty on github

* github actions  caught my spelling error

* left out  in front of the conda commands for both these containers

* added debugged containers

* moved test.config to conf/executor/test.config

* fixed syntax error executor -> executors

* created local lr_orfcalling.nf and _nextflow.config to aid in local testing and debugging before final merge into pipeline

* adding main.nf and nextflow.config

* added to README.md instructions for executing nextflow within a jupyter notebook for debugging

* Update README.md

* Update README.md

* Updates ORF calling Nextflow files (#73)

* Updates ORF calling Nextflow snippet

* Deletes redundant global container definition

* Removes superfluous new line

* Clean up of comments

* Updates channel syntax to simplify to 1 line

* Adds TransDecoder in log.info summary

* Adds nextflow_run.sh with commands, expected exits

* Improves code readability, adds file exists check

Co-authored-by: EC2 Default User <[email protected]>

Co-authored-by: cgpu <[email protected]>
Co-authored-by: EC2 Default User <[email protected]>
Co-authored-by: bj8th <[email protected]>

* Add nextflow io (#100)

* added input output to readmes and modified to run via main

* refine database readme

* pull request changes made

* Clean up peptideanalysismodule (#99)

* Clean up peptide analysis module.

* Remove README

* Bj8th readme (#102)

* added readme information to several modules. updated modules to run from command line

* added source modules

Co-authored-by: kyuubi430 <[email protected]>
Co-authored-by: kyuubi430 <[email protected]>
Co-authored-by: rob <[email protected]>
Co-authored-by: trishorts <[email protected]>
Co-authored-by: Michael Shortreed <[email protected]>
Co-authored-by: cgpu <[email protected]>
Co-authored-by: gsheynkman <[email protected]>
Co-authored-by: rmmiller22 <[email protected]>
Co-authored-by: Anne Deslattes Mays <[email protected]>
Co-authored-by: Gloria Sheynkman <[email protected]>
Co-authored-by: cgpu <[email protected]>
Co-authored-by: EC2 Default User <[email protected]>
bj8th added a commit that referenced this pull request Dec 10, 2020
* Adds author name in README.md

* Adds author name in README.md

* Deletes temp file

* Adds author name in README.md

* Adds Author Name in README (#15)

* Adds author name in README.md

* Adds author name in README.md

* Deletes temp file

* Adds author name in README.md

* add long read info basics

* gui app manifest gitignore fix

* Adds @trishorts name in README.md (#18)

* add author name to readme.md

* add one line to refresh commit

* add author name

Co-authored-by: Michael Shortreed <[email protected]>
Co-authored-by: cgpu <[email protected]>

* Adds @gsheynkman name to README.md (#16)

* Added authorname in README

Co-authored-by: cgpu <[email protected]>

* Adds Rachel Miller to the author names in the README (#14)

* Adds Rachel Miller to the author names in the README

* Minor typo

Co-authored-by: cgpu <[email protected]>

* added author and ORCID (#12)

Co-authored-by: cgpu <[email protected]>

* Refined database orig code (#29)

* Add initial code to extract and cluster pacbio protein sequences, based on input from LR_ORFCalling

* aggregation of FL and CPM by cluster

Co-authored-by: Robert Millikin <[email protected]>
Co-authored-by: Gloria Sheynkman <[email protected]>

* Bj8th orf calling (#25)

* Add author name in README.md

* orf calling updted to run from command line

Co-authored-by: gsheynkman <[email protected]>

* Adds nextflow files and folders based on nf-core template (#26)

* Adds nf-core template for nextflow pips

* Cleans up template main.nf and adds swag cli message

* Updates nextflow.config

* Adds Dockerfile and env yaml updates

* Removes redundant files from assets

* Deleted nf schema json

* Removes redundant configs

* Updates README with template structure

* Updates docs/

* Updates repo name in changelog

* Updates template test.config

* Adds bin folder and template wrapper R script

* Adds pbccs in env.yml

* Changes the location of pipeline info, logs

* Adds .github folder

* Removes redundant files from GH actions

* Removes AWS tests

* Adds misspelling test

* Removes linting.yml

* Removes igenomes config

* Adds tentative LICENSE (MIT)

* Adds nudge for asking help via GH issues

* added pull scripts from the zenodo site

* weighted protein inference

* fixes

* remove script to hopefully avoid merge conflict..

* update mzLib

* require equal long read weight for indistinguishable proteins

* add contrib

* Modified README.md File in LR_TranscriptomeSummary

* Add files via upload

This adds the genomic data compilation and comparison jupyter notebook script and adds several custom module dependencies.

* update main readme with author names (#38)

* update README contributions

* new readme

* fix minor errors in readme (#40)

* update README contributions

* new readme

* fix readme errors

* excel compatible tsv by default

* accept thermo license by default

* Updated READMEs, uploaded scripts for isoform mapping, protein clustering. (#41)

* Updated various readmes. Uploaded scripts for isoform mappings, protein clustering, etc.

* Add scripts that run blast and parse blast results to find at-length matches between isoforms. Need to combine with Anne's pipeline.

* add scripts for blast mapping between protein databases

* delete files that shouldn't be part of the previous commit, old scripts accidentally put in DatabaseAnnotation

* updated contributions (#44)

* Attempt to create a container for blast mapping process within nextflow. (#48)

* Updated various readmes. Uploaded scripts for isoform mappings, protein clustering, etc.

* Add scripts that run blast and parse blast results to find at-length matches between isoforms. Need to combine with Anne's pipeline.

* add scripts for blast mapping between protein databases

* Attempt to create nextflow container to do blast searching for isoform accession mapping

* correct spelling error

* adding cpat and transdecoder containers (#36)

* added pull scripts from the zenodo site

* adding cpat and transdecoder containers

* adding the empty data directory with README.md explaining why it is empty on github

* github actions  caught my spelling error

* left out  in front of the conda commands for both these containers

* added debugged containers

* moved test.config to conf/executor/test.config

* fixed syntax error executor -> executors

* Update README.md

* Updated version of previous files with less typos

* Delete Transcriptomic_Proteomic_Comparison.ipynb

* Delete m_MMprocess.py

* Delete m_gen_maps.py

* Delete m_make_gene_length_table.py

* Delete m_sqantitable.py

* Delete m_squantitable.py

* Updated version with less typos

* Update README.md

* Preliminary module for analyzing peptide space

* Added commands to run CPAT. 'output' directory to .gitignore. (#53)

* Updated various readmes. Uploaded scripts for isoform mappings, protein clustering, etc.

* Add scripts that run blast and parse blast results to find at-length matches between isoforms. Need to combine with Anne's pipeline.

* add scripts for blast mapping between protein databases

* Attempt to create nextflow container to do blast searching for isoform accession mapping

* correct spelling error

* Made amendments to orf_calling.py.

* CPAT commands. Added output directory to gitignore. Outupt directly to hold intermediate datafiles.

* Add nextflow-related files, sundries. (#55)

* Updated various readmes. Uploaded scripts for isoform mappings, protein clustering, etc.

* Add scripts that run blast and parse blast results to find at-length matches between isoforms. Need to combine with Anne's pipeline.

* add scripts for blast mapping between protein databases

* Attempt to create nextflow container to do blast searching for isoform accession mapping

* correct spelling error

* Made amendments to orf_calling.py.

* CPAT commands. Added output directory to gitignore. Outupt directly to hold intermediate datafiles.

* Adding Simi's transcriptomic and peptide analysis scripts. (#56)

* Updated various readmes. Uploaded scripts for isoform mappings, protein clustering, etc.

* Add scripts that run blast and parse blast results to find at-length matches between isoforms. Need to combine with Anne's pipeline.

* add scripts for blast mapping between protein databases

* Attempt to create nextflow container to do blast searching for isoform accession mapping

* correct spelling error

* Made amendments to orf_calling.py.

* CPAT commands. Added output directory to gitignore. Outupt directly to hold intermediate datafiles.

* Gloria adding Simi's transcriptome analysis and peptide analysis scripts.

* argparse modification (#57)

* adds lr_orfcalling.nf and lr_orfcalling_nextflow.config in the module LR_ORFCalling (#59)

* adding cpat and transdecoder containers

* adding the empty data directory with README.md explaining why it is empty on github

* github actions  caught my spelling error

* left out  in front of the conda commands for both these containers

* added debugged containers

* moved test.config to conf/executor/test.config

* fixed syntax error executor -> executors

* created local lr_orfcalling.nf and _nextflow.config to aid in local testing and debugging before final merge into pipeline

* adding main.nf and nextflow.config

* Protein Inference Analysis Module Custom Script (#64)

* Adds Rachel Miller to the author names in the README

* custom script for the comparison of protein group output from MetaMorpheus searches using different protein database reference models

* Add files via upload

Update of peptide analysis jupyter notebook script

* Need to merge with latest dev (disregarding differences in dev_gloria, which are outdated) (#67)

* Updated various readmes. Uploaded scripts for isoform mappings, protein clustering, etc.

* Add scripts that run blast and parse blast results to find at-length matches between isoforms. Need to combine with Anne's pipeline.

* add scripts for blast mapping between protein databases

* Attempt to create nextflow container to do blast searching for isoform accession mapping

* correct spelling error

* Made amendments to orf_calling.py.

* CPAT commands. Added output directory to gitignore. Outupt directly to hold intermediate datafiles.

* Gloria adding Simi's transcriptome analysis and peptide analysis scripts.

* Update orf_calling.py

Removed a line to match spacing with the py file in dev.

* Convert jupyter notebook into python

* ORF Calling - bug fix (#70)

* argparse modification

* small import bug fix

Co-authored-by: gsheynkman <[email protected]>

* Update README.md (#65)

* Update README.md

* Update README.md

spelling fixes

* Update README.md

* ORF Filtering bug fixes and RefineDB (#71)

* argparse modification

* small import bug fix

* fixed bugs in orf_filter. module for refining db

* refine orf working

Co-authored-by: gsheynkman <[email protected]>

* Adds Dockerfile, environment.yml for SQANTI3 (#72)

* Adds Dockerfile, environment.yml for SQANTI3

* Improves container files

Co-authored-by: EC2 Default User <[email protected]>

* Updated peptide_analysis script for review and added required files/tables

* Update peptide_analysis.py

* Updated .gitignore with a local data file

* Updated peptide_analysis.py to include new path info

* Delete gene_based_info.tsv

* Delete trans_to_gene.tsv

* Updated peptide analysis file  (#66)

* Adds author name in README.md

* Adds author name in README.md

* Deletes temp file

* Adds author name in README.md

* Modified README.md File in LR_TranscriptomeSummary

* Add files via upload

This adds the genomic data compilation and comparison jupyter notebook script and adds several custom module dependencies.

* Update README.md

* Updated version of previous files with less typos

* Delete Transcriptomic_Proteomic_Comparison.ipynb

* Delete m_MMprocess.py

* Delete m_gen_maps.py

* Delete m_make_gene_length_table.py

* Delete m_sqantitable.py

* Delete m_squantitable.py

* Updated version with less typos

* Update README.md

* Preliminary module for analyzing peptide space

* Add files via upload

Update of peptide analysis jupyter notebook script

* Convert jupyter notebook into python

* Updated peptide_analysis script for review and added required files/tables

* Update peptide_analysis.py

* Updated .gitignore with a local data file

* Updated peptide_analysis.py to include new path info

* Delete gene_based_info.tsv

* Delete trans_to_gene.tsv

* 6frm readme (#45)

* Add initial code to extract and cluster pacbio protein sequences, based on input from LR_ORFCalling

* Started code for protein group mapping

* add toy tables for the protein inference mapping

* edited 6frm translate readme

* delete mock files for protein inference (protein group) comparisons. Rachel and Kyndalanne have continued to work on this and these may be outdated.

Co-authored-by: Robert Millikin <[email protected]>
Co-authored-by: Gloria Sheynkman <[email protected]>

* Protein Inference (#74)

* Separate module for greedy protein inference

* protein_inference bug fix

* added rescue to greedy algorithm

* connected peptides changed to set

* small bug fix. cleaned up notebook

* Removed unnecessary files from Transcriptome Module

* Removed unnecessary files from Transcriptome module

* Removed unnecessary files from Transcriptome module

* Removed unnecessary files from Transcriptome module

* Removed unnecessary files from Transcriptome module  (#75)

* Adds author name in README.md

* Adds author name in README.md

* Deletes temp file

* Adds author name in README.md

* Modified README.md File in LR_TranscriptomeSummary

* Add files via upload

This adds the genomic data compilation and comparison jupyter notebook script and adds several custom module dependencies.

* Update README.md

* Updated version of previous files with less typos

* Delete Transcriptomic_Proteomic_Comparison.ipynb

* Delete m_MMprocess.py

* Delete m_gen_maps.py

* Delete m_make_gene_length_table.py

* Delete m_sqantitable.py

* Delete m_squantitable.py

* Updated version with less typos

* Update README.md

* Preliminary module for analyzing peptide space

* Add files via upload

Update of peptide analysis jupyter notebook script

* Convert jupyter notebook into python

* Updated peptide_analysis script for review and added required files/tables

* Update peptide_analysis.py

* Updated .gitignore with a local data file

* Updated peptide_analysis.py to include new path info

* Delete gene_based_info.tsv

* Delete trans_to_gene.tsv

* Removed unnecessary files from Transcriptome Module

* Removed unnecessary files from Transcriptome module

* Removed unnecessary files from Transcriptome module

* Removed unnecessary files from Transcriptome module

* Simi and Gloria - Update referencetable, transcriptome, and peptide modules (#78)

* Files in progress to create three modules: ReferenceTables, TranscriptomeAnalysis, PeptideAnalysis. Also, debugged orf_calling.py, found that minus strand ORFs not included.

* Prepared a script that makes reference tables

* Updated Transcriptomic Script

* Updated Transcriptomic Script (#77)

Co-authored-by: kyuubi430 <[email protected]>

* Remove files for making three modules with simi.

* Cleaned up referencetable module, Simi to edit.

* Modified Reference Tables Script

* Deleted plots.

* Simi and Gloria finalized the prepare_reference_tables. Works on commandline. Correct outputs to results/PG_ReferenceTables.

* Small edits to peptide_analysis, not done, push to Simi.

* Modified the names out output files from Prepare Reference Tabe script

* Changed file names in reference tables script and modified the transcriptome summary

* Delete unneeded files in transcriptome summary module.

* Finalized ReferenceTables. tested Transcriptome Summary. Started modifying the PeptideAnalysis.

* Made the transcriptome summary script command line executable

* Made the peptide analysis script command line runnable

* In process of modifying MMprocessing script

* Move scripts between TranscriptomeSummary and PeptideAnalysis modules. Code related to MM peptide/protein processing will now be exclusively in PeptideAnalysis.

* Added fasta/tsv and the results directory to gitignore

* Delete jurkat_orf_refined.fasta

Don't want to include *fasta in pull request.

* Delete genes_in_refined.tsv

Don't want to include *tsv output file in PR.
Added *tsv to gitignore, so shouldn't upload in future PR.

Co-authored-by: kyuubi430 <[email protected]>

* Adds modules nextflow pseudocode (#79)

* Adds nf-core template for nextflow pips

* Cleans up template main.nf and adds swag cli message

* Updates nextflow.config

* Adds Dockerfile and env yaml updates

* Removes redundant files from assets

* Deleted nf schema json

* Removes redundant configs

* Updates README with template structure

* Updates docs/

* Updates repo name in changelog

* Updates template test.config

* Adds bin folder and template wrapper R script

* Adds pbccs in env.yml

* Changes the location of pipeline info, logs

* Adds .github folder

* Removes redendant files from GH actions

* Updates CONTRIBUTING.md

* Updates ISSUE_TEMPLATE

* Update PULL_REQUEST_TEMPLATE.md

* Removes AWS tests

* Adds misspelling test

* Removes linting.yml

* Corrects typo

* Removes igenomes config

* Fixes typos caught by review-dog

* Adds tentative LICENSE

* Adds environment.yml with pandas, numpy, biopython

* Adds CCS process

* Adds pbbam (required for ccs --chunk subsequent routine)

* Adds pbindex, ccs processes (w/ parallel --chunks)

* Removes redundant bai (pbi is needed)

* Adds temp process mock ccs and flag for testing

* Deletes commented out section

To respect the rule, "we do not choose to modify cod ebehaviour by commenting in and out code chunks",

* Makes the section note more informative

* Dev rmmiller protein inference (#83)

* Adds Rachel Miller to the author names in the README

* custom script for the comparison of protein group output from MetaMorpheus searches using different protein database reference models

* Make protein inference analysis script command line executable

* spelling fixes

* Update PI_proteinInferenceAnalysis.py

fix merge conflicts

* Adds nextflow referencetable, notes to SQANTI module. (#88)

* Initiated files for nextflowifying reference-table module.

* Add nextflow code for reference table module. Successfully run on Lifebit/CloudOS.

* Rename transcript script. Add notes on requirements and commands to run SQANTI.

* Adds nextflow refined db (#80)

* Moves refine_orfs.py

* Restores tsv vs csv in refine_orfs.py

* Adds Nextflow files for refined db generation

* Refactor Dockerfile to eliminate duplication of env name

* Updates README.md in the modules/LR_ORFCalling subdirectory and lr_orfcalling.nf and lr_orfcalling_nextflow.config (#63)

* adding cpat and transdecoder containers

* adding the empty data directory with README.md explaining why it is empty on github

* github actions  caught my spelling error

* left out  in front of the conda commands for both these containers

* added debugged containers

* moved test.config to conf/executor/test.config

* fixed syntax error executor -> executors

* created local lr_orfcalling.nf and _nextflow.config to aid in local testing and debugging before final merge into pipeline

* adding main.nf and nextflow.config

* added to README.md instructions for executing nextflow within a jupyter notebook for debugging

* Update README.md

* Update README.md

* Updates ORF calling Nextflow files (#73)

* Updates ORF calling Nextflow snippet

* Deletes redundant global container definition

* Removes superfluous new line

* Clean up of comments

* Updates channel syntax to simplify to 1 line

* Adds TransDecoder in log.info summary

* Adds nextflow_run.sh with commands, expected exits

* Improves code readability, adds file exists check

Co-authored-by: EC2 Default User <[email protected]>

Co-authored-by: cgpu <[email protected]>
Co-authored-by: EC2 Default User <[email protected]>
Co-authored-by: bj8th <[email protected]>

* Clean up peptide analysis module.

* Remove README

* Add nextflow io (#100)

* added input output to readmes and modified to run via main

* refine database readme

* pull request changes made

* Clean up peptideanalysismodule (#99)

* Clean up peptide analysis module.

* Remove README

* Add module to make gencode protein database.

* Add module to make PacBio CDS.

* Bj8th readme (#102)

* added readme information to several modules. updated modules to run from command line

* added source modules

Co-authored-by: kyuubi430 <[email protected]>
Co-authored-by: kyuubi430 <[email protected]>
Co-authored-by: rob <[email protected]>
Co-authored-by: trishorts <[email protected]>
Co-authored-by: Michael Shortreed <[email protected]>
Co-authored-by: cgpu <[email protected]>
Co-authored-by: rmmiller22 <[email protected]>
Co-authored-by: Anne Deslattes Mays <[email protected]>
Co-authored-by: bj8th <[email protected]>
Co-authored-by: Gloria Sheynkman <[email protected]>
Co-authored-by: cgpu <[email protected]>
Co-authored-by: EC2 Default User <[email protected]>
bj8th added a commit that referenced this pull request Dec 14, 2020
* Adds author name in README.md

* Adds author name in README.md

* Deletes temp file

* Adds author name in README.md

* Adds Author Name in README (#15)

* Adds author name in README.md

* Adds author name in README.md

* Deletes temp file

* Adds author name in README.md

* add long read info basics

* gui app manifest gitignore fix

* Adds @trishorts name in README.md (#18)

* add author name to readme.md

* add one line to refresh commit

* add author name

Co-authored-by: Michael Shortreed <[email protected]>
Co-authored-by: cgpu <[email protected]>

* Adds @gsheynkman name to README.md (#16)

* Added authorname in README

Co-authored-by: cgpu <[email protected]>

* Adds Rachel Miller to the author names in the README (#14)

* Adds Rachel Miller to the author names in the README

* Minor typo

Co-authored-by: cgpu <[email protected]>

* added author and ORCID (#12)

Co-authored-by: cgpu <[email protected]>

* Refined database orig code (#29)

* Add initial code to extract and cluster pacbio protein sequences, based on input from LR_ORFCalling

* aggregation of FL and CPM by cluster

Co-authored-by: Robert Millikin <[email protected]>
Co-authored-by: Gloria Sheynkman <[email protected]>

* Bj8th orf calling (#25)

* Add author name in README.md

* orf calling updted to run from command line

Co-authored-by: gsheynkman <[email protected]>

* Adds nextflow files and folders based on nf-core template (#26)

* Adds nf-core template for nextflow pips

* Cleans up template main.nf and adds swag cli message

* Updates nextflow.config

* Adds Dockerfile and env yaml updates

* Removes redundant files from assets

* Deleted nf schema json

* Removes redundant configs

* Updates README with template structure

* Updates docs/

* Updates repo name in changelog

* Updates template test.config

* Adds bin folder and template wrapper R script

* Adds pbccs in env.yml

* Changes the location of pipeline info, logs

* Adds .github folder

* Removes redundant files from GH actions

* Removes AWS tests

* Adds misspelling test

* Removes linting.yml

* Removes igenomes config

* Adds tentative LICENSE (MIT)

* Adds nudge for asking help via GH issues

* added pull scripts from the zenodo site

* weighted protein inference

* fixes

* remove script to hopefully avoid merge conflict..

* update mzLib

* require equal long read weight for indistinguishable proteins

* add contrib

* Modified README.md File in LR_TranscriptomeSummary

* Add files via upload

This adds the genomic data compilation and comparison jupyter notebook script and adds several custom module dependencies.

* update main readme with author names (#38)

* update README contributions

* new readme

* fix minor errors in readme (#40)

* update README contributions

* new readme

* fix readme errors

* excel compatible tsv by default

* accept thermo license by default

* Updated READMEs, uploaded scripts for isoform mapping, protein clustering. (#41)

* Updated various readmes. Uploaded scripts for isoform mappings, protein clustering, etc.

* Add scripts that run blast and parse blast results to find at-length matches between isoforms. Need to combine with Anne's pipeline.

* add scripts for blast mapping between protein databases

* delete files that shouldn't be part of the previous commit, old scripts accidentally put in DatabaseAnnotation

* updated contributions (#44)

* Attempt to create a container for blast mapping process within nextflow. (#48)

* Updated various readmes. Uploaded scripts for isoform mappings, protein clustering, etc.

* Add scripts that run blast and parse blast results to find at-length matches between isoforms. Need to combine with Anne's pipeline.

* add scripts for blast mapping between protein databases

* Attempt to create nextflow container to do blast searching for isoform accession mapping

* correct spelling error

* adding cpat and transdecoder containers (#36)

* added pull scripts from the zenodo site

* adding cpat and transdecoder containers

* adding the empty data directory with README.md explaining why it is empty on github

* github actions  caught my spelling error

* left out  in front of the conda commands for both these containers

* added debugged containers

* moved test.config to conf/executor/test.config

* fixed syntax error executor -> executors

* Update README.md

* Updated version of previous files with less typos

* Delete Transcriptomic_Proteomic_Comparison.ipynb

* Delete m_MMprocess.py

* Delete m_gen_maps.py

* Delete m_make_gene_length_table.py

* Delete m_sqantitable.py

* Delete m_squantitable.py

* Updated version with less typos

* Update README.md

* Preliminary module for analyzing peptide space

* Added commands to run CPAT. 'output' directory to .gitignore. (#53)

* Updated various readmes. Uploaded scripts for isoform mappings, protein clustering, etc.

* Add scripts that run blast and parse blast results to find at-length matches between isoforms. Need to combine with Anne's pipeline.

* add scripts for blast mapping between protein databases

* Attempt to create nextflow container to do blast searching for isoform accession mapping

* correct spelling error

* Made amendments to orf_calling.py.

* CPAT commands. Added output directory to gitignore. Outupt directly to hold intermediate datafiles.

* Add nextflow-related files, sundries. (#55)

* Updated various readmes. Uploaded scripts for isoform mappings, protein clustering, etc.

* Add scripts that run blast and parse blast results to find at-length matches between isoforms. Need to combine with Anne's pipeline.

* add scripts for blast mapping between protein databases

* Attempt to create nextflow container to do blast searching for isoform accession mapping

* correct spelling error

* Made amendments to orf_calling.py.

* CPAT commands. Added output directory to gitignore. Outupt directly to hold intermediate datafiles.

* Adding Simi's transcriptomic and peptide analysis scripts. (#56)

* Updated various readmes. Uploaded scripts for isoform mappings, protein clustering, etc.

* Add scripts that run blast and parse blast results to find at-length matches between isoforms. Need to combine with Anne's pipeline.

* add scripts for blast mapping between protein databases

* Attempt to create nextflow container to do blast searching for isoform accession mapping

* correct spelling error

* Made amendments to orf_calling.py.

* CPAT commands. Added output directory to gitignore. Outupt directly to hold intermediate datafiles.

* Gloria adding Simi's transcriptome analysis and peptide analysis scripts.

* argparse modification (#57)

* adds lr_orfcalling.nf and lr_orfcalling_nextflow.config in the module LR_ORFCalling (#59)

* adding cpat and transdecoder containers

* adding the empty data directory with README.md explaining why it is empty on github

* github actions  caught my spelling error

* left out  in front of the conda commands for both these containers

* added debugged containers

* moved test.config to conf/executor/test.config

* fixed syntax error executor -> executors

* created local lr_orfcalling.nf and _nextflow.config to aid in local testing and debugging before final merge into pipeline

* adding main.nf and nextflow.config

* Protein Inference Analysis Module Custom Script (#64)

* Adds Rachel Miller to the author names in the README

* custom script for the comparison of protein group output from MetaMorpheus searches using different protein database reference models

* Add files via upload

Update of peptide analysis jupyter notebook script

* Need to merge with latest dev (disregarding differences in dev_gloria, which are outdated) (#67)

* Updated various readmes. Uploaded scripts for isoform mappings, protein clustering, etc.

* Add scripts that run blast and parse blast results to find at-length matches between isoforms. Need to combine with Anne's pipeline.

* add scripts for blast mapping between protein databases

* Attempt to create nextflow container to do blast searching for isoform accession mapping

* correct spelling error

* Made amendments to orf_calling.py.

* CPAT commands. Added output directory to gitignore. Outupt directly to hold intermediate datafiles.

* Gloria adding Simi's transcriptome analysis and peptide analysis scripts.

* Update orf_calling.py

Removed a line to match spacing with the py file in dev.

* Convert jupyter notebook into python

* ORF Calling - bug fix (#70)

* argparse modification

* small import bug fix

Co-authored-by: gsheynkman <[email protected]>

* Update README.md (#65)

* Update README.md

* Update README.md

spelling fixes

* Update README.md

* ORF Filtering bug fixes and RefineDB (#71)

* argparse modification

* small import bug fix

* fixed bugs in orf_filter. module for refining db

* refine orf working

Co-authored-by: gsheynkman <[email protected]>

* Adds Dockerfile, environment.yml for SQANTI3 (#72)

* Adds Dockerfile, environment.yml for SQANTI3

* Improves container files

Co-authored-by: EC2 Default User <[email protected]>

* Updated peptide_analysis script for review and added required files/tables

* Update peptide_analysis.py

* Updated .gitignore with a local data file

* Updated peptide_analysis.py to include new path info

* Delete gene_based_info.tsv

* Delete trans_to_gene.tsv

* Updated peptide analysis file  (#66)

* Adds author name in README.md

* Adds author name in README.md

* Deletes temp file

* Adds author name in README.md

* Modified README.md File in LR_TranscriptomeSummary

* Add files via upload

This adds the genomic data compilation and comparison jupyter notebook script and adds several custom module dependencies.

* Update README.md

* Updated version of previous files with less typos

* Delete Transcriptomic_Proteomic_Comparison.ipynb

* Delete m_MMprocess.py

* Delete m_gen_maps.py

* Delete m_make_gene_length_table.py

* Delete m_sqantitable.py

* Delete m_squantitable.py

* Updated version with less typos

* Update README.md

* Preliminary module for analyzing peptide space

* Add files via upload

Update of peptide analysis jupyter notebook script

* Convert jupyter notebook into python

* Updated peptide_analysis script for review and added required files/tables

* Update peptide_analysis.py

* Updated .gitignore with a local data file

* Updated peptide_analysis.py to include new path info

* Delete gene_based_info.tsv

* Delete trans_to_gene.tsv

* 6frm readme (#45)

* Add initial code to extract and cluster pacbio protein sequences, based on input from LR_ORFCalling

* Started code for protein group mapping

* add toy tables for the protein inference mapping

* edited 6frm translate readme

* delete mock files for protein inference (protein group) comparisons. Rachel and Kyndalanne have continued to work on this and these may be outdated.

Co-authored-by: Robert Millikin <[email protected]>
Co-authored-by: Gloria Sheynkman <[email protected]>

* Protein Inference (#74)

* Separate module for greedy protein inference

* protein_inference bug fix

* added rescue to greedy algorithm

* connected peptides changed to set

* small bug fix. cleaned up notebook

* Removed unnecessary files from Transcriptome Module

* Removed unnecessary files from Transcriptome module

* Removed unnecessary files from Transcriptome module

* Removed unnecessary files from Transcriptome module

* Removed unnecessary files from Transcriptome module  (#75)

* Adds author name in README.md

* Adds author name in README.md

* Deletes temp file

* Adds author name in README.md

* Modified README.md File in LR_TranscriptomeSummary

* Add files via upload

This adds the genomic data compilation and comparison jupyter notebook script and adds several custom module dependencies.

* Update README.md

* Updated version of previous files with less typos

* Delete Transcriptomic_Proteomic_Comparison.ipynb

* Delete m_MMprocess.py

* Delete m_gen_maps.py

* Delete m_make_gene_length_table.py

* Delete m_sqantitable.py

* Delete m_squantitable.py

* Updated version with less typos

* Update README.md

* Preliminary module for analyzing peptide space

* Add files via upload

Update of peptide analysis jupyter notebook script

* Convert jupyter notebook into python

* Updated peptide_analysis script for review and added required files/tables

* Update peptide_analysis.py

* Updated .gitignore with a local data file

* Updated peptide_analysis.py to include new path info

* Delete gene_based_info.tsv

* Delete trans_to_gene.tsv

* Removed unnecessary files from Transcriptome Module

* Removed unnecessary files from Transcriptome module

* Removed unnecessary files from Transcriptome module

* Removed unnecessary files from Transcriptome module

* Simi and Gloria - Update referencetable, transcriptome, and peptide modules (#78)

* Files in progress to create three modules: ReferenceTables, TranscriptomeAnalysis, PeptideAnalysis. Also, debugged orf_calling.py, found that minus strand ORFs not included.

* Prepared a script that makes reference tables

* Updated Transcriptomic Script

* Updated Transcriptomic Script (#77)

Co-authored-by: kyuubi430 <[email protected]>

* Remove files for making three modules with simi.

* Cleaned up referencetable module, Simi to edit.

* Modified Reference Tables Script

* Deleted plots.

* Simi and Gloria finalized the prepare_reference_tables. Works on commandline. Correct outputs to results/PG_ReferenceTables.

* Small edits to peptide_analysis, not done, push to Simi.

* Modified the names out output files from Prepare Reference Tabe script

* Changed file names in reference tables script and modified the transcriptome summary

* Delete unneeded files in transcriptome summary module.

* Finalized ReferenceTables. tested Transcriptome Summary. Started modifying the PeptideAnalysis.

* Made the transcriptome summary script command line executable

* Made the peptide analysis script command line runnable

* In process of modifying MMprocessing script

* Move scripts between TranscriptomeSummary and PeptideAnalysis modules. Code related to MM peptide/protein processing will now be exclusively in PeptideAnalysis.

* Added fasta/tsv and the results directory to gitignore

* Delete jurkat_orf_refined.fasta

Don't want to include *fasta in pull request.

* Delete genes_in_refined.tsv

Don't want to include *tsv output file in PR.
Added *tsv to gitignore, so shouldn't upload in future PR.

Co-authored-by: kyuubi430 <[email protected]>

* Adds modules nextflow pseudocode (#79)

* Adds nf-core template for nextflow pips

* Cleans up template main.nf and adds swag cli message

* Updates nextflow.config

* Adds Dockerfile and env yaml updates

* Removes redundant files from assets

* Deleted nf schema json

* Removes redundant configs

* Updates README with template structure

* Updates docs/

* Updates repo name in changelog

* Updates template test.config

* Adds bin folder and template wrapper R script

* Adds pbccs in env.yml

* Changes the location of pipeline info, logs

* Adds .github folder

* Removes redendant files from GH actions

* Updates CONTRIBUTING.md

* Updates ISSUE_TEMPLATE

* Update PULL_REQUEST_TEMPLATE.md

* Removes AWS tests

* Adds misspelling test

* Removes linting.yml

* Corrects typo

* Removes igenomes config

* Fixes typos caught by review-dog

* Adds tentative LICENSE

* Adds environment.yml with pandas, numpy, biopython

* Adds CCS process

* Adds pbbam (required for ccs --chunk subsequent routine)

* Adds pbindex, ccs processes (w/ parallel --chunks)

* Removes redundant bai (pbi is needed)

* Adds temp process mock ccs and flag for testing

* Deletes commented out section

To respect the rule, "we do not choose to modify cod ebehaviour by commenting in and out code chunks",

* Makes the section note more informative

* Dev rmmiller protein inference (#83)

* Adds Rachel Miller to the author names in the README

* custom script for the comparison of protein group output from MetaMorpheus searches using different protein database reference models

* Make protein inference analysis script command line executable

* spelling fixes

* Update PI_proteinInferenceAnalysis.py

fix merge conflicts

* rescue algorithm implemented

* eliminate added conflict files

* config file remove

* spelling fix

Co-authored-by: kyuubi430 <[email protected]>
Co-authored-by: kyuubi430 <[email protected]>
Co-authored-by: rob <[email protected]>
Co-authored-by: trishorts <[email protected]>
Co-authored-by: Michael Shortreed <[email protected]>
Co-authored-by: cgpu <[email protected]>
Co-authored-by: gsheynkman <[email protected]>
Co-authored-by: Anne Deslattes Mays <[email protected]>
Co-authored-by: bj8th <[email protected]>
Co-authored-by: Gloria Sheynkman <[email protected]>
Co-authored-by: cgpu <[email protected]>
Co-authored-by: EC2 Default User <[email protected]>
bj8th added a commit that referenced this pull request Dec 28, 2020
* Adds author name in README.md

* Adds author name in README.md

* Deletes temp file

* Adds author name in README.md

* Adds Author Name in README (#15)

* Adds author name in README.md

* Adds author name in README.md

* Deletes temp file

* Adds author name in README.md

* add long read info basics

* gui app manifest gitignore fix

* Adds @trishorts name in README.md (#18)

* add author name to readme.md

* add one line to refresh commit

* add author name

Co-authored-by: Michael Shortreed <[email protected]>
Co-authored-by: cgpu <[email protected]>

* Adds @gsheynkman name to README.md (#16)

* Added authorname in README

Co-authored-by: cgpu <[email protected]>

* Adds Rachel Miller to the author names in the README (#14)

* Adds Rachel Miller to the author names in the README

* Minor typo

Co-authored-by: cgpu <[email protected]>

* added author and ORCID (#12)

Co-authored-by: cgpu <[email protected]>

* Refined database orig code (#29)

* Add initial code to extract and cluster pacbio protein sequences, based on input from LR_ORFCalling

* aggregation of FL and CPM by cluster

Co-authored-by: Robert Millikin <[email protected]>
Co-authored-by: Gloria Sheynkman <[email protected]>

* Bj8th orf calling (#25)

* Add author name in README.md

* orf calling updted to run from command line

Co-authored-by: gsheynkman <[email protected]>

* Adds nextflow files and folders based on nf-core template (#26)

* Adds nf-core template for nextflow pips

* Cleans up template main.nf and adds swag cli message

* Updates nextflow.config

* Adds Dockerfile and env yaml updates

* Removes redundant files from assets

* Deleted nf schema json

* Removes redundant configs

* Updates README with template structure

* Updates docs/

* Updates repo name in changelog

* Updates template test.config

* Adds bin folder and template wrapper R script

* Adds pbccs in env.yml

* Changes the location of pipeline info, logs

* Adds .github folder

* Removes redundant files from GH actions

* Removes AWS tests

* Adds misspelling test

* Removes linting.yml

* Removes igenomes config

* Adds tentative LICENSE (MIT)

* Adds nudge for asking help via GH issues

* added pull scripts from the zenodo site

* weighted protein inference

* fixes

* remove script to hopefully avoid merge conflict..

* update mzLib

* require equal long read weight for indistinguishable proteins

* add contrib

* Modified README.md File in LR_TranscriptomeSummary

* Add files via upload

This adds the genomic data compilation and comparison jupyter notebook script and adds several custom module dependencies.

* update main readme with author names (#38)

* update README contributions

* new readme

* fix minor errors in readme (#40)

* update README contributions

* new readme

* fix readme errors

* excel compatible tsv by default

* accept thermo license by default

* Updated READMEs, uploaded scripts for isoform mapping, protein clustering. (#41)

* Updated various readmes. Uploaded scripts for isoform mappings, protein clustering, etc.

* Add scripts that run blast and parse blast results to find at-length matches between isoforms. Need to combine with Anne's pipeline.

* add scripts for blast mapping between protein databases

* delete files that shouldn't be part of the previous commit, old scripts accidentally put in DatabaseAnnotation

* updated contributions (#44)

* Attempt to create a container for blast mapping process within nextflow. (#48)

* Updated various readmes. Uploaded scripts for isoform mappings, protein clustering, etc.

* Add scripts that run blast and parse blast results to find at-length matches between isoforms. Need to combine with Anne's pipeline.

* add scripts for blast mapping between protein databases

* Attempt to create nextflow container to do blast searching for isoform accession mapping

* correct spelling error

* adding cpat and transdecoder containers (#36)

* added pull scripts from the zenodo site

* adding cpat and transdecoder containers

* adding the empty data directory with README.md explaining why it is empty on github

* github actions  caught my spelling error

* left out  in front of the conda commands for both these containers

* added debugged containers

* moved test.config to conf/executor/test.config

* fixed syntax error executor -> executors

* Update README.md

* Updated version of previous files with less typos

* Delete Transcriptomic_Proteomic_Comparison.ipynb

* Delete m_MMprocess.py

* Delete m_gen_maps.py

* Delete m_make_gene_length_table.py

* Delete m_sqantitable.py

* Delete m_squantitable.py

* Updated version with less typos

* Update README.md

* Preliminary module for analyzing peptide space

* Added commands to run CPAT. 'output' directory to .gitignore. (#53)

* Updated various readmes. Uploaded scripts for isoform mappings, protein clustering, etc.

* Add scripts that run blast and parse blast results to find at-length matches between isoforms. Need to combine with Anne's pipeline.

* add scripts for blast mapping between protein databases

* Attempt to create nextflow container to do blast searching for isoform accession mapping

* correct spelling error

* Made amendments to orf_calling.py.

* CPAT commands. Added output directory to gitignore. Outupt directly to hold intermediate datafiles.

* Add nextflow-related files, sundries. (#55)

* Updated various readmes. Uploaded scripts for isoform mappings, protein clustering, etc.

* Add scripts that run blast and parse blast results to find at-length matches between isoforms. Need to combine with Anne's pipeline.

* add scripts for blast mapping between protein databases

* Attempt to create nextflow container to do blast searching for isoform accession mapping

* correct spelling error

* Made amendments to orf_calling.py.

* CPAT commands. Added output directory to gitignore. Outupt directly to hold intermediate datafiles.

* Adding Simi's transcriptomic and peptide analysis scripts. (#56)

* Updated various readmes. Uploaded scripts for isoform mappings, protein clustering, etc.

* Add scripts that run blast and parse blast results to find at-length matches between isoforms. Need to combine with Anne's pipeline.

* add scripts for blast mapping between protein databases

* Attempt to create nextflow container to do blast searching for isoform accession mapping

* correct spelling error

* Made amendments to orf_calling.py.

* CPAT commands. Added output directory to gitignore. Outupt directly to hold intermediate datafiles.

* Gloria adding Simi's transcriptome analysis and peptide analysis scripts.

* argparse modification (#57)

* adds lr_orfcalling.nf and lr_orfcalling_nextflow.config in the module LR_ORFCalling (#59)

* adding cpat and transdecoder containers

* adding the empty data directory with README.md explaining why it is empty on github

* github actions  caught my spelling error

* left out  in front of the conda commands for both these containers

* added debugged containers

* moved test.config to conf/executor/test.config

* fixed syntax error executor -> executors

* created local lr_orfcalling.nf and _nextflow.config to aid in local testing and debugging before final merge into pipeline

* adding main.nf and nextflow.config

* Protein Inference Analysis Module Custom Script (#64)

* Adds Rachel Miller to the author names in the README

* custom script for the comparison of protein group output from MetaMorpheus searches using different protein database reference models

* Add files via upload

Update of peptide analysis jupyter notebook script

* Need to merge with latest dev (disregarding differences in dev_gloria, which are outdated) (#67)

* Updated various readmes. Uploaded scripts for isoform mappings, protein clustering, etc.

* Add scripts that run blast and parse blast results to find at-length matches between isoforms. Need to combine with Anne's pipeline.

* add scripts for blast mapping between protein databases

* Attempt to create nextflow container to do blast searching for isoform accession mapping

* correct spelling error

* Made amendments to orf_calling.py.

* CPAT commands. Added output directory to gitignore. Outupt directly to hold intermediate datafiles.

* Gloria adding Simi's transcriptome analysis and peptide analysis scripts.

* Update orf_calling.py

Removed a line to match spacing with the py file in dev.

* Convert jupyter notebook into python

* ORF Calling - bug fix (#70)

* argparse modification

* small import bug fix

Co-authored-by: gsheynkman <[email protected]>

* Update README.md (#65)

* Update README.md

* Update README.md

spelling fixes

* Update README.md

* ORF Filtering bug fixes and RefineDB (#71)

* argparse modification

* small import bug fix

* fixed bugs in orf_filter. module for refining db

* refine orf working

Co-authored-by: gsheynkman <[email protected]>

* Adds Dockerfile, environment.yml for SQANTI3 (#72)

* Adds Dockerfile, environment.yml for SQANTI3

* Improves container files

Co-authored-by: EC2 Default User <[email protected]>

* Updated peptide_analysis script for review and added required files/tables

* Update peptide_analysis.py

* Updated .gitignore with a local data file

* Updated peptide_analysis.py to include new path info

* Delete gene_based_info.tsv

* Delete trans_to_gene.tsv

* Updated peptide analysis file  (#66)

* Adds author name in README.md

* Adds author name in README.md

* Deletes temp file

* Adds author name in README.md

* Modified README.md File in LR_TranscriptomeSummary

* Add files via upload

This adds the genomic data compilation and comparison jupyter notebook script and adds several custom module dependencies.

* Update README.md

* Updated version of previous files with less typos

* Delete Transcriptomic_Proteomic_Comparison.ipynb

* Delete m_MMprocess.py

* Delete m_gen_maps.py

* Delete m_make_gene_length_table.py

* Delete m_sqantitable.py

* Delete m_squantitable.py

* Updated version with less typos

* Update README.md

* Preliminary module for analyzing peptide space

* Add files via upload

Update of peptide analysis jupyter notebook script

* Convert jupyter notebook into python

* Updated peptide_analysis script for review and added required files/tables

* Update peptide_analysis.py

* Updated .gitignore with a local data file

* Updated peptide_analysis.py to include new path info

* Delete gene_based_info.tsv

* Delete trans_to_gene.tsv

* 6frm readme (#45)

* Add initial code to extract and cluster pacbio protein sequences, based on input from LR_ORFCalling

* Started code for protein group mapping

* add toy tables for the protein inference mapping

* edited 6frm translate readme

* delete mock files for protein inference (protein group) comparisons. Rachel and Kyndalanne have continued to work on this and these may be outdated.

Co-authored-by: Robert Millikin <[email protected]>
Co-authored-by: Gloria Sheynkman <[email protected]>

* Protein Inference (#74)

* Separate module for greedy protein inference

* protein_inference bug fix

* added rescue to greedy algorithm

* connected peptides changed to set

* small bug fix. cleaned up notebook

* Removed unnecessary files from Transcriptome Module

* Removed unnecessary files from Transcriptome module

* Removed unnecessary files from Transcriptome module

* Removed unnecessary files from Transcriptome module

* Removed unnecessary files from Transcriptome module  (#75)

* Adds author name in README.md

* Adds author name in README.md

* Deletes temp file

* Adds author name in README.md

* Modified README.md File in LR_TranscriptomeSummary

* Add files via upload

This adds the genomic data compilation and comparison jupyter notebook script and adds several custom module dependencies.

* Update README.md

* Updated version of previous files with less typos

* Delete Transcriptomic_Proteomic_Comparison.ipynb

* Delete m_MMprocess.py

* Delete m_gen_maps.py

* Delete m_make_gene_length_table.py

* Delete m_sqantitable.py

* Delete m_squantitable.py

* Updated version with less typos

* Update README.md

* Preliminary module for analyzing peptide space

* Add files via upload

Update of peptide analysis jupyter notebook script

* Convert jupyter notebook into python

* Updated peptide_analysis script for review and added required files/tables

* Update peptide_analysis.py

* Updated .gitignore with a local data file

* Updated peptide_analysis.py to include new path info

* Delete gene_based_info.tsv

* Delete trans_to_gene.tsv

* Removed unnecessary files from Transcriptome Module

* Removed unnecessary files from Transcriptome module

* Removed unnecessary files from Transcriptome module

* Removed unnecessary files from Transcriptome module

* Simi and Gloria - Update referencetable, transcriptome, and peptide modules (#78)

* Files in progress to create three modules: ReferenceTables, TranscriptomeAnalysis, PeptideAnalysis. Also, debugged orf_calling.py, found that minus strand ORFs not included.

* Prepared a script that makes reference tables

* Updated Transcriptomic Script

* Updated Transcriptomic Script (#77)

Co-authored-by: kyuubi430 <[email protected]>

* Remove files for making three modules with simi.

* Cleaned up referencetable module, Simi to edit.

* Modified Reference Tables Script

* Deleted plots.

* Simi and Gloria finalized the prepare_reference_tables. Works on commandline. Correct outputs to results/PG_ReferenceTables.

* Small edits to peptide_analysis, not done, push to Simi.

* Modified the names out output files from Prepare Reference Tabe script

* Changed file names in reference tables script and modified the transcriptome summary

* Delete unneeded files in transcriptome summary module.

* Finalized ReferenceTables. tested Transcriptome Summary. Started modifying the PeptideAnalysis.

* Made the transcriptome summary script command line executable

* Made the peptide analysis script command line runnable

* In process of modifying MMprocessing script

* Move scripts between TranscriptomeSummary and PeptideAnalysis modules. Code related to MM peptide/protein processing will now be exclusively in PeptideAnalysis.

* Added fasta/tsv and the results directory to gitignore

* Delete jurkat_orf_refined.fasta

Don't want to include *fasta in pull request.

* Delete genes_in_refined.tsv

Don't want to include *tsv output file in PR.
Added *tsv to gitignore, so shouldn't upload in future PR.

Co-authored-by: kyuubi430 <[email protected]>

* Adds modules nextflow pseudocode (#79)

* Adds nf-core template for nextflow pips

* Cleans up template main.nf and adds swag cli message

* Updates nextflow.config

* Adds Dockerfile and env yaml updates

* Removes redundant files from assets

* Deleted nf schema json

* Removes redundant configs

* Updates README with template structure

* Updates docs/

* Updates repo name in changelog

* Updates template test.config

* Adds bin folder and template wrapper R script

* Adds pbccs in env.yml

* Changes the location of pipeline info, logs

* Adds .github folder

* Removes redendant files from GH actions

* Updates CONTRIBUTING.md

* Updates ISSUE_TEMPLATE

* Update PULL_REQUEST_TEMPLATE.md

* Removes AWS tests

* Adds misspelling test

* Removes linting.yml

* Corrects typo

* Removes igenomes config

* Fixes typos caught by review-dog

* Adds tentative LICENSE

* Adds environment.yml with pandas, numpy, biopython

* Adds CCS process

* Adds pbbam (required for ccs --chunk subsequent routine)

* Adds pbindex, ccs processes (w/ parallel --chunks)

* Removes redundant bai (pbi is needed)

* Adds temp process mock ccs and flag for testing

* Deletes commented out section

To respect the rule, "we do not choose to modify cod ebehaviour by commenting in and out code chunks",

* Makes the section note more informative

* Dev rmmiller protein inference (#83)

* Adds Rachel Miller to the author names in the README

* custom script for the comparison of protein group output from MetaMorpheus searches using different protein database reference models

* Make protein inference analysis script command line executable

* spelling fixes

* Update PI_proteinInferenceAnalysis.py

fix merge conflicts

* rescue algorithm implemented

* eliminate added conflict files

* config file remove

* spelling fix

* Update mzLib version- includes database parsing changes

* Test Update

Co-authored-by: kyuubi430 <[email protected]>
Co-authored-by: kyuubi430 <[email protected]>
Co-authored-by: rob <[email protected]>
Co-authored-by: trishorts <[email protected]>
Co-authored-by: Michael Shortreed <[email protected]>
Co-authored-by: cgpu <[email protected]>
Co-authored-by: gsheynkman <[email protected]>
Co-authored-by: Anne Deslattes Mays <[email protected]>
Co-authored-by: bj8th <[email protected]>
Co-authored-by: Gloria Sheynkman <[email protected]>
Co-authored-by: cgpu <[email protected]>
Co-authored-by: EC2 Default User <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants