Skip to content

v1.3.0

Latest
Compare
Choose a tag to compare
@cody-mar10 cody-mar10 released this 23 Oct 16:21
· 70 commits to main since this release

Major changes for 1.3

  • Stabilized support for multi-scaffold genome detection and optional features in the graph-formatted data file.
  • References to internal data in the GenomeDataset object now are prepended with the biological feature level. For example, previously, the protein embeddings were stored in .data but are now in .protein_data. There are no changes to the batch objects, however.

Major changes for 1.2

  • The prediction output file has different fields other than data now since there is support for genome fragmentation (for large genomes) and multi-scaffold genomes. Thus, there will be up to 3 fields (fragment, scaffold, and genome) depending on the dataset that represent the protein-based embeddings of contiguous genomic segments and collections of scaffolds in a genome.
  • Genomic scaffolds in the GenomeDataset can be artificially fragmented. This has several purposes:
    1. Scaffolds encoding more proteins than a pretrained PST expected can be used for fine-tuning and inference.
    2. Reduces memory burden if a smaller fragment size is chosen