Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MRG: adjust protein ksize for record/manifest #3019

Merged
merged 3 commits into from
Feb 21, 2024

Conversation

bluegenes
Copy link
Contributor

@bluegenes bluegenes commented Feb 20, 2024

Protein k-mer sizes are k=k*3 internally, but k in a manifest.

Record is used within manifest, so should reflect k.

This should not impact selection at sig level. Insignature::Select:

 valid = if let Some(ksize) = selection.ksize() {
                let k = s.ksize() as u32;
                k == ksize || k == ksize * 3

So here we match exact ksize or k=k*3, regardless of ksize or molecule type. This is b/c we don't have access to is_protein or any other property unless we load the minhash.

we may want to be more explicit at some point, but this solves the immediate problem

Copy link

codecov bot commented Feb 20, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (484c7ea) 86.70% compared to head (45a7ad8) 86.71%.

Additional details and impacted files
@@           Coverage Diff           @@
##           latest    #3019   +/-   ##
=======================================
  Coverage   86.70%   86.71%           
=======================================
  Files         135      135           
  Lines       15395    15396    +1     
  Branches     2624     2624           
=======================================
+ Hits        13349    13350    +1     
  Misses       1745     1745           
  Partials      301      301           
Flag Coverage Δ
hypothesis-py 25.56% <ø> (ø)
python 92.85% <ø> (ø)
rust 59.88% <100.00%> (+0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@bluegenes bluegenes changed the title WIP: adjust protein ksizes MRG: adjust protein ksizes Feb 21, 2024
@bluegenes bluegenes changed the title MRG: adjust protein ksizes MRG: adjust protein ksize for record/manifest Feb 21, 2024
@bluegenes
Copy link
Contributor Author

ready for review @sourmash-bio/devs

@bluegenes bluegenes merged commit fa4ae0b into latest Feb 21, 2024
41 checks passed
@bluegenes bluegenes deleted the fix-prot-ksize-manifest branch February 21, 2024 19:51
ctb added a commit that referenced this pull request Feb 23, 2024
)

Updates `src/core/CHANGELOG.md` with release notes; see below.

## Release checklist

from
#2987 (comment):

- [x] verify minimum supported rust version (MSRV) for writing release
notes (see #2988 for an
example); MSRV is checked by CI in `.github/workflows/rust.yml`,
`minimum_rust_version`
- [x] write release notes using `git log --oneline r0.12.0..latest
src/core | cut -d\ -f2- > /tmp/out.txt`
- [x] verify that the ChangeLog is up to date:
https://github.com/sourmash-bio/sourmash/blob/latest/src/core/CHANGELOG.md
- [x] bump version in `src/core/Cargo.toml`

## Release notes for r0.13.0

MSRV: 1.65

Changes/additions:

* Calculate all gather stats in rust; use for rocksdb gather (#2943)
* adjust protein ksize for record/manifest (#3019)
* Allow changing storage location for a collection in RevIndex (#3015)
* make core Manifest booleans python compatible (core) (#3007)

Updates:
* Bump roaring from 0.10.2 to 0.10.3 (#3014)
* Bump histogram from 0.9.0 to 0.9.1 (#3002)
* Bump chrono from 0.4.33 to 0.4.34 (#3000)
* Bump web-sys from 0.3.67 to 0.3.68 (#2998)
* Bump num-iter from 0.1.43 to 0.1.44 (#2997)
* Bump wasm-bindgen-test from 0.3.40 to 0.3.41 (#2996)
bluegenes added a commit to sourmash-bio/sourmash_plugin_branchwater that referenced this pull request Feb 23, 2024
This PR updates manifest generation in `manysketch` to use `Record` and `Manifest` utils from core. This also makes the code significantly simpler, since we can generate manifest information directly from the signature, rather than saving sketch parameters alongside sketches. 

Required `0.13.0` core due to these fixes:
- sourmash-bio/sourmash#3007
- sourmash-bio/sourmash#3019

* Fixes #203
@ctb ctb mentioned this pull request Mar 21, 2024
ctb added a commit that referenced this pull request Mar 21, 2024
# Release notes for sourmash v4.8.7

Note: This release changes the way `sourmash multigather` names output
files in some situations. Please see
#2722 for details.

Minor new features:

* support proper manifest creation with `--relpath` for `sig check` and
`sig collect` (#3054)
* fix `multigather` output by adding md5sum along with
`-U/--output-add-query-md5sum` (#2722)
* enable loading lineages from annotated gather with match_name instead
of name (#3078)

Bug fixes:

* fix output for `sketch ... --singleton` (#3066)
* fix `calculate_gather_stats` `threshold=0` bug (#3052)

Cleanup and documentation updates:

* adjust protein ksize for record/manifest (#3019)
* Resolve `sourmash gather --help` issue (#3032)
* rework the manifest documentation; do misc cleanup (#3027)
* add branchwater web to docs (#3018)

Developer updates:

* make core Manifest booleans python compatible (core) (#3007)
* safer ksize selection while still accommodating k=k*3 (#3028)
* fix clippy beta issues (#3088)
* tell dependabot to ignore upgrades to `byteorder`, `chrono`,
`once_cell`, and `wasm-bindgen` (#3065)
* update rust changelog for r0.13.0 in preparation for release (#3033)
* Allow changing storage location for a collection in RevIndex (#3015)
* Fix tox and nix configs so all tox tests execute correctly (#2992)
* Calculate all gather stats in rust; use for rocksdb gather (#2943)
* bump screed req to 1.1.3 (#3067)
* bump to v4.8.7-dev (#2989)

Dependabot updates:

* Bump DeterminateSystems/magic-nix-cache-action from 1 to 3 (#2994)
* Bump DeterminateSystems/magic-nix-cache-action from 3 to 4 (#3085)
* Bump DeterminateSystems/nix-installer-action from 4 to 9 (#2995)
* Bump DeterminateSystems/nix-installer-action from 9 to 10 (#3083)
* Bump chrono from 0.4.33 to 0.4.34 (#3000)
* Bump conda-incubator/setup-miniconda from 3.0.1 to 3.0.2 (#3046)
* Bump conda-incubator/setup-miniconda from 3.0.2 to 3.0.3 (#3057)
* Bump histogram from 0.9.0 to 0.9.1 (#3002)
* Bump itertools from 0.12.0 to 0.12.1 (#3043)
* Bump log from 0.4.20 to 0.4.21 (#3062)
* Bump num-iter from 0.1.43 to 0.1.44 (#2997)
* Bump pypa/cibuildwheel from 2.16.5 to 2.17.0 (#3084)
* Bump rayon from 1.8.1 to 1.9.0 (#3058)
* Bump roaring from 0.10.2 to 0.10.3 (#3014)
* Bump serde from 1.0.196 to 1.0.197 (#3045)
* Bump serde_json from 1.0.113 to 1.0.114 (#3044)
* Bump tempfile from 3.10.0 to 3.10.1 (#3059)
* Bump thiserror from 1.0.56 to 1.0.57 (#2999)
* Bump thiserror from 1.0.57 to 1.0.58 (#3082)
* Bump wasm-bindgen from 0.2.91 to 0.2.92 (#3060)
* Bump wasm-bindgen-test from 0.3.40 to 0.3.41 (#2996)
* Bump wasm-bindgen-test from 0.3.41 to 0.3.42 (#3063)
* Bump web-sys from 0.3.67 to 0.3.68 (#2998)
* Bump web-sys from 0.3.68 to 0.3.69 (#3061)
* Revert "Bump wasm-bindgen from 0.2.91 to 0.2.92 (#3060)" (#3064)
* Update asv to 0.6.2 (#3025)
* Update pytest requirement from <8.1.0,>=6.2.4 to >=6.2.4,<8.2.0
(#3075)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants