Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[0.1.0-beta.1] Feat/compare #3

Merged
merged 12 commits into from
Nov 22, 2023
17 changes: 17 additions & 0 deletions .github/workflows/release.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
name: automatic_test_build_publish
on:
release:
types: [created]
workflow_dispatch: {}
jobs:
build:
runs-on: ubuntu-latest
steps:
- name: Checkout repository with submodules
uses: actions/checkout@v3
with:
submodules: recursive
- name: Cargo publish
run: cargo publish --token ${CRATES_TOKEN}
env:
CRATES_TOKEN: ${{ secrets.CRATES_TOKEN }}
36 changes: 7 additions & 29 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

8 changes: 4 additions & 4 deletions Cargo.toml
Original file line number Diff line number Diff line change
@@ -1,17 +1,17 @@
[package]
name = "jam-rs"
version = "0.1.0"
version = "0.1.0-beta.1"
edition = "2021"
repository = "https://github.com/St4NNi/jam-rs"
license = "MIT"
description = "Just another minhash (Jam) implementation in Rust"
description = "Just another (genomic) minhash (Jam) implementation in Rust"

[dependencies]
anyhow = "1.0.75"
bincode = "1.3.3"
flate2 = "1.0.27"
flate2 = "1.0.28"
needletail = "0.5.1"
rayon = "1.7.0"
rayon = "1.8.0"
xxhash-rust = { version = "0.8.7", features = ["xxh3"]}
bytemuck = "1.14.0"
serde = { version = "1", features = ["derive"] }
Expand Down
14 changes: 14 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,20 @@ Implements parts of the ScaledMinHash / FracMinHash algorithm described in [sour

Unlike traditional implementations like [sourmash](https://joss.theoj.org/papers/10.21105/joss.00027) or [mash](https://doi.org/10.1186/s13059-016-0997-x) this version tries to specialise more on estimating the containment of small sequences in large sets. This is intended to be used to screen terabytes of data in just a few seconds / minutes.

### Installation

A pre-release is published via [crates.io](https://crates.io/) to install it use (you need to have `cargo` and the `rust-toolchain` installed, the easiest way is via [rustup.rs](https://rustup.rs/)):

```bash
cargo install jam-rs
```

If you want the bleeding edge development release you can install via git:

```bash
cargo install --git https://github.com/St4NNi/jam-rs
```

### Comparison

- [xxhash3](https://github.com/DoumanAsh/xxhash-rust) or [ahash-fallback](https://github.com/tkaitchuck/aHash/wiki/AHash-fallback-algorithm) (for kmer < 32) instead of [murmurhash3](https://github.com/mhallin/murmurhash3-rs)
Expand Down
13 changes: 11 additions & 2 deletions src/cli.rs
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ use std::path::PathBuf;
#[derive(Debug, Parser)]
#[command(name = "jam")]
#[command(bin_name = "jam")]
#[command(version = "0.1.0")]
#[command(version = "0.1.0-beta.1")]
#[command(
about = "Just another minhasher, obviously blazingly fast",
long_about = "A heavily optimized minhash implementation that focuses less on accuracy and more on quick scans of large datasets."
Expand Down Expand Up @@ -93,13 +93,22 @@ pub enum Commands {
input: PathBuf,
/// Database sketch(es)
#[arg(short, long)]
database: PathBuf,
database: Vec<PathBuf>,
/// Output to file instead of stdout
#[arg(short, long)]
#[arg(value_parser = clap::value_parser!(std::path::PathBuf))]
output: Option<PathBuf>,
/// Cut-off value for similarity
#[arg(short, long, default_value = "0.0")]
cutoff: f64,
/// Use the Stats params for restricting results
#[arg(long)]
stats: bool,
/// Use GC stats with an upper bound of x% and a lower bound of y%
#[arg(long)]
gc_lower: Option<u8>,
/// Use GC stats with an upper bound of x% and a lower bound of y%
#[arg(long)]
gc_upper: Option<u8>,
},
}
Loading