KFF 1.1 ideas #8

yoann-dufresne · 2021-04-06T07:19:49Z

Upcoming ideas for v1.1. Do not hesitate to propose other features!
(list updated with new ideas)

i section: Index section. Register the distance to some following sections (not necessary all of them). This will allow parallel reading of a file.

natir · 2021-04-06T08:07:13Z

A specialization of raw and minimizer section, for count ?

About index of section, it's distance in raw ? How it's work if we have compressed version of kff file ?

yoann-dufresne · 2021-04-06T08:38:40Z

It is a good point to think about data specialization.
But I think that the version 1 of kff will focus on the sequence part only.
Maybe the v2.0 will include such ideas but will have first to publish the simplest version of the format.

yoann-dufresne · 2021-04-06T08:39:38Z

For the index it is a distance in the uncompressed file.
I do not know how it will work inside of compressed files. Do you have any suggestion ?

natir · 2021-04-06T09:17:39Z

I agree specialization should be hard and/or inelegant to include without a breaking of compatibility.

I think that if we want to be able to do parallel or random access in a compressed kff file. We have to apply the method chosen for the bam.

We don't compress the whole file we compress blocks and the index indicates the beginning of these blocks.
We can apply this for the version 1.1, we could imagine that the blocks can be compressed or not. When we read the file we use the magic number to know which decompression algorithm we have to use.

yoann-dufresne · 2021-05-06T12:22:40Z

MAJOR UPDATE: The index section is moving to version 1.0

lrobidou · 2024-08-15T02:12:19Z

Useful for at least my use case:

a flag indicating that a minimizer appears only in one minimizer section, and never anywhere else
a flag indicating the count is unique for all kmers in the block

Random idea:
A flag indicating an index section is present and contains information about the data, e.g. the occurrence of the first kmer above count x is in section u, the last below count x is in section v. This would allow to do a binary search on the data.

yoann-dufresne added the enhancement New feature or request label Apr 6, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

KFF 1.1 ideas #8

KFF 1.1 ideas #8

yoann-dufresne commented Apr 6, 2021 •

edited

Loading

natir commented Apr 6, 2021

yoann-dufresne commented Apr 6, 2021

yoann-dufresne commented Apr 6, 2021

natir commented Apr 6, 2021

yoann-dufresne commented May 6, 2021

lrobidou commented Aug 15, 2024

KFF 1.1 ideas #8

KFF 1.1 ideas #8

Comments

yoann-dufresne commented Apr 6, 2021 • edited Loading

natir commented Apr 6, 2021

yoann-dufresne commented Apr 6, 2021

yoann-dufresne commented Apr 6, 2021

natir commented Apr 6, 2021

yoann-dufresne commented May 6, 2021

lrobidou commented Aug 15, 2024

yoann-dufresne commented Apr 6, 2021 •

edited

Loading