Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create high-level and low-level APIs #27

Open
arq5x opened this issue Apr 21, 2015 · 3 comments
Open

Create high-level and low-level APIs #27

arq5x opened this issue Apr 21, 2015 · 3 comments

Comments

@arq5x
Copy link
Collaborator

arq5x commented Apr 21, 2015

Motivated by suggestions from @brentp and Manuel Rivas (https://twitter.com/manuelrivascruz/status/590250919570186240), it is clear that a simple API would be beneficial. In particular, providing a well-designed C API will have multiple benefits:

  • It will allow the GQT indexing, querying, and bitmap comparison operations to be accessible to other programming languages (in particular, Python and R).
  • It will allow GQT's efficient genotype bitmaps to be used by others for the applcation of statistics, method development, etc. For example, the existing PCA functionality is a perfect example of using the WAH-encoded bitmaps to quickly compute metrics that are otherwise very computationally intensive.
  • It will make it easier for others to understand and therefore contribute to the codebase.

In my mind, the API needs high level functions, such as the following to support point number 1:

// create a GQT index of a VCF, VCF.GZ, or BCF file. Returns indicator of success
int  gqt_index(const char *vcf_fn, const char *gqt_fn);

// open a GQT index; return a pointer to a GQT file struct containing file handle, index information, etc.
gqt_file *gqt_open(const char *fn);

// open a sample database; return a pointer to a struct with SQLIte database handle.
smp_file *db_open(const char *fn);

// query the gqt index to return the set of variant row numbers that match
// the set of phenotype and genotype criteria
int *gqt_query(const char **phenotype_rules, const char **genotype_rules); 

// close
int gqt_close(gqt_file *);
int db_close(smp_file *);

In addition, in order to support goals 2 and 3, users will need a lower-level API to gain access to the set of bitmaps for a given individual, compare bit vectors, etc. This one is a bit trickier and will require discussion, since users will need to be able to access individual bit vectors for individual samples, compare bit vectors across samples, and apply comparison operators that assess all of an individual's bit vectors for a given attribute (that is, genotypes, depths, phases, PLs, etc.).

@ekg
Copy link

ekg commented Apr 21, 2015

I wholeheartedly support an API to GQT! 👍

Another simple, yet very general access pattern, is to apply a function over a range in the db. The input to this could be a query and a callback function.

@pontikos
Copy link

pontikos commented Aug 3, 2016

Hi, great program! Has there been any progress on that front?

@ryanlayer
Copy link
Owner

Not yet. But the API is at the top of my list of improvements once a few
other projects are cleared out.

On Wed, Aug 3, 2016 at 6:28 AM, Nikolas Pontikos [email protected]
wrote:

Hi, great program! Has there been any progress on that front?


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
#27 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/AAlDUQ0BIxUQgzMjBtK0dsR2T6-GUMwAks5qcIlhgaJpZM4EExHz
.

Ryan Layer

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants