You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It will allow the GQT indexing, querying, and bitmap comparison operations to be accessible to other programming languages (in particular, Python and R).
It will allow GQT's efficient genotype bitmaps to be used by others for the applcation of statistics, method development, etc. For example, the existing PCA functionality is a perfect example of using the WAH-encoded bitmaps to quickly compute metrics that are otherwise very computationally intensive.
It will make it easier for others to understand and therefore contribute to the codebase.
In my mind, the API needs high level functions, such as the following to support point number 1:
// create a GQT index of a VCF, VCF.GZ, or BCF file. Returns indicator of success
int gqt_index(const char *vcf_fn, const char *gqt_fn);
// open a GQT index; return a pointer to a GQT file struct containing file handle, index information, etc.
gqt_file *gqt_open(const char *fn);
// open a sample database; return a pointer to a struct with SQLIte database handle.
smp_file *db_open(const char *fn);
// query the gqt index to return the set of variant row numbers that match
// the set of phenotype and genotype criteria
int *gqt_query(const char **phenotype_rules, const char **genotype_rules);
// close
int gqt_close(gqt_file *);
int db_close(smp_file *);
In addition, in order to support goals 2 and 3, users will need a lower-level API to gain access to the set of bitmaps for a given individual, compare bit vectors, etc. This one is a bit trickier and will require discussion, since users will need to be able to access individual bit vectors for individual samples, compare bit vectors across samples, and apply comparison operators that assess all of an individual's bit vectors for a given attribute (that is, genotypes, depths, phases, PLs, etc.).
The text was updated successfully, but these errors were encountered:
Another simple, yet very general access pattern, is to apply a function over a range in the db. The input to this could be a query and a callback function.
Motivated by suggestions from @brentp and Manuel Rivas (https://twitter.com/manuelrivascruz/status/590250919570186240), it is clear that a simple API would be beneficial. In particular, providing a well-designed C API will have multiple benefits:
In my mind, the API needs high level functions, such as the following to support point number 1:
In addition, in order to support goals 2 and 3, users will need a lower-level API to gain access to the set of bitmaps for a given individual, compare bit vectors, etc. This one is a bit trickier and will require discussion, since users will need to be able to access individual bit vectors for individual samples, compare bit vectors across samples, and apply comparison operators that assess all of an individual's bit vectors for a given attribute (that is, genotypes, depths, phases, PLs, etc.).
The text was updated successfully, but these errors were encountered: