-
Notifications
You must be signed in to change notification settings - Fork 111
Reads Info tag is poor representation of arrays and ints #445
Comments
Not really a string. I'm not sure about empty or not as option; IMHO this should read |
The current definition is intended to be a map of string -> []string (I guess the string key type is implied in Avro?), e.g. the following JSON:
The string-only approach is definitely a bit inflexible. Possibly this could be improved along with or after the proposed move to proto. |
My problem with jamming it into a string is that the SAM equivalent carries type info with it. Without that info you can go SAM -> GA4GH but you can't then go back the other way without knowing what type the tag is supposed to be. For unregistered or lowercase tags this may not be possible. If we make this a union of int, float, string and array I can look at the JSON and see "" I can imply it's a string, 6 suggests an integer, 6.0 a float and [] an array. Not sure how that would translate into Proto though. |
@calbach Thanks for the empty array confirmation. And yes, |
So would anyone accept the following:
This doesn't really cover all the cases in http://samtools.github.io/hts-specs/SAMv1.pdf section 1.5 but it covers a lot more than we currently do. |
This suggestions is in a good direction, I think. Some kind of rich representation of attributes is important. We Sequence annotations https://github.com/ga4gh/schemas/blob/rna/src/main/resources/avro/sequenceAnnotations.avdl record Attributes { I am unsure how useful it is to have scalar values as well as It would be good to converge on single attribute structure for Martin Pollard [email protected] writes:
|
The info tag is string only at the moment, unfortunately this makes for a rather poor representation of arrays and integers. Given that this tag appears to be intended to represent BAM/SAM/CRAM info tags it might be wise to make this a union or similar?
https://github.com/ga4gh/schemas/blob/d2b3380992d920150c6d40f884c48cda0d50f321/src/main/resources/avro/reads.avdl#L280
The text was updated successfully, but these errors were encountered: