-
Notifications
You must be signed in to change notification settings - Fork 442
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Drop duplicate FORMAT tags and fix an error with dropping large tags #1752
Conversation
fmt_aux_t *jfmt = &fmt[j]; | ||
if ( jfmt->size==-1 ) continue; // already marked for removal | ||
if ( ifmt->key!=jfmt->key ) continue; | ||
static int warned = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Htslib is meant to be thread safe, and also callable multiple times. We shouldn't be using static variables like this as they can lead to thread complaints with automated checking tools (such as two threads modifying the same variable without intervening locks) and also if we parse file A and file B then we'll never report warnings on file B if file A produced one.
Personally, I think we should warn for everything. If it gets too spammy, people can turn down the warning level with the appropriate command line options to set verbosity. (Or just fix their buggy inputs.)
We could however have it as a non-standard function level scope so we warn once for the function as a whole. This reduces spamming a little if the entire string is duplicated.
vcf.c
Outdated
v->n_fmt--; | ||
if ( i+1<v->n_fmt ) memmove(&fmt[i],&fmt[i+1],sizeof(*fmt)*v->n_fmt); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's worrying that this never worked before, but the new version also doesn't work and is now infact a security flaw:
$ ./test/test_view poc_simple.vcf
...
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT X1
[W::vcf_parse_format_alloc4] Duplicate FORMAT tag GT at chr1:10327
*** stack smashing detected ***: <unknown> terminated
Aborted
Note it also didn't work with a FORMAT string of GT:GT:X. That got replaced by GT (no X) because you're decrementing v->n_fmt too early so the if ( i+1<v->n_fmt )
means it doesn't do the copy of the remaining item.
To avoid the buffer overflow, you need to reduce the amount copied:
diff --git a/htscodecs b/htscodecs
--- a/htscodecs
+++ b/htscodecs
@@ -1 +1 @@
-Subproject commit ffda7310c4b3292955561d6c3b1743cb82bfe26b
+Subproject commit ffda7310c4b3292955561d6c3b1743cb82bfe26b-dirty
diff --git a/vcf.c b/vcf.c
index f08dbc07..1ed0f23d 100644
--- a/vcf.c
+++ b/vcf.c
@@ -3297,8 +3297,9 @@ static int vcf_parse_format_gt6(kstring_t *s, const bcf_hdr_t *h, bcf1_t *v,
while ( i < v->n_fmt ) {
if ( fmt[i].size==-1 )
{
+ if ( i+1<v->n_fmt )
+ memmove(&fmt[i],&fmt[i+1],sizeof(*fmt)*(v->n_fmt-(i+1)));
v->n_fmt--;
- if ( i+1<v->n_fmt ) memmove(&fmt[i],&fmt[i+1],sizeof(*fmt)*v->n_fmt);
}
else
i++;
The reason this isn't spotted by the address sanitizer in the tests is that we're only testing with 2 format keys, and we always allocate MAX_N_FMT entries. So it's overflowing the array slots we think it's using, but as far as the CPU is concerned it's still keeping within the bounds of the memory we allocated. To exploit it you need to have an entry in slot x where 2x >= 255 (+/- a few perhaps).
My little test file is this:
##fileformat=VCFv4.1
##INFO=<ID=S,Number=1,dype=String,Description="blah">
##INFO=<ID=I,Number=1,Type=Integer,Description="blah">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FORMAT=<ID=a0,Number=1,Type=String>
##FORMAT=<ID=a1,Number=1,Type=String>
##FORMAT=<ID=a2,Number=1,Type=String>
##FORMAT=<ID=a3,Number=1,Type=String>
##FORMAT=<ID=a4,Number=1,Type=String>
##FORMAT=<ID=a5,Number=1,Type=String>
##FORMAT=<ID=a6,Number=1,Type=String>
##FORMAT=<ID=a7,Number=1,Type=String>
##FORMAT=<ID=a8,Number=1,Type=String>
##FORMAT=<ID=a9,Number=1,Type=String>
##FORMAT=<ID=a10,Number=1,Type=String>
##FORMAT=<ID=a11,Number=1,Type=String>
##FORMAT=<ID=a12,Number=1,Type=String>
##FORMAT=<ID=a13,Number=1,Type=String>
##FORMAT=<ID=a14,Number=1,Type=String>
##FORMAT=<ID=a15,Number=1,Type=String>
##FORMAT=<ID=a16,Number=1,Type=String>
##FORMAT=<ID=a17,Number=1,Type=String>
##FORMAT=<ID=a18,Number=1,Type=String>
##FORMAT=<ID=a19,Number=1,Type=String>
##FORMAT=<ID=a20,Number=1,Type=String>
##FORMAT=<ID=a21,Number=1,Type=String>
##FORMAT=<ID=a22,Number=1,Type=String>
##FORMAT=<ID=a23,Number=1,Type=String>
##FORMAT=<ID=a24,Number=1,Type=String>
##FORMAT=<ID=a25,Number=1,Type=String>
##FORMAT=<ID=a26,Number=1,Type=String>
##FORMAT=<ID=a27,Number=1,Type=String>
##FORMAT=<ID=a28,Number=1,Type=String>
##FORMAT=<ID=a29,Number=1,Type=String>
##FORMAT=<ID=a30,Number=1,Type=String>
##FORMAT=<ID=a31,Number=1,Type=String>
##FORMAT=<ID=a32,Number=1,Type=String>
##FORMAT=<ID=a33,Number=1,Type=String>
##FORMAT=<ID=a34,Number=1,Type=String>
##FORMAT=<ID=a35,Number=1,Type=String>
##FORMAT=<ID=a36,Number=1,Type=String>
##FORMAT=<ID=a37,Number=1,Type=String>
##FORMAT=<ID=a38,Number=1,Type=String>
##FORMAT=<ID=a39,Number=1,Type=String>
##FORMAT=<ID=a40,Number=1,Type=String>
##FORMAT=<ID=a41,Number=1,Type=String>
##FORMAT=<ID=a42,Number=1,Type=String>
##FORMAT=<ID=a43,Number=1,Type=String>
##FORMAT=<ID=a44,Number=1,Type=String>
##FORMAT=<ID=a45,Number=1,Type=String>
##FORMAT=<ID=a46,Number=1,Type=String>
##FORMAT=<ID=a47,Number=1,Type=String>
##FORMAT=<ID=a48,Number=1,Type=String>
##FORMAT=<ID=a49,Number=1,Type=String>
##FORMAT=<ID=a50,Number=1,Type=String>
##FORMAT=<ID=a51,Number=1,Type=String>
##FORMAT=<ID=a52,Number=1,Type=String>
##FORMAT=<ID=a53,Number=1,Type=String>
##FORMAT=<ID=a54,Number=1,Type=String>
##FORMAT=<ID=a55,Number=1,Type=String>
##FORMAT=<ID=a56,Number=1,Type=String>
##FORMAT=<ID=a57,Number=1,Type=String>
##FORMAT=<ID=a58,Number=1,Type=String>
##FORMAT=<ID=a59,Number=1,Type=String>
##FORMAT=<ID=a60,Number=1,Type=String>
##FORMAT=<ID=a61,Number=1,Type=String>
##FORMAT=<ID=a62,Number=1,Type=String>
##FORMAT=<ID=a63,Number=1,Type=String>
##FORMAT=<ID=a64,Number=1,Type=String>
##FORMAT=<ID=a65,Number=1,Type=String>
##FORMAT=<ID=a66,Number=1,Type=String>
##FORMAT=<ID=a67,Number=1,Type=String>
##FORMAT=<ID=a68,Number=1,Type=String>
##FORMAT=<ID=a69,Number=1,Type=String>
##FORMAT=<ID=a70,Number=1,Type=String>
##FORMAT=<ID=a71,Number=1,Type=String>
##FORMAT=<ID=a72,Number=1,Type=String>
##FORMAT=<ID=a73,Number=1,Type=String>
##FORMAT=<ID=a74,Number=1,Type=String>
##FORMAT=<ID=a75,Number=1,Type=String>
##FORMAT=<ID=a76,Number=1,Type=String>
##FORMAT=<ID=a77,Number=1,Type=String>
##FORMAT=<ID=a78,Number=1,Type=String>
##FORMAT=<ID=a79,Number=1,Type=String>
##FORMAT=<ID=a80,Number=1,Type=String>
##FORMAT=<ID=a81,Number=1,Type=String>
##FORMAT=<ID=a82,Number=1,Type=String>
##FORMAT=<ID=a83,Number=1,Type=String>
##FORMAT=<ID=a84,Number=1,Type=String>
##FORMAT=<ID=a85,Number=1,Type=String>
##FORMAT=<ID=a86,Number=1,Type=String>
##FORMAT=<ID=a87,Number=1,Type=String>
##FORMAT=<ID=a88,Number=1,Type=String>
##FORMAT=<ID=a89,Number=1,Type=String>
##FORMAT=<ID=a90,Number=1,Type=String>
##FORMAT=<ID=a91,Number=1,Type=String>
##FORMAT=<ID=a92,Number=1,Type=String>
##FORMAT=<ID=a93,Number=1,Type=String>
##FORMAT=<ID=a94,Number=1,Type=String>
##FORMAT=<ID=a95,Number=1,Type=String>
##FORMAT=<ID=a96,Number=1,Type=String>
##FORMAT=<ID=a97,Number=1,Type=String>
##FORMAT=<ID=a98,Number=1,Type=String>
##FORMAT=<ID=a99,Number=1,Type=String>
##FORMAT=<ID=a100,Number=1,Type=String>
##FORMAT=<ID=a101,Number=1,Type=String>
##FORMAT=<ID=a102,Number=1,Type=String>
##FORMAT=<ID=a103,Number=1,Type=String>
##FORMAT=<ID=a104,Number=1,Type=String>
##FORMAT=<ID=a105,Number=1,Type=String>
##FORMAT=<ID=a106,Number=1,Type=String>
##FORMAT=<ID=a107,Number=1,Type=String>
##FORMAT=<ID=a108,Number=1,Type=String>
##FORMAT=<ID=a109,Number=1,Type=String>
##FORMAT=<ID=a110,Number=1,Type=String>
##FORMAT=<ID=a111,Number=1,Type=String>
##FORMAT=<ID=a112,Number=1,Type=String>
##FORMAT=<ID=a113,Number=1,Type=String>
##FORMAT=<ID=a114,Number=1,Type=String>
##FORMAT=<ID=a115,Number=1,Type=String>
##FORMAT=<ID=a116,Number=1,Type=String>
##FORMAT=<ID=a117,Number=1,Type=String>
##FORMAT=<ID=a118,Number=1,Type=String>
##FORMAT=<ID=a119,Number=1,Type=String>
##FORMAT=<ID=a120,Number=1,Type=String>
##FORMAT=<ID=a121,Number=1,Type=String>
##FORMAT=<ID=a122,Number=1,Type=String>
##FORMAT=<ID=a123,Number=1,Type=String>
##FORMAT=<ID=a124,Number=1,Type=String>
##FORMAT=<ID=a125,Number=1,Type=String>
##FORMAT=<ID=a126,Number=1,Type=String>
##FORMAT=<ID=a127,Number=1,Type=String>
##FORMAT=<ID=a128,Number=1,Type=String>
##FORMAT=<ID=a129,Number=1,Type=String>
##FORMAT=<ID=a130,Number=1,Type=String>
##FORMAT=<ID=a131,Number=1,Type=String>
##FORMAT=<ID=a132,Number=1,Type=String>
##FORMAT=<ID=a133,Number=1,Type=String>
##FORMAT=<ID=a134,Number=1,Type=String>
##FORMAT=<ID=a135,Number=1,Type=String>
##FORMAT=<ID=a136,Number=1,Type=String>
##FORMAT=<ID=a137,Number=1,Type=String>
##FORMAT=<ID=a138,Number=1,Type=String>
##FORMAT=<ID=a139,Number=1,Type=String>
##FORMAT=<ID=a140,Number=1,Type=String>
##FORMAT=<ID=a141,Number=1,Type=String>
##FORMAT=<ID=a142,Number=1,Type=String>
##FORMAT=<ID=a143,Number=1,Type=String>
##FORMAT=<ID=a144,Number=1,Type=String>
##FORMAT=<ID=a145,Number=1,Type=String>
##FORMAT=<ID=a146,Number=1,Type=String>
##FORMAT=<ID=a147,Number=1,Type=String>
##FORMAT=<ID=a148,Number=1,Type=String>
##FORMAT=<ID=a149,Number=1,Type=String>
##FORMAT=<ID=a150,Number=1,Type=String>
##FORMAT=<ID=a151,Number=1,Type=String>
##FORMAT=<ID=a152,Number=1,Type=String>
##FORMAT=<ID=a153,Number=1,Type=String>
##FORMAT=<ID=a154,Number=1,Type=String>
##FORMAT=<ID=a155,Number=1,Type=String>
##FORMAT=<ID=a156,Number=1,Type=String>
##FORMAT=<ID=a157,Number=1,Type=String>
##FORMAT=<ID=a158,Number=1,Type=String>
##FORMAT=<ID=a159,Number=1,Type=String>
##FORMAT=<ID=a160,Number=1,Type=String>
##FORMAT=<ID=a161,Number=1,Type=String>
##FORMAT=<ID=a162,Number=1,Type=String>
##FORMAT=<ID=a163,Number=1,Type=String>
##FORMAT=<ID=a164,Number=1,Type=String>
##FORMAT=<ID=a165,Number=1,Type=String>
##FORMAT=<ID=a166,Number=1,Type=String>
##FORMAT=<ID=a167,Number=1,Type=String>
##FORMAT=<ID=a168,Number=1,Type=String>
##FORMAT=<ID=a169,Number=1,Type=String>
##FORMAT=<ID=a170,Number=1,Type=String>
##FORMAT=<ID=a171,Number=1,Type=String>
##FORMAT=<ID=a172,Number=1,Type=String>
##FORMAT=<ID=a173,Number=1,Type=String>
##FORMAT=<ID=a174,Number=1,Type=String>
##FORMAT=<ID=a175,Number=1,Type=String>
##FORMAT=<ID=a176,Number=1,Type=String>
##FORMAT=<ID=a177,Number=1,Type=String>
##FORMAT=<ID=a178,Number=1,Type=String>
##FORMAT=<ID=a179,Number=1,Type=String>
##FORMAT=<ID=a180,Number=1,Type=String>
##FORMAT=<ID=a181,Number=1,Type=String>
##FORMAT=<ID=a182,Number=1,Type=String>
##FORMAT=<ID=a183,Number=1,Type=String>
##FORMAT=<ID=a184,Number=1,Type=String>
##FORMAT=<ID=a185,Number=1,Type=String>
##FORMAT=<ID=a186,Number=1,Type=String>
##FORMAT=<ID=a187,Number=1,Type=String>
##FORMAT=<ID=a188,Number=1,Type=String>
##FORMAT=<ID=a189,Number=1,Type=String>
##FORMAT=<ID=a190,Number=1,Type=String>
##FORMAT=<ID=a191,Number=1,Type=String>
##FORMAT=<ID=a192,Number=1,Type=String>
##FORMAT=<ID=a193,Number=1,Type=String>
##FORMAT=<ID=a194,Number=1,Type=String>
##FORMAT=<ID=a195,Number=1,Type=String>
##FORMAT=<ID=a196,Number=1,Type=String>
##FORMAT=<ID=a197,Number=1,Type=String>
##FORMAT=<ID=a198,Number=1,Type=String>
##FORMAT=<ID=a199,Number=1,Type=String>
##contig=<ID=chr1,length=249250621,assembly=b37>
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT X1
chr1 10327 . T C . . S=foo;I=12345 a0:a1:a2:a3:a4:a5:a6:a7:a8:a9:a10:a11:a12:a13:a14:a15:a16:a17:a18:a19:a20:a21:a22:a23:a24:a25:a26:a27:a28:a29:a30:a31:a32:a33:a34:a35:a36:a37:a38:a39:a40:a41:a42:a43:a44:a45:a46:a47:a48:a49:a50:a51:a52:a53:a54:a55:a56:a57:a58:a59:a60:a61:a62:a63:a64:a65:a66:a67:a68:a69:a70:a71:a72:a73:a74:a75:a76:a77:a78:a79:a80:a81:a82:a83:a84:a85:a86:a87:a88:a89:a90:a91:a92:a93:a94:a95:a96:a97:a98:a99:a100:a101:a102:a103:a104:a105:a106:a107:a108:a109:a110:a111:a112:a113:a114:a115:a116:a117:a118:a119:a120:a121:a122:a123:a124:a125:a126:a127:a128:a129:a130:a131:a132:a133:a134:a135:a136:a137:a138:a139:a140:a141:a142:a143:a144:a145:a146:a147:a148:a149:a150:a151:a152:a153:a154:a155:a156:a157:a158:a159:a160:a161:a162:a163:a164:a165:a166:a167:a168:a169:a170:a171:a172:a173:a174:a175:a176:a177:a178:a179:a180:a181:a182:a183:a184:a185:a186:a187:a188:a189:a190:a191:a192:a193:a194:a195:a196:a197:a198:GT:GT:a199 a0:a1:a2:a3:a4:a5:a6:a7:a8:a9:a10:a11:a12:a13:a14:a15:a16:a17:a18:a19:a20:a21:a22:a23:a24:a25:a26:a27:a28:a29:a30:a31:a32:a33:a34:a35:a36:a37:a38:a39:a40:a41:a42:a43:a44:a45:a46:a47:a48:a49:a50:a51:a52:a53:a54:a55:a56:a57:a58:a59:a60:a61:a62:a63:a64:a65:a66:a67:a68:a69:a70:a71:a72:a73:a74:a75:a76:a77:a78:a79:a80:a81:a82:a83:a84:a85:a86:a87:a88:a89:a90:a91:a92:a93:a94:a95:a96:a97:a98:a99:a100:a101:a102:a103:a104:a105:a106:a107:a108:a109:a110:a111:a112:a113:a114:a115:a116:a117:a118:a119:a120:a121:a122:a123:a124:a125:a126:a127:a128:a129:a130:a131:a132:a133:a134:a135:a136:a137:a138:a139:a140:a141:a142:a143:a144:a145:a146:a147:a148:a149:a150:a151:a152:a153:a154:a155:a156:a157:a158:a159:a160:a161:a162:a163:a164:a165:a166:a167:a168:a169:a170:a171:a172:a173:a174:a175:a176:a177:a178:a179:a180:a181:a182:a183:a184:a185:a186:a187:a188:a189:a190:a191:a192:a193:a194:a195:a196:a197:a198:0/1:0/1:a199
You should probably try it right up to the maximum supported of 255 to check there are no boundary cases.
This is now fixed and tested extensively, should be ready to go. |
I'll try rebasing and pushing back. Some of these build errors are due to you basing this on an old htslib, which used htscodecs 1.4 instead of the current release (1.6). |
0fd148c
to
03899cb
Compare
The removal of large tags introduced by b49eea4 and 9db565d could not work correctly, the memmove pointers were wrong! Resolves samtools#1733
03899cb
to
ae0ecfb
Compare
The removal of large tags introduced by b49eea4 and 9db565d could not work correctly, the memmove pointers were wrong!
Resolves #1733
This is a fix of the botched pull request #1736