From 5d8ef6d5c574c051eacd331dc6e90ed477c0865b Mon Sep 17 00:00:00 2001 From: jp0317 Date: Mon, 12 Aug 2024 21:53:03 +0000 Subject: [PATCH 1/3] adding some bad parquet files --- bad_data/README.md | 6 +++++- data/bad-dict-page-header.parquet | Bin 0 -> 533 bytes data/bad-levels.parquet | Bin 0 -> 609 bytes 3 files changed, 5 insertions(+), 1 deletion(-) create mode 100755 data/bad-dict-page-header.parquet create mode 100644 data/bad-levels.parquet diff --git a/bad_data/README.md b/bad_data/README.md index 472865b..baafde6 100644 --- a/bad_data/README.md +++ b/bad_data/README.md @@ -21,4 +21,8 @@ These are files used for reproducing various bugs that have been reported. * PARQUET-1481.parquet: tests a case where a schema Thrift value has been - corrupted + corrupted. +* bad-dict-page-header.parquet: tests a case where the number of values + stored in dictionary page header is negative. +* bad-levels.parquet: tests a case where a page has insufficient repetition + levels. diff --git a/data/bad-dict-page-header.parquet b/data/bad-dict-page-header.parquet new file mode 100755 index 0000000000000000000000000000000000000000..7d14d5ec7995c3b3826f767723824c0e1f04a47b GIT binary patch literal 533 zcmY+Cze>YU6vj`kN&lo#tM=R`5a?uZsDlyRGnBRvN-1s%rIeteNo}cvtMm=roq~h2 zkKyPb_y*2SBItKR#R&IzPj0@G@0^^Z9}M=G_(j8N)_hzRxI`p~$k(uu+SF+U=)j=w zuL5EbAd$+z1QlQ*bro?9tb;1p0GnV7)Bpns>cCcKogdW-d7t60Y=f}f8gv8Y91Pm6 z-ch#~prq>J?CI^`7}2(Mt!k0^e5d-#E95j`sk1<(EMz5!MX1h5MzFZrWRblqd=kv;E2dau$w6v~Z-bJEdbjD^Lvu+3yNKc6JSi|7VUxS=PbI9h76==7gf z8BgLkO6J4yePYuh%Oubv-vZYco>r7l2a$!J#nihWz{;K5+ pQjE~MFE1VP4P;4aL#7Sr>MKZ}cFe2MYP%)sCB0&r{P#Xk`oC+A9#E0>L zd<w$Q$h5N8yYuE!~Bm7t*>!wr1ke1GfaIK(tf>tnP}3#c%JNH z0B8+zXjOnUxC6KW7(hvmz;yxcX;=cdqlon;Vs%X|>1H1C)>t*xEoxTCC1^YX1Vy8f zyLx%@qk#C;GO1|g^LCyFWnW4OFsFOaB>IgqS>HzXbK9m8KIy&>$wfUKaMy`WJhAy7 z7ks(eA)oMMEz=HZ#nr#Qya=FJWcu;6H=ixTI2%l-N2_qY7*5_#bGd?3$TlCqYgX!y OhDlH6C&ieDkLwo|hlFAP literal 0 HcmV?d00001 From 89bd400cacd81b2c532b23df2de8b9510edaabe8 Mon Sep 17 00:00:00 2001 From: mwish Date: Thu, 15 Aug 2024 17:30:08 +0800 Subject: [PATCH 2/3] move to bad-data --- {data => bad_data}/bad-dict-page-header.parquet | Bin {data => bad_data}/bad-levels.parquet | Bin 2 files changed, 0 insertions(+), 0 deletions(-) rename {data => bad_data}/bad-dict-page-header.parquet (100%) rename {data => bad_data}/bad-levels.parquet (100%) diff --git a/data/bad-dict-page-header.parquet b/bad_data/bad-dict-page-header.parquet similarity index 100% rename from data/bad-dict-page-header.parquet rename to bad_data/bad-dict-page-header.parquet diff --git a/data/bad-levels.parquet b/bad_data/bad-levels.parquet similarity index 100% rename from data/bad-levels.parquet rename to bad_data/bad-levels.parquet From d54d465aea18bee596a1a430f9b3e45eafe04430 Mon Sep 17 00:00:00 2001 From: mwish Date: Thu, 15 Aug 2024 17:47:00 +0800 Subject: [PATCH 3/3] fmt? --- ....parquet => ARROW-RS-GH-6229-DICTHEADER.parquet} | Bin ...vels.parquet => ARROW-RS-GH-6229-LEVELS.parquet} | Bin bad_data/README.md | 6 +++--- 3 files changed, 3 insertions(+), 3 deletions(-) rename bad_data/{bad-dict-page-header.parquet => ARROW-RS-GH-6229-DICTHEADER.parquet} (100%) rename bad_data/{bad-levels.parquet => ARROW-RS-GH-6229-LEVELS.parquet} (100%) diff --git a/bad_data/bad-dict-page-header.parquet b/bad_data/ARROW-RS-GH-6229-DICTHEADER.parquet similarity index 100% rename from bad_data/bad-dict-page-header.parquet rename to bad_data/ARROW-RS-GH-6229-DICTHEADER.parquet diff --git a/bad_data/bad-levels.parquet b/bad_data/ARROW-RS-GH-6229-LEVELS.parquet similarity index 100% rename from bad_data/bad-levels.parquet rename to bad_data/ARROW-RS-GH-6229-LEVELS.parquet diff --git a/bad_data/README.md b/bad_data/README.md index 0b7c022..30802a5 100644 --- a/bad_data/README.md +++ b/bad_data/README.md @@ -22,10 +22,10 @@ These are files used for reproducing various bugs that have been reported. * PARQUET-1481.parquet: tests a case where a schema Thrift value has been corrupted. -* bad-dict-page-header.parquet: tests a case where the number of values +* ARROW-RS-GH-6229-DICTHEADER.parquet: tests a case where the number of values stored in dictionary page header is negative. -* bad-levels.parquet: tests a case where a page has insufficient repetition - levels. +* ARROW-RS-GH-6229-LEVELS.parquet: tests a case where a page has insufficient + repetition levels. * ARROW-GH-41321.parquet: test case of https://github.com/apache/arrow/issues/41321 where decoded rep / def levels is less than num_values in page_header. * ARROW-GH-41317.parquet: test case of https://github.com/apache/arrow/issues/41317