Closes #848: Limit Length of File Lines Logged #1563

stress-tess · 2022-07-07T17:45:53Z

This PR (Closes #848):

Adds limit to length of file/dset names logged

I had no idea what n should be. I made it 1000 characters (so lengths > 2000 are truncated) because that seemed reasonable to me. I'm certainly open to suggestions for different values

If I could get @Ethan-DeBandi99's help testing that this works as intended, that would be awesome!

This PR (Closes Bears-R-Us#848): - Adds limit to length of file/dset names logged I had no idea what n should be. I made it 1000 characters (so lengths > 2000 are truncated) because that seemed reasonable to me. I'm certainly open to suggestions for different values If I could get ethan's help testing this, that would be awesome!

Ethan-DeBandi99 · 2022-07-07T19:02:03Z

@pierce314159 - confirming that this is a functional solution. I threw together a quick Jupyter Notebook and pulled down your PR.

I updated this line of ak.pdarrayIO.read (ln 192) to add a [ before the {json.dumps(filenames)}. This throws off the json read on the Chapel side because you get [[ with only a single closing bracket.

rep_msg = generic_msg(
            cmd=cmd,
            args=f"{strictTypes} {len(datasets)} {len(filenames)} {allow_errors} {calc_string_offsets} "
            f"{json.dumps(datasets)} | [{json.dumps(filenames)}",
        )

The output I recieved is:

RuntimeError: Could not decode json filenames via tempfile (100 files: [["/Users/ethandebandi/Documents/test_data/HB6BAV3OYBLLRZRBQCSX027USLMLFDP4XDXV8UO917AAWYXUSJBO6T7T9P1SK8I766JZTNVKPCOO9ZRYEY974BDU6LGDK74R1AHL_0", "/Users/ethandebandi/Documents/test_data/EFKR25MM8U4JYSI79N6A4RZLJNXCAJCI2Z892F3EE2B13HXHRN9B3BELABFESKLYH239KZICTRILDTLIBRKFLGFOGOLTFP6XLRP4_1", "/Users/ethandebandi/Documents/test_data/SA79MOC7DQWBHT8J0VUBM0IL2KFQVE40Y4RUDSNSOAAFR8RAW4SYXQN77YOBU14A7XVXVH4TRFCIC7BRZC8EEGQ1BAV1DX5DLAMH_2", "/Users/ethandebandi/Documents/test_data/AIPD7Q1V3GDDQVXLGQ88YUTZ5P6K09WMB2GVAJBFRMHF4MX8I29WP3YBD1O8LRKYFC89ITTF9CPWHYU7D38ZKYQRVRBWH67BGC6H_3", "/Users/ethandebandi/Documents/test_data/53OTJ95N16NKTP29719BL1K9F7BQR5VVCOBYQHVNO1VZWNDBVLYA3NSMV0SN8PK2HUPBM9F9QDZ8T0MWCV7XUUM7K3ZUY09VO57S_4", "/Users/ethandebandi/Documents/test_data/FK1NPW93PHEECVQYAQ6BLH0JTXPK6W5KJEZU8V459ED67S8BAD9BHZOK3AY17DUJ8M4GZJPTFJ492WBVC7JKS4U26JJYCBDQOC0R_5", "/Users/ethandebandi/Documents/test_data/X5W7UULKEQ13HYII3NFCWXNUM9PTC0NZQ62UHOMPN7CUULF4SUPGP84QE692SRJBBHEKCQU8OKOE52KDZ...ts/test_data/SB7BEWNJDNUS7VLQ0NUZY55NYRK23LA2RFXKTP2KU7U0KOUUIQZTNDJKTLBIFS1CEDHA3SO7ERHY20Y31A0IVDP30P8P0NP1X15C_93", "/Users/ethandebandi/Documents/test_data/6SA39Y3GCZ8LU0158ILO2I1MMVAAHMPPV2OANESBUSBLGIU1VC1F6DABMF6MVNUBAG8BEHXYA38NBPZUGLTVZKZXTPD9WUDYCRAX_94", "/Users/ethandebandi/Documents/test_data/OZYC1Y34K1G218OL86TIJ3IB8542FNSWM1GGRGW4864AAF5H864Q54MKOE8VCXP4K57TPS7YBQPYSNTO15AJHDL3XAMZFE5P3QG6_95", "/Users/ethandebandi/Documents/test_data/KVJH5JU1UK8G54Z0SD9FD6K7BQ3AD6U2C7B2S7H6SPIBDZK791SGP977JL29Z3982YBN6PP0DYC4PCPF7R1DOXXY6WNTPLB9W1BR_96", "/Users/ethandebandi/Documents/test_data/20YYRROCO70ZJNT2KIANUQAR4CW8238FVXNRC7CA2B6NOUELQKB9UPECSK7NUIN44SXXWKA7BLIXKIDB0AS8GIXWGL43B17ELMJH_97", "/Users/ethandebandi/Documents/test_data/X9LSGSSAO3USZQPCLRHB6S7U29RV9EBM5S1J4PJ74SBTLKMJTNGZ9AWGUVP004EFF7JO33UCTDR5PI0E68B03SGPZD8J01MPHKGB_98", "/Users/ethandebandi/Documents/test_data/RWWAAEOVFOPSDM0TKTKEIE7AO0TFSINUYU5OSC6X4IFRAY7PK6BOLWTRUSDKK97Y8F777T6QE104SEADM7PRR1428Y12LRK5GEBU_99"])

which does include the ... break and the first/last 1000 chars.

In case anyone wants to double check, here is the code I used:

import arkouda as ak
import h5py
import random
import string 

ak.connect()

# make a bunch of files 
file_list = []
base_dir = "<path_to_dir>/test_data/"
chars = string.ascii_uppercase + string.digits
for i in range(100):
    fname = ''.join(random.choice(chars) for _ in range(100))
    fname = fname + f"_{i}"
    file_list.append(base_dir+fname)
    hf = h5py.File(base_dir+fname, 'w')
    hf.close()

# try to read all the files - this is the line that should produce the error
ak.read(file_list)

ak.shutdown()

Ethan-DeBandi99

This looks good. As per my previous comment, it functions as intended. We may want to drop the chars down some, but maybe that should be a setting.

stress-tess requested review from Ethan-DeBandi99, reuster986, mhmerrill and joshmarshall1 July 7, 2022 17:45

Ethan-DeBandi99 approved these changes Jul 7, 2022

View reviewed changes

joshmarshall1 approved these changes Jul 8, 2022

View reviewed changes

Ethan-DeBandi99 merged commit 6ea2918 into Bears-R-Us:master Jul 11, 2022

stress-tess deleted the 848_Limit_Len_Log_Lines branch July 20, 2022 16:32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Closes #848: Limit Length of File Lines Logged #1563

Closes #848: Limit Length of File Lines Logged #1563

stress-tess commented Jul 7, 2022

Ethan-DeBandi99 commented Jul 7, 2022

Ethan-DeBandi99 left a comment

Closes #848: Limit Length of File Lines Logged #1563

Closes #848: Limit Length of File Lines Logged #1563

Conversation

stress-tess commented Jul 7, 2022

Ethan-DeBandi99 commented Jul 7, 2022

Ethan-DeBandi99 left a comment

Choose a reason for hiding this comment