Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closes #848: Limit Length of File Lines Logged #1563

Merged

Conversation

stress-tess
Copy link
Member

This PR (Closes #848):

  • Adds limit to length of file/dset names logged

I had no idea what n should be. I made it 1000 characters (so lengths > 2000 are truncated) because that seemed reasonable to me. I'm certainly open to suggestions for different values

If I could get @Ethan-DeBandi99's help testing that this works as intended, that would be awesome!

This PR (Closes Bears-R-Us#848):
- Adds limit to length of file/dset names logged

I had no idea what n should be. I made it 1000 characters (so lengths > 2000 are truncated) because that seemed reasonable to me. I'm certainly open to suggestions for different values

If I could get ethan's help testing this, that would be awesome!
@Ethan-DeBandi99
Copy link
Contributor

@pierce314159 - confirming that this is a functional solution. I threw together a quick Jupyter Notebook and pulled down your PR.

I updated this line of ak.pdarrayIO.read (ln 192) to add a [ before the {json.dumps(filenames)}. This throws off the json read on the Chapel side because you get [[ with only a single closing bracket.

rep_msg = generic_msg(
            cmd=cmd,
            args=f"{strictTypes} {len(datasets)} {len(filenames)} {allow_errors} {calc_string_offsets} "
            f"{json.dumps(datasets)} | [{json.dumps(filenames)}",
        )

The output I recieved is:

RuntimeError: Could not decode json filenames via tempfile (100 files: [["/Users/ethandebandi/Documents/test_data/HB6BAV3OYBLLRZRBQCSX027USLMLFDP4XDXV8UO917AAWYXUSJBO6T7T9P1SK8I766JZTNVKPCOO9ZRYEY974BDU6LGDK74R1AHL_0", "/Users/ethandebandi/Documents/test_data/EFKR25MM8U4JYSI79N6A4RZLJNXCAJCI2Z892F3EE2B13HXHRN9B3BELABFESKLYH239KZICTRILDTLIBRKFLGFOGOLTFP6XLRP4_1", "/Users/ethandebandi/Documents/test_data/SA79MOC7DQWBHT8J0VUBM0IL2KFQVE40Y4RUDSNSOAAFR8RAW4SYXQN77YOBU14A7XVXVH4TRFCIC7BRZC8EEGQ1BAV1DX5DLAMH_2", "/Users/ethandebandi/Documents/test_data/AIPD7Q1V3GDDQVXLGQ88YUTZ5P6K09WMB2GVAJBFRMHF4MX8I29WP3YBD1O8LRKYFC89ITTF9CPWHYU7D38ZKYQRVRBWH67BGC6H_3", "/Users/ethandebandi/Documents/test_data/53OTJ95N16NKTP29719BL1K9F7BQR5VVCOBYQHVNO1VZWNDBVLYA3NSMV0SN8PK2HUPBM9F9QDZ8T0MWCV7XUUM7K3ZUY09VO57S_4", "/Users/ethandebandi/Documents/test_data/FK1NPW93PHEECVQYAQ6BLH0JTXPK6W5KJEZU8V459ED67S8BAD9BHZOK3AY17DUJ8M4GZJPTFJ492WBVC7JKS4U26JJYCBDQOC0R_5", "/Users/ethandebandi/Documents/test_data/X5W7UULKEQ13HYII3NFCWXNUM9PTC0NZQ62UHOMPN7CUULF4SUPGP84QE692SRJBBHEKCQU8OKOE52KDZ...ts/test_data/SB7BEWNJDNUS7VLQ0NUZY55NYRK23LA2RFXKTP2KU7U0KOUUIQZTNDJKTLBIFS1CEDHA3SO7ERHY20Y31A0IVDP30P8P0NP1X15C_93", "/Users/ethandebandi/Documents/test_data/6SA39Y3GCZ8LU0158ILO2I1MMVAAHMPPV2OANESBUSBLGIU1VC1F6DABMF6MVNUBAG8BEHXYA38NBPZUGLTVZKZXTPD9WUDYCRAX_94", "/Users/ethandebandi/Documents/test_data/OZYC1Y34K1G218OL86TIJ3IB8542FNSWM1GGRGW4864AAF5H864Q54MKOE8VCXP4K57TPS7YBQPYSNTO15AJHDL3XAMZFE5P3QG6_95", "/Users/ethandebandi/Documents/test_data/KVJH5JU1UK8G54Z0SD9FD6K7BQ3AD6U2C7B2S7H6SPIBDZK791SGP977JL29Z3982YBN6PP0DYC4PCPF7R1DOXXY6WNTPLB9W1BR_96", "/Users/ethandebandi/Documents/test_data/20YYRROCO70ZJNT2KIANUQAR4CW8238FVXNRC7CA2B6NOUELQKB9UPECSK7NUIN44SXXWKA7BLIXKIDB0AS8GIXWGL43B17ELMJH_97", "/Users/ethandebandi/Documents/test_data/X9LSGSSAO3USZQPCLRHB6S7U29RV9EBM5S1J4PJ74SBTLKMJTNGZ9AWGUVP004EFF7JO33UCTDR5PI0E68B03SGPZD8J01MPHKGB_98", "/Users/ethandebandi/Documents/test_data/RWWAAEOVFOPSDM0TKTKEIE7AO0TFSINUYU5OSC6X4IFRAY7PK6BOLWTRUSDKK97Y8F777T6QE104SEADM7PRR1428Y12LRK5GEBU_99"])

which does include the ... break and the first/last 1000 chars.

In case anyone wants to double check, here is the code I used:

import arkouda as ak
import h5py
import random
import string 

ak.connect()

# make a bunch of files 
file_list = []
base_dir = "<path_to_dir>/test_data/"
chars = string.ascii_uppercase + string.digits
for i in range(100):
    fname = ''.join(random.choice(chars) for _ in range(100))
    fname = fname + f"_{i}"
    file_list.append(base_dir+fname)
    hf = h5py.File(base_dir+fname, 'w')
    hf.close()

# try to read all the files - this is the line that should produce the error
ak.read(file_list)

ak.shutdown()

Copy link
Contributor

@Ethan-DeBandi99 Ethan-DeBandi99 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good. As per my previous comment, it functions as intended. We may want to drop the chars down some, but maybe that should be a setting.

@Ethan-DeBandi99 Ethan-DeBandi99 merged commit 6ea2918 into Bears-R-Us:master Jul 11, 2022
@stress-tess stress-tess deleted the 848_Limit_Len_Log_Lines branch July 20, 2022 16:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Limit Length of Logging Lines
3 participants