Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

parquet file is opened unintentionally #114

Closed
dvirtz opened this issue Jan 24, 2024 · 10 comments · Fixed by #115
Closed

parquet file is opened unintentionally #114

dvirtz opened this issue Jan 24, 2024 · 10 comments · Fixed by #115
Assignees
Labels

Comments

@dvirtz
Copy link
Owner

dvirtz commented Jan 24, 2024

I have a possibly related issue. I too cannot view parquets in VSCode with the extension installed. I have a function I've created that's doing the following (Python 3.9 code)

`def save_df_as_parquet( df, fname ):
half_floats = df.select_dtypes(include="float16")
df[half_floats.columns] = half_floats.astype("float32")
df.to_parquet( save_dir + fname + ".parquet", engine = 'pyarrow')
#Is this needed? VSCodekeeps trying to open the parquet
return 0

_ = save_df_as_parquet(y1_test_pred_df, fname = "y1_test_pred_df")
`

I've tried many different versions of this, one where I don't return anything and don't save the function output to a value, but no matter what I try, every time this code runs, the parquet viewer extension pops up in the bottom right and says it's trying to open the file. I'm not interested in opening the file, I'm simply interested in writing a parquet file. For now I've disabled the extension.

Originally posted by @at672 in #112 (comment)

@dvirtz
Copy link
Owner Author

dvirtz commented Jan 24, 2024

@at672 I tried running the following file but no file was opened:

import pandas as pd
import pyarrow as pa

def save_df_as_parquet( df: pd.DataFrame, fname: str ):
  half_floats = df.select_dtypes(include="float16")
  df[half_floats.columns] = half_floats.astype("float32")
  df.to_parquet( fname + ".parquet", engine = 'pyarrow')
  #Is this needed? VSCodekeeps trying to open the parquet
  return 0

_ = save_df_as_parquet(pd.DataFrame(), fname = "y1_test_pred_df")

Can you create a minimal example that reproduces the behaviour your reported in the above comment?
You can attach it to this issue or upload it whenever I will be able to download it from.

@at672
Copy link

at672 commented Jan 25, 2024

Hmm... I'm not able to reproduce it either when I isolated the function and imports like you did in a new file. I disabled and re-enabled the extension, and it's behaving as intended: I do not issue a command to view a parquet file, and the extension does not try to open up the parquet file.

However, I am encountering a similar issue to the original poster in #112 , I'm unable to open a parquet file ( a 37 MB file to be precise, this isn't a very large file) when I ask the extension to open it. It says Assertion Failed: argument is undefined or null.

Happy to discuss this here or in the other thread.

image

@dvirtz
Copy link
Owner Author

dvirtz commented Jan 25, 2024

Can you please upload the file here?

@at672
Copy link

at672 commented Jan 25, 2024

I tried up loading but github says it exceeds the file limit. This is the most granular file I have without going into Python and moving into pandas and slicing the dataframe and then exporting, which I could do, but at that point it defeats the purpose to a degree because the file in question represents one day worth of high frequency trading data. In reality I would only rarely have a use case for viewing less than a day at a time.

When I have some time I can create a smaller file and upload.

image

@dvirtz
Copy link
Owner Author

dvirtz commented Jan 26, 2024

Perhaps you could upload that to a google drive or a similar service?

@at672
Copy link

at672 commented Jan 26, 2024

Here is a zip folder containing the file. Please let me know when you have downloaded it so I can remove it from drive after.

https://drive.google.com/file/d/19YSgp4UwGf2I8nxeAKfT2DXvTvGk1-fQ/view?usp=drive_link

Edit: I might need to give you access, I'm not sure if the link automatically does or not. If so you can privately message me your email and I can give you access.

@dvirtz
Copy link
Owner Author

dvirtz commented Jan 27, 2024

Thanks I was able to download and reproduce the issue.

@dvirtz dvirtz self-assigned this Jan 28, 2024
dvirtz added a commit that referenced this issue Jan 28, 2024
VSCode only supports files up to 50MB so stop before resulting JSON gets to this size.
See microsoft/vscode#31078

Fixes #114, #74
dvirtz added a commit that referenced this issue Jan 28, 2024
VSCode only supports files up to 50MB so stop before resulting JSON gets to this size.
See microsoft/vscode#31078

Fixes #114, #74
dvirtz pushed a commit that referenced this issue Jan 28, 2024
## [2.6.4](v2.6.3...v2.6.4) (2024-01-28)

### Bug Fixes

* opening large files ([6d2164f](6d2164f)), closes [#114](#114) [#74](#74)
@dvirtz
Copy link
Owner Author

dvirtz commented Jan 28, 2024

🎉 This issue has been resolved in version 2.6.4 🎉

The release is available on:

Your semantic-release bot 📦🚀

@at672
Copy link

at672 commented Jan 29, 2024

Thank you. I am able to open it up now. One question though, when opening it up, it loads as a JSON. I have a screenshot below.

Is this the intended behavior? Or, is there a way for it to view in an excel style (column names in the top row, values in all rows below)

image

@dvirtz
Copy link
Owner Author

dvirtz commented Jan 29, 2024

currently, only JSON is supported.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants