-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Python] Processes killed and semaphore objects leaked when reading pandas data #28936
Comments
Weston Pace / @westonpace:
The second " |
Koyomi Akaguro: |
Weston Pace / @westonpace: Second, you should determine how much memory you have available. The linux command "free -h" can be used to get this information. To convert from Pandas safely you will probably need around double the amount of memory required to store the dataframe. If you do not have this much memory then you can convert the table in parts. |
Koyomi Akaguro: In terms of convert table in parts, do you mean split the dataframe and take each to pa.Table and then combine? |
Weston Pace / @westonpace:
Yes, but you will need to make sure to delete the old parts of the dataframe as they are no longer needed. For example...
df_1 = df.iloc[:1000000,:]
df_2 = df.iloc[1000001:,:]
del df
table_1 = pa.Table.from_pandas(df_1)
del df_1
table_2 = pa.Table.from_pandas(df_2)
del df_2 |
Koyomi Akaguro: |
Koyomi Akaguro: |
Weston Pace / @westonpace: |
Koyomi Akaguro: |
Weston Pace / @westonpace: |
Todd Farmer / @toddfarmer: |
how could i remove the message? above all |
When I run
pa.Table.from_pandas(df)
for a >1G dataframe, it reportsKilled: 9 ../anaconda3/envs/py38/lib/python3.8/multiprocessing/resource_tracker.py:216: UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown
Environment: OS name and version: macOS 11.4
Python version: 3.8.10
Pyarrow version: 4.0.1
Reporter: Koyomi Akaguro
Assignee: Weston Pace / @westonpace
Related issues:
Note: This issue was originally created as ARROW-13254. Please see the migration documentation for further details.
The text was updated successfully, but these errors were encountered: