Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[PyROOT][DF] Improve RDataFrame.AsNumpy performance #6496

Merged
merged 2 commits into from
Oct 1, 2020

Conversation

stwunsch
Copy link
Contributor

@stwunsch stwunsch commented Sep 29, 2020

  • Fixes a warning in newer python version (first commit)
  • Adds a test that we actually adopt memory from C++ objects and fixes the broken adoption from numpy.array

We should merge first the respective benchmarks in rootbench. The PR is ongoing here: root-project/rootbench#195

* Fixed broken memory adoption formely triggered by numpy.array.
  Switched to the numpy.asarray adoption mechanism.
* Added tests confirming the adoption of the memory from the C++ objects
@stwunsch stwunsch requested a review from etejedor September 29, 2020 09:06
@stwunsch stwunsch self-assigned this Sep 29, 2020
@phsft-bot
Copy link
Collaborator

Starting build on ROOT-debian10-i386/cxx14, ROOT-performance-centos7-multicore/default, ROOT-fedora30/cxx14, ROOT-fedora31/noimt, ROOT-ubuntu16/nortcxxmod, mac1014/python3, mac1015/cxx17, windows10/cxx14
How to customize builds

@phsft-bot
Copy link
Collaborator

Build failed on windows10/cxx14.
Running on null:C:\build\workspace\root-pullrequests-build
See console output.

Errors:

  • [2020-09-29T09:12:52.793Z] LINK : fatal error LNK1104: cannot open file 'C:\build\workspace\root-pullrequests-build\build\bin\libCore.dll' [C:\build\workspace\root-pullrequests-build\build\core\Core.vcxproj]

@phsft-bot
Copy link
Collaborator

Build failed on ROOT-performance-centos7-multicore/default.
Running on olhswep22.cern.ch:/data/sftnight/workspace/root-pullrequests-build
See console output.

Failing tests:

@@ -76,7 +76,7 @@ def RDataFrameAsNumpy(df, columns=None, exclude=None):
for column in columns:
cpp_reference = result_ptrs[column].GetValue()
if hasattr(cpp_reference, "__array_interface__"):
tmp = numpy.array(cpp_reference) # This adopts the memory of the C++ object.
tmp = numpy.asarray(cpp_reference) # This adopts the memory of the C++ object.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good @stwunsch ! So only asarray adopts now and array copies?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've checked the adoption when we put this into legacy pyroot, but we haven't had a test for it. For some reason it seems that the numpy.array constructor does not adopt anymore. I'm pretty sure it did before! However, np.asarray does and we check this now.

df = ROOT.ROOT.RDataFrame(1).Define("x", "1.0")
npy = df.AsNumpy(["x"])
pyarr = npy["x"]
cpparr = pyarr.result_ptr.GetValue()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This result_ptr is RDataFrame's result pointer that you store?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep, exactly!

@stwunsch stwunsch merged commit 05ba9fc into root-project:master Oct 1, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants