Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RF] RooDataSet.from_numpy gives wrong result when the input arrays are not c-contiguous #13605

Closed
1 task done
AlkaidCheng opened this issue Sep 5, 2023 · 2 comments · Fixed by #13609
Closed
1 task done

Comments

@AlkaidCheng
Copy link

AlkaidCheng commented Sep 5, 2023

Check duplicate issues.

  • Checked for duplicates

Description

If the input numpy arrays to RooDataSet.from_numpy are not c-contiguous, then the resulting dataset will have messed up values.

Reproducer

import itertools
import numpy as np

import ROOT

obs_1 = ROOT.RooRealVar('obs_1' , 'obs_1', 70, 70, 190)
obs_1.setBins(30)
obs_2 = ROOT.RooRealVar('obs_2' , 'obs_2', 100, 100, 180)
obs_2.setBins(80)

val_obs_1 = []
val_obs_2 = []
for i in range(obs_1.numBins()):
    obs_1.setBin(i)
    val_obs_1.append(obs_1.getVal())
for i in range(obs_2.numBins()):
    obs_2.setBin(i)
    val_obs_2.append(obs_2.getVal())    

# so that all combination of values are in the dataset
val_cart_product = np.array(list(itertools.product(val_obs_1, val_obs_2)))
data = {
    'obs_1': val_cart_product[:, 0],
    'obs_2': val_cart_product[:, 1]
}

dataset = ROOT.RooDataSet.from_numpy(data, ROOT.RooArgSet(obs_1, obs_2))

This gives (memory dependent):

>>> dataset.to_numpy()
{'obs_1': array([ 72. , 100.5,  72. , ..., 178.5, 128. , 179.5]),
 'obs_2': array([100.5,  72. , 101.5, ..., 128. , 179.5, 132. ])}

The expected output should be

>>> dataset.to_numpy()
{'obs_1': array([ 72.,  72.,  72., ..., 188., 188., 188.]),
 'obs_2': array([100.5, 101.5, 102.5, ..., 177.5, 178.5, 179.5])}

This happens because the arrays data['obs_1'] and data['obs_2'] are not c-contiguous:

>>> data['obs_1'].flags
  C_CONTIGUOUS : False
  F_CONTIGUOUS : False
  OWNDATA : False
  WRITEABLE : True
  ALIGNED : True
  WRITEBACKIFCOPY : False
  UPDATEIFCOPY : False

If we make the arrays c-contiguous, the resulting dataset is correct:

for key, arr in data.items():
    if not arr.flags.c_contiguous:
        data[key] = np.ascontiguousarray(arr)
dataset = ROOT.RooDataSet.from_numpy(data, ROOT.RooArgSet(obs_1, obs_2))

this gives

>>> dataset.to_numpy()
{'obs_1': array([ 72.,  72.,  72., ..., 188., 188., 188.]),
 'obs_2': array([100.5, 101.5, 102.5, ..., 177.5, 178.5, 179.5])}

This issue arises because the copying of numpy data to std::vector assumes the numpy array to be c-contiguous:
https://github.com/root-project/root/blob/master/bindings/pyroot/pythonizations/python/ROOT/_pythonization/_roofit/_roodataset.py#L129-L132

ROOT version

ROOT 6.26/08+

Installation method

lxplus

Operating system

Linux

Additional context

No response

@guitargeek
Copy link
Contributor

Thanks a lot for reporting this! It will be fixed in the next ROOT releases.

@github-actions
Copy link

github-actions bot commented Sep 8, 2023

Hi @guitargeek,

It appears this issue is closed, but wasn't yet added to a project. Please add upcoming versions that will include the fix, or 'not applicable' otherwise.

Sincerely,
🤖

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants