Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

xr.corr produces incorrect output for complex arrays #7340

Closed
4 tasks done
mattragoza opened this issue Dec 1, 2022 · 4 comments · Fixed by #7392
Closed
4 tasks done

xr.corr produces incorrect output for complex arrays #7340

mattragoza opened this issue Dec 1, 2022 · 4 comments · Fixed by #7392
Labels

Comments

@mattragoza
Copy link

What happened?

I create a DataArray full of complex numbers, and I compute the correlation of the DataArray with itself.

What did you expect to happen?

The absolute value of the correlation coefficient should be equal to 1, up to numerical precision. However, this is not the case. The returned correlation coefficient is around 0.26 and change depending on the number of values in the array.

Minimal Complete Verifiable Example

import xarray as xr

array = xr.DataArray([
    -4.21904583e-03-1.53714478e-03j, -4.24663044e-03-1.12832926e-03j,
    -4.26968892e-03-4.87451439e-04j, -6.99917538e-03+3.07376860e-04j,
    0.00000000e+00+0.00000000e+00j, -2.42585590e-02+1.42052459e-02j,
    -5.53404148e-03+4.60188062e-03j, -4.68829482e-03+4.90179019e-03j,
    -7.02331258e-03+8.75908673e-03j, -1.31233383e-01+1.86572484e-01j,
    -4.05137401e-03+6.59972035e-03j, -4.20701822e-03+7.29813816e-03j,
    -3.56487231e-03+6.51759430e-03j, -3.68077200e-03+7.04388575e-03j,
    -8.16459981e-02+1.70084145e-01j, -5.11737898e-03+1.98164995e-02j,
    6.72772914e-04-7.28110367e-05j,  2.13957504e-03-1.82525995e-03j,
    1.60369835e-03-1.54029189e-03j,  8.77788719e-02-8.45568854e-02j,
    1.04277417e-01-9.38854749e-02j,  7.58465696e-03-6.07906563e-03j,
    8.00776452e-03-5.70470615e-03j,  8.36166252e-03-5.14978313e-03j,
    0.00000000e+00+0.00000000e+00j,  0.00000000e+00+0.00000000e+00j,
    0.00000000e+00+0.00000000e+00j,  7.26422461e-03+4.40382166e-04j,
    4.01364547e-03+1.09269127e-03j, -1.99069471e-01-1.20355081e-01j,
    1.56511579e-01+2.59839758e-01j,  9.14046953e-04+5.42262898e-03j,
    -8.37800782e-04+5.67555708e-03j, -3.36561822e-03+7.50108018e-03j,
    -4.22682090e-03+5.36279242e-03j,  5.95438564e-02-3.48209841e-02j,
    -6.77184281e-03+2.10711488e-03j, -4.84293269e-03+3.78698499e-04j,
    -5.13547723e-03-6.86765713e-04j,  4.48392070e-01+1.54568226e-01j,
    -3.17412047e-01-2.35431216e-01j, -2.95731737e-03-3.39078899e-03j,
    -1.95111443e-03-3.77545168e-03j, -2.82719903e-04-1.61393513e-03j,
    7.20241467e-04-1.73515565e-03j, -1.96675563e-01-4.42259734e-02j,
    0.00000000e+00+0.00000000e+00j,  4.84813452e-03+7.60742077e-03j,
    6.31707602e-03+1.51808252e-02j,  2.99277774e-03+1.18667410e-02j,
    5.64640060e-04+1.58372118e-02j, -1.74137347e-03+1.70383706e-02j,
    -5.91398408e-03+2.30008930e-02j, -7.12027831e-03+1.87732435e-02j,
    9.30919156e-02-1.65255887e-01j, -2.09716130e-01+2.30490479e-01j,
    -1.80115101e-02+1.37248240e-02j, -1.85851718e-02+9.23420957e-03j,
    -1.88459965e-02+5.12854226e-03j,  1.09175874e+00-9.17875627e-02j,
    -1.63766142e-02-5.32431671e-03j, -1.24749963e-02-9.63714407e-03j,
    -7.58657222e-03-1.27728267e-02j, -1.99052439e-03-1.35879033e-02j,
    -5.70595470e-01+2.27742231e+00j,  1.24516564e-02-1.21867738e-02j,
    1.82174257e-02-8.67884733e-03j,  2.27204879e-02-3.77097224e-03j,
    2.66143091e-02+2.68683768e-03j,  1.06983372e+00+3.19301893e-01j,
    -6.86033738e-01-4.72910865e-01j,  3.00291320e-02+3.10297521e-02j,
    2.22880055e-02+3.45332319e-02j,  1.61724440e-02+4.04122368e-02j,
    9.78881043e-03+4.96053678e-02j, -6.51085120e-03+5.27227722e-02j,
    -1.76752380e-02+5.26095806e-02j, -3.81856382e-02+6.41735764e-02j,
    0.00000000e+00+0.00000000e+00j, -4.32481463e-02+3.88706950e-02j
])
r = np.abs(xr.corr(array, array).item())
assert np.isclose(r, 1.0), r

MVCE confirmation

  • Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • Complete example — the example is self-contained, including all data and the text of any traceback.
  • Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • New issue — a search of GitHub Issues suggests this is not a duplicate.

Relevant log output

The exact output I get for the self-contained example below is:


AssertionError                            Traceback (most recent call last)
Cell In [44], line 46
      3 array = xr.DataArray([
      4     -4.21904583e-03-1.53714478e-03j, -4.24663044e-03-1.12832926e-03j,
      5     -4.26968892e-03-4.87451439e-04j, -6.99917538e-03+3.07376860e-04j,
   (...)
     43     0.00000000e+00+0.00000000e+00j, -4.32481463e-02+3.88706950e-02j
     44 ])
     45 r = np.abs(xr.corr(array, array).item())
---> 46 assert np.isclose(r, 1.0), r

AssertionError: 0.2664911388214005


### Anything else we need to know?

Python 3.8.5 (default, Sep  4 2020, 07:30:14) 
[GCC 7.3.0]

Xarray version is '2022.9.0'

### Environment

<details>
INSTALLED VERSIONS
------------------
commit: None
python: 3.10.6 | packaged by conda-forge | (main, Aug 22 2022, 20:36:39) [GCC 10.4.0]
python-bits: 64
OS: Linux
OS-release: 4.18.0-193.28.1.el8_2.x86_64
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: None
LOCALE: ('en_US', 'UTF-8')
libhdf5: 1.12.1
libnetcdf: 4.8.1

xarray: 2022.9.0
pandas: 1.5.0
numpy: 1.23.3
scipy: 1.9.1
netCDF4: 1.6.0
pydap: None
h5netcdf: None
h5py: None
Nio: None
zarr: None
cftime: 1.6.2
nc_time_axis: None
PseudoNetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: None
dask: 2022.11.0
distributed: None
matplotlib: 3.6.2
cartopy: None
seaborn: 0.12.1
numbagg: None
fsspec: 2022.11.0
cupy: None
pint: None
sparse: None
flox: None
numpy_groupies: None
setuptools: 65.4.1
pip: 22.2.2
conda: None
pytest: None
IPython: 8.5.0
sphinx: None


</details>
@mattragoza mattragoza added bug needs triage Issue that has not been reviewed by xarray team member labels Dec 1, 2022
@max-sixty
Copy link
Collaborator

Thanks @mattragoza . Does numpy compute this correctly?

@mattragoza
Copy link
Author

Yes. If I use a np.array and call np.corrcoef, I get a 2x2 correlation matrix full of ones.

@mattragoza
Copy link
Author

I'm now noticing that scipy.stats.pearsonr returns the same value as xr.corr.

@mattragoza
Copy link
Author

The problem is in https://github.com/pydata/xarray/blob/main/xarray/core/computation.py#L1402-L1404. You need to conjugate one of the arrays before computing the dot product.

@dcherian dcherian removed the needs triage Issue that has not been reviewed by xarray team member label Jan 15, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants