NumPy asarray op does not respect dtype for lists #896

connorbrinton · 2023-08-09T21:45:36Z

How to reproduce the behaviour

Here's the smallest example I could get that reproduces this issue:

import numpy as np
from thinc.api import get_current_ops, use_ops

# Define some test data with uint64 values
data = [[15, 16, 11648197037703959513], [22, 23, 4867388482626701284]]

# Start using NumPy ops
with use_ops("numpy"):
    ops = get_current_ops()

    # Call ops.asarray with an explicit dtype
    thinc_array = ops.asarray(data, dtype="uint64")

    # Use np.asarray directly
    numpy_array = np.asarray(data, dtype="uint64")

    # Check that the two arrays are the same
    # This line raises an error
    np.testing.assert_array_equal(thinc_array, numpy_array)

I believe that the root of the issue is that (i) Thinc calls np.array without passing a dtype in NumpyOps.asarray and (ii) NumPy's promotion rules.

This bug can be tricky to reproduce since it can disappear if you don't have the right kind of numbers in your data. For example, if you remove the first row from the sample data given above, the error disappears. This is because the value 11648197037703959513 from the first row causes NumPy to use the float64 type when np.array is called without a dtype. As a result of float imprecision, this ends up modifying the exact values of the large integers before they're converted back to integers in NumpyOps.asarray.

If NumpyOps.asarray is given a dtype, I think that dtype should also be used when converting lists of values into NumPy arrays 🙂

Your Environment

Operating System: macOS Ventura 13.5
Python Version Used: 3.9.16
Thinc Version Used: 8.1.10
Environment Information: M1 Mac, Poetry virtual environment

The text was updated successfully, but these errors were encountered:

adrianeboyd · 2023-08-10T06:46:36Z

Thanks for the report, that's definitely a bug.

We've run into related problems before, just with numpy following the changes in v1.24: numpy/numpy#22733

connorbrinton changed the title ~~NumPy asarray ops does not respect dtype for lists~~ NumPy asarray op does not respect dtype for lists Aug 9, 2023

adrianeboyd added bug Bugs and behaviour differing from documentation feat / ops Backends and maths labels Aug 10, 2023

adrianeboyd linked a pull request Aug 10, 2023 that will close this issue

Preserve values with dtype for NumpyOps/CupyOps.asarray #897

Merged

3 tasks

adrianeboyd closed this as completed in #897 Aug 11, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NumPy asarray op does not respect dtype for lists #896

NumPy asarray op does not respect dtype for lists #896

connorbrinton commented Aug 9, 2023

adrianeboyd commented Aug 10, 2023

NumPy asarray op does not respect dtype for lists #896

NumPy asarray op does not respect dtype for lists #896

Comments

connorbrinton commented Aug 9, 2023

How to reproduce the behaviour

Your Environment

adrianeboyd commented Aug 10, 2023