You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Here's the smallest example I could get that reproduces this issue:
importnumpyasnpfromthinc.apiimportget_current_ops, use_ops# Define some test data with uint64 valuesdata= [[15, 16, 11648197037703959513], [22, 23, 4867388482626701284]]
# Start using NumPy opswithuse_ops("numpy"):
ops=get_current_ops()
# Call ops.asarray with an explicit dtypethinc_array=ops.asarray(data, dtype="uint64")
# Use np.asarray directlynumpy_array=np.asarray(data, dtype="uint64")
# Check that the two arrays are the same# This line raises an errornp.testing.assert_array_equal(thinc_array, numpy_array)
This bug can be tricky to reproduce since it can disappear if you don't have the right kind of numbers in your data. For example, if you remove the first row from the sample data given above, the error disappears. This is because the value 11648197037703959513 from the first row causes NumPy to use the float64 type when np.array is called without a dtype. As a result of float imprecision, this ends up modifying the exact values of the large integers before they're converted back to integers in NumpyOps.asarray.
If NumpyOps.asarray is given a dtype, I think that dtype should also be used when converting lists of values into NumPy arrays 🙂
How to reproduce the behaviour
Here's the smallest example I could get that reproduces this issue:
I believe that the root of the issue is that (i) Thinc calls
np.array
without passing adtype
inNumpyOps.asarray
and (ii) NumPy's promotion rules.This bug can be tricky to reproduce since it can disappear if you don't have the right kind of numbers in your data. For example, if you remove the first row from the sample data given above, the error disappears. This is because the value
11648197037703959513
from the first row causes NumPy to use thefloat64
type whennp.array
is called without adtype
. As a result of float imprecision, this ends up modifying the exact values of the large integers before they're converted back to integers inNumpyOps.asarray
.If
NumpyOps.asarray
is given adtype
, I think thatdtype
should also be used when converting lists of values into NumPy arrays 🙂Your Environment
The text was updated successfully, but these errors were encountered: