Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NumPy asarray op does not respect dtype for lists #896

Closed
connorbrinton opened this issue Aug 9, 2023 · 1 comment · Fixed by #897
Closed

NumPy asarray op does not respect dtype for lists #896

connorbrinton opened this issue Aug 9, 2023 · 1 comment · Fixed by #897
Labels
bug Bugs and behaviour differing from documentation feat / ops Backends and maths

Comments

@connorbrinton
Copy link
Contributor

How to reproduce the behaviour

Here's the smallest example I could get that reproduces this issue:

import numpy as np
from thinc.api import get_current_ops, use_ops

# Define some test data with uint64 values
data = [[15, 16, 11648197037703959513], [22, 23, 4867388482626701284]]

# Start using NumPy ops
with use_ops("numpy"):
    ops = get_current_ops()

    # Call ops.asarray with an explicit dtype
    thinc_array = ops.asarray(data, dtype="uint64")

    # Use np.asarray directly
    numpy_array = np.asarray(data, dtype="uint64")

    # Check that the two arrays are the same
    # This line raises an error
    np.testing.assert_array_equal(thinc_array, numpy_array)

I believe that the root of the issue is that (i) Thinc calls np.array without passing a dtype in NumpyOps.asarray and (ii) NumPy's promotion rules.

This bug can be tricky to reproduce since it can disappear if you don't have the right kind of numbers in your data. For example, if you remove the first row from the sample data given above, the error disappears. This is because the value 11648197037703959513 from the first row causes NumPy to use the float64 type when np.array is called without a dtype. As a result of float imprecision, this ends up modifying the exact values of the large integers before they're converted back to integers in NumpyOps.asarray.

If NumpyOps.asarray is given a dtype, I think that dtype should also be used when converting lists of values into NumPy arrays 🙂

Your Environment

  • Operating System: macOS Ventura 13.5
  • Python Version Used: 3.9.16
  • Thinc Version Used: 8.1.10
  • Environment Information: M1 Mac, Poetry virtual environment
@connorbrinton connorbrinton changed the title NumPy asarray ops does not respect dtype for lists NumPy asarray op does not respect dtype for lists Aug 9, 2023
@adrianeboyd adrianeboyd added bug Bugs and behaviour differing from documentation feat / ops Backends and maths labels Aug 10, 2023
@adrianeboyd
Copy link
Contributor

Thanks for the report, that's definitely a bug.

We've run into related problems before, just with numpy following the changes in v1.24: numpy/numpy#22733

@adrianeboyd adrianeboyd linked a pull request Aug 10, 2023 that will close this issue
3 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Bugs and behaviour differing from documentation feat / ops Backends and maths
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants