Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[dataloader] Multiple warnings printed when torch.as_tensor applied to readonly NumPy tensor #37581

Closed
vadimkantorov opened this issue Apr 30, 2020 · 22 comments
Assignees
Labels
module: dataloader Related to torch.utils.data.DataLoader and Sampler module: numpy Related to numpy support, and also numpy compatibility of our operators triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Comments

@vadimkantorov
Copy link
Contributor

vadimkantorov commented Apr 30, 2020

I get this warning printed every time I do:

sample_rate_, signal = scipy.io.wavfile.read(audio_path)
signal = torch.as_tensor(signal)
UserWarning: The given NumPy array is not writeable, and PyTorch does not support non-writeable tensors. This means you can write to the underlying (supposedly non-writeable) NumPy array using the tensor. You may want to copy the array to protect its data or make it writeable before converting it to a tensor. This type of warning will be suppressed for the rest of this program.

Maybe it's related to the fact that this code gets executed within multi-threaded DataLoader threads? So I get one warning per thread which is still nasty.

It would be nice to be able to suppress this warning with PyTorch ways without having to clone the tensor. This situation is pretty common, I'd like to suggest it doesn't deserve figuring out how to suppress in general Python warnings :), e.g. passing some flag to torch.as_tensor writable = True that would ensure that the user is conscious about what's going on.

cc @ssnl

@pbelevich pbelevich added module: dataloader Related to torch.utils.data.DataLoader and Sampler module: numpy Related to numpy support, and also numpy compatibility of our operators triage review labels May 1, 2020
@mruberry mruberry self-assigned this May 4, 2020
@mruberry
Copy link
Collaborator

mruberry commented May 4, 2020

We're actually planning to update our warning code to make these warnings less intrusive, but a workaround is to mark the NumPy array as writeable before calling torch.as_tensor() (see https://docs.scipy.org/doc/numpy/reference/generated/numpy.ndarray.setflags.html).

@ezyang
Copy link
Contributor

ezyang commented May 4, 2020

You mean multiprocessing dataloader? I'm pretty sure TORCH_WARN_ONCE triggers only once per process execution.

Another workaround would be to just manually suppress the warning using Python's warning handlers.

@pbelevich pbelevich added triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module and removed triage review labels May 4, 2020
@albanD
Copy link
Collaborator

albanD commented May 4, 2020

I think this is made worst due to this: #15849
Making the once not work properly.

@vadimkantorov
Copy link
Contributor Author

Setflags workaround doesn’t work:

ValueError: cannot set WRITEABLE flag to True of this array»

@mruberry
Copy link
Collaborator

mruberry commented May 6, 2020

Setflags workaround doesn’t work:

ValueError: cannot set WRITEABLE flag to True of this array»

That's interesting... this kinda of increases the importance of making a copy or warning our users. I guess @ezyang's suggestion is the thing in this case, then.

@vadimkantorov
Copy link
Contributor Author

A special, more intuitive suppression flag to torch.as_tensor could be useful

@vadimkantorov
Copy link
Contributor Author

Related: pytorch/vision#2194

@ssnl
Copy link
Collaborator

ssnl commented May 13, 2020

given that we don't support nonwritable tensor, shouldn't as_tensor just copy the data for nonwritable np arrays?

@vadimkantorov
Copy link
Contributor Author

vadimkantorov commented May 13, 2020

Often these non-writable NumPy tensors appear as wrapper of some contig memory from an external source (like read file). An optional no-copy behavior is often desirable even in these cases: e.g. audio after decompression can be huge, and copy 1) asks to have a second free big chunk of memory, 2) introduces copying overhead

@ssnl
Copy link
Collaborator

ssnl commented May 13, 2020

Fair points. I wonder if we should just support nonwritable tensors.... given that we already have nontions of views and inplace ops.

@albanD
Copy link
Collaborator

albanD commented May 13, 2020

The problem with that is that the notions of view and inplace are only true for the autograd. So everything that is done outside the scope of the autograd (most of our low level code) will not respect these constructs :/

@ssnl
Copy link
Collaborator

ssnl commented May 13, 2020

@albanD Hmm but suppose we just add an assert self->storage().is_mutable() or something in codegen of inplace methods and nonconst storage.data_ptr(). What else could trigger inplace writes?

@albanD
Copy link
Collaborator

albanD commented May 13, 2020

Ho I'm not saying it's impossible. But is a bit of work I think.

Does all the methods use storage.data_ptr() to access the content? Don't we ever go around it?
Also Tensors created from blobs can be writen by anything else.
Or a user that gets the data_ptr() in python and write into it by other means.

Also I think the jit would benefit from knowing that a Tensor is immutable :)

But that feels quite overkill for this issue.
I think the default being a warning is good here. And adding a flag like "ignore_non_writable" to the from_numpy() method (or as_tensor, whichever is the right one to use here) would do the trick to avoid the spam in some special cases.
@mruberry what do you think?

@mruberry
Copy link
Collaborator

mruberry commented May 13, 2020

We decided to warn when creating a tensor from a read-only NumPy array. Users can mark the array as writeable, make a copy of it, or avoid writing to it. Disallowing the creation seemed onerous.

In the future it'd be great to support read-only tensors.

@vadimkantorov
Copy link
Contributor Author

We decided to warn when creating a tensor from a read-only NumPy array.

Then I propose that DataLoader docs show an example of how to suppress warnings in all threads (can be useful in other scenarios as well!).

Marking an array as writable does not work, copying can be wasteful (since uncompressed audio is quite large), avoiding writes by itself does not reduce spam of many-many warnings in dataloader setting.

@mruberry
Copy link
Collaborator

I don't know if this scenarios is common enough / alarming enough to warrant a note in the docs, especially since suppressing warnings is more of a Python feature than a PyTorch-specific issue. Would we, for example, wants docs describing every time PyTorch could throw a warning and showing how to use Python's warning filtering mechanisms?

I could see a brief blog post on the issue of writeable/read-only NumPy arrays, warnings, and filtering them being interesting, though.

@vadimkantorov
Copy link
Contributor Author

vadimkantorov commented May 17, 2020

A problem is with DataLoader, when warnings are being printed many times (by default they would be printed only once), and the specific mechanism for DataLoader (worker_init?) is advanced.

I'll try again with warnings.ignore and will let you know

@mruberry
Copy link
Collaborator

You can also catch the warnings with a guard, or set the warning filter.

@ssnl
Copy link
Collaborator

ssnl commented May 18, 2020

The problem is not with DataLoader. The problem is with multiprocessing. I'm not sure if we should list all multiprocessing pitfalls in our DataLoader docs...

@vadimkantorov
Copy link
Contributor Author

ok

@mhnatiuk
Copy link

What is the usecase for warning about array being read-only? Probably as 99% of users, I use DataLoader to (suprisingly enough) __load data __ and I consider forcing user to have a writable array as one of the most absurd pattern I have yet seen in PyTorch. I have 4M rows of data, I won't copy it because it's stored and loaded using numpy.memmap and I don't want to allow anything to write to it. I also think that in multiprocessing context that becomes even a requirement! I can't follow the logic behind this decision.

@mruberry
Copy link
Collaborator

PyTorch doesn't (yet) support read-only tensors, so you're making the data writable when you create a PyTorch tensor from it. That's why we warn.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
module: dataloader Related to torch.utils.data.DataLoader and Sampler module: numpy Related to numpy support, and also numpy compatibility of our operators triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
None yet
Development

No branches or pull requests

7 participants