-
Notifications
You must be signed in to change notification settings - Fork 23k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[dataloader] Multiple warnings printed when torch.as_tensor applied to readonly NumPy tensor #37581
Comments
We're actually planning to update our warning code to make these warnings less intrusive, but a workaround is to mark the NumPy array as writeable before calling |
You mean multiprocessing dataloader? I'm pretty sure Another workaround would be to just manually suppress the warning using Python's warning handlers. |
I think this is made worst due to this: #15849 |
Setflags workaround doesn’t work: |
That's interesting... this kinda of increases the importance of making a copy or warning our users. I guess @ezyang's suggestion is the thing in this case, then. |
A special, more intuitive suppression flag to |
Related: pytorch/vision#2194 |
given that we don't support nonwritable tensor, shouldn't as_tensor just copy the data for nonwritable np arrays? |
Often these non-writable NumPy tensors appear as wrapper of some contig memory from an external source (like read file). An optional no-copy behavior is often desirable even in these cases: e.g. audio after decompression can be huge, and copy 1) asks to have a second free big chunk of memory, 2) introduces copying overhead |
Fair points. I wonder if we should just support nonwritable tensors.... given that we already have nontions of views and inplace ops. |
The problem with that is that the notions of view and inplace are only true for the autograd. So everything that is done outside the scope of the autograd (most of our low level code) will not respect these constructs :/ |
@albanD Hmm but suppose we just add an |
Ho I'm not saying it's impossible. But is a bit of work I think. Does all the methods use Also I think the jit would benefit from knowing that a Tensor is immutable :) But that feels quite overkill for this issue. |
We decided to warn when creating a tensor from a read-only NumPy array. Users can mark the array as writeable, make a copy of it, or avoid writing to it. Disallowing the creation seemed onerous. In the future it'd be great to support read-only tensors. |
Then I propose that DataLoader docs show an example of how to suppress warnings in all threads (can be useful in other scenarios as well!). Marking an array as writable does not work, copying can be wasteful (since uncompressed audio is quite large), avoiding writes by itself does not reduce spam of many-many warnings in dataloader setting. |
I don't know if this scenarios is common enough / alarming enough to warrant a note in the docs, especially since suppressing warnings is more of a Python feature than a PyTorch-specific issue. Would we, for example, wants docs describing every time PyTorch could throw a warning and showing how to use Python's warning filtering mechanisms? I could see a brief blog post on the issue of writeable/read-only NumPy arrays, warnings, and filtering them being interesting, though. |
A problem is with DataLoader, when warnings are being printed many times (by default they would be printed only once), and the specific mechanism for DataLoader (worker_init?) is advanced. I'll try again with warnings.ignore and will let you know |
You can also catch the warnings with a guard, or set the warning filter. |
The problem is not with DataLoader. The problem is with multiprocessing. I'm not sure if we should list all multiprocessing pitfalls in our DataLoader docs... |
ok |
What is the usecase for warning about array being read-only? Probably as 99% of users, I use DataLoader to (suprisingly enough) __load data __ and I consider forcing user to have a writable array as one of the most absurd pattern I have yet seen in PyTorch. I have 4M rows of data, I won't copy it because it's stored and loaded using numpy.memmap and I don't want to allow anything to write to it. I also think that in multiprocessing context that becomes even a requirement! I can't follow the logic behind this decision. |
PyTorch doesn't (yet) support read-only tensors, so you're making the data writable when you create a PyTorch tensor from it. That's why we warn. |
I get this warning printed every time I do:
Maybe it's related to the fact that this code gets executed within multi-threaded DataLoader threads? So I get one warning per thread which is still nasty.
It would be nice to be able to suppress this warning with PyTorch ways without having to clone the tensor. This situation is pretty common, I'd like to suggest it doesn't deserve figuring out how to suppress in general Python warnings :), e.g. passing some flag to
torch.as_tensor
writable = True
that would ensure that the user is conscious about what's going on.cc @ssnl
The text was updated successfully, but these errors were encountered: