Skip to content

Commit

Permalink
Enable TensorPrimitives to perform in-place operations (#92820)
Browse files Browse the repository at this point in the history
Some operations would produce incorrect results if the same span was passed as both an input and an output.  When vectorization was employed but the span's length wasn't a perfect multiple of a vector, we'd do the standard trick of performing one last operation on the last vector's worth of data; however, that relies on the operation being idempotent, and if a previous operation has overwritten input with a new value due to the same memory being used for input and output, some operations won't be idempotent.  This fixes that by masking off the already processed elements.  It adds tests to validate in-place use works, and it updates the docs to carve out this valid overlapping.
  • Loading branch information
stephentoub authored Sep 29, 2023
1 parent 56251ec commit e3d37a8
Show file tree
Hide file tree
Showing 4 changed files with 740 additions and 96 deletions.
Loading

0 comments on commit e3d37a8

Please sign in to comment.