Propagate device data names #1157

antoniojkim · 2022-08-04T19:58:07Z

Need to propagate device data names in order to be able to distinguish between inputs and weights.

An example of its intended usage has been added to the reference backend. The idea is to assign names to tensors on the python side (e.g. perhaps in the dataloader) so that you know which tensors are which in the actual compile

Also, export remaining headers that weren't exported before.

CC: @ke1337 @henrytwo

silvasean

can you add a test for this?

setup.py

silvasean · 2022-08-08T22:10:49Z

I'm not sure I understand what the test is testing. It seems to call the API but I don't see it checking the result.

antoniojkim · 2022-08-09T00:01:04Z

I'm not sure I understand what the test is testing. It seems to call the API but I don't see it checking the result.

There's unfortunately no good way to check the result. I originally checked that the number of inputs was greater than zero, but this is not true for all ops apparently and thus some ops fail that check.

I also tried saving the input names and checking them against the values that are set, but this also doesn't work as multiple tests are run in parallel and thus there is a race condition in the storage and retrieval of this value.

As such, I decided to only add it as a spot check. Especially considering that this is not something that is needed for all lazy backends (for example, the TorchScript backend has no use for this). I figured its fine to just print the number of inputs in the reference lazy backend so that we can do a visible check of the output to verify that its working as intended.

@silvasean please let me know if my reasoning for this is sufficient

antoniojkim · 2022-08-10T15:08:41Z

@silvasean Gentle reminder to please review again

silvasean · 2022-08-11T13:04:29Z

Sorry, I've been OoO.

What I meant to say is that I see std::cout << num_inputs << " input tensors found" << std::endl; but I don't see where you check that that gets actually printed. And I don't think that calling this API in the end-to-end tests is very useful.

I would recommend that you add unit tests like we did with the "eager mode" here: https://github.com/llvm/torch-mlir/tree/main/python/test/eager_mode

I'm imagining a 3 line test where you call lazy_backend.set_parameter_name(e, f"some_name") and then either print the graph or print the node or get the parameter name or something and print it out and check that "some_name" is in the output.

antoniojkim · 2022-08-11T14:27:43Z

What I meant to say is that I see std::cout << num_inputs << " input tensors found" << std::endl; but I don't see where you check that that gets actually printed. And I don't think that calling this API in the end-to-end tests is very useful.

There is no concrete check. I only meant for it to be a "check by inspection". Its not required by all backends and I thought it was sufficient to be able to see it in the test logs.

I would recommend that you add unit tests like we did with the "eager mode" here: https://github.com/llvm/torch-mlir/tree/main/python/test/eager_mode

The problem with doing this is that the tests are run in parallel in the same process. This causes race conditions as there is only a single instance of the static lazy tensor C++ symbols.

I'm imagining a 3 line test where you call lazy_backend.set_parameter_name(e, f"some_name") and then either print the graph or print the node or get the parameter name or something and print it out and check that "some_name" is in the output.

This is what I was hoping to add as well, but with the current test infrastructure, unless I'm mistaken, I don't think this is possible due to the reasons I listed above.

antoniojkim · 2022-08-15T15:32:54Z

@silvasean I've added some unit tests. Please review again when you get a chance

python/torch_mlir/csrc/reference_lazy_backend/backend_impl.cpp

antoniojkim requested review from powderluv, ramiro050 and silvasean August 4, 2022 19:58

antoniojkim self-assigned this Aug 4, 2022

silvasean requested changes Aug 4, 2022

View reviewed changes

setup.py Show resolved Hide resolved

antoniojkim force-pushed the antoniojkim/propagate_device_data_names branch from 876ec6e to 4ba4a1a Compare August 8, 2022 14:31

antoniojkim requested a review from silvasean August 8, 2022 14:31

antoniojkim force-pushed the antoniojkim/propagate_device_data_names branch 3 times, most recently from 76dad5d to b48684d Compare August 9, 2022 18:34

antoniojkim force-pushed the antoniojkim/propagate_device_data_names branch from 09a605f to 4ae1f0b Compare August 12, 2022 21:02

antoniojkim added 7 commits August 15, 2022 06:39

Propagate device data names

a7dbd5a

Address PR comment

2234304

Add example usage

4886f2d

Add test for device data names

717250f

Make TorchMlirComputation fields protected

2f85e3d

Add lazy backend device data name unit tests

f1fb0ee

Disable lazy backend tests if LTC is disabled

545b776

antoniojkim force-pushed the antoniojkim/propagate_device_data_names branch from 4ae1f0b to 545b776 Compare August 15, 2022 13:39

silvasean reviewed Aug 15, 2022

View reviewed changes

python/torch_mlir/csrc/reference_lazy_backend/backend_impl.cpp Show resolved Hide resolved

silvasean approved these changes Aug 15, 2022

View reviewed changes

Add comments

a5aa530

antoniojkim merged commit 0af5578 into llvm:main Aug 16, 2022

tanyokwok mentioned this pull request Sep 21, 2022

features/bladedisc rebase 20220830 pai-disc/torch-mlir#20

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Propagate device data names #1157

Propagate device data names #1157

antoniojkim commented Aug 4, 2022

silvasean left a comment

silvasean commented Aug 8, 2022

antoniojkim commented Aug 9, 2022 •

edited

Loading

antoniojkim commented Aug 10, 2022

silvasean commented Aug 11, 2022

antoniojkim commented Aug 11, 2022

antoniojkim commented Aug 15, 2022

Propagate device data names #1157

Propagate device data names #1157

Conversation

antoniojkim commented Aug 4, 2022

silvasean left a comment

Choose a reason for hiding this comment

silvasean commented Aug 8, 2022

antoniojkim commented Aug 9, 2022 • edited Loading

antoniojkim commented Aug 10, 2022

silvasean commented Aug 11, 2022

antoniojkim commented Aug 11, 2022

antoniojkim commented Aug 15, 2022

antoniojkim commented Aug 9, 2022 •

edited

Loading