Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Propagate device data names #1157

Merged

Conversation

antoniojkim
Copy link
Collaborator

Need to propagate device data names in order to be able to distinguish between inputs and weights.

An example of its intended usage has been added to the reference backend. The idea is to assign names to tensors on the python side (e.g. perhaps in the dataloader) so that you know which tensors are which in the actual compile

Also, export remaining headers that weren't exported before.

CC: @ke1337 @henrytwo

@antoniojkim antoniojkim self-assigned this Aug 4, 2022
Copy link
Contributor

@silvasean silvasean left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a test for this?

setup.py Show resolved Hide resolved
@antoniojkim antoniojkim force-pushed the antoniojkim/propagate_device_data_names branch from 876ec6e to 4ba4a1a Compare August 8, 2022 14:31
@antoniojkim antoniojkim requested a review from silvasean August 8, 2022 14:31
@silvasean
Copy link
Contributor

I'm not sure I understand what the test is testing. It seems to call the API but I don't see it checking the result.

@antoniojkim
Copy link
Collaborator Author

antoniojkim commented Aug 9, 2022

I'm not sure I understand what the test is testing. It seems to call the API but I don't see it checking the result.

There's unfortunately no good way to check the result. I originally checked that the number of inputs was greater than zero, but this is not true for all ops apparently and thus some ops fail that check.

I also tried saving the input names and checking them against the values that are set, but this also doesn't work as multiple tests are run in parallel and thus there is a race condition in the storage and retrieval of this value.

As such, I decided to only add it as a spot check. Especially considering that this is not something that is needed for all lazy backends (for example, the TorchScript backend has no use for this). I figured its fine to just print the number of inputs in the reference lazy backend so that we can do a visible check of the output to verify that its working as intended.

@silvasean please let me know if my reasoning for this is sufficient

@antoniojkim antoniojkim force-pushed the antoniojkim/propagate_device_data_names branch 3 times, most recently from 76dad5d to b48684d Compare August 9, 2022 18:34
@antoniojkim
Copy link
Collaborator Author

@silvasean Gentle reminder to please review again

@silvasean
Copy link
Contributor

Sorry, I've been OoO.

What I meant to say is that I see std::cout << num_inputs << " input tensors found" << std::endl; but I don't see where you check that that gets actually printed. And I don't think that calling this API in the end-to-end tests is very useful.

I would recommend that you add unit tests like we did with the "eager mode" here: https://github.com/llvm/torch-mlir/tree/main/python/test/eager_mode

I'm imagining a 3 line test where you call lazy_backend.set_parameter_name(e, f"some_name") and then either print the graph or print the node or get the parameter name or something and print it out and check that "some_name" is in the output.

@antoniojkim
Copy link
Collaborator Author

What I meant to say is that I see std::cout << num_inputs << " input tensors found" << std::endl; but I don't see where you check that that gets actually printed. And I don't think that calling this API in the end-to-end tests is very useful.

There is no concrete check. I only meant for it to be a "check by inspection". Its not required by all backends and I thought it was sufficient to be able to see it in the test logs.

I would recommend that you add unit tests like we did with the "eager mode" here: https://github.com/llvm/torch-mlir/tree/main/python/test/eager_mode

The problem with doing this is that the tests are run in parallel in the same process. This causes race conditions as there is only a single instance of the static lazy tensor C++ symbols.

I'm imagining a 3 line test where you call lazy_backend.set_parameter_name(e, f"some_name") and then either print the graph or print the node or get the parameter name or something and print it out and check that "some_name" is in the output.

This is what I was hoping to add as well, but with the current test infrastructure, unless I'm mistaken, I don't think this is possible due to the reasons I listed above.

@antoniojkim antoniojkim force-pushed the antoniojkim/propagate_device_data_names branch from 09a605f to 4ae1f0b Compare August 12, 2022 21:02
@antoniojkim antoniojkim force-pushed the antoniojkim/propagate_device_data_names branch from 4ae1f0b to 545b776 Compare August 15, 2022 13:39
@antoniojkim
Copy link
Collaborator Author

@silvasean I've added some unit tests. Please review again when you get a chance

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants