Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Push for finishing transition to Python dtype functions #1807

Closed
23 tasks done
ramiro050 opened this issue Jan 16, 2023 · 4 comments · Fixed by #2105
Closed
23 tasks done

Push for finishing transition to Python dtype functions #1807

ramiro050 opened this issue Jan 16, 2023 · 4 comments · Fixed by #2105
Assignees

Comments

@ramiro050
Copy link
Collaborator

ramiro050 commented Jan 16, 2023

Hi everyone,

I would like to get some help from the community to finish up the transition from using RefineTypes.cpp for encoding how to calculate the resulting dtypes of the tensors of all the ops to using dtype functions that are written in Python. For more information about the rationale behind this transition see the Custom Op RFC.

The transition will take place on the branch dtype-functions-staging, so that it happens in a single go in main, otherwise many large workloads can fail to lower through Torch-MLIR. The reason things can fail is that if you have some ops being handled by RefineTypes and other ops using the Python dtype functions that get applied in the TorchDtypeRefinementPipeline, then the RefineTypes pass and the TorchDtypeRefinementPipeline will cycle over and over, each pass propagating dtype information until encountering an op that is not handled by the current pass. For workloads with many ops, the cycling of passes can easily reach the iteration limit in the LowerToBackendContract pass, resulting in a failure when lowering through Torch-MLIR. Currently, there are only a few ops using dtype functions in main, and it is already affecting some user workloads.

To avoid breaking workloads for users, or forcing users to increase the iteration limit affecting performance of Torch-MLIR, we will instead do the entire transition on the branch dtype-functions-staging, where we can XFAIL complex tests temporarily as needed.

The contribution process

  1. Choose an item from the work list below and edit this issue by adding your name alongside the checklist item you're working on to avoid duplicate work
    • If you don't have edit access, just mention it as a comment, and I will update the issue
  2. Make a new branch off of the dtype-functions-staging branch
  3. Remove the op or ops linked in the checklist item from RefineTypes.cpp
  4. Add dtypes functions with proper testing to the abstract_interp_lib_gen.py. See the docs for information on how two write a dtype function
  5. Run ./build_tools/update_abstract_interp_lib.sh to update abstract interp library
  6. Your patch is only expected to pass the e2e tests that test a single op. If your patch causes tests that use many ops like ResNet18 or Mobilenet to fail, mark those tests as XFAIL
  7. Make a PR to merge your branch into dtype-functions-staging (see the PR for ops that return i1s for reference)
  8. Tag me (@ramiro050) for review
  9. Once merged, mark the item in the work list as finished

Work List

Happy to help with any questions/issues!

@li-plus
Copy link
Collaborator

li-plus commented Jan 17, 2023

I'm willing to help. Could I take the the first 3 tasks?

@ramiro050
Copy link
Collaborator Author

I'm willing to help. Could I take the the first 3 tasks?

Awesome! I've updated the work list

@powderluv
Copy link
Collaborator

@gpetters94 can you please help chip away at these ops too.

@gpetters94
Copy link
Collaborator

@ramiro050 I can take Embedding and Softmax for now.

@ramiro050 ramiro050 linked a pull request May 9, 2023 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants