Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generalize CategoryMapper test. #1172

Merged
merged 2 commits into from
Feb 14, 2022
Merged

Generalize CategoryMapper test. #1172

merged 2 commits into from
Feb 14, 2022

Conversation

etiotto
Copy link
Collaborator

@etiotto etiotto commented Feb 9, 2022

In this PR I generalized the test infrastructure for the CategoryMapper operator by providing templated functions to cut down on code duplication. As a side effect I modified the runtime interfaces for querying OMTensor properties to pass input OMTensor(s) with const.

Signed-off-by: Ettore Tiotto [email protected]

@etiotto etiotto self-assigned this Feb 9, 2022
@etiotto
Copy link
Collaborator Author

etiotto commented Feb 9, 2022

@NathanielMcVicar @sstamenova I saw this error on the Win CI:

##[error]No hosted parallelism has been purchased or granted. To request a free parallelism grant, please fill out the following form https://aka.ms/azpipelines-parallelism-request

Maybe is transitory?

@NathanielMcVicar
Copy link
Collaborator

Maybe is transitory?

Unfortunately, it doesn't seem to be. I'm going to fill out the request form since this seems to meet the definition of a public project. I suspect when we moved over to an ONNX build subscription to host more artifacts we lost our 2 free parallel jobs.

@NathanielMcVicar
Copy link
Collaborator

This appears to be a wide issue effecting AzDO pipelines across organizations. The ONNX team is aware and working to rootcause the issue. See for example conda-forge/conda-forge.github.io#1604. Thanks @jcwchen!

@etiotto etiotto force-pushed the ModelBuilderv2 branch 2 times, most recently from 1e7da33 to 53a3324 Compare February 10, 2022 15:52
@etiotto
Copy link
Collaborator Author

etiotto commented Feb 10, 2022

This appears to be a wide issue effecting AzDO pipelines across organizations. The ONNX team is aware and working to rootcause the issue. See for example conda-forge/conda-forge.github.io#1604. Thanks @jcwchen!

Thank you very much @NathanielMcVicar and @jcwchen for the quick response and investigation.

@etiotto etiotto marked this pull request as ready for review February 10, 2022 18:07
@NathanielMcVicar
Copy link
Collaborator

@etiotto Windows-CI issues are resolved.

src/Runtime/OMTensorHelper.h Show resolved Hide resolved
auto outDataPtr = (const char **)omTensorGetDataPtr(out);
auto expectedDataPtr = (const char **)(omTensorGetDataPtr(expected));
const auto *outDataPtr = static_cast<T *>(omTensorGetDataPtr(out));
const auto *expDataPtr = static_cast<T *>(omTensorGetDataPtr(expected));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you comment on if that cast is safer/ok compared to what was done in numerical, namely

  for (int64_t i = 0; i < I; ++i) {
    for (int64_t j = 0; j < J; ++j) {
      omTensorGetElem<float>(ref, {i, j}) = 0;
      for (int64_t k = 0; k < K; k++) {
        float aVal, bVal;
        if (aTrans == 0)
          aVal = omTensorGetElem<float>(a.get(), {i, k});
        else
          aVal = omTensorGetElem<float>(a.get(), {k, i});
        if (bTrans == 0)
          bVal = omTensorGetElem<float>(b.get(), {k, j});
        else
          bVal = omTensorGetElem<float>(b.get(), {j, k});
        omTensorGetElem<float>(ref, {i, j}) += aVal * bVal;
      }
    }
  }

I agree that the quoted code above is dreadfully slow by creating a vector for each accesses

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The implementation of omTensorGetElem is:

template <typename T>
T &omTensorGetElem(const OMTensor *omt, std::vector<int64_t> indexes) {
  int64_t elemOffset = omTensorComputeElemOffset(omt, indexes);
  return ((T *)omt->_alignedPtr)[elemOffset];
}

The type cast on the return stmt is in C style. A C-style cast (T *) is not equivalent to a C++ static_cast<T*>() because a C-style cast is not type safe, while a static_cast allows the compiler to report errors. For example:

void foo(char *p) {
  int *pi = (int *)p;   // no compiler error here
  *pi = 3;              // ooops: just overwrote 3 extra bytes in mem. pointed to by p!!!
}

If a static_cast<int*>(p) was used the compiler would catch the potential runtime problem at compile time ==> safer!

If the intention was to "reinterpret" the memory pointed to by p then one should use a C++ reintepret_cast.

Of course all of this implies the code section is compiled by a C++ compiler.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We prob. should modify the runtime implementation of omTensorGetElem and use C++ style casts. But that is another PR.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We prob. should modify the runtime implementation of omTensorGetElem and use C++ style casts. But that is another PR.

That file is unique, it has to be compiled in C or C++, with some of the code which is an "extension" and is only for C++ with header defined in a separate helper header file...

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

Signed-off-by: Ettore Tiotto <[email protected]>
Copy link
Collaborator

@AlexandreEichenberger AlexandreEichenberger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@etiotto
Copy link
Collaborator Author

etiotto commented Feb 14, 2022

@jenkins-droid test this please

@etiotto etiotto merged commit 54941f9 into onnx:main Feb 14, 2022
@jenkins-droid
Copy link
Collaborator

Jenkins Linux s390x Build #3854 [push] Generalize CategoryMappe... started at 09:14

@jenkins-droid
Copy link
Collaborator

Jenkins Linux amd64 Build #3863 [push] Generalize CategoryMappe... started at 08:14

@jenkins-droid
Copy link
Collaborator

Jenkins Linux ppc64le Build #2970 [push] Generalize CategoryMappe... started at 09:15

@jenkins-droid
Copy link
Collaborator

Jenkins Linux s390x Build #3854 [push] Generalize CategoryMappe... passed after 37 min

@jenkins-droid
Copy link
Collaborator

Jenkins Linux ppc64le Build #2970 [push] Generalize CategoryMappe... passed after 47 min

@jenkins-droid
Copy link
Collaborator

Jenkins Linux amd64 Build #3863 [push] Generalize CategoryMappe... passed after 52 min

YW-Ethan pushed a commit to YW-Ethan/onnx-mlir that referenced this pull request Mar 9, 2022
Signed-off-by: Ettore Tiotto <[email protected]>

Co-authored-by: Alexandre Eichenberger <[email protected]>
Signed-off-by: Ethan Wang <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants