Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix radix sort test #34929

Merged
merged 1 commit into from
Aug 18, 2021
Merged

Conversation

fwyzard
Copy link
Contributor

@fwyzard fwyzard commented Aug 18, 2021

PR description:

Fix the compilation errors in HeterogeneousCore/CUDAUtilities/test/radixSort_t.cu:

  • take into account the actual size of the type being sorted, instead of always assuming a 64-bit type;
  • use an union instead of a cast to keep the compiler happy about the aliasing rules.

PR validation:

gpuRadixSort_t compiles and runs fine.

Take into account the actual size of the type being sorted.

Use an union instead of a cast to keep the compiler happy about the
aliasing rules.
@fwyzard
Copy link
Contributor Author

fwyzard commented Aug 18, 2021

type bugfix

@fwyzard
Copy link
Contributor Author

fwyzard commented Aug 18, 2021

please test

@fwyzard
Copy link
Contributor Author

fwyzard commented Aug 18, 2021

please test with cms-externals/eigen-git-mirror#7 for CMSSW_12_1_X/slc7_amd64_gcc11

@fwyzard
Copy link
Contributor Author

fwyzard commented Aug 18, 2021

+heterogeneous

@cmsbuild
Copy link
Contributor

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-34929/24737

  • This PR adds an extra 12KB to repository

@cmsbuild
Copy link
Contributor

A new Pull Request was created by @fwyzard (Andrea Bocci) for master.

It involves the following packages:

  • HeterogeneousCore/CUDAUtilities (heterogeneous)

can you please review it and eventually sign? Thanks.
@makortel, @rovere this is something you requested to watch as well.
@perrotta, @dpiparo, @qliphy you are the release manager for this.

cms-bot commands are listed here

@fwyzard
Copy link
Contributor Author

fwyzard commented Aug 18, 2021

Fixes #34917 .

@cmsbuild
Copy link
Contributor

+1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-d04a7e/17851/summary.html
COMMIT: 0b6eae2
CMSSW: CMSSW_12_1_X_2021-08-17-2300/slc7_amd64_gcc900
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmssw/34929/17851/install.sh to create a dev area with all the needed externals and cmssw changes.

The following merge commits were also included on top of IB + this PR after doing git cms-merge-topic:

You can see more details here:
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-d04a7e/17851/git-recent-commits.json
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-d04a7e/17851/git-merge-result

Comparison Summary

Summary:

  • No significant changes to the logs found
  • Reco comparison results: 12 differences found in the comparisons
  • DQMHistoTests: Total files compared: 39
  • DQMHistoTests: Total histograms compared: 3000352
  • DQMHistoTests: Total failures: 11
  • DQMHistoTests: Total nulls: 1
  • DQMHistoTests: Total successes: 3000318
  • DQMHistoTests: Total skipped: 22
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: -0.004 KiB( 38 files compared)
  • DQMHistoSizes: changed ( 312.0 ): -0.004 KiB MessageLogger/Warnings
  • Checked 165 log files, 37 edm output root files, 39 DQM output files
  • TriggerResults: no differences found

@perrotta
Copy link
Contributor

+1

@cmsbuild cmsbuild merged commit 05c2b42 into cms-sw:master Aug 18, 2021
@fwyzard
Copy link
Contributor Author

fwyzard commented Aug 18, 2021

The slc7_amd64_gcc11 test (where this was actually broken) is still running...

@perrotta
Copy link
Contributor

The slc7_amd64_gcc11 test (where this was actually broken) is still running...

Ah, sorry Andrea. I saw the message that the tests ended succesfully (together with your "+1" for the heterogeneous category) and I thought they were the ones.
Ok, let see how does the relevant test end up, and stay ready to revert in case of issues

@fwyzard
Copy link
Contributor Author

fwyzard commented Aug 18, 2021

No problem, testing locally it worked, so I'm optimistic :-)

@cmsbuild
Copy link
Contributor

-1

Failed Tests: Build HeaderConsistency
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-d04a7e/17853/summary.html
COMMIT: 0b6eae2
CMSSW: CMSSW_12_1_X_2021-08-16-1100/slc7_amd64_gcc11
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmssw/34929/17853/install.sh to create a dev area with all the needed externals and cmssw changes.

The following merge commits were also included on top of IB + this PR after doing git cms-merge-topic:

You can see more details here:
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-d04a7e/17853/git-recent-commits.json
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-d04a7e/17853/git-merge-result

Build

I found compilation error when building:

>> Compiling edm plugin /data/cmsbld/jenkins/workspace/ib-run-pr-tests/CMSSW_12_1_X_2021-08-16-1100/src/L1Trigger/DTTrigger/src/DTTrig.cc
>> Compiling edm plugin /data/cmsbld/jenkins/workspace/ib-run-pr-tests/CMSSW_12_1_X_2021-08-16-1100/src/L1Trigger/DTTrigger/src/DTTrigProd.cc
>> Compiling edm plugin /data/cmsbld/jenkins/workspace/ib-run-pr-tests/CMSSW_12_1_X_2021-08-16-1100/src/L1Trigger/DTTrigger/src/DTTrigTest.cc
>> Compiling edm plugin /data/cmsbld/jenkins/workspace/ib-run-pr-tests/CMSSW_12_1_X_2021-08-16-1100/src/L1Trigger/DTTrigger/src/SealModule.cc
/data/cmsbld/jenkins/workspace/ib-run-pr-tests/CMSSW_12_1_X_2021-08-16-1100/src/L1Trigger/DTTrigger/src/DTTrigTest.cc: In member function 'virtual void DTTrigTest::beginRun(const edm::Run&, const edm::EventSetup&)':
/data/cmsbld/jenkins/workspace/ib-run-pr-tests/CMSSW_12_1_X_2021-08-16-1100/src/L1Trigger/DTTrigger/src/DTTrigTest.cc:198:23: error: 'this' pointer is null [-Werror=nonnull]
  198 |     my_trig->createTUs(iEventSetup);
      |     ~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~
In file included from /data/cmsbld/jenkins/workspace/ib-run-pr-tests/CMSSW_12_1_X_2021-08-16-1100/src/L1Trigger/DTTrigger/interface/DTTrigTest.h:26,
                 from /data/cmsbld/jenkins/workspace/ib-run-pr-tests/CMSSW_12_1_X_2021-08-16-1100/src/L1Trigger/DTTrigger/src/DTTrigTest.cc:17:
/data/cmsbld/jenkins/workspace/ib-run-pr-tests/CMSSW_12_1_X_2021-08-16-1100/src/L1Trigger/DTTrigger/interface/DTTrig.h:78:8: note: in a call to non-static member function 'void DTTrig::createTUs(const edm::EventSetup&)'


@fwyzard
Copy link
Contributor Author

fwyzard commented Aug 18, 2021

OK, there are other (preexisting, I assume) errors in CondCore/EcalPlugins, FastSimulation/TrackingRecHitProducer and L1Trigger/DTTrigger - but indeed HeterogeneousCore/CUDAUtilities looks good.

uintT_t<T> u;
} c;
c.t = t;
c.u = c.u >> shift << shift;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm pretty sure this is actually undefined behavior as you are not supposed to set a union using one type and read the union using another type. I believe the recommend way to do this is to use memcpy.

uintT_t<T> u;
memcpy(&u, &t, sizeof(u));
u = u >>shift <<shift;
memcpy(&t, &u, sizeof(t));

I played around with that recently on godbolt (for reasons other than this PR) and found compilers know about memcpy and optimize out the calls and just do the 'right' thing.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think section 11.5.6.3 of the C++ 20 standard demonstrates that it is undefined behavior.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm pretty sure the union pattern is used elsewhere in CMSSW too, so we should address those as well.

@fwyzard
Copy link
Contributor Author

fwyzard commented Aug 18, 2021 via email

@Dr15Jones
Copy link
Contributor

Looks like C++ 20 is finally trying to address this

https://en.cppreference.com/w/cpp/numeric/bit_cast

@fwyzard
Copy link
Contributor Author

fwyzard commented Aug 19, 2021

I've done some tests using memcpy instead a union, or using memset:

template <int N, typename T, typename SFINAE = std::enable_if_t<N <= sizeof(T)>>
void truncate1(T& t) {
    const int shift = 8 * (sizeof(T) - N);
    union {
        T t;
        uintT_t<T> u;
    } c;
    c.t = t;
    c.u = c.u >> shift << shift;
    t = c.t;
}
template <int N, typename T, typename SFINAE = std::enable_if_t<N <= sizeof(T)>>
void truncate2(T& t) {
    const int shift = 8 * (sizeof(T) - N);
    uintT_t<T> u;
    std::memcpy(&u, &t, sizeof(T));
    u = u >> shift << shift;
    std::memcpy(&t, &u, sizeof(T));
}
template <int N, typename T, typename SFINAE = std::enable_if_t<N <= sizeof(T)>>
void truncate3(T& t) {
    char* bytes = reinterpret_cast<char*>(&t);
    static_assert(__BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__, "Only little endian architectures are supported");
    std::memset(bytes, 0x00, sizeof(T) - N);
}

Host compilers (GCC 11.2, clang 12.0) always generate the same code for truncate1 and truncate2:

  • GCC always generates an and with a mask
  • clang generates either an and with a mask, or some combinations of move and shifts, or directly sets a byte/word/dword to 0

Device compilers (NVCC 11.3, clang 11.0 --cuda) show more variety:

  • both generate an and with a mask for truncate1
  • NVCC does not optimise out the calls to memcpy in truncate2, resulting in very verbose and inefficient code
  • clang always generate the same code as for truncate1 (unlike on the host)

All compilers (host and device) always generate one or more stores of 0 for truncate3.

@fwyzard
Copy link
Contributor Author

fwyzard commented Aug 19, 2021

Summary

  • in host code we could replace the union and assignment with memcpy - the result does not change.
  • in device code we should avoid the memcpy
  • in all cases, it's not clear to me if memset is better or worse than the union

So... keep things as they are for the time being, and revisit once we have C++20 ?

@fwyzard fwyzard deleted the fix_radix_sort_test_121x branch August 19, 2021 08:16
@fwyzard
Copy link
Contributor Author

fwyzard commented Aug 19, 2021

P.S. my play area on godbolt is here: https://godbolt.org/z/75csPcK1q

@makortel
Copy link
Contributor

So... keep things as they are for the time being, and revisit once we have C++20 ?

I'd be fine with that. Hopefully we get into C++20 before we need to consider other compilers than gcc or clang for pieces of code using union for type punning (that appears to be guaranteed behavior in gcc, I didn't find quickly what clang guarantees).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants