Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Buildrules] Treat ROCM_CXXFLAGS/LDFLAGS as valid compiler flags #8271

Merged
merged 2 commits into from
Jan 30, 2023

Conversation

smuzaffar
Copy link
Contributor

No description provided.

@cmsbuild
Copy link
Contributor

A new Pull Request was created by @smuzaffar (Malik Shahzad Muzaffar) for branch IB/CMSSW_13_0_X/master.

@cmsbuild, @smuzaffar, @aandvalenzuela, @iarspider can you please review it and eventually sign? Thanks.
@perrotta, @dpiparo, @rappoccio you are the release manager for this.
cms-bot commands are listed here

@smuzaffar
Copy link
Contributor Author

please test with cms-sw/cmssw#40619

@cmsbuild
Copy link
Contributor

-1

Failed Tests: Build
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-aa10a1/30237/summary.html
COMMIT: a511b62
CMSSW: CMSSW_13_0_X_2023-01-29-1100/el8_amd64_gcc11
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmsdist/8271/30237/install.sh to create a dev area with all the needed externals and cmssw changes.

The following merge commits were also included on top of IB + this PR after doing git cms-merge-topic:

You can see more details here:
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-aa10a1/30237/git-recent-commits.json
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-aa10a1/30237/git-merge-result

Build

I found compilation error when building:

>> Package HeterogeneousCore/ROCmServices built
>> Compiling  /data/cmsbld/jenkins/workspace/ib-run-pr-tests/CMSSW_13_0_X_2023-01-29-1100/src/HeterogeneousCore/ROCmServices/bin/rocmComputeCapabilities.cpp
>> Compiling  /data/cmsbld/jenkins/workspace/ib-run-pr-tests/CMSSW_13_0_X_2023-01-29-1100/src/HeterogeneousCore/ROCmServices/bin/isRocmDeviceSupported.hip.cc
>> Building binary rocmComputeCapabilities
/data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/el8_amd64_gcc11/external/gcc/11.2.1-f9b9dfdd886f71cd63f5538223d8f161/bin/../lib/gcc/x86_64-redhat-linux-gnu/11.2.1/../../../../x86_64-redhat-linux-gnu/bin/ld: tmp/el8_amd64_gcc11/src/HeterogeneousCore/ROCmServices/bin/rocmComputeCapabilities/isRocmDeviceSupported.hip.cc.o:(.hipFatBinSegment+0x8): undefined reference to `__hip_fatbin'
collect2: error: ld returned 1 exit status
>> Deleted: tmp/el8_amd64_gcc11/src/HeterogeneousCore/ROCmServices/bin/rocmComputeCapabilities/rocmComputeCapabilities
gmake: *** [tmp/el8_amd64_gcc11/src/HeterogeneousCore/ROCmServices/bin/rocmComputeCapabilities/rocmComputeCapabilities] Error 1
>> Compiling  /data/cmsbld/jenkins/workspace/ib-run-pr-tests/CMSSW_13_0_X_2023-01-29-1100/src/HeterogeneousCore/ROCmServices/bin/rocmIsEnabled.cpp
>> Compiling  /data/cmsbld/jenkins/workspace/ib-run-pr-tests/CMSSW_13_0_X_2023-01-29-1100/src/HeterogeneousCore/ROCmServices/bin/isRocmDeviceSupported.hip.cc
>> Building binary rocmIsEnabled


@smuzaffar
Copy link
Contributor Author

test parameters:

  • full_cmssw = true

@smuzaffar
Copy link
Contributor Author

please test

@smuzaffar
Copy link
Contributor Author

smuzaffar commented Jan 29, 2023

@fwyzard , any idea why we get isRocmDeviceSupported.hip.cc.o:(.hipFatBinSegment+0x8): undefined reference to '__hip_fatbin' collect2: error: ld returned 1 exit status ? I tried adding --hip-link to hipcc but still not success

@fwyzard
Copy link
Contributor

fwyzard commented Jan 30, 2023

I think it's because the final link is being done with g++ instead of hipcc.

@cmsbuild
Copy link
Contributor

+1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-aa10a1/30238/summary.html
COMMIT: a511b62
CMSSW: CMSSW_13_0_X_2023-01-29-1100/el8_amd64_gcc11
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmsdist/8271/30238/install.sh to create a dev area with all the needed externals and cmssw changes.

The following merge commits were also included on top of IB + this PR after doing git cms-merge-topic:

You can see more details here:
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-aa10a1/30238/git-recent-commits.json
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-aa10a1/30238/git-merge-result

Comparison Summary

Summary:

  • You potentially removed 3040 lines from the logs
  • Reco comparison results: 4 differences found in the comparisons
  • DQMHistoTests: Total files compared: 49
  • DQMHistoTests: Total histograms compared: 3555495
  • DQMHistoTests: Total failures: 3
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 3555470
  • DQMHistoTests: Total skipped: 22
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 48 files compared)
  • Checked 211 log files, 162 edm output root files, 49 DQM output files
  • TriggerResults: no differences found

@smuzaffar
Copy link
Contributor Author

smuzaffar commented Jan 30, 2023

I think it's because the final link is being done with g++ instead of hipcc.

@fwyzard that is true and using hipcc forlinking needs a careful thinking and I will do not in next PR . By the way, I was able to link and run hip application if I explicitly link with rccl library (i.e I added <lib name="rccl"/> in rocm.xml). So can we still use g++ and link rccl?

@fwyzard
Copy link
Contributor

fwyzard commented Jan 30, 2023

No, that is not enough to make the program work.

$ g++ -O2 ... tmp/el8_amd64_gcc11/src/HeterogeneousCore/ROCmServices/bin/rocmComputeCapabilities/rocmComputeCapabilities.cpp.o tmp/el8_amd64_gcc11/src/HeterogeneousCore/ROCmServices/bin/rocmComputeCapabilities/isRocmDeviceSupported.hip.cc.o ... -lamdhip64 -lrccl -o tmp/el8_amd64_gcc11/src/HeterogeneousCore/ROCmServices/bin/rocmComputeCapabilities/rocmComputeCapabilities

does link, but the resulting binary does not run:

$ ./tmp/el8_amd64_gcc11/src/HeterogeneousCore/ROCmServices/bin/rocmComputeCapabilities/rocmComputeCapabilities
"Cannot find Symbol"
Aborted (core dumped)

I think because it is still missing the generation and linking of the device-side code and kernel launches.

@smuzaffar smuzaffar merged commit e071390 into IB/CMSSW_13_0_X/master Jan 30, 2023
@cmsbuild
Copy link
Contributor

Pull request #8271 was updated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants