Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update UTM to 0.11.1 #8375

Merged
merged 1 commit into from
Mar 14, 2023
Merged

Conversation

epalencia
Copy link
Contributor

An updated utm library (version 0.11.1) is available: https://gitlab.cern.ch/cms-l1t-utm/utm/-/tree/utm_0.11.1/

Follow up of #8352.

@cmsbuild
Copy link
Contributor

A new Pull Request was created by @epalencia (Enrique Palencia Cortezon) for branch IB/CMSSW_13_1_X/master.

@cmsbuild, @smuzaffar, @aandvalenzuela, @iarspider can you please review it and eventually sign? Thanks.
@perrotta, @dpiparo, @rappoccio you are the release manager for this.
cms-bot commands are listed here

@aandvalenzuela
Copy link
Contributor

please test

@smuzaffar
Copy link
Contributor

@epalencia , do we need cms-sw/cmssw#41036 to test this PR?

@aloeliger
Copy link
Contributor

@smuzaffar It seems to be working so far, and if I'm reading the jenkins workflow right, I think it even passed the unit tests it was failing. cms-sw/cmssw#41036 is still a good code cleaning PR, but possibly not necessary for integration of this firmware external.

@smuzaffar
Copy link
Contributor

yes unit tests passed, so v0.11.1 looks good.
I will start test for other archs ( arm/power) to make sure this works for all.

@smuzaffar
Copy link
Contributor

please test for el8_aarch64_gcc11

@smuzaffar
Copy link
Contributor

please test for el8_ppc64le_gcc11

@aloeliger
Copy link
Contributor

aloeliger commented Mar 13, 2023

@epalencia, @arnobaer, Looks like (hopefully) 11.1 passes unit tests.

Looking at the diffs, I'm not positive, but I think this might have been the problem here: https://gitlab.cern.ch/cms-l1t-utm/utm/-/compare/utm_0.11.0...utm_0.11.1?from_project_id=4792&straight=false#5218e3632e92cf74920f5db96dc28add42fda4cc

Thank you for your help debugging this @arnobaer.

For my elucidation, what exactly did this change do? The objects are no longer referencing the underlying objects, which go out of scope and are destroyed rendering comment and datetime poorly defined? That doesn't seem right to me (that seems like it would be a more traditional segmentation fault).

@arnobaer
Copy link

arnobaer commented Mar 13, 2023

For my elucidation, what exactly did this change do? The objects are no longer referencing the underlying objects, which go out of scope and are destroyed rendering comment and datetime poorly defined? That doesn't seem right to me (that seems like it would be a more traditional segmentation fault).

@aloeliger the patch fixes a segmentation fault issue when building utm Python bindings in a manylinux2014 (CentOS 7) docker container AND patching the binaries using auditwheel to pack dependent system libraries into the resulting universal binary wheel. When calling getTriggerMenu() from the Python interpreter a segmentation fault occurred. Debugging revealed that the segmentation fault occurs in tmeventsetup::esTriggerMenu::setComment() and tmeventsetup::esTriggerMenu::setDateTime() when assigning to the member variables.

Both strings comment and datetime are by definition optional in the xml format and in the map provided by tmtable, so I overloaded tmtable::get() to accept an additional reference to an optional default value. In this case an empty string that eventually goes out of scope. But before this happens the reference is assigned (and I assume copied) to esTriggerMenu::commet_ by esTriggerMenu.setComment() (but maybe I'm wrong). By explicitly copying the reference returned by tmtable::get() for comment and datetime I fixed the segmentation fault affecting the binary Python wheels.

using Row = std::map<std::string, std::string>;
const Row::mapped_type& get(const Row& row, const std::string& key) {
    if (not row.count(key)) throw std::runtime_error(key);
    return row.at(key);
}
const Row::mapped_type& get(const Row& row, const std::string& key, const Row::mapped_type& optional) {
    return row.count(key) ? row.at(key) : optional;
}
{
    const auto& comment = get(row, "comment", ""); // removing & fixed the issue
    tm.setComment(comment); // inside here the segmentation fault occurs
}

Note: I was not able to reproduce the segmentation fault by building and linking a test program on Ubunu 18.04 (gcc 7.5), CentOS 7 (gcc 4.8) and MacOS 10.15 (clang 12.0).

@cmsbuild
Copy link
Contributor

+1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-7df4f1/31242/summary.html
COMMIT: 078740a
CMSSW: CMSSW_13_1_X_2023-03-12-2300/el8_aarch64_gcc11
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmsdist/8375/31242/install.sh to create a dev area with all the needed externals and cmssw changes.

@cmsbuild
Copy link
Contributor

-1

Failed Tests: UnitTests
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-7df4f1/31243/summary.html
COMMIT: 078740a
CMSSW: CMSSW_13_1_X_2023-03-12-2300/el8_ppc64le_gcc11
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmsdist/8375/31243/install.sh to create a dev area with all the needed externals and cmssw changes.

Unit Tests

I found errors in the following unit tests:

---> test testFWCoreConcurrency had ERRORS

@cmsbuild
Copy link
Contributor

+1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-7df4f1/31241/summary.html
COMMIT: 078740a
CMSSW: CMSSW_13_1_X_2023-03-13-1100/el8_amd64_gcc11
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmsdist/8375/31241/install.sh to create a dev area with all the needed externals and cmssw changes.

Comparison Summary

Summary:

  • You potentially removed 15 lines from the logs
  • Reco comparison results: 1 differences found in the comparisons
  • DQMHistoTests: Total files compared: 49
  • DQMHistoTests: Total histograms compared: 3550756
  • DQMHistoTests: Total failures: 3
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 3550731
  • DQMHistoTests: Total skipped: 22
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 48 files compared)
  • Checked 213 log files, 164 edm output root files, 49 DQM output files
  • TriggerResults: no differences found

@aloeliger
Copy link
Contributor

-1

Failed Tests: UnitTests Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-7df4f1/31243/summary.html COMMIT: 078740a CMSSW: CMSSW_13_1_X_2023-03-12-2300/el8_ppc64le_gcc11 User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmsdist/8375/31243/install.sh to create a dev area with all the needed externals and cmssw changes.

Unit Tests

I found errors in the following unit tests:

---> test testFWCoreConcurrency had ERRORS

@smuzaffar Is this expected? There's not much output from this:

===== Test "testFWCoreConcurrency" ====
Running .........
---> test testFWCoreConcurrency had ERRORS
TestTime:3600
^^^^ End Test testFWCoreConcurrency ^^^^

and it doesn't look like something this library broke.

@makortel
Copy link
Contributor

-1
Failed Tests: UnitTests Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-7df4f1/31243/summary.html COMMIT: 078740a CMSSW: CMSSW_13_1_X_2023-03-12-2300/el8_ppc64le_gcc11 User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmsdist/8375/31243/install.sh to create a dev area with all the needed externals and cmssw changes.

Unit Tests

I found errors in the following unit tests:
---> test testFWCoreConcurrency had ERRORS

@smuzaffar Is this expected? There's not much output from this:

===== Test "testFWCoreConcurrency" ====
Running .........
---> test testFWCoreConcurrency had ERRORS
TestTime:3600
^^^^ End Test testFWCoreConcurrency ^^^^

and it doesn't look like something this library broke.

That test failing occasionally on ppc64le is a known problem (cms-sw/cmssw#32086).

@aloeliger
Copy link
Contributor

Then we're good? This UTM firmware is okay to go?

@smuzaffar
Copy link
Contributor

+externals

looks good

@smuzaffar smuzaffar merged commit f730306 into cms-sw:IB/CMSSW_13_1_X/master Mar 14, 2023
@cmsbuild
Copy link
Contributor

This pull request is fully signed and it will be integrated in one of the next IB/CMSSW_13_1_X/master IBs (tests are also fine). This pull request will now be reviewed by the release team before it's merged. @perrotta, @dpiparo, @rappoccio (and backports should be raised in the release meeting by the corresponding L2)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants