Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TestIntegrationFinalPath failure #36522

Closed
makortel opened this issue Dec 16, 2021 · 19 comments · Fixed by #36801
Closed

TestIntegrationFinalPath failure #36522

makortel opened this issue Dec 16, 2021 · 19 comments · Fixed by #36801

Comments

@makortel
Copy link
Contributor

TestIntegrationFinalPath failed in CMSSW_12_3_X_2021-12-15-2300 slc7_amd64_gcc900 with

/data/cmsbld/jenkins/workspace/ib-run-qa/CMSSW_12_3_X_2021-12-15-2300/tmp/slc7_amd64_gcc900 /data/cmsbld/jenkins/workspace/ib-run-qa/CMSSW_12_3_X_2021-12-15-2300
1,7c1
< found thing '' TEST
< found thing '' TEST
< found thing '' TEST
< found thing 'beginLumi' TEST
< found thing 'endLumi' TEST
< found thing 'beginRun' TEST
< found thing 'endRun' TEST
---
> Binary file finalpath.log matches
differences for test_finalpath_cfg.py --path --task --schedule: status 1
=== Log file ===
=== End log file ===

---> test TestIntegrationFinalPath had ERRORS

https://cmssdt.cern.ch/SDT/cgi-bin/logreader/slc7_amd64_gcc900/CMSSW_12_3_X_2021-12-15-2300/unitTestLogs/FWCore/Integration#/

@makortel
Copy link
Contributor Author

assign core

@cmsbuild
Copy link
Contributor

New categories assigned: core

@Dr15Jones,@smuzaffar,@makortel you have been requested to review this Pull request/Issue and eventually sign? Thanks

@cmsbuild
Copy link
Contributor

A new Issue was created by @makortel Matti Kortelainen.

@Dr15Jones, @perrotta, @dpiparo, @makortel, @smuzaffar, @qliphy can you please review it and eventually sign/assign? Thanks.

cms-bot commands are listed here

@makortel
Copy link
Contributor Author

So something caused grep to interpret finalpath.log as a binary file?

cmsRun ${LOCAL_TEST_DIR}/test_finalpath_cfg.py -- --path --task --schedule >& finalpath.log || die "failed test_finalpath_cfg.py --path --task --schedule" $?
grep "thing '.*' TEST" finalpath.log | diff finalpath_expected_found.log - || die "differences for test_finalpath_cfg.py --path --task --schedule" $?

@makortel
Copy link
Contributor Author

Occurred again in CMSSW_12_3_X_2021-12-24-2300

@makortel
Copy link
Contributor Author

makortel commented Jan 5, 2022

I managed to reproduce this accidentally (as a byproduct of testing something else). The finalpath.log has

%MSG-i FoundProduct:  PreStreamEndLumi 05-Jan-2022 17:00:11 CET  End Run: 1 Stream: 0^@: 0
found thing 'beginRun' TEST
%MSG
%MSG-i FoundProduct:  PreStreamEndLumi 05-Jan-2022 17:00:11 CET  End Run: 1 Stream: 0^@: 0
found thing 'endRun' TEST
%MSG

and the ^@ makes grep to think the file as binary. I.e. the real culprit could be #33224.

@dan131riley
Copy link

^@ (control-@) is how browsers and pagers (and some editors) conventionally render the nul character, so we're writing out a \0 somewhere.

@Dr15Jones
Copy link
Contributor

In order to help diagnose the underlying problem, the actual message should have been

%MSG-i FoundProduct:  PostGlobalEndRun 25-Jan-2022 16:47:03 UTC PostGlobalEndRun
found thing 'beginRun' TEST
%MSG
%MSG-i FoundProduct:  PostGlobalEndRun 25-Jan-2022 16:47:03 UTC PostGlobalEndRun
found thing 'endRun' TEST
%MSG

@Dr15Jones
Copy link
Contributor

Intersting enough, the \0 Dan points out is exactly where \0 would be if one were to write PostGlobalEndRun and then stick the \0 at the end.

The item holding the information in question is a std::string_view which doesn't have to use \0 as its end condition.

@Dr15Jones
Copy link
Contributor

The incorrect messages seems to contain MessageDrop::runEvent from the preStreamEndRun transition

auto& buffer = transitionInfoCache_[iContext.streamID().value()];
auto v = fill_buffer(buffer, "End Run: ", iContext.eventID().run(), " Stream: ", iContext.streamID().value());
edm::MessageDrop::instance()->runEvent = v;
edm::MessageDrop::instance()->setSinglet("PreStreamEndRun"); // changelog 17

@Dr15Jones
Copy link
Contributor

I was also able to reproduce the problem. The full log file is

['cmsRun', '/root/buildArea/CMSSW_12_3_0_pre2/src/FWCore/Integration/test/test_finalpath_cfg.py', '--', '--path', '--task']
adding Path
%MSG-i ThreadStreamSetup:  (NoModuleName) 25-Jan-2022 17:23:33 UTC pre-events
setting # threads 3
setting # streams 1
%MSG
Begin processing the 1st record. Run 1, Event 1, LumiSection 1 on stream 0 at 25-Jan-2022 17:23:35.156 UTC
%MSG-i FoundProduct:  GetProductCheckerOutputModule:out  25-Jan-2022 17:23:35 UTC Run: 1 Event: 1
found thing '' TEST
%MSG
Begin processing the 2nd record. Run 1, Event 2, LumiSection 1 on stream 0 at 25-Jan-2022 17:23:35.157 UTC
%MSG-i FoundProduct:  GetProductCheckerOutputModule:out  25-Jan-2022 17:23:35 UTC Run: 1 Event: 2
found thing '' TEST
%MSG
Begin processing the 3rd record. Run 1, Event 3, LumiSection 1 on stream 0 at 25-Jan-2022 17:23:35.157 UTC
%MSG-i FoundProduct:  GetProductCheckerOutputModule:out  25-Jan-2022 17:23:35 UTC Run: 1 Event: 3
found thing '' TEST
%MSG
%MSG-i FoundProduct:  PostGlobalEndLumi 25-Jan-2022 17:23:35 UTC  PostGlobalEndLumi
found thing 'beginLumi' TEST
%MSG
%MSG-i FoundProduct:  PostGlobalEndLumi 25-Jan-2022 17:23:35 UTC  PostGlobalEndLumi
found thing 'endLumi' TEST
%MSG
%MSG-i FoundProduct:  PreStreamEndLumi 25-Jan-2022 17:23:35 UTC  End Run: 1 Stream: 0^@: 0
found thing 'beginRun' TEST
%MSG
%MSG-i FoundProduct:  PreStreamEndLumi 25-Jan-2022 17:23:35 UTC  End Run: 1 Stream: 0^@: 0
found thing 'endRun' TEST
%MSG

@Dr15Jones
Copy link
Contributor

I think I now understand the origin of the problem. The MessageLogger does not listen to the ES module nor the OutputModule write signals. Therefore the context it uses for such messages is just whatever context was used for the last known transition to be run on that thread.

@makortel
Copy link
Contributor Author

Thanks @Dr15Jones. Should we make MessageLogger to listen ES module and OutputModule write signals then?

Do these observations also imply we're using std::string_view in some place within MessageLogger where we shouldn't?

@Dr15Jones
Copy link
Contributor

Thanks @Dr15Jones. Should we make MessageLogger to listen ES module and OutputModule write signals then?

I'm working on that

Do these observations also imply we're using std::string_view in some place within MessageLogger where we shouldn't?

Not so far.

@makortel
Copy link
Contributor Author

+1

@cmsbuild
Copy link
Contributor

This issue is fully signed and ready to be closed.

@perrotta
Copy link
Contributor

@makortel I imagine that since you signed this issue, it can also get closed: however, I'd let you do it, just in case you wanted to have it still open for other purposes.

@makortel
Copy link
Contributor Author

@perrotta It was already closed by the merge of #36801

@perrotta
Copy link
Contributor

@perrotta It was already closed by the merge of #36801

You are right Matti, of course!
Sorry for the noise, I didn't realize it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants