Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HHC01423I in the log is badly formatted #480

Closed
ghost opened this issue Mar 12, 2022 · 11 comments
Closed

HHC01423I in the log is badly formatted #480

ghost opened this issue Mar 12, 2022 · 11 comments
Labels
BUG The issue describes likely incorrect product functionality that likely needs corrected. HELP! Help is needed from someone more experienced or I'm simply overloaded with too much work right now! L Linux only issue, such as with tuntap networking that doesn't occur on Windows.

Comments

@ghost
Copy link

ghost commented Mar 12, 2022

The current code in hscmisc.c at line 1327:

// "Calling termination routines"
 WRMSG( HHC01423, "I" );

shows as:

14:26:07 HHC01420I Begin Hercules shutdown
14:26:07 14:26:07 HHC01423I Calling termination routines
HHC01423I Calling termination routines

Changing the code to:

// "Calling termination routines"
 fprintf( stdout, MSG( HHC01423, "I" ));

looks better:

14:45:23 HHC01420I Begin Hercules shutdown
14:45:23 HHC01423I Calling termination routines

I might be wrong of course; you know the code better than I do!

enrico

@Fish-Git Fish-Git self-assigned this Mar 12, 2022
@Fish-Git Fish-Git added BUG The issue describes likely incorrect product functionality that likely needs corrected. IN PROGRESS... I'm working on it! (Or someone else is!) Researching... The issue is being looked into or additional information is being gathered/located. and removed BUG The issue describes likely incorrect product functionality that likely needs corrected. IN PROGRESS... I'm working on it! (Or someone else is!) labels Mar 12, 2022
@Fish-Git
Copy link
Member

Fish-Git commented Mar 12, 2022

On Windows, it works perfectly.

On Linux (KDE Neon 5.24) however, I am seeing a properly formatted HHC01423I message appear in the logfile twice. That is to say, all lines of the logfile appears to be properly formatted, but the HHC01423I message appears to be duplicated. Furthermore, the message does not appear on the screen at all. It appears to be missing from the screen. Very weird!

Continuing research...

@Fish-Git Fish-Git added BUG The issue describes likely incorrect product functionality that likely needs corrected. L Linux only issue, such as with tuntap networking that doesn't occur on Windows. and removed Researching... The issue is being looked into or additional information is being gathered/located. labels Mar 12, 2022
@Fish-Git
Copy link
Member

Fish-Git commented Mar 12, 2022

Okay, when I make the suggested change, now I get difference (worse IMO) results:

On Windows, the message completely disappears from both the screen and the logfile. That is to say, message HHC01423I is nowhere to be found anywhere, neither on the screen nor in the logfile!

On Linux (KDE Neon 5.24), the message still does not appear on the screen (i.e. same (incorrect!) behavior as before), but it does now only appear once in the logfile (and still appears to be properly formatted too).

So in my opinion, your suggested change only makes matters worse. While it does indeed slightly improve(?) things on Linux, it only makes matters worse on Windows. Without the change things work perfectly on Windows. With the change, the message completely disappears!

So I am going to reject your suggested change.

More research is obviously needed (by someone else more experienced with Linux! Not me!) as the problem appears to only occur on Linux and not Windows.

I suspect it has something to do with the way our file redirection is being handled for Linux builds (i.e. our calls to dup2, which I have personally never understood the need for), but as I said, that's for someone else to figure out.

I am going to leave this issue open but am going to remove myself as the assignee, in the hope that someone else more experienced at Linux can take ownership of it instead, and figure out what the frick is going on and how best to fix it.

Thank you, Enrico, for reporting it.   ;-)

@Fish-Git Fish-Git removed their assignment Mar 12, 2022
@ghost
Copy link
Author

ghost commented Mar 14, 2022

I just tested on Fedora 35:

The hercules console:

HHC01427I Main storage released
HHC01427I Expanded storage released
HHC01422I Configuration released
HHC01424I All termination routines complete
HHC01425I Hercules shutdown complete
HHC01412I Hercules terminated
HHC00101I Thread id 00007fa000738640, prio 2, name 'rubato_thread' ended
HHC00101I Thread id 00007fa000839640, prio 7, name 'timer_thread' ended
HHC00101I Thread id 00007fa00279e240, prio 5, name 'panel_display' ended

The log:

12:58:52 HHC01603I quit
12:58:52 HHC01420I Begin Hercules shutdown
12:58:52 HHC01423I Calling termination routines
12:58:52 HHC01423I Calling termination routines
12:58:52 HHC01427I Main storage released
12:58:52 HHC01427I Expanded storage released
12:58:52 HHC01422I Configuration released
12:58:52 HHC00101I Thread id 00007fa000738640, prio 2, name 'rubato_thread' ended
12:58:52 HHC00101I Thread id 00007fa000839640, prio 7, name 'timer_thread' ended
12:58:52 HHC00101I Thread id 00007fa00279e240, prio 5, name 'panel_display' ended

The HHC01423I message is not clobbered, but is duplicated .

I am curious, and will keep testing for a while.

@wrljet
Copy link
Member

wrljet commented Mar 14, 2022

I tried it on a freshly updated Fedora 35 and got even stranger results.

Running Hercules just bare without a config file, so it comes up and shuts down, I do get the HHC01423I msg on the terminal, with two copies in the log file.

vs:

Running with a config file, I don't get the HHC01423I msg on the terminal, but I still get two copies in the log file.

I suspect there are issues in the way Hercules shuts down, because on different systems I get different messages and in different orders.

AND...

On Ubuntu 21.10, kernel 5.13.0, if I start Hercules with a config file, and just type quit, it hangs on exit.

I have seen this behavior before and spent days trying to narrow it down, booting endless different Linuxes and building with various compilers. All I can say at the moment is the only thing that stands out is Linux 5.13.

I saw this hang-on-exit behavior before, but it went away before I could track anything down. I suspect some of these rolling release distributions might have had Linux 5.13 on them for a while and have now moved on.

P.S. I updated Kubtuntu 21.10 to kernel 5.16 and it still hangs on exit on that system.

Bill

@wrljet
Copy link
Member

wrljet commented Mar 17, 2022

I've run into similar weirdness. A bunch of lines in the log file never appeared in the console window. FWIW, the Hercules Automatic Operator is being used here. This is on Debian 10.

The attached logs bug-herclog.txt and bug-herclog-notime.txt are the same, but I removed the timestamps
off the left edge to make diffing against bug-herccon.txt easier.

Bill

@wrljet
Copy link
Member

wrljet commented Mar 23, 2022

I've now run into similar issue where some msgs (specifically HHC00809I Processor CP00: disabled wait state) sometimes doesn't show up in the console output, and is then missed by the Automatic Operator.

@Fish-Git
Copy link
Member

I've now run into similar issue where some msgs (specifically HHC00809I Processor CP00: disabled wait state) sometimes doesn't show up in the console output, and is then missed by the Automatic Operator.

Interesting!

Just out of curiosity, is there a sudden flurry of messages preceding the HHC00809I Processor CP00: disabled wait state message that was missed?

@wrljet
Copy link
Member

wrljet commented Mar 24, 2022

Next time it happens I'll grab the log and console output.
I'm chasing so many problems at once I can't see straight sometimes.

@Fish-Git
Copy link
Member

*** MORE INFORMATION ***

(for whoever decides to try and tackle this problem)

I suspect the problem is somewhere within one or more of the following sections of code:

hyperion/panel.c

Lines 3314 to 3350 in d0ccfbc

static void panel_cleanup(void *unused)
{
int i;
PANMSG* p;
UNREFERENCED(unused);
log_wakeup(NULL);
if(topmsg)
{
set_screen_color( stderr, COLOR_DEFAULT_FG, COLOR_DEFAULT_BG );
clear_screen( stderr );
/* Scroll to last full screen's worth of messages */
scroll_to_bottom_screen();
/* Display messages in scrolling area */
for (i=0, p = topmsg; i < SCROLL_LINES && p != curmsg->next; i++, p = p->next)
{
set_pos (i+1, 1);
set_color (COLOR_DEFAULT_FG, COLOR_DEFAULT_BG);
write_text (p->msg, MSG_SIZE);
}
}
sysblk.panel_init = 0; /* Panel is no longer running */
/* Restore the terminal mode */
set_or_reset_console_mode( keybfd, 0 );
/* Position to next line */
fwrite("\n",1,1,stderr);
set_screen_color(stderr, COLOR_DEFAULT_FG, COLOR_DEFAULT_BG);
fflush(stderr);
}

hyperion/hconsole.c

Lines 577 to 619 in d0ccfbc

#else // !defined( WIN32 )
#ifdef HAVE_TERMIOS_H
static struct termios saved_kbattr; // (saved value so we can later restore it)
#endif
// 'save_and_set' = 1 --> just what it says; 0 --> restore from saved value.
int set_or_reset_console_mode( int keybrd_fd, short save_and_set )
{
#ifdef HAVE_TERMIOS_H
struct termios kbattr;
if ( save_and_set )
{
// Put the terminal into cbreak mode
tcgetattr( keybrd_fd, &saved_kbattr ); // (save for later restore)
kbattr = saved_kbattr; // (make copy)
kbattr.c_lflag &= ~(ECHO | ICANON); // (set desired values...)
kbattr.c_cc[VMIN] = 0;
kbattr.c_cc[VTIME] = 0;
tcsetattr( keybrd_fd, TCSANOW, &kbattr );
}
else
{
// Restore the terminal mode
tcsetattr( STDIN_FILENO, TCSANOW, &saved_kbattr ); // (restore prev value)
}
#else
UNREFERENCED( keybrd_fd );
UNREFERENCED( save_and_set );
#endif
return 0;
}

hyperion/logmsg.c

Lines 228 to 308 in d0ccfbc

/*-------------------------------------------------------------------*/
/* _flog_write_pipe helper function */
/*-------------------------------------------------------------------*/
/* logger is non-blocking to prevent stalls so we try for a short */
/* time to get the data through. Strictly log writes are atomic as */
/* they should be < PIPE_BUF bytes (see pipe(7) but we do try and */
/* handle the case where we are passed more data */
/*-------------------------------------------------------------------*/
static int do_write_pipe( int fd, const char *msg, int len)
{
// Note: logger write fd is non-blocking
int retry = 5;
int written = 0;
int rc;
while ((rc = write_pipe( fd, msg, len )) != len && retry--)
{
if (rc == -1)
{
if (errno == EAGAIN)
{
// logger is backlogged; wait a bit before retrying
usleep(10000);
continue;
}
break;
}
if (rc < len)
{
// Short write (may occur if write >= PIPE_BUF)
len -= rc;
msg += rc;
written += rc;
}
}
return rc == -1 ? -1 : written + len;
}
/*-------------------------------------------------------------------*/
/* internal helper function: write message to logger facility pipe */
/*-------------------------------------------------------------------*/
static void _flog_write_pipe( FILE* f, const char* msg )
{
/* Send message through logger facility pipe to panel.c,
or display it directly to the terminal via fprintf
if this is a utility message or we're shutting down
or we're otherwise unable to send it through the pipe.
*/
int rc = 0;
int len = (int) strlen( msg );
if (0
|| sysblk.shutdown
|| stdout != f
|| !logger_syslogfd[ LOG_WRITE ]
|| (rc = do_write_pipe( logger_syslogfd[ LOG_WRITE ], msg, len )) < 0
)
{
// Something went wrong or we're shutting down.
// Write the message to the screen instead...
fprintf( f, "%s", msg ); /* (write msg to screen) */
// PROGRAMMING NOTE: the external GUI receives messages
// not only via its logfile stream but also via its stderr
// stream too, so if there's an external gui, we skip the
// logfile write in order to prevent duplicate messages.
//
// If no external GUI exists however, then we need to
// write the message to the logfile WITH a timstamp.
if (1
&& sysblk.shutdown // (shutting down?)
&& !extgui // (no external gui?)
)
{
// (then we need to timestamp the logmsg...)
// (note: call does nothing if no logfile exists!)
logger_timestamped_logfile_write( msg, len );
}
}
}

 
panel.c "registers" the panel_cleanup function during startup/initialization (to cause it to be called during shutdown), and shortly thereafter calls set_or_reset_console_mode to set the terminal to "cbreak mode". During shutdown, set_or_reset_console_mode is then called to "undo" what the earlier call did (i.e. restore normal terminal mode).

Additionally, the logmsg.c functions do_write_pipe and _flog_write_pipe functions -- which are responsible for sending messages through the pipe to either panel.c (to display the message on the console/terminal) or to the logger thread (to write the message to the hardcopy log file) -- both have error handling logic that could very likely come into play (especially on certain Linux systems I'm guessing).

Notice for example the do_write_pipe function (which is called by the _flog_write_pipe function) only retries a maximum of 5 times if/when its attempt to write the message to the pipe fails for whatever reason.

I suspect one or more of these code points might be where the problem lies.

As stated, I'm not a Linux person, so I have no idea whether the above code is sound (sane/correct/proper) or not, so someone else more experienced with how Linux works is going to need to look into this issue.

@Fish-Git Fish-Git added the HELP! Help is needed from someone more experienced or I'm simply overloaded with too much work right now! label Sep 21, 2022
@JamesWekel
Copy link
Contributor

Fish,

As I still consider myself as a newbie to Hercules development, I thought that I might focus my newbie eyes on this issue rather than just on instruction performance.

So I reviewed a few of my Hercules logs. On a Raspberry PI4 (running Armbian Jammy; a Ubuntu version) , I had a MVS TK5 log with multiple duplicate lines, not just the 'Calling termination routines' line. For example,

20:48:56 HHC01603I quit
20:48:56 HHC02262I Script 9: processing paused for 60000 milliseconds...
20:48:56 HHC01420I Begin Hercules shutdown
20:48:56 HHC01100I 0:000E Printer: client <unknown>, IP 172.20.11.129 disconnected from device 1403
20:48:56 HHC01100I 0:000E Printer: client <unknown>, IP 172.20.11.129 disconnected from device 1403
20:48:56 HHC00101I Thread id 0000007fad7bf0e0, prio 0, name '3705 device(1:0661) thread' ended
20:48:56 HHC00101I Thread id 0000007fad7bf0e0, prio 0, name '3705 device(1:0661) thread' ended
20:48:56 HHC00101I Thread id 0000007f2eeff0e0, prio 0, name 'hao_thread' ended
20:48:56 HHC00101I Thread id 0000007f2eeff0e0, prio 0, name 'hao_thread' ended
20:48:56 HHC00101I Thread id 0000007facd1f0e0, prio 0, name '3705 device(1:0666) thread' ended
20:48:56 HHC00101I Thread id 0000007facd1f0e0, prio 0, name '3705 device(1:0666) thread' ended
20:48:56 HHC00101I Thread id 0000007fae0c60e0, prio 0, name 'socket_thread' ended
20:48:56 HHC00101I Thread id 0000007fae0c60e0, prio 0, name 'socket_thread' ended
20:48:56 HHC00101I Thread id 0000007facaff0e0, prio 0, name '3705 device(1:0667) thread' ended
20:48:56 HHC00101I Thread id 0000007facaff0e0, prio 0, name '3705 device(1:0667) thread' ended
20:48:56 HHC00101I Thread id 0000007fadfb60e0, prio 0, name 'socket_thread' ended
20:48:56 HHC00101I Thread id 0000007fadfb60e0, prio 0, name 'socket_thread' ended
20:48:56 HHC00101I Thread id 0000007fad15f0e0, prio 0, name '3705 device(1:0664) thread' ended
20:48:56 HHC00101I Thread id 0000007fad15f0e0, prio 0, name '3705 device(1:0664) thread' ended
20:48:56 HHC00101I Thread id 0000007fad37f0e0, prio 0, name '3705 device(1:0663) thread' ended
20:48:56 HHC00101I Thread id 0000007fad37f0e0, prio 0, name '3705 device(1:0663) thread' ended
20:48:56 HHC01423I Calling termination routines
20:48:56 HHC01423I Calling termination routines
20:48:56 HHC00101I Thread id 0000007fac49f0e0, prio 0, name '3705 device(1:066A) thread' ended
20:48:56 HHC00101I Thread id 0000007fac49f0e0, prio 0, name '3705 device(1:066A) thread' ended
20:48:56 HHC00101I Thread id 0000007fac6bf0e0, prio 0, name '3705 device(1:0669) thread' ended
20:48:56 HHC00101I Thread id 0000007fac6bf0e0, prio 0, name '3705 device(1:0669) thread' ended
20:48:56 HHC00101I Thread id 0000007fac8df0e0, prio 0, name '3705 device(1:0668) thread' ended
20:48:56 HHC00101I Thread id 0000007fac8df0e0, prio 0, name '3705 device(1:0668) thread' ended
20:48:56 HHC00101I Thread id 0000007fac27f0e0, prio 0, name '3705 device(1:066B) thread' ended
20:48:56 HHC00101I Thread id 0000007fac27f0e0, prio 0, name '3705 device(1:066B) thread' ended
20:48:56 HHC00101I Thread id 0000007fad59f0e0, prio 0, name '3705 device(1:0662) thread' ended
20:48:56 HHC00101I Thread id 0000007fad59f0e0, prio 0, name '3705 device(1:0662) thread' ended
20:48:56 HHC00101I Thread id 0000007fad9df0e0, prio 0, name '3705 device(1:0660) thread' ended
20:48:56 HHC00101I Thread id 0000007fad9df0e0, prio 0, name '3705 device(1:0660) thread' ended
20:48:56 HHC00101I Thread id 0000007fafcef0e0, prio 0, name 'http_server' ended
20:48:56 HHC00101I Thread id 0000007fafaec0e0, prio 0, name 'Processor CP00' ended
20:48:56 HHC00333I 0:0390   32/64       size free  nbr st   reads  writes l2reads    hits switches

here is the complete log: 3033-1.log

As I was testing a potential fix, I observed missing lines from the console. For example, the console showed (partial output):

HHC02262I Script 10: processing paused for 5000 milliseconds...
HHC02263I Script 10: processing resumed...
HHC01603I stopall
HHC01603I * pausing for 5 seconds, please stand by.
HHC02262I Script 10: processing paused for 5000 milliseconds...
HHC02263I Script 10: processing resumed...
HHC01603I sysclear
HHC02311I sysclear completed
HHC01603I * pausing for 5 seconds, please stand by.
HHC02262I Script 10: processing paused for 5000 milliseconds...
HHC00092W Warning in function USLEEP() at comm3705.c(1227): rc=0, EINTR retrys count=20
HHC00092W Warning in function USLEEP() at comm3705.c(1227): rc=0, EINTR retrys count=20
HHC00092W Warning in function USLEEP() at comm3705.c(1227): rc=0, EINTR retrys count=20
HHC00101I Thread id 00007fe4ffbfb640, prio -1, name '3705 device(1:0665) thread' ended
HHC00101I Thread id 00007fe4ffdfd640, prio -1, name '3705 device(1:0664) thread' ended
HHC00101I Thread id 00007fe4ff7f7640, prio -1, name '3705 device(1:0667) thread' ended
HHC00101I Thread id 00007fe53e726640, prio -1, name 'http_server' ended
HHC00101I Thread id 00007fe53e523640, prio -1, name 'Processor CP00' ended
HHC00333I 0:0390   32/64       size free  nbr st   reads  writes l2reads    hits switches
HHC00334I 0:0390                                                      readaheads   misses

but the log showed

15:23:33 HHC02262I Script 10: processing paused for 5000 milliseconds...
15:23:38 HHC02263I Script 10: processing resumed...
15:23:38 HHC01603I stopall
15:23:38 HHC01603I * pausing for 5 seconds, please stand by.
15:23:38 HHC02262I Script 10: processing paused for 5000 milliseconds...
15:23:43 HHC02263I Script 10: processing resumed...
15:23:43 HHC01603I sysclear
15:23:43 HHC02311I sysclear completed
15:23:43 HHC01603I * pausing for 5 seconds, please stand by.
15:23:43 HHC02262I Script 10: processing paused for 5000 milliseconds...
15:23:48 HHC02263I Script 10: processing resumed...
15:23:48 HHC01603I quit
15:23:48 HHC02262I Script 10: processing paused for 60000 milliseconds...
15:23:48 HHC01420I Begin Hercules shutdown
15:23:48 HHC00101I Thread id 00007fe4fdddd640, prio -1, name 'hao_thread' ended
15:23:48 HHC00101I Thread id 00007fe4ff1f1640, prio -1, name '3705 device(1:066A) thread' ended
15:23:48 HHC00101I Thread id 00007fe4fefef640, prio -1, name '3705 device(1:066B) thread' ended
15:23:48 HHC00101I Thread id 00007fe4ff3f3640, prio -1, name '3705 device(1:0669) thread' ended
15:23:48 HHC00101I Thread id 00007fe4ff5f5640, prio -1, name '3705 device(1:0668) thread' ended
15:23:48 HHC00101I Thread id 00007fe4fffff640, prio -1, name '3705 device(1:0663) thread' ended
15:23:48 HHC00101I Thread id 00007fe53c414640, prio -1, name '3705 device(1:0661) thread' ended
15:23:48 HHC00101I Thread id 00007fe53c212640, prio -1, name '3705 device(1:0662) thread' ended
15:23:48 HHC00101I Thread id 00007fe53c616640, prio -1, name '3705 device(1:0660) thread' ended
15:23:48 HHC00101I Thread id 00007fe53cb1b640, prio -1, name 'socket_thread' ended
15:23:48 HHC00092W Warning in function USLEEP() at hscmisc.c(1479): rc=0, EINTR retrys count=20
15:23:48 HHC01423I Calling termination routines
15:23:48 HHC00092W Warning in function USLEEP() at timer.c(336): rc=0, EINTR retrys count=17**
15:23:48 HHC00092W Warning in function USLEEP() at comm3705.c(1227): rc=0, EINTR retrys count=20
15:23:48 HHC00092W Warning in function USLEEP() at comm3705.c(1227): rc=0, EINTR retrys count=20
15:23:48 HHC00092W Warning in function USLEEP() at comm3705.c(1227): rc=0, EINTR retrys count=20
15:23:48 HHC00101I Thread id 00007fe4ffbfb640, prio -1, name '3705 device(1:0665) thread' ended
15:23:48 HHC00101I Thread id 00007fe4ffdfd640, prio -1, name '3705 device(1:0664) thread' ended
15:23:48 HHC00101I Thread id 00007fe4ff7f7640, prio -1, name '3705 device(1:0667) thread' ended
15:23:48 HHC00101I Thread id 00007fe53e726640, prio -1, name 'http_server' ended
15:23:48 HHC00101I Thread id 00007fe53e523640, prio -1, name 'Processor CP00' ended
15:23:48 HHC00333I 0:0390   32/64       size free  nbr st   reads  writes l2reads    hits switches

17 lines missing starting with '15:23:48 HHC02263I Script 10: processing resumed...'.

Both problems seem to be related to the delay timing in do_shutdown_now()_between setting sysblk.shutdown = TRUE;
and the execution of 'logger_unredirect();' (for the duplicate log lines) and 'panel_cleanup' as part of 'hdl_atexit()' (for missing console lines).

For your consideration regarding this issue, I'll submit a pull request with my 'partial' fix. I'm still missing a few console lines.

Jim

@Fish-Git
Copy link
Member

Fish-Git commented Mar 1, 2024

Both Bill and I have confirmed that Jame's PR #634 is good, so I am closing this issue as resolved.

@Fish-Git Fish-Git closed this as completed Mar 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
BUG The issue describes likely incorrect product functionality that likely needs corrected. HELP! Help is needed from someone more experienced or I'm simply overloaded with too much work right now! L Linux only issue, such as with tuntap networking that doesn't occur on Windows.
Projects
None yet
Development

No branches or pull requests

3 participants