-
Notifications
You must be signed in to change notification settings - Fork 79
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pattern error message #130
Comments
I think what's going on here is that NHC is getting confused by the fact that the leading portion (what the code refers to as the Because Bash auto-interprets numbers in bases other than 10 under certain circumstances, the range-matching code prepends I'll see if I can reproduce the problem myself by hand, but if you'd be willing to attach the output from running Thanks for reporting the bug! PS: As a possible workaround, for the time being, you could change to a glob expression like |
Hi Michael,
Thanks for your answer.
Please find the output of the 'nhc -ax' command on cluster-nfs15.
PS: As a possible workaround, for the time being, you could change to
a glob expression like |cluster-n[0-9][0-9]| or a regular expression
like |/^cluster-n[[:digit:]]+$/|.
We use the same nhc.conf on all our nodes (heterogenous nodes, in SLURM
or not), there is 42 patterns (pdsh pattern with {}) to modify.
As the errors are only present on our spiro-nfs[01-15] nodes, and the
reason of this messages is identified without impact, we will be patient.
Many thanks for your support !
Best regards,
Bruno
Bruno AGNERAY - DSI
Service Infrastructure Système et Réseaux / Calcul Scientifique Intensif
Tél: +33 1 46 73 44 10
Mail ***@***.***
ONERA - The French Aerospace Lab - Centre de Châtillon
29, avenue de la Division Leclerc - BP 72 - 92322 CHÂTILLON CEDEX
Le 20/04/2023 à 22:26, Michael Jennings a écrit :
I think what's going on here is that NHC is getting confused by the
fact that the leading portion (what the code refers to as the
|PREFIX|) of the hostname |cluster-nfs15|, when matched against the
range expression |cluster-n[01-99]|, is taken to be |cluster-n|. Once
that gets trimmed off, it then tries to treat the remainder of the
hostname (i.e., |fs15|) as a number that it then tries to compare with
the range |01-99| to see if the "number" |fs15| falls within that range.
Because Bash auto-interprets numbers in bases other than 10 under
certain circumstances, the range-matching code prepends |10#| to the
numeric variables
<https://github.com/mej/nhc/blob/1.4.3/scripts/common.nhc#L201> to
ensure they get treated as base-10 numbers in all cases. In this
situation, however, |fs| is getting erroneously lumped into the
numeric value, and as the error message says, |f| and |s| don't fall
within the range of digits that are valid for base-10 numbers.
I'll see if I can reproduce the problem myself by hand, but if you'd
be willing to attach the output from running |nhc -ax| on that
|cluster-nfs15| host, that'd help a lot! 😀 In the meantime, though,
the error shouldn't be causing any actual breakage -- range expression
matching should still be working accurately, right?
Thanks for reporting the bug!
PS: As a possible workaround, for the time being, you could change to
a glob expression like |cluster-n[0-9][0-9]| or a regular expression
like |/^cluster-n[[:digit:]]+$/|.
—
Reply to this email directly, view it on GitHub
<#130 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/ARF4OLJ57JF2NLY6BPYKTP3XCGLXXANCNFSM6AAAAAAWVBWFIY>.
You are receiving this because you authored the thread.Message ID:
***@***.***>
***@***.*** ~]# nhc -ax
…***@***.***:342:nhcmain_parse_cmdline()]> dbg 'BASH tracing active.'
***@***.***:99:dbg()]> local PREFIX=
***@***.***:101:dbg()]> [[ '' == \1 ]]
***@***.***:328:nhcmain_parse_cmdline()]> getopts :D:ac:de:fhl:n:qt:vx OPTION
***@***.***:347:nhcmain_parse_cmdline()]> shift 1
***@***.***:348:nhcmain_parse_cmdline()]> [[ ! -z '' ]]
***@***.***:352:nhcmain_parse_cmdline()]> return 0
***@***.***:729:main()]> nhcmain_load_sysconfig
***@***.***:359:nhcmain_load_sysconfig()]> [[ -f /etc/sysconfig/nhc ]]
***@***.***:730:main()]> nhcmain_finalize_env
***@***.***:367:nhcmain_finalize_env()]> CONFDIR=/etc/nhc
***@***.***:368:nhcmain_finalize_env()]> CONFFILE=/etc/nhc/nhc.conf
***@***.***:369:nhcmain_finalize_env()]> INCDIR=/etc/nhc/scripts
***@***.***:370:nhcmain_finalize_env()]> HELPERDIR=/usr/libexec/nhc
***@***.***:371:nhcmain_finalize_env()]> ONLINE_NODE=/usr/libexec/nhc/node-mark-online
***@***.***:372:nhcmain_finalize_env()]> OFFLINE_NODE=/usr/libexec/nhc/node-mark-offline
***@***.***:373:nhcmain_finalize_env()]> LOGFILE='>>/var/log/nhc.log 2>&1'
***@***.***:374:nhcmain_finalize_env()]> RESULTFILE=/var/run/nhc/nhc.status
***@***.***:375:nhcmain_finalize_env()]> DEBUG=0
***@***.***:376:nhcmain_finalize_env()]> TS=0
***@***.***:377:nhcmain_finalize_env()]> SILENT=0
***@***.***:378:nhcmain_finalize_env()]> VERBOSE=0
***@***.***:379:nhcmain_finalize_env()]> MARK_OFFLINE=1
***@***.***:380:nhcmain_finalize_env()]> DETACHED_MODE=0
***@***.***:381:nhcmain_finalize_env()]> DETACHED_MODE_FAIL_NODATA=0
***@***.***:382:nhcmain_finalize_env()]> TIMEOUT=30
***@***.***:383:nhcmain_finalize_env()]> NHC_CHECK_ALL=1
***@***.***:384:nhcmain_finalize_env()]> NHC_CHECK_FORKED=0
***@***.***:385:nhcmain_finalize_env()]> export NHC_SID=0
***@***.***:385:nhcmain_finalize_env()]> NHC_SID=0
***@***.***:388:nhcmain_finalize_env()]> kill -s 0 -- -784937
***@***.***:389:nhcmain_finalize_env()]> [[ 0 -eq 0 ]]
***@***.***:391:nhcmain_finalize_env()]> dbg 'NHC process 784937 is session leader.'
***@***.***:99:dbg()]> local PREFIX=
***@***.***:101:dbg()]> [[ 0 == \1 ]]
***@***.***:392:nhcmain_finalize_env()]> NHC_SID=-784937
***@***.***:405:nhcmain_finalize_env()]> [[ -n '' ]]
***@***.***:410:nhcmain_finalize_env()]> [[ >>/var/log/nhc.log 2>&1 != \>\>\/\v\a\r\/\l\o\g\/\n\h\c\.\l\o\g\ \2\>\&\1 ]]
***@***.***:413:nhcmain_finalize_env()]> [[ >>/var/log/nhc.log 2>&1 == \- ]]
***@***.***:418:nhcmain_finalize_env()]> [[ -z '' ]]
***@***.***:419:nhcmain_finalize_env()]> nhcmain_find_rm
***@***.***:455:nhcmain_find_rm()]> local DIR
***@***.***:456:nhcmain_find_rm()]> local -a DIRLIST
***@***.***:458:nhcmain_find_rm()]> [[ -d /var/spool/torque ]]
***@***.***:461:nhcmain_find_rm()]> [[ -n '' ]]
***@***.***:468:nhcmain_find_rm()]> type -a -p -f -P scontrol
***@***.***:471:nhcmain_find_rm()]> type -a -p -f -P pbsnodes
***@***.***:474:nhcmain_find_rm()]> type -a -p -f -P qselect
***@***.***:477:nhcmain_find_rm()]> type -a -p -f -P badmin
***@***.***:477:nhcmain_find_rm()]> type -a -p -f -P sbatchd
***@***.***:482:nhcmain_find_rm()]> [[ -z '' ]]
***@***.***:483:nhcmain_find_rm()]> dbg 'Unable to detect resource manager.'
***@***.***:99:dbg()]> local PREFIX=
***@***.***:101:dbg()]> [[ 0 == \1 ]]
***@***.***:484:nhcmain_find_rm()]> return 1
***@***.***:420:nhcmain_finalize_env()]> ONLINE_NODE=:
***@***.***:421:nhcmain_finalize_env()]> OFFLINE_NODE=:
***@***.***:422:nhcmain_finalize_env()]> MARK_OFFLINE=0
***@***.***:425:nhcmain_finalize_env()]> [[ '' == \s\g\e ]]
***@***.***:436:nhcmain_finalize_env()]> [[ 0 -ne 0 ]]
***@***.***:443:nhcmain_finalize_env()]> [[ -n '' ]]
***@***.***:445:nhcmain_finalize_env()]> [[ 0 -eq 1 ]]
***@***.***:451:nhcmain_finalize_env()]> export NAME CONFDIR CONFFILE INCDIR HELPERDIR ONLINE_NODE OFFLINE_NODE LOGFILE DEBUG TS SILENT TIMEOUT NHC_RM
***@***.***:731:main()]> [[ -n '' ]]
***@***.***:736:main()]> nhcmain_redirect_output
***@***.***:489:nhcmain_redirect_output()]> [[ -n >>/var/log/nhc.log 2>&1 ]]
***@***.***:490:nhcmain_redirect_output()]> exec
***@***.***:710:nhcmain_finish()]> exit 0
|
Hi,
We use lbnl-nhc-1.4.3-1 version.
We have nodes with name like cluster-n[01-99] and storage nodes with name like cluster-nfs[01-99].
We have the following lines in our nhc.conf file :
{cluster-n[01-99]} || export NHC_RM=
{cluster-nfs[01-99]} || export NHC_RM=
When executing the command 'nhc -a' on a storage node in cluster-nfs, we encounter the following error message like :
/etc/nhc/scripts/common.nhc: line 201: [[: 10#fs15: value too great for base (error token is "10#fs15")
Regards,
Bruno
The text was updated successfully, but these errors were encountered: