DAOS-16111 test: new utility daos_sys_logscan #15629

kccain · 2024-12-17T16:37:08Z

Scan a list of engine logfiles to produce a nested dictionary of pools and a sequence of key events such as pool leadership terms, pool map version updates (due to target state changes), rebuild start and progress update events and total rebuild duration. This first version focuses on finding the pool service leader engine log file and producing this information. Future updates to the tool can potentially include finer-grain tracking of operations across all pool storage engine log files.

The supporting class LogLine in cart_logparse.py has a tiny change to support this new utility.

Before requesting gatekeeper:

Two review approvals and any prior change requests have been resolved.
Testing is complete and all tests passed or there is a reason documented in the PR why it should be force landed and forced-landing tag is set.
Features: (or Test-tag*) commit pragma was used or there is a reason documented that there are no appropriate tags for this PR.
Commit messages follows the guidelines outlined here.
Any tests skipped by the ticket being addressed have been run and passed in the PR.

Gatekeeper:

Scan a list of engine logfiles to produce a nested dictionary of pools and a sequence of key events such as pool leadership terms, pool map version updates (due to target state changes), rebuild start and progress update events and total rebuild duration. This first version focuses on finding the pool service leader engine log file and producing this information. Future updates to the tool can potentially include finer-grain tracking of operations across all pool storage engine log files. The supporting class LogLine in cart_logparse.py has a tiny change to support this new utility. Signed-off-by: Kenneth Cain <[email protected]>

github-actions · 2024-12-17T16:37:28Z

Ticket title is 'rebuild enhancement: uniform identifier in log messages'
Status is 'In Review'
https://daosio.atlassian.net/browse/DAOS-16111

daosbuild1 · 2024-12-17T16:50:10Z

Test stage Build RPM on Leap 15.5 completed with status UNSTABLE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-15629/1/execution/node/327/log

daosbuild1 · 2024-12-17T16:57:45Z

Test stage Build RPM on EL 9 completed with status UNSTABLE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-15629/1/execution/node/437/log

daosbuild1 · 2024-12-17T17:00:57Z

Test stage Build RPM on EL 8 completed with status UNSTABLE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-15629/1/execution/node/432/log

daosbuild1 · 2024-12-17T17:08:04Z

Test stage Build DEB on Ubuntu 20.04 completed with status UNSTABLE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-15629/1/execution/node/431/log

daltonbohning · 2024-12-17T19:20:39Z

src/tests/ftest/cart/util/daos_sys_logscan.py

+    # rebuild aborted/errored
+    # rebuild completed/success
+
+    re_rank_assign = re.compile("ds_mgmt_drpc_set_rank.*set rank to (\d+)")


W605 invalid escape sequence usually resolved by using a regex string

Suggested change

re_rank_assign = re.compile("ds_mgmt_drpc_set_rank.*set rank to (\d+)")

re_rank_assign = re.compile(r"ds_mgmt_drpc_set_rank.*set rank to (\d+)")

daltonbohning · 2024-12-17T19:27:17Z

src/tests/ftest/cart/util/daos_sys_logscan.py

+            rank = -1
+            for line in log_iter.new_iter(pid=pid):
+                msg = line.get_msg()
+                host = line.hostname
+                datetime = line.time_stamp
+                # Find engine rank assignment (early in log)
+                match = self.re_rank_assign.match(msg)
+                if match:
+                    rank = int(match.group(1))
+                    print(f"========== rank {rank} logfile {fname} ==========")
+                    continue


What if rank == -1 at the end of this loop?

daltonbohning · 2024-12-17T19:42:36Z

src/tests/ftest/cart/util/daos_sys_logscan.py

+                match = self.re_pmap_update.match(msg)
+                if match:
+                    puuid = match.group(1)
+                    from_ver = int(match.group(2))
+                    to_ver = int(match.group(3))
+                    # ignore if this engine is not the leader
+                    if puuid not in self._pools or rank != self._cur_ldr_rank[puuid]:
+                        continue
+                    term = self._cur_term[puuid]
+                    self._pools[puuid][term]["maps"][to_ver] = {"carryover": False, "from_ver": from_ver, "time": datetime, "rebuild_gens": {}}
+                    #print(f"FOUND pool {puuid} map update {from_ver}->{to_ver} rank {rank}\t{host}\tPID {pid}\t{fname}")
+                    continue


This function has a lot of these of the format

match = ... if match: ...

I think it would be easier to manage if each was a helper function. Then this function could either loop over each one, or call out to each function. It's generally not pythonic to have large functions.

There are many ways to do this, but some general examples might be (depending on what you need)

def fun1(arg): print(arg) def fun2(arg): print(arg) all_funs = (fun1, fun2) for _fun in all_funs: _fun('test')

def fun1(arg1, arg2): print(arg1, arg2) def fun2(arg1, arg2): print(arg1, arg2) all_funs_args = ( (fun1, ('arg1', 'arg2')), (fun2, ('arg1', 'arg2'))) for _fun, args in all_funs_args: _fun(*args)

daltonbohning · 2024-12-17T19:46:21Z

src/tests/ftest/cart/util/daos_sys_logscan.py

+                        continue
+                    term = self._cur_term[puuid]
+                    if term < 1:
+                        print(f"WARN pool {puuid} I don't know what term it is ({term})!")


Depending on what tracks overall failure, might want to store warnings and errors somewhere instead of simply printing. Roughly e.g.

def _warn(self, message): print(f"WARN: {message}") self._warnings.append(message) # if you want to store the message self._warnings +=1 # if you just want a count

And then later can do

if self._warnings: print("Whoah something bad!") sys.exit(1) # or return 1, etc.

Signed-off-by: Kenneth Cain <[email protected]>

daosbuild1 · 2024-12-20T03:47:12Z

Test stage Build RPM on Leap 15.5 completed with status UNSTABLE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-15629/2/execution/node/361/log

daosbuild1 · 2024-12-20T03:48:56Z

Test stage Build RPM on EL 9 completed with status UNSTABLE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-15629/2/execution/node/368/log

daosbuild1 · 2024-12-20T03:52:03Z

Test stage Build RPM on EL 8 completed with status UNSTABLE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-15629/2/execution/node/367/log

daosbuild1 · 2024-12-20T03:56:22Z

Test stage Build DEB on Ubuntu 20.04 completed with status UNSTABLE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-15629/2/execution/node/349/log

knard-intel

Small suggested fixes.

knard-intel · 2024-12-20T09:01:32Z

src/tests/ftest/cart/util/daos_sys_logscan.py

+    def _warn(self, wmsg, fname, line):
+        full_msg = f"WARN file={fname}, line={line.lineno}: " + wmsg
+        self._warnings.append(full_msg)


Suggested change

def _warn(self, wmsg, fname, line):

full_msg = f"WARN file={fname}, line={line.lineno}: " + wmsg

self._warnings.append(full_msg)

def _warn(self, wmsg, fname, line=None):

full_msg = f"WARN file={fname}"

if line:

full_msg += f", line={line.lineno}"

full_msg += f": {wmsg}"

self._warnings.append(full_msg)

knard-intel · 2024-12-20T09:02:56Z

src/tests/ftest/cart/util/daos_sys_logscan.py

+        # Find rank assignment log line for this file. Can't do much without it.
+        self._file_to_rank[fname] = rank
+        if rank == -1 and not self.find_rank(log_iter):
+            self._warn(f"cannot find rank assignment in log file - skipping", fname, line)


Suggested change

self._warn(f"cannot find rank assignment in log file - skipping", fname, line)

self._warn("cannot find rank assignment in log file - skipping", fname)

- fix up some flake8 python linting issues Signed-off-by: Kenneth Cain <[email protected]>

daosbuild1 · 2024-12-20T16:28:44Z

Test stage Build RPM on EL 9 completed with status UNSTABLE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-15629/3/execution/node/356/log

daosbuild1 · 2024-12-20T16:29:33Z

Test stage Build RPM on Leap 15.5 completed with status UNSTABLE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-15629/3/execution/node/357/log

daosbuild1 · 2024-12-20T16:29:45Z

Test stage Build RPM on EL 8 completed with status UNSTABLE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-15629/3/execution/node/351/log

daosbuild1 · 2024-12-20T16:33:33Z

Test stage Build DEB on Ubuntu 20.04 completed with status UNSTABLE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-15629/3/execution/node/345/log

Signed-off-by: Kenneth Cain <[email protected]>

daosbuild1 · 2024-12-20T17:24:10Z

Test stage Build RPM on EL 9 completed with status UNSTABLE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-15629/4/execution/node/357/log

daosbuild1 · 2024-12-20T17:25:28Z

Test stage Build RPM on Leap 15.5 completed with status UNSTABLE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-15629/4/execution/node/358/log

daosbuild1 · 2024-12-20T17:28:57Z

Test stage Build RPM on EL 8 completed with status UNSTABLE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-15629/4/execution/node/352/log

daosbuild1 · 2024-12-20T17:34:11Z

Test stage Build DEB on Ubuntu 20.04 completed with status UNSTABLE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-15629/4/execution/node/338/log

daltonbohning reviewed Dec 17, 2024

View reviewed changes

first refactor, address review feedback

c691881

Signed-off-by: Kenneth Cain <[email protected]>

knard-intel reviewed Dec 20, 2024

View reviewed changes

- legacy rebuild start/status_check regexs to match on daos 2.6 logging

5c0afdf

- fix up some flake8 python linting issues Signed-off-by: Kenneth Cain <[email protected]>

more cleanup

f00b62c

Signed-off-by: Kenneth Cain <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DAOS-16111 test: new utility daos_sys_logscan #15629

DAOS-16111 test: new utility daos_sys_logscan #15629

kccain commented Dec 17, 2024

github-actions bot commented Dec 17, 2024

daosbuild1 commented Dec 17, 2024

daosbuild1 commented Dec 17, 2024

daosbuild1 commented Dec 17, 2024

daosbuild1 commented Dec 17, 2024

daltonbohning Dec 17, 2024

daltonbohning Dec 17, 2024

daltonbohning Dec 17, 2024

daltonbohning Dec 17, 2024

daosbuild1 commented Dec 20, 2024

daosbuild1 commented Dec 20, 2024

daosbuild1 commented Dec 20, 2024

daosbuild1 commented Dec 20, 2024

knard-intel left a comment

knard-intel Dec 20, 2024 •

edited

Loading

knard-intel Dec 20, 2024

daosbuild1 commented Dec 20, 2024

daosbuild1 commented Dec 20, 2024

daosbuild1 commented Dec 20, 2024

daosbuild1 commented Dec 20, 2024

daosbuild1 commented Dec 20, 2024

daosbuild1 commented Dec 20, 2024

daosbuild1 commented Dec 20, 2024

daosbuild1 commented Dec 20, 2024

	re_rank_assign = re.compile("ds_mgmt_drpc_set_rank.*set rank to (\d+)")
	re_rank_assign = re.compile(r"ds_mgmt_drpc_set_rank.*set rank to (\d+)")

-    def _warn(self, wmsg, fname, line):
-        full_msg = f"WARN file={fname}, line={line.lineno}: " + wmsg
-        self._warnings.append(full_msg)
+    def _warn(self, wmsg, fname, line=None):
+        full_msg = f"WARN file={fname}"
+        if line:
+            full_msg += f", line={line.lineno}"
+        full_msg += f": {wmsg}"
+        self._warnings.append(full_msg)

	self._warn(f"cannot find rank assignment in log file - skipping", fname, line)
	self._warn("cannot find rank assignment in log file - skipping", fname)

DAOS-16111 test: new utility daos_sys_logscan #15629

Are you sure you want to change the base?

DAOS-16111 test: new utility daos_sys_logscan #15629

Conversation

kccain commented Dec 17, 2024

Before requesting gatekeeper:

Gatekeeper:

github-actions bot commented Dec 17, 2024

daosbuild1 commented Dec 17, 2024

daosbuild1 commented Dec 17, 2024

daosbuild1 commented Dec 17, 2024

daosbuild1 commented Dec 17, 2024

daltonbohning Dec 17, 2024

Choose a reason for hiding this comment

daltonbohning Dec 17, 2024

Choose a reason for hiding this comment

daltonbohning Dec 17, 2024

Choose a reason for hiding this comment

daltonbohning Dec 17, 2024

Choose a reason for hiding this comment

daosbuild1 commented Dec 20, 2024

daosbuild1 commented Dec 20, 2024

daosbuild1 commented Dec 20, 2024

daosbuild1 commented Dec 20, 2024

knard-intel left a comment

Choose a reason for hiding this comment

knard-intel Dec 20, 2024 • edited Loading

Choose a reason for hiding this comment

knard-intel Dec 20, 2024

Choose a reason for hiding this comment

daosbuild1 commented Dec 20, 2024

daosbuild1 commented Dec 20, 2024

daosbuild1 commented Dec 20, 2024

daosbuild1 commented Dec 20, 2024

daosbuild1 commented Dec 20, 2024

daosbuild1 commented Dec 20, 2024

daosbuild1 commented Dec 20, 2024

daosbuild1 commented Dec 20, 2024

knard-intel Dec 20, 2024 •

edited

Loading