Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extract text from screenshot at test failure and errors from VM logs #528

Merged
merged 14 commits into from
Jan 19, 2024

Conversation

keirazhang
Copy link
Contributor

@keirazhang keirazhang commented Jan 4, 2024

This fix include:

  1. Added a new tool tools/extractor.py to extract text from screenshot and extract errors from VM's log file.
  2. Detected errors in Linux VM's serial port log file at test failure.
  3. Extracted text from screenshot at test failure.
  4. Added a tag fail_message to append errors into failed_tasks.log

@keirazhang keirazhang added this to the 3.1 milestone Jan 4, 2024
@keirazhang
Copy link
Contributor Author

The extractor.py can extract Call Trace or Traceback from logs like below:

Found call trace in log file:
Call Trace:
 <TASK>
 ? hrtimer_interrupt+0x12b/0x250
 __irq_exit_rcu+0xa2/0xd0
 irq_exit_rcu+0xe/0x20
 sysvec_apic_timer_interrupt+0x43/0xb0
 asm_sysvec_apic_timer_interrupt+0x1b/0x20
 </TASK>
Call Trace:
 <TASK>
 ? call_rcu+0xe/0x20
 kfree_skbmem+0x58/0xb0
 consume_skb+0x50/0xe0
 arp_process+0x2c8/0x9d0
 ? kmem_cache_alloc+0x17e/0x340
 ? __skb_clone+0x2e/0x140
 arp_rcv.part.0+0xef/0x1b0
 arp_rcv+0x28/0x50
 __netif_receive_skb_core.constprop.0+0x4a9/0xe20
 ? kmalloc_reserve+0x31/0xa0
 __netif_receive_skb_one_core+0x3f/0xa0
 __netif_receive_skb+0x15/0x60
 netif_receive_skb+0x130/0x180
 vmxnet3_rq_rx_complete+0xb63/0x1060 [vmxnet3]
 ? kfree_skbmem+0x58/0xb0
 vmxnet3_poll_rx_only+0x39/0xe0 [vmxnet3]
 __napi_poll+0x30/0x1f0
 net_rx_action+0x185/0x2d0
 __do_softirq+0xda/0x330
 ? __pfx_smpboot_thread_fn+0x10/0x10
 run_ksoftirqd+0x37/0x60
 smpboot_thread_fn+0xe0/0x1e0
 kthread+0xeb/0x120
 ? __pfx_kthread+0x10/0x10
 ret_from_fork+0x29/0x50
 </TASK>
Found error messages in log file:
Oct 31 04:48:39 ubuntu networkd-dispatcher[2684]: ERROR:Unknown state for interface NetworkctlListState(idx=1, name='lo', type='loopback', operational='n/a', administrative='unmanaged'): n/a
Oct 31 04:48:39 ubuntu networkd-dispatcher[2684]: Traceback (most recent call last):
Oct 31 04:48:39 ubuntu networkd-dispatcher[2684]:   File "/usr/bin/networkd-dispatcher", line 298, in trigger_all
Oct 31 04:48:39 ubuntu networkd-dispatcher[2684]:     self.handle_state(iface_name,
Oct 31 04:48:39 ubuntu networkd-dispatcher[2684]:   File "/usr/bin/networkd-dispatcher", line 348, in handle_state
Oct 31 04:48:39 ubuntu networkd-dispatcher[2684]:     raise UnknownState(operational_state)
Oct 31 04:48:39 ubuntu networkd-dispatcher[2684]: ERROR:Unknown state for interface NetworkctlListState(idx=2, name='ens33', type='ether', operational='n/a', administrative='unmanaged'): n/a
Oct 31 04:48:39 ubuntu networkd-dispatcher[2684]: Traceback (most recent call last):
Oct 31 04:48:39 ubuntu networkd-dispatcher[2684]:   File "/usr/bin/networkd-dispatcher", line 298, in trigger_all
Oct 31 04:48:39 ubuntu networkd-dispatcher[2684]:     self.handle_state(iface_name,
Oct 31 04:48:39 ubuntu networkd-dispatcher[2684]:   File "/usr/bin/networkd-dispatcher", line 348, in handle_state
Oct 31 04:48:39 ubuntu networkd-dispatcher[2684]:     raise UnknownState(operational_state)
Oct 31 04:49:11 ubuntu pulseaudio[2911]: GetManagedObjects() failed: org.freedesktop.DBus.Error.TimedOut: Failed to activate service 'org.bluez': timed out (service_start_timeout=25000ms)
Oct 31 04:52:39 ubuntu ubiquity: py_compile.py: error: the following arguments are required: filenames
Oct 31 04:58:40 ubuntu in-target: ERROR: /etc/gdm3/custom.conf doesn't exist. Please check the config file for gdm3.service

tools/extractor.py Outdated Show resolved Hide resolved
tools/extractor.py Show resolved Hide resolved
common/wait_vm_deploy_checkpoint.yml Outdated Show resolved Hide resolved
common/README.md Outdated Show resolved Hide resolved
common/wait_vm_deploy_checkpoint.yml Outdated Show resolved Hide resolved
common/collect_serial_port_log.yml Outdated Show resolved Hide resolved
common/vm_take_screenshot.yml Show resolved Hide resolved
common/vm_take_screenshot.yml Show resolved Hide resolved
common/vm_take_screenshot.yml Show resolved Hide resolved
common/vm_set_power_state.yml Show resolved Hide resolved
README.md Show resolved Hide resolved
@keirazhang keirazhang force-pushed the add_extractor branch 3 times, most recently from 53ed5bb to 5b65adb Compare January 16, 2024 06:31
common/extract_errors_from_log.yml Outdated Show resolved Hide resolved
common/extract_errors_from_log.yml Outdated Show resolved Hide resolved
common/extract_errors_from_log.yml Show resolved Hide resolved
common/extract_errors_from_log.yml Outdated Show resolved Hide resolved
common/extract_text_from_screenshot.yml Outdated Show resolved Hide resolved
common/test_rescue.yml Show resolved Hide resolved
common/vm_guest_send_key.yml Outdated Show resolved Hide resolved
common/vm_guest_send_key.yml Show resolved Hide resolved
@keirazhang
Copy link
Contributor Author

A positive detection about Windows WinBSOD:
screenshot_at_15_nvme_vhba_device_ops

2024-01-16 09:27:22,016 | Failed at Play [15_nvme_vhba_device_ops] *******************

2024-01-16 09:27:22,016 | TASK [15_nvme_vhba_device_ops][Restart OS using win_reboot module] 
task path: /home/worker/workspace/Ansible_Regression_Windows_10_32/ansible-vsphere-gos-validation/windows/utils/win_shutdown_restart.yml:23
An exception occurred during task execution. The full traceback is:
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/urllib3/connection.py", line 203, in _new_conn
    sock = connection.create_connection(
  File "/usr/local/lib/python3.10/dist-packages/urllib3/util/connection.py", line 85, in create_connection
    raise err
  File "/usr/local/lib/python3.10/dist-packages/urllib3/util/connection.py", line 73, in create_connection
    sock.connect(sa)
TimeoutError: timed out

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/urllib3/connectionpool.py", line 790, in urlopen
    response = self._make_request(
  File "/usr/local/lib/python3.10/dist-packages/urllib3/connectionpool.py", line 491, in _make_request
    raise new_e
  File "/usr/local/lib/python3.10/dist-packages/urllib3/connectionpool.py", line 467, in _make_request
    self._validate_conn(conn)
  File "/usr/local/lib/python3.10/dist-packages/urllib3/connectionpool.py", line 1096, in _validate_conn
    conn.connect()
  File "/usr/local/lib/python3.10/dist-packages/urllib3/connection.py", line 611, in connect
    self.sock = sock = self._new_conn()
  File "/usr/local/lib/python3.10/dist-packages/urllib3/connection.py", line 212, in _new_conn
    raise ConnectTimeoutError(
urllib3.exceptions.ConnectTimeoutError: (<urllib3.connection.HTTPSConnection object at 0x7efc1d480250>, 'Connection to 10.83.0.83 timed out. (connect timeout=5)')

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/requests/adapters.py", line 486, in send
    resp = conn.urlopen(
  File "/usr/local/lib/python3.10/dist-packages/urllib3/connectionpool.py", line 874, in urlopen
    return self.urlopen(
  File "/usr/local/lib/python3.10/dist-packages/urllib3/connectionpool.py", line 874, in urlopen
    return self.urlopen(
  File "/usr/local/lib/python3.10/dist-packages/urllib3/connectionpool.py", line 874, in urlopen
    return self.urlopen(
  [Previous line repeated 2 more times]
  File "/usr/local/lib/python3.10/dist-packages/urllib3/connectionpool.py", line 844, in urlopen
    retries = retries.increment(
  File "/usr/local/lib/python3.10/dist-packages/urllib3/util/retry.py", line 515, in increment
    raise MaxRetryError(_pool, url, reason) from reason  # type: ignore[arg-type]
urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='10.83.0.83', port=5986): Max retries exceeded with url: /wsman (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x7efc1d480250>, 'Connection to 10.83.0.83 timed out. (connect timeout=5)'))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/ansible/plugins/connection/psrp.py", line 392, in _connect
    self.runspace.open()
  File "/usr/local/lib/python3.10/dist-packages/pypsrp/powershell.py", line 548, in open
    self.shell.open(options, open_content)
  File "/usr/local/lib/python3.10/dist-packages/pypsrp/shell.py", line 205, in open
    response = self.wsman.create(self.resource_uri, shell, option_set=options if len(options.values) else None)
  File "/usr/local/lib/python3.10/dist-packages/pypsrp/wsman.py", line 290, in create
    res = self.invoke(WSManAction.CREATE, resource_uri, resource, option_set, selector_set, timeout)
  File "/usr/local/lib/python3.10/dist-packages/pypsrp/wsman.py", line 470, in invoke
    response = self.transport.send(xml)
  File "/usr/local/lib/python3.10/dist-packages/pypsrp/wsman.py", line 837, in send
    return self._send_request(prep_request)
  File "/usr/local/lib/python3.10/dist-packages/pypsrp/wsman.py", line 840, in _send_request
    response = self.session.send(request, timeout=(self.connection_timeout, self.read_timeout))  # type: ignore[union-attr] # This should not happen
  File "/usr/local/lib/python3.10/dist-packages/requests/sessions.py", line 703, in send
    r = adapter.send(request, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/requests/adapters.py", line 507, in send
    raise ConnectTimeout(e, request=request)
requests.exceptions.ConnectTimeout: HTTPSConnectionPool(host='10.83.0.83', port=5986): Max retries exceeded with url: /wsman (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x7efc1d480250>, 'Connection to 10.83.0.83 timed out. (connect timeout=5)'))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/root/.ansible/collections/ansible_collections/ansible/windows/plugins/plugin_utils/_reboot.py", line 546, in _wrap_conn_err
    func(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/ansible/plugins/connection/psrp.py", line 425, in reset
    self._connect()
  File "/usr/local/lib/python3.10/dist-packages/ansible/plugins/connection/psrp.py", line 402, in _connect
    raise AnsibleConnectionFailure(
ansible.errors.AnsibleConnectionFailure: Failed to connect to the host via PSRP: HTTPSConnectionPool(host='10.83.0.83', port=5986): Max retries exceeded with url: /wsman (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x7efc1d480250>, 'Connection to 10.83.0.83 timed out. (connect timeout=5)'))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/root/.ansible/collections/ansible_collections/ansible/windows/plugins/plugin_utils/_reboot.py", line 352, in _do_until_success_or_timeout
    return _do_until_success_or_condition(
  File "/root/.ansible/collections/ansible_collections/ansible/windows/plugins/plugin_utils/_reboot.py", line 430, in _do_until_success_or_condition
    raise last_error
  File "/root/.ansible/collections/ansible_collections/ansible/windows/plugins/plugin_utils/_reboot.py", line 388, in _do_until_success_or_condition
    _reset_connection(task_action, connection, host_context)
  File "/root/.ansible/collections/ansible_collections/ansible/windows/plugins/plugin_utils/_reboot.py", line 574, in _reset_connection
    _wrap_conn_err(connection.reset)
  File "/root/.ansible/collections/ansible_collections/ansible/windows/plugins/plugin_utils/_reboot.py", line 552, in _wrap_conn_err
    raise AnsibleError(e)
ansible.errors.AnsibleError: Failed to connect to the host via PSRP: HTTPSConnectionPool(host='10.83.0.83', port=5986): Max retries exceeded with url: /wsman (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x7efc1d480250>, 'Connection to 10.83.0.83 timed out. (connect timeout=5)'))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/root/.ansible/collections/ansible_collections/ansible/windows/plugins/plugin_utils/_reboot.py", line 229, in reboot_host
    _do_until_success_or_timeout(
  File "/root/.ansible/collections/ansible_collections/ansible/windows/plugins/plugin_utils/_reboot.py", line 363, in _do_until_success_or_timeout
    raise Exception(
Exception: Timed out waiting for last boot time check (timeout=600.0)

fatal: [localhost -> 10.83.0.83]: FAILED! => {
    "changed": true,
    "elapsed": 764,
    "msg": "Timed out waiting for last boot time check (timeout=600.0)",
    "rebooted": true,
    "unreachable": false
}
error message:
Timed out waiting for last boot time check (timeout=600.0)


2024-01-16 09:27:30,016 | TASK [15_nvme_vhba_device_ops][Display extracted text from screenshot] 
task path: /home/worker/workspace/Ansible_Regression_Windows_10_32/ansible-vsphere-gos-validation/common/test_rescue.yml:60
ok: [localhost] => {
    "msg": [
        "Extracted text from image:",
        "Your device ran into a problem and needs to restart. You can restart.",
        "For more information about this issue and possible fixes, visit https://www.windows.com/stopcode",
        "If you call a support person, give them this info:",
        "Stop code: DRIVER IRQL NOT LESS OR EQUAL",
        "What failed: stornvme.sys"
    ]
}
error message:
Extracted text from image:
Your device ran into a problem and needs to restart. You can restart.
For more information about this issue and possible fixes, visit https://www.windows.com/stopcode
If you call a support person, give them this info:
Stop code: DRIVER IRQL NOT LESS OR EQUAL
What failed: stornvme.sys

Tomorrow9
Tomorrow9 previously approved these changes Jan 19, 2024
Copy link
Member

@Tomorrow9 Tomorrow9 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

keirazhang and others added 14 commits January 19, 2024 15:20
Signed-off-by: Qi Zhang <[email protected]>
Signed-off-by: Qi Zhang <[email protected]>
Signed-off-by: Qi Zhang <[email protected]>
Signed-off-by: Qi Zhang <[email protected]>
Signed-off-by: Qi Zhang <[email protected]>
@keirazhang keirazhang merged commit c8f201b into vmware:main Jan 19, 2024
@keirazhang keirazhang deleted the add_extractor branch March 7, 2024 05:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants