Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

zed syslog entries drop important info #10967

Merged
merged 4 commits into from
Oct 19, 2020
Merged

zed syslog entries drop important info #10967

merged 4 commits into from
Oct 19, 2020

Conversation

don-brady
Copy link
Contributor

Motivation and Context

ZED will log zevents summaries to the syslog, however the log entries tend to drop event details that can be useful for diagnosis. This is especially true for ereport events, like io, checksum, and delay.

A typical ereport zevent entry looked like:

zed: eid=12727 class=checksum pool_guid=0xC3E5BD371EB1E534 vdev_path=/dev/disk/by-id/scsi-36000c29c9ec70e1e1d4e912ee09330e2-part1
zed: eid=12732 class=io pool_guid=0xC3E5BD371EB1E534 vdev_path=/dev/disk/by-id/scsi-36000c29c9ec70e1e1d4e912ee09330e2-part1
zed: eid=12740 class=delay pool_guid=0xC3E5BD371EB1E534 vdev_path=/dev/disk/by-id/scsi-36000c29c9ec70e1e1d4e912ee09330e2-part1

I propose that we add additional event information, like:

algorithm=fletcher4
size=512
offset=20971520
priority=4
flags=0x1888b0
bookmark=54:35:0:0
delay=32531ms

So that the above entries become:

zed: eid=11214 class=checksum pool='error_pool' vdev=scsi-36000c29c9ec70e1e1d4e912ee09330e2-part1 algorithm=fletcher4 size=512 offset=20971520 priority=4 flags=0x1888b0 bookmark=54:35:0:0
zed: eid=11278 class=io pool='error_pool' vdev=scsi-36000c29c9ec70e1e1d4e912ee09330e2-part1 size=512 offset=20971520 priority=4 err=5 flags=0x1888b0 bookmark=54:1:0:0
zed: eid=12793 class=delay pool='demo' vdev=scsi-36000c297399fae4233e9ef38c03e9706-part1 size=131072 offset=138426368 priority=0 err=5 flags=0x180880 delay=32531ms bookmark=54:3:0:0

Description

Updated the all-syslog.sh script to log additional event information.

Added an optional config option, ZED_SYSLOG_DISPLAY_GUIDS, to zed.rc for choosing GUIDs over names for pool and vdev.

Changed the default ZED_SYSLOG_SUBCLASS_EXCLUDE to exclude history_event events. These events tend to be frequent, convey no meaningful info, and are already logged in the zpool history.

How Has This Been Tested?

Used ZTS zpool_events suite to generate events and confirmed that new information is now logged to the syslog.
Every new pair that was added was accounted for.
Checked script with shellcheck

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Performance enhancement (non-breaking change which improves efficiency)
  • Code cleanup (non-breaking change which makes code smaller or more readable)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Documentation (a change to man pages or other documentation)

Checklist:

Copy link
Contributor

@behlendorf behlendorf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm all for this. It just looks like there are a couple of existing testing which need to be updated to expect the new defaults.

Tests with results other than PASS that are unexpected:
    FAIL events/events_001_pos (expected PASS)
    FAIL events/zed_rc_filter (expected PASS)

@behlendorf behlendorf added the Status: Code Review Needed Ready for review and testing label Sep 23, 2020
@allanjude
Copy link
Contributor

@mattmacy @freqlabs What do we think of moving /etc/devd/zfs.conf into the openzfs repo, so we can track more closely changes like these?

@ghost
Copy link

ghost commented Sep 24, 2020

@mattmacy @freqlabs What do we think of moving /etc/devd/zfs.conf into the openzfs repo, so we can track more closely changes like these?

Fine with me, especially if we can add some tests with that. I don't think anything currently tests the devd events for ZFS.

@behlendorf behlendorf added Status: Accepted Ready to integrate (reviewed, tested) Status: Code Review Needed Ready for review and testing Status: Revision Needed Changes are required for the PR to be accepted and removed Status: Code Review Needed Ready for review and testing Status: Accepted Ready to integrate (reviewed, tested) labels Sep 25, 2020
@don-brady
Copy link
Contributor Author

Looks like there are still some intermittent failures for ZTS functional/events/events_001_pos. I'll investigate.

@don-brady
Copy link
Contributor Author

Before my config change, the steady stream of history events had a side effect of pushing out the time to settle the log. When those history events were removed, the log was copied sooner and missed the tail event the test was expecting.

@codecov
Copy link

codecov bot commented Oct 16, 2020

Codecov Report

Merging #10967 into master will decrease coverage by 0.16%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #10967      +/-   ##
==========================================
- Coverage   79.69%   79.52%   -0.17%     
==========================================
  Files         396      396              
  Lines      125402   125402              
==========================================
- Hits        99936    99732     -204     
- Misses      25466    25670     +204     
Flag Coverage Δ
#kernel 80.35% <ø> (-0.10%) ⬇️
#user 65.47% <ø> (+0.75%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
module/os/linux/spl/spl-kmem-cache.c 78.44% <0.00%> (-11.00%) ⬇️
cmd/ztest/ztest.c 74.73% <0.00%> (-5.69%) ⬇️
module/lua/lmem.c 83.33% <0.00%> (-4.17%) ⬇️
module/zcommon/zfs_fletcher.c 75.65% <0.00%> (-2.64%) ⬇️
module/zfs/zil.c 91.33% <0.00%> (-1.87%) ⬇️
module/zfs/zio_compress.c 92.72% <0.00%> (-1.82%) ⬇️
lib/libzfs/os/linux/libzfs_pool_os.c 69.64% <0.00%> (-1.79%) ⬇️
module/zfs/zio.c 87.20% <0.00%> (-1.33%) ⬇️
module/zfs/spa_log_spacemap.c 93.40% <0.00%> (-1.15%) ⬇️
module/icp/api/kcf_mac.c 37.71% <0.00%> (-1.15%) ⬇️
... and 51 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 6bdb095...7fc006c. Read the comment docs.

@behlendorf behlendorf removed the Status: Revision Needed Changes are required for the PR to be accepted label Oct 16, 2020
@don-brady
Copy link
Contributor Author

Added file_wait_event function to synchronize with the event landing in the zed debug log rather than waiting for an arbitrary amount of time.

This make the test less flaky and shortens the actual test turnover time.

# wait for 'event' to show up in the log 'file'
function file_wait_event # file event timeout
{
file=$1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to be unused. Perhaps have it default to $ZED_DEBUG_LOG?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are still some external callers to file_wait that can possibly be converted to file_wait_event so I left the file input available.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think having an unused variable invites confusion. Even in my initial comment I didn't clearly understand what its purpose was intended to be.

Don Brady added 2 commits October 16, 2020 15:35
Signed-off-by: Don Brady <[email protected]>
@behlendorf behlendorf added Status: Accepted Ready to integrate (reviewed, tested) and removed Status: Code Review Needed Ready for review and testing labels Oct 19, 2020
@behlendorf behlendorf merged commit 13d6598 into openzfs:master Oct 19, 2020
behlendorf pushed a commit that referenced this pull request Oct 19, 2020
ZED will log zevents summaries to the syslog, however the log entries 
tend to drop event details that can be useful for diagnosis. This is 
especially true for ereport events, like io, checksum, and delay.

Update the all-syslog.sh script to log additional event information.

Add an optional config option, ZED_SYSLOG_DISPLAY_GUIDS, to zed.rc
for choosing GUIDs over names for pool and vdev.

Change the default ZED_SYSLOG_SUBCLASS_EXCLUDE to exclude history_event 
events. These events tend to be frequent, convey no meaningful info, 
and are already logged in the zpool history.

Reviewed-by: John Kennedy <[email protected]>
Reviewed-by: Pavel Zakharov <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Don Brady <[email protected]>
Closes #10967
jsai20 pushed a commit to jsai20/zfs that referenced this pull request Mar 30, 2021
ZED will log zevents summaries to the syslog, however the log entries 
tend to drop event details that can be useful for diagnosis. This is 
especially true for ereport events, like io, checksum, and delay.

Update the all-syslog.sh script to log additional event information.

Add an optional config option, ZED_SYSLOG_DISPLAY_GUIDS, to zed.rc
for choosing GUIDs over names for pool and vdev.

Change the default ZED_SYSLOG_SUBCLASS_EXCLUDE to exclude history_event 
events. These events tend to be frequent, convey no meaningful info, 
and are already logged in the zpool history.

Reviewed-by: John Kennedy <[email protected]>
Reviewed-by: Pavel Zakharov <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Don Brady <[email protected]>
Closes openzfs#10967
sempervictus pushed a commit to sempervictus/zfs that referenced this pull request May 31, 2021
ZED will log zevents summaries to the syslog, however the log entries 
tend to drop event details that can be useful for diagnosis. This is 
especially true for ereport events, like io, checksum, and delay.

Update the all-syslog.sh script to log additional event information.

Add an optional config option, ZED_SYSLOG_DISPLAY_GUIDS, to zed.rc
for choosing GUIDs over names for pool and vdev.

Change the default ZED_SYSLOG_SUBCLASS_EXCLUDE to exclude history_event 
events. These events tend to be frequent, convey no meaningful info, 
and are already logged in the zpool history.

Reviewed-by: John Kennedy <[email protected]>
Reviewed-by: Pavel Zakharov <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Don Brady <[email protected]>
Closes openzfs#10967
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Accepted Ready to integrate (reviewed, tested)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants