Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

os/bluestore: Add health warning for bluestore fragmentation #61214

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

aclamk
Copy link
Contributor

@aclamk aclamk commented Jan 2, 2025

Changed "bluestore/fragmentation_micros" from quick imprecise to slow but more representative score.
Introduced config "bluestore_warn_on_free_fragmentation" that controls when free space fragmentation score becomes a health warning.

Currently calculation of fragmentation score might be non-instant for severly fragmented disks. It might induce stalls to write IO. Config value "bluestore_fragmentation_check_period" control score calculation period.

In future, costly score calculation will be replaced with method that continously updates score.

Contribution Guidelines

  • To sign and title your commits, please refer to Submitting Patches to Ceph.

  • If you are submitting a fix for a stable branch (e.g. "quincy"), please refer to Submitting Patches to Ceph - Backports for the proper workflow.

  • When filling out the below checklist, you may click boxes directly in the GitHub web UI. When entering or editing the entire PR message in the GitHub web UI editor, you may also select a checklist item by adding an x between the brackets: [x]. Spaces and capitalization matter when checking off items this way.

Checklist

  • Tracker (select at least one)
    • References tracker ticket
    • Very recent bug; references commit where it was introduced
    • New feature (ticket optional)
    • Doc update (no ticket needed)
    • Code cleanup (no ticket needed)
  • Component impact
    • Affects Dashboard, opened tracker ticket
    • Affects Orchestrator, opened tracker ticket
    • No impact that needs to be tracked
  • Documentation (select at least one)
    • Updates relevant documentation
    • No doc update is appropriate
  • Tests (select at least one)
Show available Jenkins commands
  • jenkins retest this please
  • jenkins test classic perf
  • jenkins test crimson perf
  • jenkins test signed
  • jenkins test make check
  • jenkins test make check arm64
  • jenkins test submodules
  • jenkins test dashboard
  • jenkins test dashboard cephadm
  • jenkins test api
  • jenkins test docs
  • jenkins render docs
  • jenkins test ceph-volume all
  • jenkins test ceph-volume tox
  • jenkins test windows
  • jenkins test rook e2e

@aclamk aclamk requested a review from a team as a code owner January 2, 2025 15:40
auto now = mono_clock::now();
timespan elapsed = now - last_fragmentation_check;
if (elapsed > make_timespan(period)) {
double score = store->alloc->get_fragmentation_score();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we log at pretty low level (0?) before and after this call to be able to identify relevant stalls if any.

@aclamk aclamk force-pushed the wip-aclamk-bs-fragmentation-health branch from 8921afc to a98cef0 Compare January 27, 2025 12:52
@aclamk
Copy link
Contributor Author

aclamk commented Jan 29, 2025

jenkins test make check

@aclamk aclamk added the aclamk-testing-nauvoo bluestore testing label Jan 29, 2025
@aclamk
Copy link
Contributor Author

aclamk commented Feb 5, 2025

@aclamk
Copy link
Contributor Author

aclamk commented Feb 5, 2025

jenkins test api

@aclamk aclamk requested a review from ifed01 February 5, 2025 19:37
if (elapsed > make_timespan(period)) {
last_fragmentation_check = now;
double score = 0;
if (store->alloc) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

may be do above along with "period !=0" check:

if (period != 0 && store->alloc)
...

@aclamk
Copy link
Contributor Author

aclamk commented Feb 11, 2025

jenkins test api

@aclamk aclamk removed the aclamk-testing-nauvoo bluestore testing label Feb 11, 2025
Copy link

This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved

@aclamk aclamk force-pushed the wip-aclamk-bs-fragmentation-health branch from 0c7c174 to 1861330 Compare February 12, 2025 16:11
Changed "bluestore/fragmentation_micros" from quick imprecise to
slow but more representative score.
Introduced config "bluestore_warn_on_free_fragmentation" that controls
when free space fragmentation score becomes a health warning.

Currently calculation of fragmentation score might be non-instant for
severly fragmented disks. It might induce stalls to write IO.
Config value "bluestore_fragmentation_check_period" control score
calculation period.

In future, costly score calculation will be replaced with method that
continously updates score.

Signed-off-by: Adam Kupczyk <[email protected]>
Using fmt::format requires libfmt for linking

Signed-off-by: Adam Kupczyk <[email protected]>
@aclamk aclamk force-pushed the wip-aclamk-bs-fragmentation-health branch from 1861330 to 3c5ae6c Compare February 12, 2025 16:15
@aclamk
Copy link
Contributor Author

aclamk commented Feb 13, 2025

jenkins test make check

@aclamk
Copy link
Contributor Author

aclamk commented Feb 13, 2025

[ FAILED ] TestLibRBD.TestPendingAio
failed on arm64, interesting.

@aclamk
Copy link
Contributor Author

aclamk commented Feb 13, 2025

jenkins test api

@neha-ojha
Copy link
Member

jenkins test make check

@neha-ojha
Copy link
Member

make check failure unrelated

Failed to load class: cas (/home/jenkins-build/build/workspace/ceph-pull-requests/build/lib/libcls_cas.so): /home/jenkins-build/build/workspace/ceph-pull-requests/build/lib/libcls_cas.so: undefined symbol: _Z26cls_get_manifest_ref_countPvNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE
Failed to load class: cmpomap (/home/jenkins-build/build/workspace/ceph-pull-requests/build/lib/libcls_cmpomap.so): /home/jenkins-build/build/workspace/ceph-pull-requests/build/lib/libcls_cmpomap.so: undefined symbol: _Z28cls_cxx_map_get_vals_by_keysPvRKSt3setINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEESt4lessIS6_ESaIS6_EEPSt3mapIS6_N4ceph6buffer7v15_2_04listES8_SaISt4pairIKS6_SH_EEE
Failed to load class: fifo (/home/jenkins-build/build/workspace/ceph-pull-requests/build/lib/libcls_fifo.so): /home/jenkins-build/build/workspace/ceph-pull-requests/build/lib/libcls_fifo.so: undefined symbol: _Z20cls_gen_random_bytesPci
Failed to load class: log (/home/jenkins-build/build/workspace/ceph-pull-requests/build/lib/libcls_log.so): /home/jenkins-build/build/workspace/ceph-pull-requests/build/lib/libcls_log.so: undefined symbol: _Z24cls_cxx_map_write_headerPvPN4ceph6buffer7v15_2_04listE
Failed to load class: rgw (/home/jenkins-build/build/workspace/ceph-pull-requests/build/lib/libcls_rgw.so): /home/jenkins-build/build/workspace/ceph-pull-requests/build/lib/libcls_rgw.so: undefined symbol: _Z19cls_get_object_infoPv
Failed to load class: user (/home/jenkins-build/build/workspace/ceph-pull-requests/build/lib/libcls_user.so): /home/jenkins-build/build/workspace/ceph-pull-requests/build/lib/libcls_user.so: undefined symbol: _Z24cls_cxx_map_write_headerPvPN4ceph6buffer7v15_2_04listE

@neha-ojha
Copy link
Member

jenkins test make check

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants