Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More debug info for DeltaTree (query_id, snapshot lifetime) #2431

Merged
merged 6 commits into from
Jul 21, 2021

Conversation

JaySon-Huang
Copy link
Contributor

@JaySon-Huang JaySon-Huang commented Jul 19, 2021

What problem does this PR solve?

Issue Number: related to #2199, #2322

Problem Summary:

After #2229, we can get the max snapshot lifetime by running SQL on TiDB[1]. But it is inconvenient for checking.

[1]

select tiflash_instance,max(STORAGE_STABLE_OLDEST_SNAPSHOT_LIFETIME),max(STORAGE_DELTA_OLDEST_SNAPSHOT_LIFETIME),max(STORAGE_META_OLDEST_SNAPSHOT_LIFETIME)
from information_schema.tiflash_tables
where 1=1 and (STORAGE_STABLE_OLDEST_SNAPSHOT_LIFETIME > 600 OR STORAGE_DELTA_OLDEST_SNAPSHOT_LIFETIME > 600 OR STORAGE_META_OLDEST_SNAPSHOT_LIFETIME > 600) 
group by tiflash_instance;

What is changed and how it works?

  • Log query_id, read_tso in DMVersionFilterBlockInputStream deconstructor so that we can track a query by logging more clear
  • Add a failpoint FailPoints::pause_after_copr_streams_acquired to mock that snapshot are not released by the copr level
  • Each time AsynchronousMetrics::update run, it will collect the max oldest (stable/delta/meta) snapshot lifetime for each TiFlash instance and report it to Prometheus. Show them in the MVCC snapshots panel.
    image

Related changes

  • PR to update pingcap/docs/pingcap/docs-cn:
  • Need to cherry-pick to the release branch:

Check List

Tests

  • Manual test (add detailed scripts or steps below)

Side effects

Release note

  • No release note

@JaySon-Huang
Copy link
Contributor Author

/run-all-tests

Copy link
Contributor

@lidezhu lidezhu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ti-srebot ti-srebot added the status/LGT1 Indicates that a PR has LGTM 1. label Jul 20, 2021
@JaySon-Huang
Copy link
Contributor Author

/merge

@ti-srebot ti-srebot added the status/can-merge Indicates a PR has been approved by a committer. label Jul 21, 2021
@ti-srebot
Copy link
Collaborator

Your auto merge job has been accepted, waiting for:

  • 2360

@ti-srebot
Copy link
Collaborator

/run-all-tests

@JaySon-Huang
Copy link
Contributor Author

cherry-pick to release-5.1 in #2564

@JaySon-Huang

This comment has been minimized.

@JaySon-Huang
Copy link
Contributor Author

cherry-pick to release-5.0 in #2568

flowbehappy pushed a commit that referenced this pull request Aug 4, 2021
* Ignore sequence hole among PageFile meta (#2312)

* Fix bug for GC may skip unexpected WriteBatches (#2356)

* Add length check while running PageStorage GC (#2394)

* PageStorage skip non continuous sequence safely (#2435)

* Fix PageStorage GC with high valid rate PageFile (#2436)

* More debug info for DeltaTree (query_id, snapshot lifetime) (#2431)

* Fix deadlock on `removeExpiredSnapshots` (#2461)

* Add grafana panels for write throughput per instance (#2524)
JaySon-Huang added a commit that referenced this pull request Aug 4, 2021
* More debug info for DeltaTree (query_id, snapshot lifetime) (#2431)
* Fix deadlock on `removeExpiredSnapshots` (#2461)
ti-chi-bot pushed a commit that referenced this pull request Sep 1, 2021
windtalker added a commit that referenced this pull request Sep 6, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
status/can-merge Indicates a PR has been approved by a committer. status/LGT1 Indicates that a PR has LGTM 1.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants