Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add visibility into arc_read through a new kstat #1712

Closed
wants to merge 2 commits into from
Closed

Add visibility into arc_read through a new kstat #1712

wants to merge 2 commits into from

Conversation

prakashsurya
Copy link
Member

This change is an attempt to add visibility into the arc_read calls
occurring on a system, in real time. To do this, a list was added to the
in memory SPA data structure for a pool, with each element on the list
corresponding to a call to arc_read. These entries are then exported
through the kstat interface, which can then be interpreted in userspace.

For each arc_read call, the following information is exported:

  • A unique identifier (uint64_t)
  • The time the entry was added to the list (hrtime_t)
    (not wall clock time; relative to the other entries on the list)
  • The objset ID (uint64_t)
  • The object number (uint64_t)
  • The indirection level (uint64_t)
  • The block ID (uint64_t)
  • The name of the function originating the arc_read call (char[24])
  • The arc_flags from the arc_read call (uint32_t)

From this exported information one can see, in real time, exactly what
is being read, what function is generating the read, and whether or not
the read was found to be already cached.

There is still some work to be done, but this should serve as a good
starting point.

Specifically, dbuf_read's are not accounted for in the currently
exported information. Thus, a follow up patch should probably be added
to export these calls that never call into arc_read (they only hit the
dbuf hash table). In addition, it might be nice to create a utility
similar to "arcstat.py" to digest the exported information and display
it in a more readable format. Or perhaps, log the information and allow
for it to be "replayed" at a later time.

Signed-off-by: Prakash Surya [email protected]

@prakashsurya
Copy link
Member Author

See also: openzfs/spl#288

@ColdCanuck
Copy link
Contributor

How much does this add on the kernel's stack ?

I ask as I am perilously close to stack overflow at the best of times. When using ZFS in anger, the stack depth averages 5500, peaks regularly at 7000, on a 8192 stack. I have infrequent crashes with no logs which might be attributed to stack overflow. I would not view making ZFS more unstable to add a feature of limited utility to the majority of users as an improvement.

By all means add debugging, but please keep it out of the mainstream.

@behlendorf
Copy link
Contributor

@ColdCanuck You raise a valid concern, we don't want to do anything which will destabilize the code. However, in this case there's virtually no impact on the stack so that shouldn't be a problem. We just wanted to get this posted for some initial review.

Prakash Surya added 2 commits September 16, 2013 12:37
This change removed the dependence on the new KSTAT_TYPE_TXG type, and
instead uses the KSTAT_TYPE_RAW type. Adding new types is not portable
to the other distributions, so it is best to refrain from this practice
and stick to the known types.

To maintain a useable /proc interface, even for raw types, the SPL added
formatting callbacks that can be registered to convert the data into a
readable text format. This patch makes use of these callbacks for
displaying the txg's history kstat.

Signed-off-by: Prakash Surya <[email protected]>
This change is an attempt to add visibility into the arc_read calls
occurring on a system, in real time. To do this, a list was added to the
in memory SPA data structure for a pool, with each element on the list
corresponding to a call to arc_read. These entries are then exported
through the kstat interface, which can then be interpreted in userspace.

For each arc_read call, the following information is exported:

 * A unique identifier (uint64_t)
 * The time the entry was added to the list (hrtime_t)
   (*not* wall clock time; relative to the other entries on the list)
 * The objset ID (uint64_t)
 * The object number (uint64_t)
 * The indirection level (uint64_t)
 * The block ID (uint64_t)
 * The name of the function originating the arc_read call (char[24])
 * The arc_flags from the arc_read call (uint32_t)

From this exported information one can see, in real time, exactly what
is being read, what function is generating the read, and whether or not
the read was found to be already cached.

There is still some work to be done, but this should serve as a good
starting point.

Specifically, dbuf_read's are not accounted for in the currently
exported information. Thus, a follow up patch should probably be added
to export these calls that never call into arc_read (they only hit the
dbuf hash table). In addition, it might be nice to create a utility
similar to "arcstat.py" to digest the exported information and display
it in a more readable format. Or perhaps, log the information and allow
for it to be "replayed" at a later time.

Signed-off-by: Prakash Surya <[email protected]>
@prakashsurya
Copy link
Member Author

Closing in favor of #1748

@prakashsurya prakashsurya deleted the read-stats branch May 30, 2014 17:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants