Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

volshell: change dt() output to show where pointers lead #1028

Draft
wants to merge 10 commits into
base: develop
Choose a base branch
from

Conversation

eve-mem
Copy link
Contributor

@eve-mem eve-mem commented Nov 2, 2023

Hello,

In volshel using the display_type command to show information about objects can be difficult when pointers are involved.

For example the output of a struct with pointers in tells you that certain members are pointers, but not what type of struct it actually is. If you want to use see that struct you need to add .dereference() which sometimes feels cumbersome. (I know I misspell dereference a lot! I even managed to do so while trying to make this PR... d5e0dec)

These changes are to make it easier to use volshell interactively while you are exploring and understanding what there is to be found.

Without changes

Here is an example volshell session without the changes to show how this helps when using volshell interactively. Note how here when we use dt() on a task_struct object we don't know what kind of struct stack, fs, files, etc are only that they are pointers. The result just shows symbol_table_name1!pointer

(layer_name) >>> dt(task)
symbol_table_name1!task_struct (1784 bytes)
   0x0 :   state                           symbol_table_name1!long int                     1
   0x8 :   stack                           symbol_table_name1!pointer                      149533613694976
  0x10 :   usage                           symbol_table_name1!unnamed_4c3f6f38ad08303b     0x8800044a5890
  0x14 :   flags                           symbol_table_name1!unsigned int                 4202752
  0x18 :   ptrace                          symbol_table_name1!unsigned int                 0
<SNIP>
 0x3b0 :   sysvsem                         symbol_table_name1!sysv_sem                     0x8800044a5c30
 0x3b8 :   last_switch_count               symbol_table_name1!long unsigned int            0
 0x3c0 :   thread                          symbol_table_name1!thread_struct                0x8800044a5c40
 0x478 :   fs                              symbol_table_name1!pointer                      149534111635776
 0x480 :   files                           symbol_table_name1!pointer                      149534107486592
 0x488 :   nsproxy                         symbol_table_name1!pointer                      281472852413568
 0x490 :   signal                          symbol_table_name1!pointer                      149534074071360
 0x498 :   sighand                         symbol_table_name1!pointer                      149534106539008
 0x4a0 :   blocked                         symbol_table_name1!unnamed_3710bba5abf5ee06     0x8800044a5d20
 0x4a8 :   real_blocked                    symbol_table_name1!unnamed_3710bba5abf5ee06     0x8800044a5d28
 0x4b0 :   saved_sigmask                   symbol_table_name1!unnamed_3710bba5abf5ee06     0x8800044a5d30
<SNIP>

If we try to access task.files we need to manually add the .dereference()

(layer_name) >>> dt(task.files) 
symbol_table_name1!pointer (8 bytes)
(layer_name) >>> dt(task.files.dereference())  
symbol_table_name1!files_struct (704 bytes)
  0x0 :   count                  symbol_table_name1!unnamed_4c3f6f38ad08303b     0x88001f5bc980
  0x8 :   fdt                    symbol_table_name1!pointer                      149534063719424
 0x10 :   fdtab                  symbol_table_name1!fdtable                      0x88001f5bc990
 0x80 :   file_lock              symbol_table_name1!spinlock                     0x88001f5bca00
 0x84 :   next_fd                symbol_table_name1!int                          3
 0x88 :   close_on_exec_init     symbol_table_name1!embedded_fd_set              0x88001f5bca08
 0x90 :   open_fds_init          symbol_table_name1!embedded_fd_set              0x88001f5bca10
 0x98 :   fd_array               symbol_table_name1!array                        ['149533653493888', '149533653493888', '149533653493888', '149533653493888', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0']

Again to access task.files.fdt we have to manually add .dereference()

(layer_name) >>> dt(task.files.fdt)  
symbol_table_name1!pointer (8 bytes)
(layer_name) >>> dt(task.files.fdt.dereference())    
symbol_table_name1!fdtable (56 bytes)
  0x0 :   max_fds           symbol_table_name1!unsigned int     256
  0x8 :   fd                symbol_table_name1!pointer          149534056130560
 0x10 :   close_on_exec     symbol_table_name1!pointer          149534063720992
 0x18 :   open_fds          symbol_table_name1!pointer          149534063720960
 0x20 :   rcu               symbol_table_name1!rcu_head         0x88001cbff420
 0x30 :   next              symbol_table_name1!pointer          0

Now lastly to access task.files.fdt.fd we need to dereference twice until we find the result we were looking for.

(layer_name) >>> dt(task.files.fdt.fd)
symbol_table_name1!pointer (8 bytes)
(layer_name) >>> dt(task.files.fdt.fd.dereference()) 
symbol_table_name1!pointer (8 bytes)
(layer_name) >>> dt(task.files.fdt.fd.dereference().dereference()) 
symbol_table_name1!file (208 bytes)
  0x0 :   f_u               symbol_table_name1!unnamed_6733425fe3c7f4c9     0x8800044c6880
 0x10 :   f_path            symbol_table_name1!path                         0x8800044c6890
 0x20 :   f_op              symbol_table_name1!pointer                      281472850426064
 0x28 :   f_lock            symbol_table_name1!spinlock                     0x8800044c68a8
 0x2c :   f_sb_list_cpu     symbol_table_name1!int                          0
 0x30 :   f_count           symbol_table_name1!unnamed_ab9338acf0e8cd0d     0x8800044c68b0
 0x38 :   f_flags           symbol_table_name1!unsigned int                 32770
 0x3c :   f_mode            symbol_table_name1!unsigned int                 3
 0x40 :   f_pos             symbol_table_name1!long long int                0
 0x48 :   f_owner           symbol_table_name1!fown_struct                  0x8800044c68c8
 0x68 :   f_cred            symbol_table_name1!pointer                      149534040826368
 0x70 :   f_ra              symbol_table_name1!file_ra_state                0x8800044c68f0
 0x90 :   f_version         symbol_table_name1!long long unsigned int       0
 0x98 :   f_security        symbol_table_name1!pointer                      0
 0xa0 :   private_data      symbol_table_name1!pointer                      149534051881536
 0xa8 :   f_ep_links        symbol_table_name1!list_head                    0x8800044c6928
 0xb8 :   f_tfile_llink     symbol_table_name1!list_head                    0x8800044c6938
 0xc8 :   f_mapping         symbol_table_name1!pointer                      149534029076488

With changes

Here is the same session with these changes. Now when we use dt() on a task_struct object we can see which members are pointers as they are marked with a * as you'd normally see in c, and we also get to see what type these members are. For example here we can see that files is a pointer to a files_struct easily.

(layer_name) >>> dt(task)
symbol_table_name1!task_struct (1784 bytes) 
   0x0 :   state                           symbol_table_name1!long int                      1
   0x8 :   stack                           *symbol_table_name1!void                         149533613694976
  0x10 :   usage                           symbol_table_name1!unnamed_4c3f6f38ad08303b      0x8800044a5890
  0x14 :   flags                           symbol_table_name1!unsigned int                  4202752
  0x18 :   ptrace                          symbol_table_name1!unsigned int                  0
<SNIP>
 0x3b0 :   sysvsem                         symbol_table_name1!sysv_sem                      0x8800044a5c30
 0x3b8 :   last_switch_count               symbol_table_name1!long unsigned int             0
 0x3c0 :   thread                          symbol_table_name1!thread_struct                 0x8800044a5c40
 0x478 :   fs                              *symbol_table_name1!fs_struct                    149534111635776
 0x480 :   files                           *symbol_table_name1!files_struct                 149534107486592
 0x488 :   nsproxy                         *symbol_table_name1!nsproxy                      281472852413568
 0x490 :   signal                          *symbol_table_name1!signal_struct                149534074071360
 0x498 :   sighand                         *symbol_table_name1!sighand_struct               149534106539008
 0x4a0 :   blocked                         symbol_table_name1!unnamed_3710bba5abf5ee06      0x8800044a5d20
 0x4a8 :   real_blocked                    symbol_table_name1!unnamed_3710bba5abf5ee06      0x8800044a5d28
 0x4b0 :   saved_sigmask                   symbol_table_name1!unnamed_3710bba5abf5ee06      0x8800044a5d30
<SNIP>

Now when we attempt to follow these pointers in volshell it is much easier to to so as we don't need to add the .dereference() ourselves. (Although if we did add a .dereference() ourselves it would still work as normal).

The output shows dereferenced once to show that we followed a pointer to get to the files_struct.

(layer_name) >>> dt(task.files) 
symbol_table_name1!files_struct (704 bytes) (dereferenced once)
  0x0 :   count                  symbol_table_name1!unnamed_4c3f6f38ad08303b     0x88001f5bc980
  0x8 :   fdt                    *symbol_table_name1!fdtable                     149534063719424
 0x10 :   fdtab                  symbol_table_name1!fdtable                      0x88001f5bc990
 0x80 :   file_lock              symbol_table_name1!spinlock                     0x88001f5bca00
 0x84 :   next_fd                symbol_table_name1!int                          3
 0x88 :   close_on_exec_init     symbol_table_name1!embedded_fd_set              0x88001f5bca08
 0x90 :   open_fds_init          symbol_table_name1!embedded_fd_set              0x88001f5bca10
 0x98 :   fd_array               symbol_table_name1!array                        ['149533653493888', '149533653493888', '149533653493888', '149533653493888', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0']

Now when showing task.files.fdt we can see right away that fd is actually a double pointer to a file.

(layer_name) >>> dt(task.files.fdt)  
symbol_table_name1!fdtable (56 bytes) (dereferenced once)
  0x0 :   max_fds           symbol_table_name1!unsigned int                  256
  0x8 :   fd                **symbol_table_name1!file                        149534056130560
 0x10 :   close_on_exec     *symbol_table_name1!unnamed_7753a428f5f98ad9     149534063720992
 0x18 :   open_fds          *symbol_table_name1!unnamed_7753a428f5f98ad9     149534063720960
 0x20 :   rcu               symbol_table_name1!rcu_head                      0x88001cbff420
 0x30 :   next              *symbol_table_name1!fdtable                      0

Finally accessing task.files.fdt.fd is just as easy even though we need to follow a double pointer to get to the result.

(layer_name) >>> dt(task.files.fdt.fd)  
symbol_table_name1!file (208 bytes) (dereferenced 2 times)
  0x0 :   f_u               symbol_table_name1!unnamed_6733425fe3c7f4c9     0x8800044c6880
 0x10 :   f_path            symbol_table_name1!path                         0x8800044c6890
 0x20 :   f_op              *symbol_table_name1!file_operations             281472850426064
 0x28 :   f_lock            symbol_table_name1!spinlock                     0x8800044c68a8
 0x2c :   f_sb_list_cpu     symbol_table_name1!int                          0
 0x30 :   f_count           symbol_table_name1!unnamed_ab9338acf0e8cd0d     0x8800044c68b0
 0x38 :   f_flags           symbol_table_name1!unsigned int                 32770
 0x3c :   f_mode            symbol_table_name1!unsigned int                 3
 0x40 :   f_pos             symbol_table_name1!long long int                0
 0x48 :   f_owner           symbol_table_name1!fown_struct                  0x8800044c68c8
 0x68 :   f_cred            *symbol_table_name1!cred                        149534040826368
 0x70 :   f_ra              symbol_table_name1!file_ra_state                0x8800044c68f0
 0x90 :   f_version         symbol_table_name1!long long unsigned int       0
 0x98 :   f_security        *symbol_table_name1!void                        0
 0xa0 :   private_data      *symbol_table_name1!void                        149534051881536
 0xa8 :   f_ep_links        symbol_table_name1!list_head                    0x8800044c6928
 0xb8 :   f_tfile_llink     symbol_table_name1!list_head                    0x8800044c6938
 0xc8 :   f_mapping         *symbol_table_name1!address_space               149534029076488

I know I personally find these changes very helpful, I would love to have your thoughts about these suggestions.

@ikelos
Copy link
Member

ikelos commented Nov 5, 2023

So, I'm interested to know how people would use dt to get information about the pointer itself? My expectation was that when dt() was called on a pointer, it would display the offsets from the pointer, and then indent in some way to show the followed object? This kinda works, but we have to make sure people can still get to any information they need (including the pointer itself) rather than always masking it. This could possibly be a no_follow boolean added to dt that when set will not follow the pointer, but I can't tell if that's a better option that always outputting the pointer structure and then what it points to?

Also might people start wanting something similar for arrays and/or arrays of pointers, etc?

Copy link
Member

@ikelos ikelos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, just want to find a way to ensure you could still examine the pointer, if that's what you're interested in, rather than just what it dereferences.

@eve-mem
Copy link
Contributor Author

eve-mem commented Nov 6, 2023

Thanks @ikelos! I always appericate your feedback.

These are just mock ups rather any any code behind this to just hash out what it could look like.

At the moment if you use dt() on something simple that it'll just display the type on it's own without the actual value behind it. e.g. it currently works like this:

(layer_name) >>> dt(task.tgid) 
symbol_table_name1!int (4 bytes) 
(layer_name) >>> dt(task.files)  
symbol_table_name1!pointer (8 bytes)

We could change that to do something like this to show the actual values. Which even on it's own I think its a useful change.

(layer_name) >>> dt(task.tgid) 
symbol_table_name1!int (4 bytes): 8600
(layer_name) >>> dt(task.files)  
symbol_table_name1!pointer (8 bytes): 149534107486592

If we did that, it would mean we could do something like this to indent when following pointers to the struct. I think this seems better than the (dereferenced once) or (dereferenced n times) messages. What do you think?

(layer_name) >>> dt(task.files.fdt.fd)  
symbol_table_name1!pointer (8 bytes): 149534107486592
    symbol_table_name1!pointer (8 bytes): 149534063719424
        symbol_table_name1!file (208 bytes)
        0x0 :   f_u               symbol_table_name1!unnamed_6733425fe3c7f4c9     0x8800044c6880
        0x10 :   f_path            symbol_table_name1!path                         0x8800044c6890
        0x20 :   f_op              *symbol_table_name1!file_operations             281472850426064
        0x28 :   f_lock            symbol_table_name1!spinlock                     0x8800044c68a8
        0x2c :   f_sb_list_cpu     symbol_table_name1!int                          0
        0x30 :   f_count           symbol_table_name1!unnamed_ab9338acf0e8cd0d     0x8800044c68b0
        0x38 :   f_flags           symbol_table_name1!unsigned int                 32770
        0x3c :   f_mode            symbol_table_name1!unsigned int                 3
        0x40 :   f_pos             symbol_table_name1!long long int                0
        0x48 :   f_owner           symbol_table_name1!fown_struct                  0x8800044c68c8
        0x68 :   f_cred            *symbol_table_name1!cred                        149534040826368
        0x70 :   f_ra              symbol_table_name1!file_ra_state                0x8800044c68f0
        0x90 :   f_version         symbol_table_name1!long long unsigned int       0
        0x98 :   f_security        *symbol_table_name1!void                        0
        0xa0 :   private_data      *symbol_table_name1!void                        149534051881536
        0xa8 :   f_ep_links        symbol_table_name1!list_head                    0x8800044c6928
        0xb8 :   f_tfile_llink     symbol_table_name1!list_head                    0x8800044c6938
        0xc8 :   f_mapping         *symbol_table_name1!address_space               149534029076488

Also if it wasn't clear in my first explantation when the display adds the * to indicate that it's a pointer to a different type, it's still returning the actual values for this member in particular, it's not followed any pointers to say return the actual location of the struct after following the pointer. It's just a short cut to seeing the type you'd get to, with the * showing it's actually a pointer in this struct.

e.g. without this change:

(layer_name) >>> dt(task.files.fdt)  
symbol_table_name1!pointer (8 bytes)
(layer_name) >>> dt(task.files.fdt.dereference())    
symbol_table_name1!fdtable (56 bytes)
  0x0 :   max_fds           symbol_table_name1!unsigned int     256
  0x8 :   fd                symbol_table_name1!pointer          149534056130560
 0x10 :   close_on_exec     symbol_table_name1!pointer          149534063720992
 0x18 :   open_fds          symbol_table_name1!pointer          149534063720960
 0x20 :   rcu               symbol_table_name1!rcu_head         0x88001cbff420
 0x30 :   next              symbol_table_name1!pointer          0

with this change:

(layer_name) >>> dt(task.files.fdt)  
symbol_table_name1!fdtable (56 bytes) (dereferenced once)
 0x0 :   max_fds           symbol_table_name1!unsigned int                  256
 0x8 :   fd                **symbol_table_name1!file                        149534056130560   # (still the same as before even though it's a double pointer)
0x10 :   close_on_exec     *symbol_table_name1!unnamed_7753a428f5f98ad9     149534063720992   # (still the same as before even though it's a pointer)
0x18 :   open_fds          *symbol_table_name1!unnamed_7753a428f5f98ad9     149534063720960
0x20 :   rcu               symbol_table_name1!rcu_head                      0x88001cbff420
0x30 :   next              *symbol_table_name1!fdtable                      0

I do think people might want to do something similar for arrays too, I could try adding that after this?

@ikelos
Copy link
Member

ikelos commented Nov 6, 2023

Oh, for some reason I thought we did already show the values by the side of a type, but I guess only if there are members? Yeah, that looks a reasonable solution, but we proabably only want to descend the tree until we hit a non-pointer (or worse, a loop), otherwise we might keep going forever.

I've changed this to a draft pull so I know it's not ready to be merged yet. Should be easy to change back when we're ready...

@ikelos ikelos marked this pull request as draft November 6, 2023 08:22
@eve-mem
Copy link
Contributor Author

eve-mem commented Nov 7, 2023

Hello again, I've made some changes so it works in the way described above. What do you think? I'm less sure my python is up to standard now however...

There is now a check to stop infinite loops, it'll only deference 8 times. I'd really hope that would mean all correct use of pointers is handled with a big margin for error, and its small enough to not be a huge problem if we do get stuck in an inifite loop.

I've made changes so that if an object of type with no members is given to dt(), it now outputs the value as well.

e.g.

(layer_name) >>> dt(task.tgid)         
 symbol_table_name1!int (4 bytes): 8600
(layer_name) >>> dt(task.comm)             
 symbol_table_name1!array (16 bytes): ['98', '97', '115', '104', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0']

When a task which has members is given, the value comes from object.vol.offset, e.g. 0x8800044a5880 is where this task_struct is. in the output below.

I've also added a special case for pointers, when they are displayed it is now shown in hex, previous this came back as an int. e.g. stack shows 0x880001ed2000, previously it would show 149533613694976. I personally perfer that, I find it easier to see the difference between kernel/user etc in hex than as an int. However this is another new change for this single PR.

(layer_name) >>> dt(task)                           
 symbol_table_name1!task_struct (1784 bytes): 0x8800044a5880
    0x0 :   state                           symbol_table_name1!long int                      1
    0x8 :   stack                           *symbol_table_name1!void                         0x880001ed2000
   0x10 :   usage                           symbol_table_name1!unnamed_4c3f6f38ad08303b      0x8800044a5890
   0x14 :   flags                           symbol_table_name1!unsigned int                  4202752
   0x18 :   ptrace                          symbol_table_name1!unsigned int                  0
   0x20 :   wake_entry                      symbol_table_name1!llist_node                    0x8800044a58a0
   0x28 :   on_cpu                          symbol_table_name1!int                           0
   0x2c :   on_rq                           symbol_table_name1!int                           0
   0x30 :   prio                            symbol_table_name1!int                           120
   0x34 :   static_prio                     symbol_table_name1!int                           120
   0x38 :   normal_prio                     symbol_table_name1!int                           120
   0x3c :   rt_priority                     symbol_table_name1!unsigned int                  0
   0x40 :   sched_class                     *symbol_table_name1!sched_class                  0xffff814052a0
   0x48 :   se                              symbol_table_name1!sched_entity                  0x8800044a58c8
<SNIP>

Now when dt() follows pointer to an object it shows the actual value of the pointer, then indents the values for the object. Note here that by showing the pointer value in hex it makes it really obvious that the task.files pointer points to 0x88001f5bc980 and the files_struct is at 0x88001f5bc980. Showing the pointer value as an int makes that relationship a little harder to see.

(layer_name) >>> dt(task.files) 
 symbol_table_name1!pointer (8 bytes): 0x88001f5bc980
     symbol_table_name1!files_struct (704 bytes): 0x88001f5bc980
       0x0 :   count                  symbol_table_name1!unnamed_4c3f6f38ad08303b     0x88001f5bc980
       0x8 :   fdt                    *symbol_table_name1!fdtable                     0x88001cbff400
      0x10 :   fdtab                  symbol_table_name1!fdtable                      0x88001f5bc990
      0x80 :   file_lock              symbol_table_name1!spinlock                     0x88001f5bca00
      0x84 :   next_fd                symbol_table_name1!int                          3
      0x88 :   close_on_exec_init     symbol_table_name1!embedded_fd_set              0x88001f5bca08
      0x90 :   open_fds_init          symbol_table_name1!embedded_fd_set              0x88001f5bca10
      0x98 :   fd_array               symbol_table_name1!array                        ['0x8800044c6880', '0x8800044c6880', '0x8800044c6880', '0x8800044c6880', '0x0', '0x0', '0x0', '0x0', '0x0', '0x0', '0x0', '0x0', '0x0', '0x0', '0x0', '0x0', '0x0', '0x0', '0x0', '0x0', '0x0', '0x0', '0x0', '0x0', '0x0', '0x0', '0x0', '0x0', '0x0', '0x0', '0x0', '0x0', '0x0', '0x0', '0x0', '0x0', '0x0', '0x0', '0x0', '0x0', '0x0', '0x0', '0x0', '0x0', '0x0', '0x0', '0x0', '0x0', '0x0', '0x0', '0x0', '0x0', '0x0', '0x0', '0x0', '0x0', '0x0', '0x0', '0x0', '0x0', '0x0', '0x0', '0x0', '0x0']

Lastly if we follow a few pointers it keeps indenting until we find the struct, find a pointer we cant follow, or we hit the limit of following 8 pointers.

(layer_name) >>> dt(task.files.fdt.fd) 
 symbol_table_name1!pointer (8 bytes): 0x88001c4c2800
    symbol_table_name1!pointer (8 bytes): 0x8800044c6880
         symbol_table_name1!file (208 bytes): 0x8800044c6880
           0x0 :   f_u               symbol_table_name1!unnamed_6733425fe3c7f4c9     0x8800044c6880
          0x10 :   f_path            symbol_table_name1!path                         0x8800044c6890
          0x20 :   f_op              *symbol_table_name1!file_operations             0xffff814378d0
          0x28 :   f_lock            symbol_table_name1!spinlock                     0x8800044c68a8
          0x2c :   f_sb_list_cpu     symbol_table_name1!int                          0
          0x30 :   f_count           symbol_table_name1!unnamed_ab9338acf0e8cd0d     0x8800044c68b0
          0x38 :   f_flags           symbol_table_name1!unsigned int                 32770
          0x3c :   f_mode            symbol_table_name1!unsigned int                 3
          0x40 :   f_pos             symbol_table_name1!long long int                0
          0x48 :   f_owner           symbol_table_name1!fown_struct                  0x8800044c68c8
          0x68 :   f_cred            *symbol_table_name1!cred                        0x88001b62a200
          0x70 :   f_ra              symbol_table_name1!file_ra_state                0x8800044c68f0
          0x90 :   f_version         symbol_table_name1!long long unsigned int       0
          0x98 :   f_security        *symbol_table_name1!void                        0x0
          0xa0 :   private_data      *symbol_table_name1!void                        0x88001c0b5240
          0xa8 :   f_ep_links        symbol_table_name1!list_head                    0x8800044c6928
          0xb8 :   f_tfile_llink     symbol_table_name1!list_head                    0x8800044c6938
          0xc8 :   f_mapping         *symbol_table_name1!address_space               0x88001aaf5808

@eve-mem eve-mem marked this pull request as ready for review November 7, 2023 16:04
Copy link
Member

@ikelos ikelos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I started reviewing this a while ago and then it slid off my radar. A few comments but it's well on its way. 5:)

volatility3/cli/volshell/generic.py Outdated Show resolved Hide resolved
volatility3/cli/volshell/generic.py Outdated Show resolved Hide resolved
volatility3/cli/volshell/generic.py Outdated Show resolved Hide resolved
volatility3/cli/volshell/generic.py Show resolved Hide resolved
# if we aren't able to follow the pointers anymore then there will
# be no more information to display as we've already printed the
# details of this pointer
return

if hasattr(volobject.vol, "members"):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure this is a good way of telling a struct from a pointer? Can't immediately think of how to do it, but I think there is a single overarching struct type that covers the 3 different children, probably better to isinstance that?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great question - I'll dig into it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been exploring this issue and found that when dealing with actual objects or types with offsets, checking if it's an instance of objects.AggregateType works really well. 👍

However, I ran into difficulties when working with templates, such as when executing dt('task_struct') without any offsets. Both cases appear as ObjectTemplate types, making it challenging to distinguish between them with type checks:

(layer_name) >>> type(self.context.symbol_space.get_type('symbol_table_name1!task_struct'))       
<class 'volatility3.framework.objects.templates.ObjectTemplate'>
(layer_name) >>> type(self.context.symbol_space.get_type('symbol_table_name1!int'))            
<class 'volatility3.framework.objects.templates.ObjectTemplate'>

I found .vol.object_class promising for getting the types:

(layer_name) >>> self.context.symbol_space.get_type('symbol_table_name1!task_struct').vol.object_class
<class 'volatility3.framework.symbols.linux.extensions.task_struct'>
(layer_name) >>> self.context.symbol_space.get_type('symbol_table_name1!int').vol.object_class         
<class 'volatility3.framework.objects.Integer'>

However, .vol.object_class ends up as abc.ABCMeta, making isinstance checks infeasible as far as I understand. Which is kind of understandable since they're not actual instances.

(layer_name) >>> type(self.context.symbol_space.get_type('symbol_table_name1!task_struct').vol.object_class) 
<class 'abc.ABCMeta'>
(layer_name) >>> type(self.context.symbol_space.get_type('symbol_table_name1!int').vol.object_class)          
<class 'abc.ABCMeta'>

I've looked for solutions and found some discussions like this one on StackOverflow, but I'm still uncertain about the best approach.

It might be possible to adjust the templates to facilitate isinstance checks, but it's currently beyond my expertise. 😕 For now, I've resorted to using hasattr to differentiate between types, though it feels like a workaround:

(layer_name) >>> hasattr(self.context.symbol_space.get_type('symbol_table_name1!task_struct'), "members")                  
True
(layer_name) >>> hasattr(self.context.symbol_space.get_type('symbol_table_name1!int'), "members")         
False

I'm open to suggestions or guidance on how to approach this more effectively. 🙃 Let me know if you have any good ideas!

volatility3/cli/volshell/generic.py Outdated Show resolved Hide resolved
volatility3/cli/volshell/generic.py Outdated Show resolved Hide resolved
volatility3/cli/volshell/generic.py Show resolved Hide resolved
@eve-mem eve-mem marked this pull request as draft December 7, 2023 09:30
eve-mem and others added 3 commits February 21, 2024 13:14
… both where the pointer is and where it is pointing to. Add extra output for null or unreadable pointers.
volatility3/cli/volshell/generic.py Fixed Show resolved Hide resolved
@eve-mem eve-mem marked this pull request as ready for review February 21, 2024 22:26
@eve-mem
Copy link
Contributor Author

eve-mem commented Feb 21, 2024

Hello @ikelos - I think this is ready for review again.

As a recap as it's been a few months this PR is all about trying to make it easier in volshell to use dt() on objects where there are pointers. Trying to show the actual data right away without you needing to keep adding dereference() onto the commands.

It's the difference between knowing that mm is a pointer, and knowing that mm is a pointer to a mm_struct and we're able to actually follow that pointer is needed.

# before the changes
(layer_name) >>> dt(task)
symbol_table_name1!task_struct (1784 bytes)
<snip>
  0x1a8 :   mm                              symbol_table_name1!pointer                      149534103296064
<snip>

# with the changes
(layer_name) >>> dt('task') 
symbol_table_name1!task_struct (1784 bytes) @ 0x8800044a5880:
<snip>
  0x1a8 :   mm                              *symbol_table_name1!mm_struct                    0x88001f1bd840
<snip>

I've updated it now to display pointers better, showing both their location and where they point to. Hopefully it's more obvious and user friendly now. It also include messages for null pointers or pointers that can't be followed.

Simple things still work as expected:

(layer_name) >>> dt('void')
symbol_table_name1!void
(layer_name) >>> dt('int')  
symbol_table_name1!int (4 bytes, little endian, signed)

Structs that include pointers show they're pointers with the * and then show what the actual type is that is being pointed to. e.g. we can see clearly that mm points to an mm_struct.

(layer_name) >>> dt('task_struct') 
symbol_table_name1!task_struct (1784 bytes):
    0x0 :   state                           symbol_table_name1!long int
    0x8 :   stack                           *symbol_table_name1!void
   0x10 :   usage                           symbol_table_name1!unnamed_4c3f6f38ad08303b
<snip>
  0x1a8 :   mm                              *symbol_table_name1!mm_struct
  0x1b0 :   active_mm                       *symbol_table_name1!mm_struct
<snip>

With an actual object we get the values too and the header shows where the struct is located much like the repr for objects. (this is more useful later when we're following pointers more). You can see here that the splice_pipe pointer is null and it's been highlighted as such.

(layer_name) >>> dt(task)
symbol_table_name1!task_struct (1784 bytes) @ 0x8800044a5880:
    0x0 :   state                           symbol_table_name1!long int                      1
    0x8 :   stack                           *symbol_table_name1!void                         0x880001ed2000
   0x10 :   usage                           symbol_table_name1!unnamed_4c3f6f38ad08303b      0x8800044a5890
   0x14 :   flags                           symbol_table_name1!unsigned int                  4202752
   0x18 :   ptrace                          symbol_table_name1!unsigned int                  0
   0x20 :   wake_entry                      symbol_table_name1!llist_node                    0x8800044a58a0
   0x28 :   on_cpu                          symbol_table_name1!int                           0
<snip>
  0x688 :   splice_pipe                     *symbol_table_name1!pipe_inode_info              0x0 (null pointer)
<snip>

This is works when just giving a type and an offset as you'd expected.

(layer_name) >>> dt('task_struct', task.vol.offset) 
symbol_table_name1!task_struct (1784 bytes) @ 0x8800044a5880:
    0x0 :   state                           symbol_table_name1!long int                      1
    0x8 :   stack                           *symbol_table_name1!void                         0x880001ed2000
   0x10 :   usage                           symbol_table_name1!unnamed_4c3f6f38ad08303b      0x8800044a5890
   0x14 :   flags                           symbol_table_name1!unsigned int                  4202752
   0x18 :   ptrace                          symbol_table_name1!unsigned int                  0
   0x20 :   wake_entry                      symbol_table_name1!llist_node                    0x8800044a58a0
   0x28 :   on_cpu                          symbol_table_name1!int                           0
<snip>

Here is an example where it's parsing some invalid data - you can see that the stack pointer has been highlighted as unreadable. That's due to it being fed wrong data but it would also show this if it happened to due smear.

(layer_name) >>> dt('task_struct', task.vol.offset - 1) 
symbol_table_name1!task_struct (1784 bytes) @ 0x8800044a587f:
    0x0 :   state                           symbol_table_name1!long int                      256
    0x8 :   stack                           *symbol_table_name1!void                         0x1ed200000 (unreadable pointer)
   0x10 :   usage                           symbol_table_name1!unnamed_4c3f6f38ad08303b      0x8800044a588f
   0x14 :   flags                           symbol_table_name1!unsigned int                  1075904512
   0x18 :   ptrace                          symbol_table_name1!unsigned int                  0
   0x20 :   wake_entry                      symbol_table_name1!llist_node                    0x8800044a589f
<snip>

Here is an example of following a pointer right away to get to the actual struct. It's showing that the pointer located at 0x8800044a5d00 points to the location 0x88001f5bc980. Then on the indented header for the actual files_struct you can see that it is located @ 0x88001f5bc980 as you'd expect.

  (layer_name) >>> dt(task.files) 
symbol_table_name1!pointer (8 bytes) @ 0x8800044a5d00 -> 0x88001f5bc980
    symbol_table_name1!files_struct (704 bytes) @ 0x88001f5bc980:
       0x0 :   count                  symbol_table_name1!unnamed_4c3f6f38ad08303b     0x88001f5bc980
       0x8 :   fdt                    *symbol_table_name1!fdtable                     0x88001cbff400
      0x10 :   fdtab                  symbol_table_name1!fdtable                      0x88001f5bc990
      0x80 :   file_lock              symbol_table_name1!spinlock                     0x88001f5bca00
      0x84 :   next_fd                symbol_table_name1!int                          3
      0x88 :   close_on_exec_init     symbol_table_name1!embedded_fd_set              0x88001f5bca08
      0x90 :   open_fds_init          symbol_table_name1!embedded_fd_set              0x88001f5bca10
<snip>

Here is an example of following a double pointer:

(layer_name) >>> dt(task.files.fdt.fd) 
symbol_table_name1!pointer (8 bytes) @ 0x88001cbff408 -> 0x88001c4c2800
    symbol_table_name1!pointer (8 bytes) @ 0x88001c4c2800 -> 0x8800044c6880
        symbol_table_name1!file (208 bytes) @ 0x8800044c6880
           0x0 :   f_u               symbol_table_name1!unnamed_6733425fe3c7f4c9     0x8800044c6880
          0x10 :   f_path            symbol_table_name1!path                         0x8800044c6890
          0x20 :   f_op              *symbol_table_name1!file_operations             0xffff814378d0
          0x28 :   f_lock            symbol_table_name1!spinlock                     0x8800044c68a8
          0x2c :   f_sb_list_cpu     symbol_table_name1!int                          0
          0x30 :   f_count           symbol_table_name1!unnamed_ab9338acf0e8cd0d     0x8800044c68b0
          0x38 :   f_flags           symbol_table_name1!unsigned int                 32770
<snip>

Here is an example where trying to following the pointers to the struct fails. The "null pointer" message there hopefully making it clear why no more information is displayed. In the past you'd have used .deference() to get here but be, rightly, hit with a full traceback as it's an invalid address so can't be followed.

(layer_name) >>> dt(task.files.fdt.fd) 
symbol_table_name1!pointer (8 bytes) @ 0xffff8162c858 -> 0xffff8162c8d8
    symbol_table_name1!pointer (8 bytes) @ 0xffff8162c8d8 -> 0x0 (null pointer)

To make that clear here is the raw data behind the two pointers in that last example:

(layer_name) >>> dq(0xffff8162c858, 8)
0xffff8162c858    ffffffff8162c8d8                     .....b..
(layer_name) >>> dq(0xffff8162c8d8, 8) 
0xffff8162c8d8    0000000000000000                     ........

Sorry for the huge comments.... I just wanted to be as clear as I could be. If there is anything I've missed out or isn't clear please let me know.

Thanks!

Copy link
Member

@ikelos ikelos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, just need to get suitable consistency between the == and isinstance testing to determine pointers, and then this should be good to merge (and it looks awesome, thanks!). 5:D

a pointer otherwise it returns just the normal type name."""
pointer_marker = "*" * depth
try:
if member_type.vol.object_class == objects.Pointer:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might be better to use isinstance than equality? It's interesting this is a different test than the isinstance below used for display_type They should probably match to avoid weird discrepancies! 5:P

else:
return member_type_name
except AttributeError:
pass # not all objects get a `object_class`, and those that don't are not pointers.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They should all get a type_name, but then you're having to do string comparisons (although pointer should be a built-in type, so the name shouldn't change?). Happy with either route...

@eve-mem
Copy link
Contributor Author

eve-mem commented Feb 23, 2024

Hey @ikelos

I was looking into why I wasn't using an isinstance check in _get_type_name_with_pointer and it's to handle when we're being given just a type and not an actual object.

e.g. here the isinstance check could work on mmap or mmap_cache as they are actually pointers. We'd also have to pass the whole member object over though to _get_type_name_with_pointer.

(layer_name) >>> dt(task.mm)  
symbol_table_name1!pointer (8 bytes) @ 0x8800044a5a28 -> 0x88001f1bd840
    symbol_table_name1!mm_struct (920 bytes) @ 0x88001f1bd840:
        0x0 :   mmap                  *symbol_table_name1!vm_area_struct               0x88001b6b4818
        0x8 :   mm_rb                 symbol_table_name1!rb_root                       0x88001f1bd848
       0x10 :   mmap_cache            *symbol_table_name1!vm_area_struct               0x88001b5d1ad8

Where as here they aren't actually real pointers so the isinstance check doesn't work - or at least I can't work out how to do it correctly.

(layer_name) >>> dt('mm_struct')   
symbol_table_name1!mm_struct (920 bytes):
    0x0 :   mmap                  *symbol_table_name1!vm_area_struct
    0x8 :   mm_rb                 symbol_table_name1!rb_root
   0x10 :   mmap_cache            *symbol_table_name1!vm_area_struct

I cant work out a different way of doing it at the moment. So either leave it as it is with the == rather than isinstance or I can mull over it and work out something better - what do you think?

Also - I wanted to ask if there where any version numbers to update anywhere? Does volshell count as part of the framework so need to up the minor number? (Sorry - I'm becoming a bit of a version number addict)

@ikelos
Copy link
Member

ikelos commented Feb 23, 2024

Nope, volshell is a separate utility. I guess we could update its version number but it's far less important (it isn't expected for people to depend on volshell in its own right). I guess we could give volshell its own version number, but that might get complicated (given it kinda comes packaged with the framework). You can always jump the patch number on the framework if you like, but it really isn't an API change k particularly since it's only in the display of output, not of the actual function or operation of volshell). The version numbers shouldn't be an all consuming thing, they're just to make sure people can easily tell if their puzzle pieces can fit together properly... 5;)

As to the other piece, operating on a type name rather than an object is fine, but we should try to keep the checks the same, so either both operate on the name or both operate on the type. If only operating on the type name, it should be possible to split on bang and check for "pointer" at the end, but that might make inheritance of pointer in the future more difficult (I don't immediately forsee a need for this, but as soon as I say that someone will come up with one, so again, just something to be aware of and document somewhere)... 5:)

@eve-mem
Copy link
Contributor Author

eve-mem commented Feb 23, 2024

Makes sense re versions.

I'll play around with those checks a bit more - i think I get it now. 😀

@eve-mem eve-mem marked this pull request as draft February 23, 2024 11:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants