Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use file size in file offset -> virtual offset translation #875

Merged
merged 1 commit into from
Oct 29, 2024

Conversation

danielocfb
Copy link
Collaborator

We were seeing an issue where normalization + file offset symbolization was reporting an incorrect symbolization result for certain addresses in a certain binary.
In short, in a binary with segments:

  > Type  Offset     VirtAddr           PhysAddr           FileSiz    MemSiz     Flg Align
  > LOAD  0x949be40  0x000000000969be40 0x000000000969be40 0x11d6c930 0x11d6c930 R E 0x200000
  > LOAD  0x1b208780 0x000000001b409780 0x000000001b409780 0x01b5e8   0x01c1f0   RW  0x1000
  > LOAD  0x1b224980 0x000000001b426980 0x000000001b426980 0x343888   0xca594c   RW  0x1000
  > LOAD  0x1b600000 0x000000001c200000 0x000000001c200000 0x4a358ac  0x4a358ac  R E 0x200000

We were normalizing absolute address 0x1c23b4d0 to file offset 0x1b63b4d0. That's based on the calculation

  > 0x1c23b4d0 - 0x1c200000 /*VirtAddr*/ + 0x1b600000 /*Offset*/ = 0x1b63b4d0

When subsequently symbolizing the file offset we were looking up the wrong segment, because we took into account the segment's memory size:

  > 0x1b63b4d0 is in [0x1b224980 /*Offset*/, 0x1b224980 + 0xca594c /*MemSiz*/ = 0x1beca2cc]

Because this first fitting segment comes before the one we used as part of normalization, we continued symbolizing virtual offset

  > 0x1b63b4d0 - 0x1b224980 /*Offset*/ + 0x1b426980 /*VirtAddr*/ = 0x1b83d4d0

which lead to a wrong symbolization. By matching up segments based on the file size instead, we pick the correct segment instead:

  > 0x1b63b4d0 is *not* in [0x1b224980 /*Offset*/, 0x1b224980 + 0x343888 /*FileSiz*/ = 0x1b568208]
  > 0x1b63b4d0 is in [0x1b600000 /*Offset*/, 0x1b600000 + 0x4a358ac /*FileSiz*/ = 0x200358ac]

So we end up symbolizing virtual offset

  > 0x1b63b4d0 - 0x1b600000 /*Offset*/ + 0x1c200000 /*VirtAddr*/ = 0x1c23b4d0

which leads to the correct symbolization result.

We were seeing an issue where normalization + file offset symbolization
was reporting an incorrect symbolization result for certain addresses in
a certain binary.
In short, in a binary with segments:

  > Type  Offset     VirtAddr           PhysAddr           FileSiz    MemSiz     Flg Align
  > LOAD  0x949be40  0x000000000969be40 0x000000000969be40 0x11d6c930 0x11d6c930 R E 0x200000
  > LOAD  0x1b208780 0x000000001b409780 0x000000001b409780 0x01b5e8   0x01c1f0   RW  0x1000
  > LOAD  0x1b224980 0x000000001b426980 0x000000001b426980 0x343888   0xca594c   RW  0x1000
  > LOAD  0x1b600000 0x000000001c200000 0x000000001c200000 0x4a358ac  0x4a358ac  R E 0x200000

We were normalizing absolute address 0x1c23b4d0 to file offset
0x1b63b4d0. That's based on the calculation

  > 0x1c23b4d0 - 0x1c200000 /*VirtAddr*/ + 0x1b600000 /*Offset*/ = 0x1b63b4d0

When subsequently symbolizing the file offset we were looking up the
wrong segment, because we took into account the segment's memory size:

  > 0x1b63b4d0 is in [0x1b224980 /*Offset*/, 0x1b224980 + 0xca594c /*MemSiz*/ = 0x1beca2cc]

Because this first fitting segment comes before the one we used as part
of normalization, we continued symbolizing virtual offset

  > 0x1b63b4d0 - 0x1b224980 /*Offset*/ + 0x1b426980 /*VirtAddr*/ = 0x1b83d4d0

which lead to a wrong symbolization. By matching up segments based on
the file size instead, we pick the correct segment instead:

  > 0x1b63b4d0 is *not* in [0x1b224980 /*Offset*/, 0x1b224980 + 0x0343888 /*FileSiz*/ = 0x1b568208]
  > 0x1b63b4d0 is       in [0x1b600000 /*Offset*/, 0x1b600000 + 0x4a358ac /*FileSiz*/ = 0x200358ac]

So we end up symbolizing virtual offset

  > 0x1b63b4d0 - 0x1b600000 /*Offset*/ + 0x1c200000 /*VirtAddr*/ = 0x1c23b4d0

which leads to the correct symbolization result.

Hence, fix the logic to work with the segment's file size instead.

Signed-off-by: Daniel Müller <[email protected]>
Copy link
Member

@anakryiko anakryiko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@danielocfb danielocfb force-pushed the topic/file-size branch 2 times, most recently from 1954f9d to 53b1623 Compare October 29, 2024 23:41
@danielocfb danielocfb merged commit 1a4e107 into libbpf:main Oct 29, 2024
37 checks passed
@danielocfb danielocfb deleted the topic/file-size branch October 29, 2024 23:46
Copy link

codecov bot commented Oct 29, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 95.39%. Comparing base (59c58c8) to head (53b1623).
Report is 1 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main     #875   +/-   ##
=======================================
  Coverage   95.39%   95.39%           
=======================================
  Files          55       55           
  Lines       10522    10522           
=======================================
  Hits        10037    10037           
  Misses        485      485           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

d-e-s-o added a commit to d-e-s-o/blazesym that referenced this pull request Oct 30, 2024
Add a regression test for the change introduced with pull request libbpf#875.
It's unfortunately very involved and cumbersome to come up with a
complete end-to-end test and so with this change we add a unit test
based on partial data from the binary in question.

Signed-off-by: Daniel Müller <[email protected]>
d-e-s-o added a commit that referenced this pull request Oct 30, 2024
Add a regression test for the change introduced with pull request #875.
It's unfortunately very involved and cumbersome to come up with a
complete end-to-end test and so with this change we add a unit test
based on partial data from the binary in question.

Signed-off-by: Daniel Müller <[email protected]>
danielocfb pushed a commit to d-e-s-o/blazesym that referenced this pull request Oct 31, 2024
Add an end-to-end test for the fix provided by pull request libbpf#875. The
test basically normalizes an address in a specially crafted binary and
symbolizes the resulting file offset. It fails without commit
1a4e107 ("Use file size in file offset -> virtual offset
translation"), because then the file offset to virtual offset
translation produces a virtual offset that can't be symbolized to the
expected _start function.

Signed-off-by: Daniel Müller <[email protected]>
danielocfb pushed a commit to d-e-s-o/blazesym that referenced this pull request Oct 31, 2024
Add an end-to-end test for the fix provided by pull request libbpf#875. The
test basically normalizes an address in a specially crafted binary and
symbolizes the resulting file offset. It fails without commit
1a4e107 ("Use file size in file offset -> virtual offset
translation"), because then the file offset to virtual offset
translation produces a virtual offset that can't be symbolized to the
expected _start function.

Signed-off-by: Daniel Müller <[email protected]>
danielocfb pushed a commit to d-e-s-o/blazesym that referenced this pull request Oct 31, 2024
Add an end-to-end test for the fix provided by pull request libbpf#875. The
test basically normalizes an address in a specially crafted binary and
symbolizes the resulting file offset. It fails without commit
1a4e107 ("Use file size in file offset -> virtual offset
translation"), because then the file offset to virtual offset
translation produces a virtual offset that can't be symbolized to the
expected _start function.

For the record, the binary looks roughly as follows:

  $ readelf --segments --wide test-block.bin
  > Program Headers:
  >   Type           Offset   VirtAddr           PhysAddr           FileSiz  MemSiz   Flg Align
  >   LOAD           0x000000 0x00000000000ff000 0x00000000000ff000 0x000200 0x301000 RW  0x1000
  >   LOAD           0x000200 0x0000000000400200 0x0000000000400200 0x000030 0x000030 R   0x1000
  >   LOAD           0x001000 0x0000000000401000 0x0000000000401000 0x00005b 0x00005b R E 0x1000
  >   LOAD           0x002000 0x0000000000402000 0x0000000000402000 0x000050 0x000050 R   0x1000
  >   [...]

Signed-off-by: Daniel Müller <[email protected]>
danielocfb pushed a commit to d-e-s-o/blazesym that referenced this pull request Oct 31, 2024
Add an end-to-end test for the fix provided by pull request libbpf#875. The
test basically normalizes an address in a specially crafted binary and
symbolizes the resulting file offset. It fails without commit
1a4e107 ("Use file size in file offset -> virtual offset
translation"), because then the file offset to virtual offset
translation produces a virtual offset that can't be symbolized to the
expected _start function.

For the record, the binary looks roughly as follows:

  $ readelf --segments --wide test-block.bin
  > Program Headers:
  >   Type           Offset   VirtAddr           PhysAddr           FileSiz  MemSiz   Flg Align
  >   LOAD           0x000000 0x00000000000ff000 0x00000000000ff000 0x000200 0x301000 RW  0x1000
  >   LOAD           0x000200 0x0000000000400200 0x0000000000400200 0x000030 0x000030 R   0x1000
  >   LOAD           0x001000 0x0000000000401000 0x0000000000401000 0x00005b 0x00005b R E 0x1000
  >   LOAD           0x002000 0x0000000000402000 0x0000000000402000 0x000050 0x000050 R   0x1000
  >   [...]

Signed-off-by: Daniel Müller <[email protected]>
danielocfb pushed a commit to d-e-s-o/blazesym that referenced this pull request Oct 31, 2024
Add an end-to-end test for the fix provided by pull request libbpf#875. The
test basically normalizes an address in a specially crafted binary and
symbolizes the resulting file offset. It fails without commit
1a4e107 ("Use file size in file offset -> virtual offset
translation"), because then the file offset to virtual offset
translation produces a virtual offset that can't be symbolized to the
expected _start function.

For the record, the binary looks roughly as follows:

  $ readelf --segments --wide test-block.bin
  > Program Headers:
  >   Type           Offset   VirtAddr           PhysAddr           FileSiz  MemSiz   Flg Align
  >   LOAD           0x000000 0x00000000000ff000 0x00000000000ff000 0x0001c8 0x301000 RW  0x1000
  >   LOAD           0x0001c8 0x00000000004001c8 0x00000000004001c8 0x000030 0x000030 R   0x1000
  >   LOAD           0x001000 0x0000000000401000 0x0000000000401000 0x00005b 0x00005b R E 0x1000
  >   LOAD           0x002000 0x0000000000402000 0x0000000000402000 0x000038 0x000038 R   0x1000
  >   [...]

Signed-off-by: Daniel Müller <[email protected]>
danielocfb pushed a commit that referenced this pull request Oct 31, 2024
Add an end-to-end test for the fix provided by pull request #875. The
test basically normalizes an address in a specially crafted binary and
symbolizes the resulting file offset. It fails without commit
1a4e107 ("Use file size in file offset -> virtual offset
translation"), because then the file offset to virtual offset
translation produces a virtual offset that can't be symbolized to the
expected _start function.

For the record, the binary looks roughly as follows:

  $ readelf --segments --wide test-block.bin
  > Program Headers:
  >   Type           Offset   VirtAddr           PhysAddr           FileSiz  MemSiz   Flg Align
  >   LOAD           0x000000 0x00000000000ff000 0x00000000000ff000 0x0001c8 0x301000 RW  0x1000
  >   LOAD           0x0001c8 0x00000000004001c8 0x00000000004001c8 0x000030 0x000030 R   0x1000
  >   LOAD           0x001000 0x0000000000401000 0x0000000000401000 0x00005b 0x00005b R E 0x1000
  >   LOAD           0x002000 0x0000000000402000 0x0000000000402000 0x000038 0x000038 R   0x1000
  >   [...]

Signed-off-by: Daniel Müller <[email protected]>
javierhonduco added a commit to javierhonduco/lightswitch that referenced this pull request Nov 15, 2024
Saw libbpf/blazesym#875 by pure chance and we
have the same bug in our normalization logic.

Thanks so much, @danielocfb!

Test Plan
=========

Integration test pass. Would be good to add a regression tests for this.
javierhonduco added a commit to javierhonduco/lightswitch that referenced this pull request Nov 15, 2024
Saw libbpf/blazesym#875 by pure chance and we
have the same bug in our normalization logic.

Thanks so much, @danielocfb!

Test Plan
=========

Integration test pass. Would be good to add a regression tests for this.
javierhonduco added a commit to javierhonduco/lightswitch that referenced this pull request Nov 15, 2024
Saw libbpf/blazesym#875 by pure chance and we
have the same bug in our normalization logic.

Thanks so much, @danielocfb!

Test Plan
=========

Integration test pass. Would be good to add a regression tests for this.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants