Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Turn off instruction flow control annotations by default #84607

Conversation

jasonmolenda
Copy link
Collaborator

Walter Erquinigo added optional instruction annotations for x86 instructions in 2022 for the thread trace dump instruction command, and code to DisassemblerLLVMC to add annotations for instructions that change flow control, v. https://reviews.llvm.org/D128477

This was added as an option to disassemble, and the trace dump command enables it by default, but several other instruction dumpers were changed to display them by default as well. These are only implemented for Intel instructions, so our disassembly on other targets ends up looking like

(lldb) x/5i 0x1000086e4
0x1000086e4: 0xa9be6ffc   unknown     stp    x28, x27, [sp, #-0x20]!
0x1000086e8: 0xa9017bfd   unknown     stp    x29, x30, [sp, #0x10]
0x1000086ec: 0x910043fd   unknown     add    x29, sp, #0x10
0x1000086f0: 0xd11843ff   unknown     sub    sp, sp, #0x610
0x1000086f4: 0x910c63e8   unknown     add    x8, sp, #0x318

instead of disassemble's output style of

lldb`main:
lldb[0x1000086e4] <+0>:  stp    x28, x27, [sp, #-0x20]!
lldb[0x1000086e8] <+4>:  stp    x29, x30, [sp, #0x10]
lldb[0x1000086ec] <+8>:  add    x29, sp, #0x10
lldb[0x1000086f0] <+12>: sub    sp, sp, #0x610
lldb[0x1000086f4] <+16>: add    x8, sp, #0x318

Adding symbolic annotations for assembly instructions is something I'm interested in too, because we may have users investigating a crash or apparent-incorrect behavior who must debug optimized assembly and they may not be familiar with the ISA they're using, so short of flipping through a many-thousand-page PDF to understand each instruction, they're lost. They don't write assembly or work at that level, but to understand a bug, they have to understand what the instructions are actually doing.

But the annotations that exist today don't move us forward much on that front - I'd argue that the flow control instructions on Intel are not hard to understand from their names, but that might just be my personal bias. Much trickier instructions exist in any event.

Displaying this information by default for all targets when we only have one class of instructions on one target is not a good default.

Also, in 2011 when Greg implemented the memory read -f i (aka x/i) command

commit 5009f9d5010a7e34ae15f962dac8505ea11a8716
Author: Greg Clayton <[email protected]>
Date:   Thu Oct 27 17:55:14 2011 +0000
[...]
    eFormatInstruction will print out disassembly with bytes and it will use the
    current target's architecture. The format character for this is "i" (which
    used to be being used for the integer format, but the integer format also has
    "d", so we gave the "i" format to disassembly), the long format is
    "instruction".

he had DumpDataExtractor's DumpInstructions print the bytes of the instruction -- that's the first field we see above for the x/5i after the address -- and this is only useful for people who are debugging the disassembler itself, I would argue. I don't want this displayed by default either.

tl;dr this patch removes both fields from memory read -f -i and I think this is the right call today. While I'm really interested in instruction annotation, I don't think x/i is the right place to have it enabled by default unless it's really compelling on at least some of our major targets.

Walter Erquinigo added optional instruction annotations for x86
instructions in 2022 for the `thread trace dump instruction` command,
and code to DisassemblerLLVMC to add annotations for instructions
that change flow control, v.  https://reviews.llvm.org/D128477

This was added as an option to `disassemble`, and the trace dump
command enables it by default, but several other instruction dumpers
were changed to display them by default as well.  These are only
implemented for Intel instructions, so our disassembly on other
targets ends up looking like

```
(lldb) x/5i 0x1000086e4
0x1000086e4: 0xa9be6ffc   unknown     stp    x28, x27, [sp, #-0x20]!
0x1000086e8: 0xa9017bfd   unknown     stp    x29, x30, [sp, #0x10]
0x1000086ec: 0x910043fd   unknown     add    x29, sp, #0x10
0x1000086f0: 0xd11843ff   unknown     sub    sp, sp, #0x610
0x1000086f4: 0x910c63e8   unknown     add    x8, sp, #0x318
```

instead of `disassemble`'s output style of

```
lldb`main:
lldb[0x1000086e4] <+0>:  stp    x28, x27, [sp, #-0x20]!
lldb[0x1000086e8] <+4>:  stp    x29, x30, [sp, #0x10]
lldb[0x1000086ec] <+8>:  add    x29, sp, #0x10
lldb[0x1000086f0] <+12>: sub    sp, sp, #0x610
lldb[0x1000086f4] <+16>: add    x8, sp, #0x318
```

Adding symbolic annotations for assembly instructions is something
I'm interested in too, because we may have users investigating a
crash or apparent-incorrect behavior who must debug optimized
assembly and they may not be familiar with the ISA they're using,
so short of flipping through a many-thousand-page PDF to understand
each instruction, they're lost.  They don't write assembly or work
at that level, but to understand a bug, they have to understand
what the instructions are actually doing.

But the annotations that exist today don't move us forward much on
that front - I'd argue that the flow control instructions on Intel
are not hard to understand from their names, but that might just
be my personal bias.  Much trickier instructions exist in any event.

Displaying this information by default for all targets when we
only have one class of instructions on one target is not a good
default.

Also, in 2011 when Greg implemented the `memory read -f i` (aka `x/i`)
command
```
commit 5009f9d
Author: Greg Clayton <[email protected]>
Date:   Thu Oct 27 17:55:14 2011 +0000
[...]
    eFormatInstruction will print out disassembly with bytes and it will use the
    current target's architecture. The format character for this is "i" (which
    used to be being used for the integer format, but the integer format also has
    "d", so we gave the "i" format to disassembly), the long format is
    "instruction".
```

he had DumpDataExtractor's DumpInstructions print the bytes of the
instruction -- that's the first field we see above for the `x/5i`
after the address -- and this is only useful for people who are
debugging the disassembler itself, I would argue.  I don't want
this displayed by default either.

tl;dr this patch removes both fields from `memory read -f -i` and I
think this is the right call today.  While I'm really interested in
instruction annotation, I don't think `x/i` is the right place to
have it enabled by default unless it's really compelling on at least
some of our major targets.
@llvmbot
Copy link
Member

llvmbot commented Mar 9, 2024

@llvm/pr-subscribers-lldb

Author: Jason Molenda (jasonmolenda)

Changes

Walter Erquinigo added optional instruction annotations for x86 instructions in 2022 for the thread trace dump instruction command, and code to DisassemblerLLVMC to add annotations for instructions that change flow control, v. https://reviews.llvm.org/D128477

This was added as an option to disassemble, and the trace dump command enables it by default, but several other instruction dumpers were changed to display them by default as well. These are only implemented for Intel instructions, so our disassembly on other targets ends up looking like

(lldb) x/5i 0x1000086e4
0x1000086e4: 0xa9be6ffc   unknown     stp    x28, x27, [sp, #-0x20]!
0x1000086e8: 0xa9017bfd   unknown     stp    x29, x30, [sp, #<!-- -->0x10]
0x1000086ec: 0x910043fd   unknown     add    x29, sp, #<!-- -->0x10
0x1000086f0: 0xd11843ff   unknown     sub    sp, sp, #<!-- -->0x610
0x1000086f4: 0x910c63e8   unknown     add    x8, sp, #<!-- -->0x318

instead of disassemble's output style of

lldb`main:
lldb[0x1000086e4] &lt;+0&gt;:  stp    x28, x27, [sp, #-0x20]!
lldb[0x1000086e8] &lt;+4&gt;:  stp    x29, x30, [sp, #<!-- -->0x10]
lldb[0x1000086ec] &lt;+8&gt;:  add    x29, sp, #<!-- -->0x10
lldb[0x1000086f0] &lt;+12&gt;: sub    sp, sp, #<!-- -->0x610
lldb[0x1000086f4] &lt;+16&gt;: add    x8, sp, #<!-- -->0x318

Adding symbolic annotations for assembly instructions is something I'm interested in too, because we may have users investigating a crash or apparent-incorrect behavior who must debug optimized assembly and they may not be familiar with the ISA they're using, so short of flipping through a many-thousand-page PDF to understand each instruction, they're lost. They don't write assembly or work at that level, but to understand a bug, they have to understand what the instructions are actually doing.

But the annotations that exist today don't move us forward much on that front - I'd argue that the flow control instructions on Intel are not hard to understand from their names, but that might just be my personal bias. Much trickier instructions exist in any event.

Displaying this information by default for all targets when we only have one class of instructions on one target is not a good default.

Also, in 2011 when Greg implemented the memory read -f i (aka x/i) command

commit 5009f9d5010a7e34ae15f962dac8505ea11a8716
Author: Greg Clayton &lt;gclayton@<!-- -->apple.com&gt;
Date:   Thu Oct 27 17:55:14 2011 +0000
[...]
    eFormatInstruction will print out disassembly with bytes and it will use the
    current target's architecture. The format character for this is "i" (which
    used to be being used for the integer format, but the integer format also has
    "d", so we gave the "i" format to disassembly), the long format is
    "instruction".

he had DumpDataExtractor's DumpInstructions print the bytes of the instruction -- that's the first field we see above for the x/5i after the address -- and this is only useful for people who are debugging the disassembler itself, I would argue. I don't want this displayed by default either.

tl;dr this patch removes both fields from memory read -f -i and I think this is the right call today. While I'm really interested in instruction annotation, I don't think x/i is the right place to have it enabled by default unless it's really compelling on at least some of our major targets.


Full diff: https://github.com/llvm/llvm-project/pull/84607.diff

3 Files Affected:

  • (modified) lldb/source/Core/DumpDataExtractor.cpp (+2-2)
  • (modified) lldb/source/Expression/IRExecutionUnit.cpp (+1-1)
  • (modified) lldb/source/Plugins/UnwindAssembly/InstEmulation/UnwindAssemblyInstEmulation.cpp (+1-1)
diff --git a/lldb/source/Core/DumpDataExtractor.cpp b/lldb/source/Core/DumpDataExtractor.cpp
index 986c9a181919ee..826edd7bab046e 100644
--- a/lldb/source/Core/DumpDataExtractor.cpp
+++ b/lldb/source/Core/DumpDataExtractor.cpp
@@ -150,8 +150,8 @@ static lldb::offset_t DumpInstructions(const DataExtractor &DE, Stream *s,
       if (bytes_consumed) {
         offset += bytes_consumed;
         const bool show_address = base_addr != LLDB_INVALID_ADDRESS;
-        const bool show_bytes = true;
-        const bool show_control_flow_kind = true;
+        const bool show_bytes = false;
+        const bool show_control_flow_kind = false;
         ExecutionContext exe_ctx;
         exe_scope->CalculateExecutionContext(exe_ctx);
         disassembler_sp->GetInstructionList().Dump(
diff --git a/lldb/source/Expression/IRExecutionUnit.cpp b/lldb/source/Expression/IRExecutionUnit.cpp
index 0682746e448e30..e4e131d70d4319 100644
--- a/lldb/source/Expression/IRExecutionUnit.cpp
+++ b/lldb/source/Expression/IRExecutionUnit.cpp
@@ -201,7 +201,7 @@ Status IRExecutionUnit::DisassembleFunction(Stream &stream,
                                       UINT32_MAX, false, false);
 
   InstructionList &instruction_list = disassembler_sp->GetInstructionList();
-  instruction_list.Dump(&stream, true, true, /*show_control_flow_kind=*/true,
+  instruction_list.Dump(&stream, true, true, /*show_control_flow_kind=*/false,
                         &exe_ctx);
 
   return ret;
diff --git a/lldb/source/Plugins/UnwindAssembly/InstEmulation/UnwindAssemblyInstEmulation.cpp b/lldb/source/Plugins/UnwindAssembly/InstEmulation/UnwindAssemblyInstEmulation.cpp
index 7ff5cd2c23b075..c4a171ec7d01b1 100644
--- a/lldb/source/Plugins/UnwindAssembly/InstEmulation/UnwindAssemblyInstEmulation.cpp
+++ b/lldb/source/Plugins/UnwindAssembly/InstEmulation/UnwindAssemblyInstEmulation.cpp
@@ -83,7 +83,7 @@ bool UnwindAssemblyInstEmulation::GetNonCallSiteUnwindPlanFromAssembly(
       const uint32_t addr_byte_size = m_arch.GetAddressByteSize();
       const bool show_address = true;
       const bool show_bytes = true;
-      const bool show_control_flow_kind = true;
+      const bool show_control_flow_kind = false;
       m_cfa_reg_info = *m_inst_emulator_up->GetRegisterInfo(
           unwind_plan.GetRegisterKind(), unwind_plan.GetInitialCFARegister());
       m_fp_is_cfa = false;

Copy link
Member

@JDevlieghere JDevlieghere left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@jasonmolenda jasonmolenda merged commit bdbad0d into llvm:main Mar 11, 2024
6 checks passed
@jasonmolenda jasonmolenda deleted the disable-flow-control-annotations-by-default-in-memory-read branch March 11, 2024 17:21
jasonmolenda added a commit to jasonmolenda/llvm-project that referenced this pull request Mar 11, 2024
Walter Erquinigo added optional instruction annotations for x86
instructions in 2022 for the `thread trace dump instruction` command,
and code to DisassemblerLLVMC to add annotations for instructions that
change flow control, v. https://reviews.llvm.org/D128477

This was added as an option to `disassemble`, and the trace dump command
enables it by default, but several other instruction dumpers were
changed to display them by default as well. These are only implemented
for Intel instructions, so our disassembly on other targets ends up
looking like

```
(lldb) x/5i 0x1000086e4
0x1000086e4: 0xa9be6ffc   unknown     stp    x28, x27, [sp, #-0x20]!
0x1000086e8: 0xa9017bfd   unknown     stp    x29, x30, [sp, #0x10]
0x1000086ec: 0x910043fd   unknown     add    x29, sp, #0x10
0x1000086f0: 0xd11843ff   unknown     sub    sp, sp, #0x610
0x1000086f4: 0x910c63e8   unknown     add    x8, sp, #0x318
```

instead of `disassemble`'s output style of

```
lldb`main:
lldb[0x1000086e4] <+0>:  stp    x28, x27, [sp, #-0x20]!
lldb[0x1000086e8] <+4>:  stp    x29, x30, [sp, #0x10]
lldb[0x1000086ec] <+8>:  add    x29, sp, #0x10
lldb[0x1000086f0] <+12>: sub    sp, sp, #0x610
lldb[0x1000086f4] <+16>: add    x8, sp, #0x318
```

Adding symbolic annotations for assembly instructions is something I'm
interested in too, because we may have users investigating a crash or
apparent-incorrect behavior who must debug optimized assembly and they
may not be familiar with the ISA they're using, so short of flipping
through a many-thousand-page PDF to understand each instruction, they're
lost. They don't write assembly or work at that level, but to understand
a bug, they have to understand what the instructions are actually doing.

But the annotations that exist today don't move us forward much on that
front - I'd argue that the flow control instructions on Intel are not
hard to understand from their names, but that might just be my personal
bias. Much trickier instructions exist in any event.

Displaying this information by default for all targets when we only have
one class of instructions on one target is not a good default.

Also, in 2011 when Greg implemented the `memory read -f i` (aka `x/i`)
command
```
commit 5009f9d
Author: Greg Clayton <[email protected]>
Date:   Thu Oct 27 17:55:14 2011 +0000
[...]
    eFormatInstruction will print out disassembly with bytes and it will use the
    current target's architecture. The format character for this is "i" (which
    used to be being used for the integer format, but the integer format also has
    "d", so we gave the "i" format to disassembly), the long format is
    "instruction".
```

he had DumpDataExtractor's DumpInstructions print the bytes of the
instruction -- that's the first field we see above for the `x/5i` after
the address -- and this is only useful for people who are debugging the
disassembler itself, I would argue. I don't want this displayed by
default either.

tl;dr this patch removes both fields from `memory read -f -i` and I
think this is the right call today. While I'm really interested in
instruction annotation, I don't think `x/i` is the right place to have
it enabled by default unless it's really compelling on at least some of
our major targets.

(cherry picked from commit bdbad0d)
@PiJoules
Copy link
Contributor

Hi, I suspect this led to the test failure we're seeing at https://luci-milo.appspot.com/ui/p/fuchsia/builders/toolchain.ci/clang-linux-x64/b8753741062398610017/overview.

Script:
--
/b/s/w/ir/x/w/lldb_install/python3/bin/python3 /b/s/w/ir/x/w/llvm-llvm-project/lldb/test/API/dotest.py -u CXXFLAGS -u CFLAGS --env ARCHIVER=/b/s/w/ir/x/w/cipd/bin/llvm-ar --env OBJCOPY=/b/s/w/ir/x/w/cipd/bin/llvm-objcopy --env LLVM_LIBS_DIR=/b/s/w/ir/x/w/llvm_build/./lib --env LLVM_INCLUDE_DIR=/b/s/w/ir/x/w/llvm_build/include --env LLVM_TOOLS_DIR=/b/s/w/ir/x/w/llvm_build/./bin --libcxx-include-dir /b/s/w/ir/x/w/llvm_build/include/c++/v1 --libcxx-include-target-dir /b/s/w/ir/x/w/llvm_build/include/x86_64-unknown-linux-gnu/c++/v1 --libcxx-library-dir /b/s/w/ir/x/w/llvm_build/./lib/x86_64-unknown-linux-gnu --arch x86_64 --build-dir /b/s/w/ir/x/w/llvm_build/lldb-test-build.noindex --lldb-module-cache-dir /b/s/w/ir/x/w/llvm_build/lldb-test-build.noindex/module-cache-lldb/lldb-api --clang-module-cache-dir /b/s/w/ir/x/w/llvm_build/lldb-test-build.noindex/module-cache-clang/lldb-api --executable /b/s/w/ir/x/w/llvm_build/./bin/lldb --compiler /b/s/w/ir/x/w/llvm_build/./bin/clang --dsymutil /b/s/w/ir/x/w/llvm_build/./bin/dsymutil --llvm-tools-dir /b/s/w/ir/x/w/llvm_build/./bin --lldb-libs-dir /b/s/w/ir/x/w/llvm_build/./lib /b/s/w/ir/x/w/llvm-llvm-project/lldb/test/API/functionalities/data-formatter/builtin-formats -p TestBuiltinFormats.py
--
Exit Code: 1

Command Output (stdout):
--
lldb version 19.0.0git (https://llvm.googlesource.com/a/llvm-project revision 8467457afc61d70e881c9817ace26356ef757733)
  clang revision 8467457afc61d70e881c9817ace26356ef757733
  llvm revision 8467457afc61d70e881c9817ace26356ef757733
Skipping the following test categories: ['dsym', 'gmodules', 'debugserver', 'objc']

--
Command Output (stderr):
--
PASS: LLDB (/b/s/w/ir/x/w/llvm_build/bin/clang-x86_64) :: test (TestBuiltinFormats.TestCase.test)
PASS: LLDB (/b/s/w/ir/x/w/llvm_build/bin/clang-x86_64) :: testAllPlatforms (TestBuiltinFormats.TestCase.testAllPlatforms)
FAIL: LLDB (/b/s/w/ir/x/w/llvm_build/bin/clang-x86_64) :: test_instruction (TestBuiltinFormats.TestCase.test_instruction)
PASS: LLDB (/b/s/w/ir/x/w/llvm_build/bin/clang-x86_64) :: test_pointer (TestBuiltinFormats.TestCase.test_pointer)
======================================================================
FAIL: test_instruction (TestBuiltinFormats.TestCase.test_instruction)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/b/s/w/ir/x/w/llvm-llvm-project/lldb/packages/Python/lldbsuite/test/decorators.py", line 150, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/b/s/w/ir/x/w/llvm-llvm-project/lldb/packages/Python/lldbsuite/test/decorators.py", line 450, in wrapper
    return func(self, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/b/s/w/ir/x/w/llvm-llvm-project/lldb/test/API/functionalities/data-formatter/builtin-formats/TestBuiltinFormats.py", line 310, in test_instruction
    self.assertIn(
AssertionError: '  addq   0xa(%rdi), %r8\n' not found in '(int) $0 = addq   0xa(%rdi), %r8\n'
Config=x86_64-/b/s/w/ir/x/w/llvm_build/bin/clang
----------------------------------------------------------------------
Ran 4 tests in 0.541s

FAILED (failures=1)
--

Could you take a look and send out a fix or revert? Thanks.

jasonmolenda added a commit to swiftlang/llvm-project that referenced this pull request Mar 13, 2024
…-on-disassembly

Turn off instruction flow control annotations by default (llvm#84607)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants